pyvalem

pyvalem is a Python 3 package for handling chemical formulae. It defines a syntax for specifying a formula with some structural information but is not itself a format for representing all molecular structures: for this purpose there are already many standards, including InChI and SMILES. Rather, it provides a simple way to parse the chemical formulae of atoms, isotopes, atomic ions and small molecules and to transform them into HTML for use on webpages, URL-safe "slug" strings and canonical stoichiometric formula form. It can also calculate masses (either as isotope-weighted averages or absolute values for specific isotopologues).

It is hosted on github and is available as free software under the GPL-v3 open source licence.

Example: the L-tyrosine zwitterion

As an example, the L-tyrosine zwitterion may be represented by the following ChemFormula object:

>>> from pyvalem.chem_formula import ChemFormula
>>> Ltyrosine = ChemFormula('L-(-)-(NH3+)CH(CH2C6H4OH)CO2-')

Its HTML representation (accessed with Ltyrosine.html) produces:

L-(-)-(NH3+)CH(CH2C6H4OH)CO2-

(Note that D- and L- prefixes appear in lower-caps). Other useful attributes and methods include:

>>> print Ltyrosine.stoichiometric_formula()
H11C9NO3
>>> print Ltyrosine.stoichiometric_formula('alphabetical')
C9H11NO3
>>> print Ltyrosine.slug
L-m___NH3_p_CH_CH2C6H4OH_CO2_m
>>> print Ltyrosine.rmm    # relative molecular mass
181.18854

Creating a pyvalem ChemFormula

A ChemFormula object may be initialized by passing it a string consisting of element symbols and their stoichiometries. Any total charge on the species is indicated at the end of the string by -2, -1 (or just -), +1 (or just +), +2, etc. Do not use underscores (_) for subscripts or carets (^) for superscripts. For example,

>>> ethanol = ChemFormula('CH3CH2OH')
>>> carbonate = ChemFormula('CO3-2')
>>> hydronium = ChemFormula('H3O+')

Enclose specific isotopes in parentheses:

>>> f1 = ChemFormula('(235U)+4')
>>> f2 = ChemFormula('(12C)(16O)2')
>>> f3 = ChemFormula('(13C)HCl3')

Prefixes and formulae including bracketed moieties are now (v1.0b) supported:

>>> isobutane = ChemFormula('CH(CH3)3')
>>> Dalanine = ChemFormula('D-CH3CH(NH2)COOH')
>>> chlorocarbon = ChemFormula('1,1,2-C2H3Cl3')
>>> beta_lysine = ChemFormula('β-H2NC3H6CH(NH2)CH2CO2H')

The supported molecular formula prefixes are listed below. Multiple prefixes are separated by a hyphen (as, for example, in '(L)-α-CH3CH(NH2)COOH'). Note that some of the prefixes require unicode characters.

'(+)', '(-), '(±)',
'D', 'L',
'(R)', '(S)',
'(E)', '(Z)',
'cis', 'trans',
's', 'a',
'Δ', 'Λ',
'α', 'β', 'γ',
'n', 'i', 't', 'neo', 'sec',
'o', 'm', 'p', 'ortho', 'meta', 'para'

Outputing a pyvalem ChemFormula

The string used to initialize the ChemFormula object is stored and is returned by the __str__() method. An HTML version (with the stoichiometric numbers as subscripts and the charge in its conventional form as a superscript is stored in the attribute html. For example,

>>> print carbonate
CO3-2
>>> print carbonate.html
CO<sub>3</sub><sup>2-</sup>

The slug attribute holds a URL-safe string representing the formula; this is guaranteed to be unique only for formulas without isotopes, prefixes or bracketed moieties.

>>> print hydronium.slug
H3O_p
>>> print chlorocarbon.slug
1_1_2__C2H3Cl3

Returning the Stoichiometric Formula

The stoichiometric formula of a ChemFormula can be returned with the elements ordered by atomic number (the default), alphabetically, or in Hill notation: first the carbons, if any, then the hydrogens, then the other atoms in alphabetical order. For example,

>>> f = ChemFormula('CH2FCH2Cl')
>>> print f.stoichiometric_formula()
H4C2FCl   # ordered by atomic number, the default
>>> print f.stoichiometric_formula('alphabetical')
C2ClFH4
>>> print f.stoichiometric_formula('hill')
C2H4ClF

Relative Molecular Mass

The relative molecular mass, relative to C=12, for average isotopic abundances is held in the attribute rmm:

>>> print ethanol.rmm;
46.06844

Conversely, where specific isotopes are specified, the isotope mass is used. For example, the most abundant isotopologue of ethanol, 12C21H516O1H:

>>> f = ChemFormula('(12C)2(1H)5(16O)(1H)')
>>> print f.rmm
46.041865

The support of the Atomic and Molecular Data Unit at the IAEA, the Data Center for Plasma Properties of the Korean National Fusion Research Institute, and the Virtual Atomic and Molecular Data Centre in the development of pyvalem is gratefully acknowledged.