mstk.topology.Molecule

class mstk.topology.Molecule(name='UNK')

A molecule is defined as atoms and the connectivity between them.

The term molecule is not strictly a chemical molecule. Some atoms may not be connected to any other atoms in the same molecule. However, there can not be bonds connecting atoms belong to different molecules. Drude particles and virtual sites are also considered as atoms. All bond, angles, dihedrals and impropers should be defined explicitly.

Parameters:

name (str) –

id

Index of this molecule in topology. -1 means information haven’t been updated by topology

Type:

int

name

Name of the molecule, not necessarily unique

Type:

str

Methods

__init__([name])

add_angle(atom1, atom2, atom3[, check_existence])

Add a angle between three atoms.

add_atom(atom[, residue, index, update_topology])

Add an atom to this molecule.

add_bond(atom1, atom2[, order, check_existence])

Add a bond between two atoms.

add_dihedral(atom1, atom2, atom3, atom4[, ...])

Add a dihedral between four atoms.

add_improper(atom1, atom2, atom3, atom4[, ...])

Add a improper between four atoms.

add_residue(name, atoms[, refresh_residues])

Put a group of atoms into a new residue.

from_rdmol(rdmol[, name])

Initialize a molecule from a RDKit Mol object.

from_smiles(smiles)

Initialize a molecule from SMILES string.

generate_angle_dihedral_improper([dihedral, ...])

Generate angle, dihedral and improper from bonds The existing angles, dihedrals and impropers will be removed first The atoms and bonds concerning Drude particles will be ignored

generate_conformers([n_conformer])

Generate several conformers with RDKit.

generate_drude_particles(ff[, type_drude, ...])

Generate Drude particles from DrudeTerms in force field.

generate_virtual_sites(ff[, update_topology])

Generate virtual sites from VirtualSiteTerms in force field.

get_12_13_14_pairs()

Retrieve all the 1-2, 1-3 and 1-4 pairs based on the bond information.

get_adjacency_matrix()

get_distance_matrix([max_bond])

get_drude_pairs()

Retrieve all the Drude dipole pairs belong to this molecule

get_sub_molecule(indexes[, deepcopy])

Extract a substructure from this molecule by indexes of atoms.

get_virtual_site_pairs()

Retrieve all the virtual site pairs belong to this molecule

guess_connectivity_from_ff(ff[, bond_limit, ...])

Guess bonds, angles, dihedrals and impropers from force field.

is_similar_to(other)

Check if this molecule is similar to another molecule.

merge(molecules[, deepcopy])

Merge several molecules into a single molecule.

refresh_residues([update_topology])

Remove empty residues, update id_in_mol attributes of each residue in this molecule

remove_atom(atom[, update_topology])

Remove an atom and all the bonds connected to the atom from this molecule.

remove_atoms(atoms[, update_topology])

Remove multiple atoms and all the bonds connected to these atoms from this molecule.

remove_connectivity(connectivity)

Remove a connectivity (bond, angle, diheral or improper) from this molecule.

remove_drude_particles([update_topology])

Remove all Drude particles and bonds belong to Drude particles

remove_non_polar_hydrogens([update_topology])

Remove single-coordinated hydrogen atoms bonded to C and Si atoms

remove_residue(residue[, refresh_residues])

Remove a residue from this molecule, and put the relevant atoms into the default residue

remove_virtual_sites([update_topology])

Remove all virtual sites.

set_positions(positions)

Set the positions of all atoms in this molecule

split([consecutive])

Split the molecule into smaller pieces based on bond network.

split_residues()

Split the molecule into smaller pieces.

Attributes

angles

List of angles belong to this molecule

atoms

List of atoms belong to this molecule

bonds

List of bonds belong to this molecule

dihedrals

List of dihedrals belong to this molecule

has_position

Whether or not all the atoms in the molecule have positions

impropers

List of impropers belong to this molecule

n_angle

Number of angles belong to this molecule

n_atom

Number of atoms belong to this molecule

n_bond

Number of bonds belong to this molecule

n_dihedral

Number of dihedrals belong to this molecule

n_improper

Number of impropers belong to this molecule

n_residue

name

positions

Positions of all the atoms in this molecule

rdmol

The rdkit.Chem.Mol object associated with this molecule.

residues

All the residues in this molecule

topology

The topology this molecule belongs to

property name
static from_smiles(smiles)

Initialize a molecule from SMILES string.

RDKit is used for parsing SMILES. The Hydrogen atoms will be created. The positions of all atoms will also be automatically generated. The SMILES string can contain the name of the molecule at the end, e.g. ‘CCCC butane’.

Parameters:

smiles (str) –

Returns:

molecule

Return type:

Molecule

static from_rdmol(rdmol, name=None)

Initialize a molecule from a RDKit Mol object. If the RDKit Mol has conformers, the position of the first conformer will be assigned to the atoms

Parameters:
  • rdmol (rdkit.Chem.Mol) –

  • name (str) – The name of the molecule. If not provided, the formula will be used as the name.

Returns:

molecule

Return type:

Molecule

property rdmol

The rdkit.Chem.Mol object associated with this molecule.

It is required by ZftTyper typing engine, which performs SMARTS matching on the molecule. The rdmol attribute will be assigned if the molecule is initialized from SMILES or RDKit Molecule. If it is not available, a RDKit molecule will be constructed from atoms and bonds. The positions will not be preserved.

Returns:

rdmol

Return type:

rdkit.Chem.Mol

generate_conformers(n_conformer=1)

Generate several conformers with RDKit.

The positions will be generated from only elements and bonds. The chiral center will not be respected.

Parameters:

n_conformer (int) – How many conformers to generate

Returns:

molecules – Each conformer will be a independent molecule object

Return type:

list of Molecule

property topology

The topology this molecule belongs to

Returns:

topology

Return type:

Topology

add_atom(atom, residue=None, index=None, update_topology=True)

Add an atom to this molecule.

The id_in_mol attribute of all atoms will be updated after insertion.

TODO Make residue assignment more robust

Parameters:
  • atom (Atom) –

  • residue (Residue, Optional) – Add the atom to this residue. Make sure the residue belongs to this molecule. For performance concern, this is not checked. If set to None, the atom will be added to the default residue.

  • index (int, Optional) – If None, the new atom will be the last atom. Otherwise, it will be inserted in front of index-th atom.

  • update_topology (bool) – If True, the topology this molecule belongs to will update its atom list and assign id for all atoms and residues. Otherwise, you have to re-init the topology manually so that the topological information is correct.

remove_atom(atom, update_topology=True)

Remove an atom and all the bonds connected to the atom from this molecule.

The atom will also be removed from its residue. The id_in_mol attribute of all atoms will be updated after removal. The angle, dihedral and improper involving this atom are untouched. Therefore, you may call generate_angle_dihedral_improper to refresh the connectivity.

Parameters:
  • atom (Atom) –

  • update_topology (bool) – If update_topology is True, the topology this molecule belongs to will update its atom list and assign id for all atoms and residues. Otherwise, you have to re-init the topology manually so that the topological information is correct.

remove_atoms(atoms, update_topology=True)

Remove multiple atoms and all the bonds connected to these atoms from this molecule.

The atom will also be removed from its residue. The id_in_mol attribute of all atoms will be updated after removal. The angle, dihedral and improper involving this atom are untouched. Therefore, you may call generate_angle_dihedral_improper to refresh the connectivity.

Parameters:
  • atom (Atom) –

  • update_topology (bool) – If update_topology is True, the topology this molecule belongs to will update its atom list and assign id for all atoms and residues. Otherwise, you have to re-init the topology manually so that the topological information is correct.

remove_non_polar_hydrogens(update_topology=True)

Remove single-coordinated hydrogen atoms bonded to C and Si atoms

Parameters:

update_topology (bool) – If update_topology is True, the topology this molecule belongs to will update its atom list and assign id for all atoms and residues. Otherwise, you have to re-init the topology manually so that the topological information is correct.

Returns:

ids_removed – The number of atoms removed

Return type:

list of int

add_residue(name, atoms, refresh_residues=True)

Put a group of atoms into a new residue. These atoms will be removed from their old residues.

Make sure that these atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • name (str) –

  • atoms (list of Atom) –

  • refresh_residues (bool) – If True, the residue list of this molecule will be updated. The id and id_in_mol for all residues will be assigned. This operation can be slow. If you have a lot of residues to add, it is more efficient to set it to False, and call refresh_residues manually after all residues are added.

Returns:

residue

Return type:

Residue

remove_residue(residue, refresh_residues=True)

Remove a residue from this molecule, and put the relevant atoms into the default residue

Make sure that this residue belongs to this molecule. For performance issue, this is not checked.

Parameters:
  • residue (Residue) –

  • refresh_residues (bool) – If True, the residue list of this molecule will be updated. The id and id_in_mol for all residues will be assigned. This operation can be slow. If you have a lot of residues to remove, it is more efficient to set it to False, and call refresh_residues manually after all residues are removed.

refresh_residues(update_topology=True)

Remove empty residues, update id_in_mol attributes of each residue in this molecule

Parameters:

update_topology (bool) – If True, the topology this molecule belongs to will assign id for all residues

add_bond(atom1, atom2, order=0, check_existence=False)

Add a bond between two atoms.

Make sure that both these two atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • atom1 (Atom) –

  • atom2 (Atom) –

  • check_existence (bool) – If set to True and there is already bond between these two atoms, then do nothing and return None

Returns:

bond

Return type:

[Bond, None]

add_angle(atom1, atom2, atom3, check_existence=False)

Add a angle between three atoms.

The second atom is the central atom. Make sure that both these three atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • atom1 (Atom) –

  • atom2 (Atom) –

  • atom3 (Atom) –

  • check_existence (bool) – If set to True and there is already angle between these three atoms, then do nothing and return None

Returns:

angle

Return type:

[Angle, None]

add_dihedral(atom1, atom2, atom3, atom4, check_existence=False)

Add a dihedral between four atoms.

Make sure that both these four atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • atom1 (Atom) –

  • atom2 (Atom) –

  • atom3 (Atom) –

  • atom4 (Atom) –

  • check_existence (bool) – If set to True and there is already dihedral between these three atoms, then do nothing and return None

Returns:

dihedral

Return type:

[Dihedral, None]

add_improper(atom1, atom2, atom3, atom4, check_existence=False)

Add a improper between four atoms.

The fist atom is the central atom. Make sure that both these four atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • atom1 (Atom) –

  • atom2 (Atom) –

  • atom3 (Atom) –

  • atom4 (Atom) –

  • check_existence (bool) – If set to True and there is already improper between these three atoms, then do nothing and return None

Returns:

dihedral

Return type:

[Improper, None]

remove_connectivity(connectivity)

Remove a connectivity (bond, angle, diheral or improper) from this molecule.

Make sure that this connectivity belongs to this molecule. For performance issue, this is not checked.

Note that when a bond get removed, the relevant angles, dihedrals and impropers are still there. You may call generate_angle_dihedral_improper to refresh connectivity.

# TODO This operation is slow. A batch version is required for better performance

Parameters:

connectivity ([Bond, Angle, Dihedral, Improper]) –

is_similar_to(other)

Check if this molecule is similar to another molecule.

It requires two molecules contains the same number of atoms. The correspond atoms should have same atom symbol, type and charge. The bonds should also be the same. But it doesn’t consider angles, dihedrals and impropers.

Parameters:

other (Molecule) –

Returns:

is

Return type:

bool

get_adjacency_matrix()
get_distance_matrix(max_bond=None)
property n_atom

Number of atoms belong to this molecule

Returns:

n

Return type:

int

property n_bond

Number of bonds belong to this molecule

Returns:

n

Return type:

int

property n_angle

Number of angles belong to this molecule

Returns:

n

Return type:

int

property n_dihedral

Number of dihedrals belong to this molecule

Returns:

n

Return type:

int

property n_improper

Number of impropers belong to this molecule

Returns:

n

Return type:

int

property n_residue
property atoms

List of atoms belong to this molecule

Returns:

atoms

Return type:

list of Atom

property bonds

List of bonds belong to this molecule

Returns:

bonds

Return type:

list of Bond

property angles

List of angles belong to this molecule

Returns:

angles

Return type:

list of Angle

property dihedrals

List of dihedrals belong to this molecule

Returns:

dihedrals

Return type:

list of Dihedral

property impropers

List of impropers belong to this molecule

Returns:

impropers

Return type:

list of Improper

property residues

All the residues in this molecule

Returns:

residues

Return type:

list of Residue

property has_position

Whether or not all the atoms in the molecule have positions

Returns:

has

Return type:

bool

property positions

Positions of all the atoms in this molecule

Returns:

positions

Return type:

array_like

set_positions(positions)

Set the positions of all atoms in this molecule

Parameters:

positions (array_like) –

get_drude_pairs()

Retrieve all the Drude dipole pairs belong to this molecule

Returns:

pairs – [(parent, drude)]

Return type:

list of tuple of Atom

get_virtual_site_pairs()

Retrieve all the virtual site pairs belong to this molecule

Returns:

pairs – [(parent, atom_virtual_site)]

Return type:

list of tuple of Atom

get_12_13_14_pairs()

Retrieve all the 1-2, 1-3 and 1-4 pairs based on the bond information.

The pairs only concerns real atoms. Drude particles will be ignored.

Returns:

  • pairs12 (list of tuple of Atom)

  • pairs13 (list of tuple of Atom)

  • pairs14 (list of tuple of Atom)

generate_angle_dihedral_improper(dihedral=True, improper=True)

Generate angle, dihedral and improper from bonds The existing angles, dihedrals and impropers will be removed first The atoms and bonds concerning Drude particles will be ignored

Parameters:
  • dihedral (bool) – Whether or not generate dihedrals based on bonds

  • improper (bool) – Whether or not generate impropers based on bonds

guess_connectivity_from_ff(ff, bond_limit=0.25, bond_tolerance=0.025, angle_tolerance=None, pbc='', cell=None)

Guess bonds, angles, dihedrals and impropers from force field.

It requires that atoms types are defined and positions are available. The distance between nearby atoms will be calculated. If it’s smaller than bond_length_limit, then it will be compared with the equilibrium length in FF. The bond will be added if a BondTerm is found in FF and the deviation is smaller than bond_tolerance. Then angles will be constructed from bonds. If angle_tolerance is None, all angles will be added. If angle_tolerance is set (as degree), then AngleTerm must be provided for these angles. The angle will be added only if the deviation between angle and equilibrium value in FF is smaller than angle_tolerance. Dihedrals and impropers will be constructed form bonds and be added if relevant terms are presented in FF.

PBC is supported for determining bonds across the periodic cell This is useful for simulating infinite structures pbc can be ‘’, ‘x’, ‘y’, ‘xy’, ‘xz’, ‘xyz’, which means check bonds cross specific boundaries cell should also be provided if pbc is not ‘’

TODO Add support for triclinic cell

Parameters:
  • ff (ForceField) –

  • bond_limit (float) –

  • bond_tolerance (float) –

  • angle_tolerance (float) –

  • pbc (str) –

  • cell (UnitCell) –

generate_drude_particles(ff, type_drude='DP_', seed=1, update_topology=True)

Generate Drude particles from DrudeTerms in force field.

The atom types should have been defined already. Drude particle will not be generated if DrudeTerm for its atom type can not be found in the FF. Note that The existing Drude particles will be removed before generating. The mass defined in the DrudeTerm will be transferred from parent atom to the Drude particle. The Drude charge will be calculated from the DrudeTerm and transferred from parent atom to the Drude particle. Bonds between parent-Drude will be generated and added to the topology. If AtomType and VdwTerm for generated Drude particles are not found in FF, these terms will be created and added to the FF.

Parameters:
  • ff (ForceField) –

  • type_drude (str) –

  • seed (int) –

  • update_topology (bool) –

remove_drude_particles(update_topology=True)

Remove all Drude particles and bonds belong to Drude particles

The charges and masses carried by Drude particles will be transferred back to parent atoms

Parameters:

update_topology (bool) –

generate_virtual_sites(ff, update_topology=True)

Generate virtual sites from VirtualSiteTerms in force field.

The atom types should have been defined already. Note that The existing virtual sites will be removed before generating. The charge won’t be assigned by this method. Therefore assign_charge_from_ff should be called to assign the charges on virtual sites.

Currently, only TIP4PSiteTerm has been implemented.

TODO Support other virtual site terms

Parameters:
remove_virtual_sites(update_topology=True)

Remove all virtual sites.

Parameters:

update_topology (bool) –

get_sub_molecule(indexes, deepcopy=True)

Extract a substructure from this molecule by indexes of atoms.

The substructure will not contain any bond, angle, dihedral and improper between atoms in substructure and remaining parts. Residue information will be reconstructed.

TODO Fix performance issue

Parameters:
  • indexes (list of int) – The atoms in the substructure will be in the same order as in indexes

  • deepcopy (bool) – If set to False, then the atoms and connections in the substructure will be the identical object as the atoms and connections in this molecule. The data structure in this molecule will be messed up, and should not be accessed later.

Returns:

substructure

Return type:

Molecule

static merge(molecules, deepcopy=True)

Merge several molecules into a single molecule.

Parameters:
  • molecules (list of Molecule) –

  • deepcopy (bool) – If True, the molecules will be deep-copied, and be intact after the mergence. Otherwise, the atoms of the merged molecule and of the original molecules will be the same objects. Then the original molecules will be unusable.

Returns:

merged

Return type:

Molecule

split(consecutive=False)

Split the molecule into smaller pieces based on bond network.

The atoms in each piece will preserve the original order. However, the atoms at the end of original molecule may end up in a piece in the beginning, causing the order of all atoms in all the pieces different from original order. To avoid this, set consecutive to True. In this case, it will make sure all atoms in front pieces will have atom id smaller than atoms in back pieces.

Residue information will be reconstructed for each piece

Parameters:

consecutive (bool) –

Returns:

molecules

Return type:

list of Molecule

split_residues()

Split the molecule into smaller pieces. Each piece will be made of one residue.

Make sure that there is no inter-residue bonds/angles/dihedrals/impropers.

Returns:

molecules

Return type:

list of Molecule