mstk.topology.Molecule

class mstk.topology.Molecule(name='UNK')

A molecule is defined as atoms and the connectivity between them.

The term molecule is not strictly a chemical molecule. Some atoms may not be connected to any other atoms in the same molecule. However, there can not be bonds connecting atoms belong to different molecules. Drude particles and virtual sites are also considered as atoms. All bond, angles, dihedrals and impropers should be defined explicitly.

Parameters:

name (str) –

id

Index of this molecule in topology. -1 means information haven’t been updated by topology

Type:

int

name

Name of the molecule, not necessarily unique

Type:

str

Methods

__init__([name])

add_angle(atom1, atom2, atom3[, check_existence])

Add a angle between three atoms.

add_atom(atom[, residue, index, update_topology])

Add an atom to this molecule.

add_bond(atom1, atom2[, order, check_existence])

Add a bond between two atoms.

add_dihedral(atom1, atom2, atom3, atom4[, ...])

Add a dihedral between four atoms.

add_improper(atom1, atom2, atom3, atom4[, ...])

Add a improper between four atoms.

add_residue(name, atoms[, refresh_residues])

Put a group of atoms into a new residue.

from_rdmol(rdmol[, name])

Initialize a molecule from a RDKit Mol object.

from_smiles(smiles[, generate_positions])

Initialize a molecule from SMILES string.

generate_angle_dihedral_improper([dihedral, ...])

Generate angle, dihedral and improper from bonds The existing angles, dihedrals and impropers will be removed first The atoms and bonds concerning Drude particles will be ignored

generate_conformers([n_conformer])

Generate several conformers with RDKit.

generate_drude_particles(ff[, type_drude, ...])

Generate Drude particles from DrudePolarTerms in force field.

generate_virtual_sites(ff[, update_topology])

Generate virtual sites from VirtualSiteTerms in force field.

get_12_13_14_pairs()

Retrieve all the 1-2, 1-3 and 1-4 pairs based on the bond information.

get_adjacency_matrix()

Get the adjacency matrix of this molecule.

get_distance_matrix([max_bond])

Get the distance matrix of this molecule.

get_drude_pairs()

Retrieve all the Drude dipole pairs belong to this molecule

get_sub_molecule(indexes[, deepcopy])

Extract a substructure from this molecule by indexes of atoms.

get_virtual_site_pairs()

Retrieve all the virtual site pairs belong to this molecule

guess_bonds_from_ff(ff[, max_bond_length, ...])

Guess bonds from force field.

is_similar_to(other)

Check if this molecule is similar to another molecule.

merge(molecules[, deepcopy])

Merge several molecules into a single molecule.

refresh_residues([update_topology])

Remove empty residues, update id_in_mol attributes of each residue in this molecule

remove_atom(atom[, update_topology])

Remove an atom and all the bonds connected to the atom from this molecule.

remove_atoms(atoms[, update_topology])

Remove multiple atoms and all the bonds connected to these atoms from this molecule.

remove_connectivity(connectivity)

Remove a connectivity (bond, angle, diheral or improper) from this molecule.

remove_drude_particles([update_topology])

Remove all Drude particles and bonds belong to Drude particles

remove_non_polar_hydrogens([update_topology])

Remove single-coordinated hydrogen atoms bonded to C and Si atoms

remove_residue(residue[, refresh_residues])

Remove a residue from this molecule, and put the relevant atoms into the default residue

remove_virtual_sites([update_topology])

Remove all virtual sites.

set_positions(positions)

Set the positions of all atoms in this molecule

split([consecutive])

Split the molecule into smaller pieces based on bond network.

split_residues([consecutive])

Split the molecule into smaller pieces based on bond network between residues.

Attributes

angles

List of angles belong to this molecule

atoms

List of atoms belong to this molecule

bonds

List of bonds belong to this molecule

dihedrals

List of dihedrals belong to this molecule

has_position

Whether or not all the atoms in the molecule have positions

impropers

List of impropers belong to this molecule

n_angle

Number of angles belong to this molecule

n_atom

Number of atoms belong to this molecule

n_bond

Number of bonds belong to this molecule

n_dihedral

Number of dihedrals belong to this molecule

n_improper

Number of impropers belong to this molecule

n_residue

name

positions

Positions of all the atoms in this molecule

rdmol

The rdkit.Chem.Mol object associated with this molecule.

residues

All the residues in this molecule

topology

The topology this molecule belongs to

property name
static from_smiles(smiles, generate_positions=True)

Initialize a molecule from SMILES string.

RDKit is used for parsing SMILES. The Hydrogen atoms will be created. By default, the positions of all atoms will be generated with RDKit. The SMILES string can contain the name of the molecule at the end, e.g. ‘CCCC butane’.

Parameters:
  • smiles (str) –

  • generate_positions (bool) –

Returns:

molecule

Return type:

Molecule

static from_rdmol(rdmol, name=None)

Initialize a molecule from a RDKit Mol object. If the RDKit Mol has conformers, the position of the first conformer will be assigned to the atoms

Parameters:
  • rdmol (rdkit.Chem.Mol) –

  • name (str) – The name of the molecule. If not provided, the formula will be used as the name.

Returns:

molecule

Return type:

Molecule

property rdmol

The rdkit.Chem.Mol object associated with this molecule.

It is required by SmartsTyper typing engine, which performs SMARTS matching on the molecule. The rdmol attribute will be assigned if the molecule is initialized from SMILES or RDKit Molecule. If it is not available, a RDKit molecule will be constructed from atoms and bonds. The positions will not be preserved.

Returns:

rdmol

Return type:

rdkit.Chem.Mol

generate_conformers(n_conformer=1)

Generate several conformers with RDKit.

The positions will be generated from only elements and bonds. The chiral center will not be respected.

Parameters:

n_conformer (int) – How many conformers to generate

Returns:

positions_list

Return type:

list of array_like

property topology

The topology this molecule belongs to

Returns:

topology

Return type:

Topology

add_atom(atom, residue=None, index=None, update_topology=True)

Add an atom to this molecule.

The id_in_mol attribute of all atoms will be updated after insertion.

TODO Make residue assignment more robust

Parameters:
  • atom (Atom) –

  • residue (Residue, Optional) – Add the atom to this residue. Make sure the residue belongs to this molecule. For performance concern, this is not checked. If set to None, the atom will be added to the default residue.

  • index (int, Optional) – If None, the new atom will be the last atom. Otherwise, it will be inserted in front of index-th atom.

  • update_topology (bool) – If True, the topology this molecule belongs to will update its atom list and assign id for all atoms and residues. Otherwise, you have to re-init the topology manually so that the topological information is correct.

remove_atom(atom, update_topology=True)

Remove an atom and all the bonds connected to the atom from this molecule.

The atom will also be removed from its residue. The id_in_mol attribute of all atoms will be updated after removal. The angle, dihedral and improper involving this atom are untouched. Therefore, you may call generate_angle_dihedral_improper to refresh the connectivity.

Parameters:
  • atom (Atom) –

  • update_topology (bool) – If update_topology is True, the topology this molecule belongs to will update its atom list and assign id for all atoms and residues. Otherwise, you have to re-init the topology manually so that the topological information is correct.

remove_atoms(atoms, update_topology=True)

Remove multiple atoms and all the bonds connected to these atoms from this molecule.

The atom will also be removed from its residue. The id_in_mol attribute of all atoms will be updated after removal. The angle, dihedral and improper involving this atom are untouched. Therefore, you may call generate_angle_dihedral_improper to refresh the connectivity.

Parameters:
  • atom (Atom) –

  • update_topology (bool) – If update_topology is True, the topology this molecule belongs to will update its atom list and assign id for all atoms and residues. Otherwise, you have to re-init the topology manually so that the topological information is correct.

remove_non_polar_hydrogens(update_topology=True)

Remove single-coordinated hydrogen atoms bonded to C and Si atoms

Parameters:

update_topology (bool) – If update_topology is True, the topology this molecule belongs to will update its atom list and assign id for all atoms and residues. Otherwise, you have to re-init the topology manually so that the topological information is correct.

Returns:

ids_removed – The list of id_in_mol of atoms removed

Return type:

list of int

add_residue(name, atoms, refresh_residues=True)

Put a group of atoms into a new residue. These atoms will be removed from their old residues.

Make sure that these atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • name (str) –

  • atoms (list of Atom) –

  • refresh_residues (bool) – If True, the residue list of this molecule will be updated. The id and id_in_mol for all residues will be assigned. This operation can be slow. If you have a lot of residues to add, it is more efficient to set it to False, and call refresh_residues manually after all residues are added.

Returns:

residue

Return type:

Residue

remove_residue(residue, refresh_residues=True)

Remove a residue from this molecule, and put the relevant atoms into the default residue

Make sure that this residue belongs to this molecule. For performance issue, this is not checked.

Parameters:
  • residue (Residue) –

  • refresh_residues (bool) – If True, the residue list of this molecule will be updated. The id and id_in_mol for all residues will be assigned. This operation can be slow. If you have a lot of residues to remove, it is more efficient to set it to False, and call refresh_residues manually after all residues are removed.

refresh_residues(update_topology=True)

Remove empty residues, update id_in_mol attributes of each residue in this molecule

Parameters:

update_topology (bool) – If True, the topology this molecule belongs to will assign id for all residues

add_bond(atom1, atom2, order=0, check_existence=False)

Add a bond between two atoms.

Make sure that both these two atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • atom1 (Atom) –

  • atom2 (Atom) –

  • check_existence (bool) – If set to True and there is already bond between these two atoms, then do nothing and return None

Returns:

bond

Return type:

[Bond, None]

add_angle(atom1, atom2, atom3, check_existence=False)

Add a angle between three atoms.

The second atom is the central atom. Make sure that both these three atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • atom1 (Atom) –

  • atom2 (Atom) –

  • atom3 (Atom) –

  • check_existence (bool) – If set to True and there is already angle between these three atoms, then do nothing and return None

Returns:

angle

Return type:

[Angle, None]

add_dihedral(atom1, atom2, atom3, atom4, check_existence=False)

Add a dihedral between four atoms.

Make sure that both these four atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • atom1 (Atom) –

  • atom2 (Atom) –

  • atom3 (Atom) –

  • atom4 (Atom) –

  • check_existence (bool) – If set to True and there is already dihedral between these three atoms, then do nothing and return None

Returns:

dihedral

Return type:

[Dihedral, None]

add_improper(atom1, atom2, atom3, atom4, check_existence=False)

Add a improper between four atoms.

The fist atom is the central atom. Make sure that both these four atoms belong to this molecule. For performance issue, this is not checked.

Parameters:
  • atom1 (Atom) –

  • atom2 (Atom) –

  • atom3 (Atom) –

  • atom4 (Atom) –

  • check_existence (bool) – If set to True and there is already improper between these three atoms, then do nothing and return None

Returns:

dihedral

Return type:

[Improper, None]

remove_connectivity(connectivity)

Remove a connectivity (bond, angle, diheral or improper) from this molecule.

Make sure that this connectivity belongs to this molecule. For performance issue, this is not checked.

Note that when a bond get removed, the relevant angles, dihedrals and impropers are still there. Usually you have to call generate_angle_dihedral_improper to keep consistency of connectivities.

# TODO This operation is slow. A batch version is required for better performance

Parameters:

connectivity ([Bond, Angle, Dihedral, Improper]) –

is_similar_to(other)

Check if this molecule is similar to another molecule.

It requires two molecules contains the same number of atoms. The correspond atoms should have same atom symbol, type and charge. The bonds should also be the same. But it doesn’t consider angles, dihedrals and impropers.

Parameters:

other (Molecule) –

Returns:

is

Return type:

bool

get_adjacency_matrix()

Get the adjacency matrix of this molecule.

The matrix is a symmetric matrix with shape (n_atom, n_atom). The element (i, j) is True if there is a bond between atom i and atom j. Otherwise, it’s False. The diagonal elements are always False.

Returns:

matrix

Return type:

np.ndarray, shape=(n_atom, n_atom), dtype=bool

get_distance_matrix(max_bond=None)

Get the distance matrix of this molecule.

The matrix is a symmetric matrix with shape (n_atom, n_atom). The element (i, j) is the shortest path length between atom i and atom j. If max_bond is set and the shortest path length is larger than max_bond, then the element is set to 0. If there is no path between atom i and atom j, then the element is set to 0. The diagonal elements are always 0.

Parameters:

max_bond (int, optional) –

Returns:

matrix

Return type:

np.ndarray, shape=(n_atom, n_atom), dtype=int

property n_atom

Number of atoms belong to this molecule

Returns:

n

Return type:

int

property n_bond

Number of bonds belong to this molecule

Returns:

n

Return type:

int

property n_angle

Number of angles belong to this molecule

Returns:

n

Return type:

int

property n_dihedral

Number of dihedrals belong to this molecule

Returns:

n

Return type:

int

property n_improper

Number of impropers belong to this molecule

Returns:

n

Return type:

int

property n_residue
property atoms

List of atoms belong to this molecule

Returns:

atoms

Return type:

list of Atom

property bonds

List of bonds belong to this molecule

Returns:

bonds

Return type:

list of Bond

property angles

List of angles belong to this molecule

Returns:

angles

Return type:

list of Angle

property dihedrals

List of dihedrals belong to this molecule

Returns:

dihedrals

Return type:

list of Dihedral

property impropers

List of impropers belong to this molecule

Returns:

impropers

Return type:

list of Improper

property residues

All the residues in this molecule

Returns:

residues

Return type:

list of Residue

property has_position

Whether or not all the atoms in the molecule have positions

Returns:

has

Return type:

bool

property positions

Positions of all the atoms in this molecule

Returns:

positions

Return type:

array_like

set_positions(positions)

Set the positions of all atoms in this molecule

Parameters:

positions (array_like) –

get_drude_pairs()

Retrieve all the Drude dipole pairs belong to this molecule

Returns:

pairs – [(parent, drude)]

Return type:

list of tuple of Atom

get_virtual_site_pairs()

Retrieve all the virtual site pairs belong to this molecule

Returns:

pairs – [(parent, atom_virtual_site)]

Return type:

list of tuple of Atom

get_12_13_14_pairs()

Retrieve all the 1-2, 1-3 and 1-4 pairs based on the bond information.

The pairs only concerns real atoms. Drude particles will be ignored.

Returns:

  • pairs12 (list of tuple of Atom)

  • pairs13 (list of tuple of Atom)

  • pairs14 (list of tuple of Atom)

generate_angle_dihedral_improper(dihedral=True, improper=True)

Generate angle, dihedral and improper from bonds The existing angles, dihedrals and impropers will be removed first The atoms and bonds concerning Drude particles will be ignored

Parameters:
  • dihedral (bool) – Whether or not generate dihedrals based on bonds

  • improper (bool) – Whether or not generate impropers based on bonds

guess_bonds_from_ff(ff, max_bond_length=None, tolerance=0.025, pbc=None, cell=None)

Guess bonds from force field.

It requires that atoms types are defined and positions are available. The distance between nearby atoms will be calculated. If it’s smaller than max_bond_length, then it will be compared with the equilibrium length in FF. The bond will be added if a BondTerm is found in FF and the deviation is smaller than tolerance.

The bond list will be cleared and re-created.

PBC is supported for determining bonds across the periodic cell This is useful for simulating crystals and crosslinked polymers pbc can be ‘x’, ‘y’, ‘z’, ‘xy’, ‘xz’, ‘yz’, ‘xyz’, which means check bonds cross specific boundaries cell must be provided if pbc is not None

TODO Add support for triclinic cell

Parameters:
  • ff (ForceField) –

  • max_bond_length (float, optional) – If None, will use the maximum bond length in FF plus tolerance

  • tolerance (float) –

  • pbc (str, optional) –

  • cell (UnitCell) –

generate_drude_particles(ff, type_drude='DP_', seed=1, update_topology=True)

Generate Drude particles from DrudePolarTerms in force field.

The atom types should have been defined already. Drude particle will not be generated if DrudePolarTerm for its atom type can not be found in the FF. Note that The existing Drude particles will be removed before generating. The mass defined in the DrudePolarTerm will be transferred from parent atom to the Drude particle. The Drude charge will be calculated from the DrudePolarTerm and transferred from parent atom to the Drude particle. Bonds between parent-Drude will be generated and added to the topology. If AtomType and VdwTerm for generated Drude particles are not found in FF, these terms will be created and added to the FF.

Parameters:
  • ff (ForceField) –

  • type_drude (str) –

  • seed (int) –

  • update_topology (bool) –

remove_drude_particles(update_topology=True)

Remove all Drude particles and bonds belong to Drude particles

The charges and masses carried by Drude particles will be transferred back to parent atoms

Parameters:

update_topology (bool) –

generate_virtual_sites(ff, update_topology=True)

Generate virtual sites from VirtualSiteTerms in force field.

The atom types should have been defined already. Note that The existing virtual sites will be removed before generating. The charge won’t be assigned by this method. Therefore assign_charge_from_ff should be called to assign the charges on virtual sites.

Currently, only TIP4PSiteTerm has been implemented.

TODO Support other virtual site terms

Parameters:
remove_virtual_sites(update_topology=True)

Remove all virtual sites.

Parameters:

update_topology (bool) –

get_sub_molecule(indexes, deepcopy=True)

Extract a substructure from this molecule by indexes of atoms.

The substructure will not contain any bond, angle, dihedral and improper between atoms in substructure and remaining parts. Because of this, this method is inefficient. Avoid using this method when integrity check for connectivities is not necessary.

Residue information will be reconstructed.

Parameters:
  • indexes (list of int) – The atoms in the substructure will be in the same order as in indexes

  • deepcopy (bool) – If set to False, then the atoms and connections in the substructure will be the identical object as the atoms and connections in this molecule. The data structure in this molecule will be messed up, and should not be accessed later.

Returns:

substructure

Return type:

Molecule

static merge(molecules, deepcopy=True)

Merge several molecules into a single molecule.

Parameters:
  • molecules (list of Molecule) –

  • deepcopy (bool) – If True, the molecules will be deep-copied, and be intact after the mergence. Otherwise, the atoms of the merged molecule and of the original molecules will be the same objects. Then the original molecules will be unusable.

Returns:

merged

Return type:

Molecule

split(consecutive=False)

Split the molecule into smaller pieces based on bond network.

The atoms in each piece will preserve the original order. However, the atoms at the end of original molecule may end up in a piece in the beginning, causing the order of all atoms in all the pieces different from original order. To avoid this, set consecutive to True. In this case, it will make sure all atoms in front pieces will have atom id smaller than atoms in back pieces.

Residue information will be reconstructed for each piece.

Parameters:

consecutive (bool) –

Returns:

molecules

Return type:

list of Molecule

split_residues(consecutive=False)

Split the molecule into smaller pieces based on bond network between residues.

The atoms belonging to the same residue will always be in the same piece.

The residues in each piece will preserve the original order. However, the residues at the end of original molecule may end up in a piece in the beginning, causing the order of all residues in all the pieces different from original order. To avoid this, set consecutive to True. In this case, it will make sure all residues in front pieces will have atom id smaller than residues in back pieces.

Residue information will be reconstructed for each piece.

Parameters:

consecutive (bool) –

Returns:

molecules

Return type:

list of Molecule