Topology#

In any molecular dynamics (MD) simulation package, a topology file (sometimes also called a structure or data file) is a critical component of the simulation. While the input script tells the program how to run the dynamics (ensembles, temperature control, run length, etc.), the topology tells it what system is being simulated. This includes a complete description of all particles: their identifiers, element or species type, masses, charges, and spatial coordinates, along with the connectivity information that defines which atoms are bonded and how angles and dihedrals are defined. Without such information, the force field cannot assign the correct interactions. In short, the topology is the “map” that defines the system at the atomic scale.

Constructing this file correctly is often the most tedious part of a simulation. In a system with more than 1,000 atoms, it is necessary to define thousands of bonds, angles, dihedrals, etc., which is not straightforward. Here, we will build the topology file using VMD (Visual Molecular Dynamics) starting from the atomic structure generated by the pyCSH code. VMD is a molecular visualization program for biomolecular systems but it is useful for preparing topology files for LAMMPS.

✒️ Both advanced and novice users will build the LAMMPS data file with the full topological information using VMD from the terminal with scripts.

Note

There are alternative ways to generate topology files, from automated tools such as LigParGen or Moltemplate to custom scripts using Python and ASE. Each method offers different levels of flexibility and complexity.


LAMMPS topology file#

Note

Users with experience with force fields that require topology (ClayFF, CSHFF, CHARMM, OPLS, etc.) can skip this section.

In LAMMPS, the topological information is usually stored in a data file, which is read using the read_data command. The structure of this file is modular and well defined:

  • A header section specifies the total and type numbers of atoms, bonds, angles, dihedrals, and impropers, as well as the dimensions of the simulation box.

  • A Masses section assigns masses to each atom type.

  • An Atoms section provides, for each atom, its ID, molecule ID (if relevant), atom type, charge, and coordinates (x, y, z). Depending on the chosen atom_style in the input file, additional fields may appear, such as velocities, dipoles, or extra properties.

  • Optional Bonds, Angles, Dihedrals, and Impropers sections describe bonded connectivity, listing the IDs of the participating atoms and their type.

As an example, the following snippet shows a LAMMPS data file for a water molecule:

LAMMPS data file for water model
3 atoms
2 bonds
1 angles

2 atom types
1 bond types
1 angle types

0.0 20.0 xlo xhi
0.0 20.0 ylo yhi
0.0 20.0 zlo zhi

Masses # atom type, atom mass

1 15.9994    # O_w (water oxygen)
2 1.008      # H_w (water hydrogen)

Atoms # atom-ID molecule-ID atom-type charge x y z

1 1 1 -0.820  10.0  9.0 10.0   # O_w (water oxygen)
2 1 2  0.410  10.0  9.9 10.7   # H_w (water hydrogen)
3 1 2  0.410  10.0  8.1 10.0   # H_w (water hydrogen)

Bonds # bond-ID bond-type atom1 atom2

1 1 1 2   # O_w – H_w (water bond)
2 1 1 3   # O_w – H_w (water bond)

Angles # angle-ID angle-type atom1 atom2 atom3

1 1 2 1 3  # H_w – O_w – H_w (water angle) 

Note

Atom styles and force fields The structure of a LAMMPS data file partly depends on the force field being used, as each force field requires a specific atom_style. For instance, classical force fields such as CHARMM, AMBER, or ClayFF use the full style. This format explicitly lists all topological terms (bonds, angles, dihedrals, and impropers) together with atom types, masses, charges, and coordinates. In contrast, ReaxFF does not rely on predefined connectivity and the data files should use the charge style, which only specifies atom IDs, types, charges, masses, and coordinates.

Regardless of the force field, the data file must be consistent with the atom_style defined in the input script, otherwise the simulation will fail to read it correctly. You can find a complete description of all available atom styles and their required fields in the LAMMPS documentation.


ClayFF atom types#

ClayFF was originally developed to describe clays and layered silicates, but has become common for modeling calcium silicate hydrate (C–S–H) and related phases. Its key advantage lies in its simplicity: ClayFF is essentially a non-bonded force field in which most atoms interact only through Lennard–Jones and Coulombic terms (see Table 1). The only exceptions are hydroxyl and water molecules, where explicit O–H bonds and H–O–H angles are included to preserve molecular geometry (see Table 2). The assigned charges are fractional (e.g., O ≈ –1.05 e and Si ≈ +2.1 e instead the formal charges -2 and +4) and were originally parameterized for clay minerals (see the Caution note).

ClayFF defines a limited set of atom types, each with specific Lennard–Jones parameters and partial charges. pyCSH writes a detailed label for each atom type, and you need to understand their correspondence:

Table 1. Nonbonded parameters for the ClayFF Force Field (reduced).

Species

ClayFF label

pyCSH label

charge (e)

ε (kcal/mol)

σ (Å)

SPC water hydrogen

h*

Hw

0.4100

0.0000

0.0000

SPC water oxygen

o*

Ow

-0.8200

0.1554

3.1655

Hydroxyl hydrogen

ho

H

0.4250

0.0000

0.0000

Hydroxyl oxygen

oh

Oh

-0.9500

0.1554

3.1655

Bridging oxygen

ob

Osi

-1.0500

0.1554

3.1655

Bridging oxygen with octahedral substitution

obos

Oca & O

-1.1808

0.1554

3.1655

Tetrahedral silicon

st

Si1 & Si2

2.1000

1.8405 × 10⁻⁶

3.3020

Octahedral calcium

cao

Ca1

1.3600

5.0298 × 10⁻⁶

5.5667

Hydroxide calcium

cah

Ca2

1.0500

5.0298 × 10⁻⁶

5.5667

Aqueous sodium ion

Na

Na

1.0000

0.1301

2.3500

Aqueous chloride ion

Cl

Cl

-1.0000

0.1001

4.4000

Caution

Clays are phyllosilicates, whereas tobermorite minerals and C–S–H are inosilicates, so applying ClayFF’s standard partial charges to these systems often produces a non-neutral configuration. Maintaining electroneutrality is crucial to obtain stable and physically meaningful simulations. It is essential to ensure that your system is neutral, i.e., that the sum of both formal and partial charges equals zero. For that, the partial charges can be slightly adjusted.

Important: the same issue applies to any force field that uses partial charges, such as CSHFF or OPLS, when “defects” are introduced.

To solve the charge problem, count the number of atoms of each element (M), multiply it by its partial charge (\(q_j\)), and sum over all the elements (N). The sum (Q) must be zero. If it is not, distribute the excess charge evenly among all oxygen atoms of the C-S-H (O, Osi, Oca):

\[ Q = \sum_{i=1}^{N} \sum_{j=1}^{M} q_j \qquad \qquad q_{new} = q_{old} - Q/N_{O} \]

This correction is undesirable, but acceptable as long as the charge is modified by a tiny percentage (typically less than 0.005%). Under these conditions, the effect on the Coulombic interactions is negligible, and the force field can be expected to behave consistently with the original parametrization.

Table 2. Bonded parameters for the ClayFF Force Field (reduced).

Bond / Angle

ClayFF label

pyCSH label

k (kcal/mol)

d (Å) / Angle (º)

Water bond

o*-h*

Ow-Hw

450.0

1.0

Hydroxyl bond

oh-ho

Oh-H

450.0

1.0

Water angle

h*-o*-h*

Hw-Ow-Hw

55.0

109.47


Generating the topology file in VMD#

In this section, we will learn how to create a LAMMPS data file from the structure generated by pyCSH, using VMD and its plugins TopoTools and PBCtools. Although it is possible to execute the commands directly from the Tk Console in VMD, in this tutorial we will use a Tcl script — a plain-text file containing the VMD commands necessary to generate the LAMMPS data file. Using this script has two key advantages:

  • it ensures the process is reproducible and less prone to manual errors.

  • it allows batch processing, i.e., generating data files for many structures in sequence reusing the Tcl script.

1. Building the Tcl script.

Open a text editor and create a new file, for instance xyz2data.tcl. This script will contain all the commands that VMD needs to read the structure of our model and export it to a format compatible with LAMMPS. Below is the general structure you should follow to obtain an appropriate LAMMPS data file:

1.1. Loading the structure. We should first include in the script a line to load our structure CSHmodel_filled.xyz. Using the command mol new, VMD opens the .xyz structure and reads the coordinates.

mol new CSHmodel_filled.xyz type xyz

1.2. Loading the plugins. To manipulate the molecular topology and define the simulation cell, VMD uses two plugins:

  • TopoTools provides a set of commands to manipulate molecular topologies and export them to formats compatible with LAMMPS.

  • PBCtools allows handling and visualizing periodic boundary conditions (PBC). To load both, we need to use the following commands:

package require topotools
package require pbctools

1.3. Defining the simulation box. Next, we have to set the periodic boundary conditions (PBC) for our system using the pbc setcommand. The first three numbers correspond to the box dimensions along x, y, and z (in Å), and the last three specify the angles between the box edges. Make sure these values match those of the actual model (check them in the .xyz file or in OVITO).

pbc set {20.0 20.0 20.0 90.0 90.0 90.0}
pbc box

1.4. Assigning masses, charges, and radii. By default, VMD does not assign atomic charges and must therefore be entered manually following the parameters defined in ClayFF (Table 1). However, VMD automatically assigns atomic masses and radii based on the detected chemical element. To do so, VMD reads the first one or two characters of the atom name to guess the element type, which can lead to misinterpretations when using custom atom labels. For example, an atom named Osi will be read as Os (osmium) instead of O (oxygen). For that reason, it is advisable to reassign the correct mass and radius values for each atom type (see the Warning below before setting the radii). To set the correct physical properties for each atom type use:

set Osi [atomselect top "name Osi"]
$Osi set radius XX
$Osi set charge XX
$Osi set mass XX

Warning

Avoiding unwanted bonds and angles in ClayFF systems. When using TopoTools to guess bonds and angles, it automatically generates all possible connections based on interatomic distances. However, in ClayFF only hydroxyl and water molecules should contain explicit bonds and angles. If you keep the default settings, VMD will create unnecessary bonds and angles such as Ca–O_w or Si–O_br–Si. To avoid this, you can assign a radius of zero to atoms that should not form bonds before recalculating connectivity (mol reanalyze top).

Be careful if atoms are too close to each other in your original structure. This may happen, for example, if packmol has to break the desired distance tolerance to meet all the geometrical requirements of your system. In such cases, VMD can detect unphysical bonds, and create new bonds, angles, or dihedrals that you would not expect otherwise. It is therefore recommended to check the VMD output visually and, if necessary, slightly adjust the radii for the hydrogen and oxygen atoms in water and hydroxyl groups. For example, reducing the radius to about 0.8 Å for Ow and 0.9 Å for Oh has been found to eliminate spurious angles and improve simulation stability.

1.5. Rebuilding the topology. After assigning all atomic properties, VMD can reconstruct the molecular connectivity (bonds and angles) using the following commands:

mol bondsrecalc top
topo retypebonds
topo guessangles

Once all this is done, reanalyze the molecule to ensure that the topology is fully consistent by typing:

mol reanalyze top

1.6. Exporting the LAMMPS data file. Finally, write the topology into a LAMMPS-compatible format using:

topo writelammpsdata atoms.data full
exit

This command creates the file atoms.data in the full style, which includes all atom, bond, and angle information required by LAMMPS. You can open it in any text editor to check its structure — you should see sections such as Masses, Atoms, Bonds, and Angles. This file is now ready to be combined with the force-field parameters in your LAMMPS input script.

2. Running VMD with a Tcl script.

Once your Tcl script (e.g. xyz2data.tcl) is ready, the next step is to run it directly from the terminal. To do this, we should call the VMD executable and use the option -dispdev text to execute the Tcl script specified after -e. Note that both the .xyz file and the .tcl script must be located in the same directory, and you should execute VMD from that directory.

vmd -dispdev text -e xyz2data.tcl

This starts VMD without opening the graphical interface, automatically loads the .xyz file, applies all the steps in the script, and writes the output file .data in the current working directory.

Tip

If your terminal does not recognise the vmd command, use the full path to the executable (e.g. on macOS: "/Applications/VMD 1.9.4a51.app/Contents/vmd/vmd_MACOSXX86_64" -dispdev text -e xyz2data.tcl and Windows C:\Program Files\University of Illinois\VMD2\).