TorchSim: A new PyTorch-based molecular dynamics engine
Table of Contents
- Introduction
- What is Molecular Dynamics (MD)?
- Machine-learned Interatomic Potentials
- TorchSim: Features and Examples
- Conclusions and Future Outlook
- Resources
Introduction
Simulating molecules and materials at the atomic scale is now a routine practice across many areas of research. These simulation techniques form one of the cornerstones of modern science, underpinning workflows in fields such as drug discovery, materials design, and catalyst optimization, among others.
Among these techniques, Molecular Dynamics (MD) has been rapidly adopted and extensively used by the scientific community since its first application by Alder in 1957.
In recent years, the rise of artificial intelligence has significantly reshaped the landscape of molecular simulations. Machine-Learned Interatomic Potentials (MLIPs) now stand at the forefront of this transformation, bridging the gap between accuracy and computational efficiency.
Before diving into what TorchSim is — and why it matters — let’s take a brief detour. Bear with me.
What is Molecular Dynamics (MD)?
In our daily life, we constantly move objects to perform tasks — by pushing or pulling, we apply a force in the direction we want the object to move.
In 1687, Newton showed through his equations that if we know an object’s initial position and velocity, as well as the force applied to it, we can predict exactly where the object will end up.
In molecular dynamics (MD), we apply the same principles to atoms. We solve Newton’s equations of motion for a set of N atoms with positions , velocities ), and masses ).
Positions and velocities can be initialized with random values without affecting the results significantly — if the simulation runs long enough, the system “forgets” its initial state as atoms move and their velocities change multiple times. Masses are usually known, either directly from the periodic table or computed in specific types of simulations (e.g., coarse-grained models).
The tricky part lies in the forces. What forces make the atoms move? A naïve approach would be to assign random values, but then all molecules would behave the same — clearly not what we observe in reality. There must be a physical explanation behind these forces, and indeed there is one: forces are the derivatives of the potential energy,
This gives us a way to compute the forces, but it introduces a new unknown — the potential energy U. Potential energy is arguably the most important part of MD. If we know U, we can find the forces; if we know the forces, we can run the simulation. So, what is potential energy, and how can we calculate it?
Potential energy represents the energy of interactions between atoms. There are two main approaches to compute it in molecular simulations:
Quantum mechanical forces
Force fields
Because molecules are quantum mechanical in nature, we could, in principle, use quantum mechanics directly to calculate the forces. In molecular dynamics, this approach is called ab initio MD. It is the most accurate and general method available, but also the most computationally expensive. As a result, it can only be used for relatively small systems and short simulation times.
Force fields, on the other hand, are much faster but less accurate and less transferable. This is the most common approach in classical MD, and over the years many force fields have been developed. A typical classical force field takes the form:
The bonded term includes bond stretching, angle bending, and dihedral torsion:
The non-bonded terms account for long- and short-range interactions: Coulomb potentials describe electrostatic interactions, while Lennard-Jones-type functions describe van der Waals interactions,
Once the functional form of the force field is chosen, it remains fixed, and the parameters are fitted to reproduce experimental data or quantum mechanical results. Each molecule has different parameters, so fitting must be done for every new system — a time-consuming process. To avoid this, force fields are typically parameterized for classes of molecules with similar behaviour (for example, the OPLS-AA force field is fitted to organic molecules).
Once the force field is defined, we can compute its derivative to obtain the forces, and then evolve positions and velocities over time. The resulting positions and velocities form trajectories, from which we can calculate physical observables such as diffusion coefficients or radial distribution functions.
👉 For a full MD primer, see the companion article: A primer on molecular dynamics.
Machine-learned Interatomic potentials
On one side, we have the ab initio approach — accurate and general, but slow. On the other, we have force field methods — less accurate and less general, but much faster. The real breakthrough would be a method that combines the accuracy and generality of quantum mechanics with the speed of classical force fields.
This is precisely what machine-learned force fields (MLFFs) set out to achieve. Over the past few years, numerous variants of machine-learned interatomic potentials (MLIPs) have been developed, but their goal is the same: if a machine learning model can be trained to reproduce quantum-mechanical energies and forces, we could in theory perform molecular dynamics simulations with quantum accuracy at force-field speed.
In practice, however, achieving this balance is far from trivial. There is no single “best” model — different architectures (neural networks, graph networks, Gaussian processes, etc.) are trained on different datasets and optimized for different chemical systems. Training such models requires large, high-quality datasets of atomic configurations with corresponding quantum energies and forces, and ensuring transferability beyond the training set remains an open challenge.
From an applied perspective, one might think it’s now as simple as choosing between ab initio MD, classical MD, or ML-based MD depending on the system of interest. In reality, the situation is more complex. Traditional MD engines are typically written in C++ or Fortran, optimized for numerical performance, while modern ML frameworks are built in Python, optimized for flexibility and automatic differentiation. Bridging these two worlds — physics-based simulation and data-driven modeling — is not straightforward. Developing or testing new hybrid ML+physics potentials often means modifying low-level C++ code, recompiling, and integrating it with Python — a process that significantly slows down experimentation and prototyping.
ML-based potentials are usually implemented and trained in frameworks such as PyTorch or TensorFlow, which rely heavily on GPU acceleration. However, most established atomistic simulation packages (e.g., LAMMPS, ASE, HOOMD-blue) were not originally designed for GPU-accelerated inference or large-scale batched simulations, meaning they can’t fully exploit modern hardware capabilities.
New MD engines have also emerged within ML frameworks themselves — for instance, JAX MD and TorchMD — enabling GPU-accelerated simulations, differentiable dynamics, and flexible potential definitions. While these packages make it easier to prototype new ML-based or hybrid potentials, they often lack tight integration with widely used materials informatics tools and state-of-the-art MLIP models.
| Feature | OpenMM | LAMMPS | TorchMD | ASE | JaxMD | TorchSim |
|---|---|---|---|---|---|---|
| Batching | ✗ | ✗ | ⊚ | ✗ | ⊚ | ✓ |
| Diverse MLIPs | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
| Differentiable | ✗ | ✗ | ✓ | ✗ | ✓ | ✓ |
| Pure Python | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ |
| GPU Dynamics | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ |
| Multi GPU | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
Finally, computational chemists and materials scientists often need to design workflows or pipelines that combine multiple software tools — from quantum chemistry codes to ML frameworks to MD engines. Building these workflows efficiently remains one of the main practical challenges in applying ML-based molecular simulations at scale.
This is where TorchSim truly shines: it functions as a native PyTorch molecular dynamics engine. Designed for small- to medium-scale atomistic systems, it supports multiple MLIPs and offers an intuitive interface to add new ones. It runs efficiently on GPUs and integrates seamlessly with the broader computational chemistry ecosystem. A key advantage is that, being built on PyTorch, it allows for easy retraining and fine-tuning of parameters through automatic differentiation (autograd).
TorchSim: Features and Examples
At its core, TorchSim is modular and minimal. Its main components are:
State: Holds positions, velocities, masses, and boundary conditions.
Models: Interface to MLIPs or other potential energy calculators.
Runners: Execute simulation schemes for molecular dynamics or optimization.
Trajectory: Tracks velocities, positions, and any property of interest.
AutoBatching: Enables automatic batching of multiple systems in parallel.
All components are built on PyTorch tensors.
The best way to understand how TorchSim works is through examples. There are many possibilities, but the minimum steps required to run a simulation are:
Generate a system using ASE or Pymatgen.
Choose a model.
Run the simulation.
Static Calculation of bulk FCC Copper
from ase.build import bulk
from torch_sim.models.lennard_jones import LennardJonesModel
from torch_sim.runners import static
# Create a Copper FCC structure using ASE
cu_atoms = bulk("Cu", "fcc", a=3.58, cubic=True)
# Create a Lennard-Jones model with parameters suitable for Cu
# Taken from (J. Phys. Chem. C, Vol. 112, No. 44, 2008)
lj_model = LennardJonesModel(
sigma=2.616, # Å, typical for Cu-Cu interaction
epsilon=0.205, # eV, typical for Cu-Cu interaction
)
# Run a static calculation
final_state = static(
system=cu_atoms,
model=lj_model,
)
This is a very basic static calculation for bulk silicon. The available runners are:
Static – as shown above, performs a single static potential energy calculation.
Optimize – relaxes atomic positions or the simulation cell until forces and stresses fall below a threshold (geometry optimization). This is similar to training an ML model, in both cases you also have to choose an optimizer (e.g. gradient descent).
Integrate – performs molecular dynamics simulations, which require an integrator specifying the algorithm to solve Newton’s equations of motion.
FCC Copper geometry optimisation
from torch_sim.optimizers import Optimizer
import torch
from ase.build import bulk
from torch_sim.models.lennard_jones import LennardJonesModel
from torch_sim.runners import optimize, generate_force_convergence_fn
from torch_sim.optimizers.cell_filters import CellFilter
# Create a Cu FCC structure using ASE
cu_atoms = bulk("Cu", "fcc", a=3.58, cubic=True)
# Create a Lennard-Jones model with parameters suitable for Cu
lj_model = LennardJonesModel(
sigma=2.616, # Å, typical for Cu-Cu interaction
epsilon=0.205, # eV, typical for Cu-Cu interaction
compute_stress=True, # Need to compute stress for cell optimisation
device=torch.device("cpu"), # Use CPU for computation
dtype=torch.float64, # Use Float64
)
# Define a convergence function based on force tolerance
force_convergence_fn = generate_force_convergence_fn(force_tol=1e-3)
# Run the optimisation
final_state = optimize(
system=cu_atoms,
model=lj_model,
optimizer=Optimizer.fire, # Specify which optimiser to use
convergence_fn=force_convergence_fn,
trajectory_reporter={"filenames": "lj_trajectory.h5"}, # Save trajectory file
max_steps = 1000, # Maximum number of optimisation steps
init_kwargs=dict(cell_filter=CellFilter.unit) # this will optimise the cell
)
This example is slightly more advanced and demonstrates additional TorchSim functionality. Most parts are self-explanatory, except perhaps init_kwargs: this argument allows you to modify the behavior of the optimizer by passing custom attributes. By specifying unit, we enable optimization of both atomic positions and cell parameters.
Molecular dynamics of ethylmethylethers with Mace-OFF MLIP
from mace.calculators.foundations_models import mace_off
from torch_sim.runners import integrate
import torch
from ase.build import molecule
from torch_sim.models.mace import MaceModel
from torch_sim.trajectory import TrajectoryReporter
from torch_sim.integrators import Integrator
from torch_sim.quantities import calc_kinetic_energy
device = "cpu"
dtype = torch.float64
ethylmethylether = molecule("ch3ch2och3",vacuum=5) #Create box padded by 5 Angstrom of vacuum each side
ethylmethylether = ethylmethylether.repeat([3,3,3]) #Replicate the molecule
ethylmethylether.pbc = True #Activate periodic boundary condition
mace = mace_off(model="large",device=device,dtype=dtype,return_raw_model=True)
mace_model = MaceModel(model=mace, device=device)
# Trajectory file
trajectory_file = "ethylmethylether_trajectory.h5"
# Define property calculators to track energies
# - Calculate potential energy every 10 steps
# - Calculate kinetic energy every 20 steps
prop_calculators = {
1: {"potential_energy": lambda state: state.energy},
1: {
"kinetic_energy": lambda state: calc_kinetic_energy(
momenta=state.momenta, masses=state.masses)},}
# Create a reporter that saves the state every 10 steps
reporter = TrajectoryReporter(
trajectory_file,
state_frequency=1, # Save the state every 10 steps
prop_calculators=prop_calculators,
)
n_steps = 1000
final_state = integrate(
system=ethylmethylether,
model=mace_model,
integrator=Integrator.nvt_langevin, # NVT Langevin integrator
n_steps=n_steps, # Number of steps
temperature=300, # Temperature in kelvin
timestep=0.001, # Timestep in picoseconds
trajectory_reporter=reporter, # Add the reporter
)
This is a slightly more complex example, but the basic workflow remains the same as in the static and optimization runs, with additional parameters such as the timestep, temperature, and integrator.
The key feature demonstrated here is the trajectory reporter, which can be used in both optimization and MD runs to track and store any property of interest. We first define the output filename, then specify a dictionary describing which properties to compute and how often to record them. You can track built-in TorchSim properties, such as potential and kinetic energy, or define your own by writing a simple function.
Now that we have saved the trajectory with all the relevant information — including the properties we calculated — how can we analyze it? TorchSim includes a trajectory reader, which allows you to read, analyze, and export trajectory data in different formats.
from torch_sim.trajectory import TrajectoryReporter, TorchSimTrajectory
# Reading the file with a TorchSimTrajectory
with TorchSimTrajectory("ethylmethylether_trajectory.h5", mode="r") as traj:
# Get raw arrays
positions = traj.get_array("positions")
steps = traj.get_steps("positions")
potential = traj.get_array("potential_energy")
kinetic = traj.get_array("kinetic_energy")
# Get a SimState object from the first cell
state = traj.get_state(0)
# Get ase atoms from the second cell
atoms = traj.get_atoms(0)
# write ase trajectory
traj.write_ase_trajectory("ethylmethylether.traj")
Batching and AutoBatching
One of the most powerful features of TorchSim (and my personal favourite) is its ability to simulate multiple systems in parallel. As a computational chemist you run calculations on computer clusters and being able to effectively use the hardware given is extremely important, hence why I find this feature to be the most important one in TorchSim. This functionality is particularly useful when working with machine learning interatomic potentials (MLIPs), which benefit from GPU acceleration. TorchSim makes it easy to run several systems at once, leveraging the full potential of modern hardware.
You can simulate all these systems in a single call to either integrate or optimize.
from mace.calculators.foundations_models import mace_mp
from torch_sim.runners import integrate
import torch
from ase.build import bulk
from torch_sim.models.mace import MaceModel
from torch_sim.trajectory import TrajectoryReporter
from torch_sim.integrators import Integrator
cu_atoms = bulk("Cu", "fcc", a=3.58, cubic=True)
fe_atoms = bulk("Fe", "fcc", a=5.26, cubic=True)
fe_atoms_supercell = fe_atoms.repeat([2, 2, 2])
cu_atoms_supercell = cu_atoms.repeat([2, 2, 2])
device = torch.device("cpu")
# Load the MACE "small" foundation model
mace = mace_mp(model="small", return_raw_model=True)
mace_model = MaceModel(
model=mace,
device=device,
dtype=torch.float64,
compute_forces=True,
)
# Pack them into a list
systems = [cu_atoms, fe_atoms, cu_atoms_supercell, fe_atoms_supercell]
n_steps = 1000
final_state = integrate(
system=systems, # List of systems to simulate
model=mace_model, # Single model for all systems
integrator=Integrator.nvt_langevin,
n_steps=n_steps,
temperature=2000,
timestep=0.002,
)
filenames = [f"batch_traj_{i}.h5" for i in range(len(systems))]
# Create a reporter that handles multiple trajectories
batch_reporter = TrajectoryReporter(
filenames,
state_frequency=10,
)
# Run the simulation with batch reporting
final_state = integrate(
system=systems,
model=mace_model,
integrator=Integrator.nvt_langevin,
n_steps=n_steps,
temperature=2000,
timestep=0.002,
trajectory_reporter=batch_reporter,
)
The integrate function also supports AutoBatching, which automatically determines the maximum number of systems that can fit into GPU memory.
It dynamically splits the systems to make optimal use of hardware resources, removing the need for manual memory management when running large batches.
You can enable AutoBatching simply by setting the autobatcher argument to True:
final_state = ts.integrate(
system=systems,
model=mace_model,
integrator=ts.Integrator.nvt_langevin,
n_steps=n_steps,
temperature=2000,
timestep=0.002,
autobatcher=True,
)
Conclusions and Future Outlook
We’ve finally reached the end of this post — but this is really just the beginning. There’s still so much more to explore, so dive in and run your own systems now!
TorchSim is built to bridge the gap between traditional molecular dynamics codes and the flexibility of modern machine learning frameworks. Whether you’re experimenting with new force fields, fine-tuning ML potentials, or building hybrid workflows, TorchSim makes it easier to prototype, train, and simulate — all within one coherent framework.
And of course, TorchSim is a community-driven project. Its evolution depends on curiosity, creativity, and collaboration. So if you’d like to contribute — whether through new features, model implementations, bug reports, or tutorials — your input is more than welcome!
👉 Join the discussion or contribute on GitHub Issues.
Resources
Selected papers
- Cohen, O. et al., TorchSim: An efficient atomistic simulation engine in PyTorch, Machine Learning: Science and Technology, 2025.
- Duignan, T. T. et al., The Potential of Neural Network Potentials, ACS Physical Chemistry Au, 2024, 4(3), 232–241.
- Jacobs, R. et al., A Practical Guide to Machine Learning Interatomic Potentials — Status and Future, Current Opinion in Solid State and Materials Science, 2025, 35, 101214.
- Zhang, Y.-W. et al., Roadmap for the Development of Machine Learning-Based Interatomic Potentials, Modelling and Simulation in Materials Science and Engineering, 2025, 33, 023301.
- Unke, O. T. et al., Machine Learning Force Fields, Chemical Reviews, 2021, 121, 10142–10186.
- Botu, V. et al., Machine Learning Force Fields: Construction, Validation, and Outlook, The Journal of Physical Chemistry C, 2017, 121(1), 511–522.
Recommended books for MD
- Frenkel, D. & Smit, B., Understanding Molecular Simulations — a cornerstone text covering algorithms and statistical foundations.
- Allen, M. P. & Tildesley, D. J., Computer Simulations of Liquids — classic introduction to simulation methods and analysis.
- Marx, D. & Hutter, J., Ab-initio Molecular Dynamics — an essential guide to quantum-based molecular simulations.
- Tuckerman, M., Statistical Mechanics: From Theory to Molecular Simulation — a rigorous treatment connecting theoretical mechanics to simulation practice.