Describing programs
Components are abstract objects. For an actual simulation to run, we need computer programs that implement the components of the simulation. As we’ve seen above, components refer to implementations, and those implementations can be either more models, or programs.
Where the model structure depends on the system being simulated, and the model configuration on the particular scenario that we want to simulate, the information in the programs section derives from the properties of the programs used in the simulation and how they are installed on the local machine.
A good way to organise this is to have a separate yMMSL file for each installed program, which could be adapted from a template describing the program’s fixed properties and adapted to the specific way its been compiled and installed.
Programs
Programs are defined in the programs section of a yMMSL file:
programs:
simplest:
executable: /home/user/models/my_model
python_script:
virtual_env: /home/user/envs/my_env
executable: python3
args: /home/user/models/my_model.py
with_env_and_args:
env:
LD_LIBRARY_PATH: /home/user/muscle3/lib
ENABLE_AWESOME_SCIENCE: 1
executable: /home/user/models/my_model
args:
- --some-lengthy-option
- --some-other-lengthy-option=some-lengthy-value
As you can see, there are quite a few different ways of describing an implementation, but all implementations have a name, which is the key in the dictionary, by which a component can refer to it.
The simplest implementation only has an executable. This could be a (probably
statically linked) executable, or a script that sets up an environment and starts the
model.
If your model or other component is a Python script, then you may want to load a virtual
environment before starting it, to make the dependencies available. This is done using
the virtual_env attribute. If the script does not have a #!/usr/bin/env python
line at the top (in which case you could set it as the executable) then you need to
start the Python interpreter directly, and pass the location of the script as an
argument.
Environment variables can be set through the env attribute, which contains a
dictionary mapping variable names to values, as shown for the with_env_and_args
example. This also shows that you can pass the arguments as a list, if that makes things
easier to read.
programs:
mpi_implementation:
executable: /home/user/models/my_model
execution_model: openmpi
on_hpc_cluster:
modules: GCC/14.1.0 OpenMPI/5.0.3
executable: /home/user/models/my_model
execution_model: openmpi
with_script:
script: |
#!/bin/bash
. /home/user/muscle3/bin/muscle3.env
export ENABLE_AWESOME_SCIENCE=1
/home/user/models/my_model -v -x
MPI programs are a bit special, as they need to be started via mpirun. However,
mpirun assumes that the program to start is going to use all of the available
resources. For a coupled simulation with multiple components, that is usually not what
you want. It is possible to tell mpirun to only use some of the resources, but of
course we don’t know which ones will be available while writing this file.
So, in yMMSL, you simply specify the path to the executable, and set the
execution_model attribute to openmpi, srunmpi or intelmpi depending on
the MPI implementation and HPC machine you’re using. When executing with MUSCLE3, the
MUSCLE Manager will then start the component on its designated subset of the resources
as required.
The on_hpc_cluster program demonstrates loading environment modules, as
commonly needed on HPC machines. They’re all in one line here, but if the modules have
long names, then like with the arguments you can make a list to keep things readable.
Finally, if you need to do something complicated, you can write an inline script to start the program. This currently only works for non-MPI programs however.
Programs are represented by ymmsl.v0_2.Program in Python.
Keeps state for next use
Implementations may indicate if they carry state between reuses. This is currently only used for Checkpoints, but might see further use in the future (e.g. for load balancers). There are three possible values an implementation may indicate.
- Necessary
This implementation remembers state between consecutive iterations of the reuse loop. That state is required for the proper execution of the implementation.
This is the default value when not specified.
Example: A micro model simulating an enclosed volume, where every reuse the boundary conditions are updated by the connected macro model. This micro model must keep track of the state inside the simulated volume between iterations of the reuse loop.
- No
This implementation has no state between consecutive iterations of the reuse loop.
Example: A data converter that receives on an
F_INITport, transforms the data and outputs it on anO_Fport. The transformation is only dependent on the information of theF_INITmessage.- Helpful
This implementation remembers state between consecutive iterations of the reuse loop. However, this state is not required for proper execution.
Example: A simulation of a fluid in a pipe with obstacles. The simulation converges much faster when starting from the solution of the previous iteration. However, the same solution can still be found when starting from scratch.