Communicator#
eON
has a server client architecture for running its calculations. The
simulation data is stored on the server and clients are sent jobs and return the
results. Each time eON
is run it first checks to see if any results have come
back from clients and processes them accordingly and then submits more jobs if
needed. In eON
there are several different ways to run jobs. One can run them
locally on the server, via MPI, or using a job queuing system such as
SGE.
Note
As of version 2.0 onwards, we recommend using dedicated workflow management tools (like AiiDA or Snakemake or Fireworks) instead of using eON
to generate submission scripts.
Configuration#
[Communicator]
- pydantic model eon.schema.CommunicatorConfig[source]#
Show JSON schema
{ "title": "CommunicatorConfig", "type": "object", "properties": { "type": { "default": "local", "description": "Communicator type", "enum": [ "local", "cluster", "mpi" ], "title": "Type", "type": "string" }, "jobs_per_bundle": { "default": 1, "description": "Number of jobs per bundle.", "title": "Jobs Per Bundle", "type": "integer" }, "num_jobs": { "default": 1, "description": "Number of jobs.", "title": "Num Jobs", "type": "integer" }, "max_jobs": { "default": 0, "description": "Maximum number of akmc jobs that can be running at once for the current state.", "title": "Max Jobs", "type": "integer" }, "client_path": { "default": "eonclient", "description": "Path to the eon client binary.", "title": "Client Path", "type": "string" }, "number_of_CPUs": { "default": 1, "description": "Number of jobs that will run simultaneously for the local communicator.", "title": "Number Of Cpus", "type": "integer" }, "script_path": { "default": "./", "description": "Path to the user-defined scripts for submitting jobs to the communicator for the cluster communicator.", "title": "Script Path", "type": "string" }, "name_prefix": { "default": "eon", "description": "Prefix added to job names to make them identifiable by the user for the cluster communicator.", "title": "Name Prefix", "type": "string" }, "queued_jobs": { "default": "queued_jobs.sh", "description": "Name of the script that returns the job IDs of all the running and queued jobs for the cluster communicator.", "title": "Queued Jobs", "type": "string" }, "cancel_job": { "default": "cancel_job.sh", "description": "Name of the script that cancels a job for the cluster communicator.", "title": "Cancel Job", "type": "string" }, "submit_job": { "default": "submit_job.sh", "description": "Name of the script that submits a single job to the queuing system for the cluster communicator.", "title": "Submit Job", "type": "string" } } }
- Config:
use_attribute_docstrings: bool = True
- Fields:
- field cancel_job: str = 'cancel_job.sh'#
Takes a single argument, the JobID.
Name of the script that cancels a job for the cluster communicator.
- field client_path: str = 'eonclient'#
If only a name and not a path is given,
eON
looks for the binary in the same directory as the configuration file. If not found there, it searches through the directories in the$PATH
environment variable.Path to the eon client binary.
- field jobs_per_bundle: int = 1#
In
eON
, a job is defined as a task that theeonclient
executes, such as a process search or a parallel replica run. Sometimes it makes sense to run more than one of the same type of job at a time.For example, when using empirical potentials to do saddle searches a single search might only take several seconds on modern CPUs. In order to improve performance more than one client job (e.g., process search, dimer, minimization) can be run at the same time.
Number of jobs per bundle.
- field max_jobs: int = 0#
For communicators with queues (
cluster
), no more jobs will be queued if the number of jobs queued and in progress equals or exceeds this number. A default of 0 means unlimited.Maximum number of akmc jobs that can be running at once for the current state.
- field name_prefix: str = 'eon'#
Prefix added to job names to make them identifiable by the user for the cluster communicator.
- field num_jobs: int = 1#
The meaning of this variable changes depending on the communicator type. For
local
, it is the number of jobs run every time the program is invoked. Forcluster
, it is the desired sum of the queued and running jobs.Number of jobs.
- field number_of_CPUs: int = 1#
Number of jobs that will run simultaneously for the local communicator.
- field queued_jobs: str = 'queued_jobs.sh'#
This may return more than just
eON
jobs.Name of the script that returns the job IDs of all the running and queued jobs for the cluster communicator.
- field script_path: str = './'#
Path to the user-defined scripts for submitting jobs to the communicator for the cluster communicator.
- field submit_job: str = 'submit_job.sh'#
It takes two command line arguments. The first is the name of the job. This is not required for
eON
use, but is highly recommended so that users can identify which job is which. The second argument is the working directory. This is the path where theeON
client should be executed. All of the needed client files will be placed in this directory. The script must return the job id of the submitted job. This is howeON
internally keeps track of jobs.Name of the script that submits a single job to the queuing system for the cluster communicator.
- field type: Literal['local', 'cluster', 'mpi'] = 'local'#
- Options:
‘local’: The local communicator runs the calculations on the same computer that the server is run on.
‘cluster’: A job scheduler can be used to run jobs through user supplied shell scripts.
‘mpi’: Allows for the server and clients to run as a MPI job.
Communicator type
Examples#
An example communicator section using the local communicator with an Eon client
binary named eonclient-custom
that either exists in the $PATH
or in the same
directory as the configuration file and uses makes use of 8 CPUs.
[Communicator]
type = "local"
client_path = "eonclient-custom"
number_of_cpus = 8
Additional Topics#
Changed in version 2.0: Potentials which can be run in parallel, like those accessed through ASE (e.g. ORCA) are always run in parallel, for the others, there is little to no benefit for this additional overhead.
MPI#
Warning
Not tested on 2.0, only ever supported AKMC.
The MPI communicator allows for the server and client to be run as a MPI job. The number of clients that are run and thus the number of jobs is set at runtime by the MPI environment.
A MPI aware client must be compiled, which will be named eonclientmpi
instead of eonclient
. It can only be used to run MPI jobs.
To run EON with MPI, two environment variables must be set. The
variable EON_NUMBER_OF_CLIENTS
determines how many of the ranks
should become clients and EON_SERVER_PATH
is the path to the
server Python script. In MPI mode the clients need to be started
instead of the server and one of them will become the server process.
Currently only AKMC is supported. Below is an example of running using
the MPI communicator:
#!/bin/bash
export EON_NUMBER_OF_CLIENTS=7
export EON_SERVER_PATH=~/eon/akmc.py
mpirun -n 8 ~/eon/client/eonclientmpi
Cluster#
Warning
Not tested on 2.0
An example communicator section for the cluster communicator using the provided
sge6.2
scripts and a name prefix of al_diffusion_
:
[Communicator]
type = "cluster"
name_prefix = "al_diffusion_"
script_path = "/home/user/eon/tools/clusters/sge6.2"