Communicator

Communicator#

eON has a server client architecture for running its calculations. The simulation data is stored on the server and clients are sent jobs and return the results. Each time eON is run it first checks to see if any results have come back from clients and processes them accordingly and then submits more jobs if needed. In eON there are several different ways to run jobs. One can run them locally on the server, via MPI, or using a job queuing system such as SGE.

Note

As of version 2.0 onwards, we recommend using dedicated workflow management tools (like AiiDA or Snakemake or Fireworks) instead of using eON to generate submission scripts.

Configuration#

[Communicator]

pydantic model eon.schema.CommunicatorConfig[source]#

Show JSON schema

{
   "title": "CommunicatorConfig",
   "type": "object",
   "properties": {
      "type": {
         "default": "local",
         "description": "Communicator type",
         "enum": [
            "local",
            "cluster",
            "mpi"
         ],
         "title": "Type",
         "type": "string"
      },
      "jobs_per_bundle": {
         "default": 1,
         "description": "Number of jobs per bundle.",
         "title": "Jobs Per Bundle",
         "type": "integer"
      },
      "num_jobs": {
         "default": 1,
         "description": "Number of jobs.",
         "title": "Num Jobs",
         "type": "integer"
      },
      "max_jobs": {
         "default": 0,
         "description": "Maximum number of akmc jobs that can be running at once for the current state.",
         "title": "Max Jobs",
         "type": "integer"
      },
      "client_path": {
         "default": "eonclient",
         "description": "Path to the eon client binary.",
         "title": "Client Path",
         "type": "string"
      },
      "number_of_CPUs": {
         "default": 1,
         "description": "Number of jobs that will run simultaneously for the local communicator.",
         "title": "Number Of Cpus",
         "type": "integer"
      },
      "script_path": {
         "default": "./",
         "description": "Path to the user-defined scripts for submitting jobs to the communicator for the cluster communicator.",
         "title": "Script Path",
         "type": "string"
      },
      "name_prefix": {
         "default": "eon",
         "description": "Prefix added to job names to make them identifiable by the user for the cluster communicator.",
         "title": "Name Prefix",
         "type": "string"
      },
      "queued_jobs": {
         "default": "queued_jobs.sh",
         "description": "Name of the script that returns the job IDs of all the running and queued jobs for the cluster communicator.",
         "title": "Queued Jobs",
         "type": "string"
      },
      "cancel_job": {
         "default": "cancel_job.sh",
         "description": "Name of the script that cancels a job for the cluster communicator.",
         "title": "Cancel Job",
         "type": "string"
      },
      "submit_job": {
         "default": "submit_job.sh",
         "description": "Name of the script that submits a single job to the queuing system for the cluster communicator.",
         "title": "Submit Job",
         "type": "string"
      }
   }
}

Config:

use_attribute_docstrings: bool = True

Fields:

cancel_job (str)
client_path (str)
jobs_per_bundle (int)
max_jobs (int)
name_prefix (str)
num_jobs (int)
number_of_CPUs (int)
queued_jobs (str)
script_path (str)
submit_job (str)
type (Literal['local', 'cluster', 'mpi'])

field cancel_job: str = 'cancel_job.sh'#

Takes a single argument, the JobID.

Name of the script that cancels a job for the cluster communicator.

field client_path: str = 'eonclient'#

If only a name and not a path is given, eON looks for the binary in the same directory as the configuration file. If not found there, it searches through the directories in the $PATH environment variable.

Path to the eon client binary.

field jobs_per_bundle: int = 1#

In eON, a job is defined as a task that the eonclient executes, such as a process search or a parallel replica run. Sometimes it makes sense to run more than one of the same type of job at a time.

For example, when using empirical potentials to do saddle searches a single search might only take several seconds on modern CPUs. In order to improve performance more than one client job (e.g., process search, dimer, minimization) can be run at the same time.

Number of jobs per bundle.

field max_jobs: int = 0#

For communicators with queues (cluster), no more jobs will be queued if the number of jobs queued and in progress equals or exceeds this number. A default of 0 means unlimited.

Maximum number of akmc jobs that can be running at once for the current state.

field name_prefix: str = 'eon'#: Prefix added to job names to make them identifiable by the user for the cluster communicator.

field num_jobs: int = 1#

The meaning of this variable changes depending on the communicator type. For local, it is the number of jobs run every time the program is invoked. For cluster, it is the desired sum of the queued and running jobs.

Number of jobs.

field number_of_CPUs: int = 1#: Number of jobs that will run simultaneously for the local communicator.

field queued_jobs: str = 'queued_jobs.sh'#

This may return more than just eON jobs.

Name of the script that returns the job IDs of all the running and queued jobs for the cluster communicator.

field script_path: str = './'#: Path to the user-defined scripts for submitting jobs to the communicator for the cluster communicator.

field submit_job: str = 'submit_job.sh'#

It takes two command line arguments. The first is the name of the job. This is not required for eON use, but is highly recommended so that users can identify which job is which. The second argument is the working directory. This is the path where the eON client should be executed. All of the needed client files will be placed in this directory. The script must return the job id of the submitted job. This is how eON internally keeps track of jobs.

Name of the script that submits a single job to the queuing system for the cluster communicator.

field type: Literal['local', 'cluster', 'mpi'] = 'local'#

Options:

‘local’: The local communicator runs the calculations on the same computer that the server is run on.
‘cluster’: A job scheduler can be used to run jobs through user supplied shell scripts.
‘mpi’: Allows for the server and clients to run as a MPI job.

Communicator type

Examples#

An example communicator section using the local communicator with an Eon client binary named eonclient-custom that either exists in the $PATH or in the same directory as the configuration file and uses makes use of 8 CPUs.

[Communicator]
type = "local"
client_path = "eonclient-custom"
number_of_cpus = 8

Additional Topics#

Changed in version 2.0: Potentials which can be run in parallel, like those accessed through ASE (e.g. ORCA) are always run in parallel, for the others, there is little to no benefit for this additional overhead.

MPI#

Warning

Not tested on 2.0, only ever supported AKMC.

The MPI communicator allows for the server and client to be run as a MPI job. The number of clients that are run and thus the number of jobs is set at runtime by the MPI environment.

A MPI aware client must be compiled, which will be named eonclientmpi instead of eonclient. It can only be used to run MPI jobs.

To run EON with MPI, two environment variables must be set. The variable EON_NUMBER_OF_CLIENTS determines how many of the ranks should become clients and EON_SERVER_PATH is the path to the server Python script. In MPI mode the clients need to be started instead of the server and one of them will become the server process. Currently only AKMC is supported. Below is an example of running using the MPI communicator:

#!/bin/bash
export EON_NUMBER_OF_CLIENTS=7
export EON_SERVER_PATH=~/eon/akmc.py
mpirun -n 8 ~/eon/client/eonclientmpi

Cluster#

Warning

Not tested on 2.0

An example communicator section for the cluster communicator using the provided sge6.2 scripts and a name prefix of al_diffusion_:

[Communicator]
type = "cluster"
name_prefix = "al_diffusion_"
script_path = "/home/user/eon/tools/clusters/sge6.2"