experiment management and process schedule
sigclear-experiment aims to help researchers and engineers with their experiments.
An engineering experiment in sigclear-experiment is defined as
a set of experimental processes and a set of QC figures.
This package provides two highly summarized SCons commands,
Process and Figure for experiments.
It can automatically do the dependency analysis,
schedule the experimental processes and distribute them among clusters.
Features
sigclear-experiment only defines two SCons commands. They are highly summarized, easy to use, but full of power for for any complicated scientific or engineering experiments.Process
A process consists of some target datasets (targets), some source datasets (sources), and the programs to produce the targets using the sources. A Process has the following prototype:
Process(targets, sources, programs, options)
The options can be blank for most cases.
automatic dependency
sigclear-experiment will automatically generate the dependency relationship among datasets of targets and sources in all processes. The targets in a Process depends on both the sources and the programs including their parameters. Once a parameter in a Process changed, only the depending processes will be scheduled for rerun and their targets will be updated, other processes will not rerun at all.process schedule and distribution
Dependent processes will be automatically scheduled one-by-one, while independent experimental processes can be scheduled simultaneously. For clusters configuration, process with high computational costs can be automatically distributed onto multiple computing nodes to accelerate the whole experiment.Figure
QC figures are generated by the command Figure, defined as follows
Figure(targets, sources, programs, options)
The sources can be blank,
when it has the same trunk name as the targets.
Figure(targets, programs, options)
Figure is implemented as an alias of Process, but with the default values of some options different from Process.
list of options
option | Process | Figure | description |
---|---|---|---|
sprefix | data/ | data/ | sources prefix |
tprefix | data/ | data/ | targets prefix |
ssuffix | .sg | .sg | sources suffix |
tsuffix | .sg | .ps | targets suffix |
verb | True | False | verbose for this Process or Figure |
stdin | True | True | use the first source as stdin pipe |
stdout | True | True | output the first target to stdout |
nodes | None | NA | nodes dictionary for parallel processing |
ngroup | 0 | NA | total groups for parallel processing |
Get Started
design the first experiment
To start a new experiment, create a folder for the experiment, and then create a SCons script file named SConstruct in the folder as
- mkdir test
- cd test
- vi SConstruct
An example SConstruct file is as follows
from experiment import *
Process('test2', 'test1',
'sgcreate vectors=data2')
Process('test1', None,
'sgcreate nx=100')
To execute the experiment, type the command scons in the command line
- scons
define customized process
You are free to define customized functions for some experiment processes. It can help you to call them repeatedly in one experiment, and make your main experiment tidy. You can also share the customized functions among multiple experiments, or even distribute them to other cooperators.Learn more from an example of loading and pre-processing NASA Lunar spectrum measurements.
FAQ
How to execute experiment in parallel
To run an experiment with maximal 8 processes, one can use the following commands
- scons -j 8
sigclear-experiment will automatically choose how many processes
it uses based on the dependency of processes in each experiments.
How to use Multiple Inputs Multiple Outputs (MIMO) for a process
Both sources and targets in Process can be a single or a list of multiple datasets. The first dataset in the sources will be automatically passed to the standard input of the programs, while the standard output of the problems will be redirect to the first dataset in targets.how to change suffix and prefix for datasets
The default suffix for sources and targets in Process is .sg, and the defaults prefix is data/, which means the following process
Process("b", "a", "sgfieldmath head:i=1")
is equivalent to
Process("data/b.sg", "data/a.sg", "sgfieldmath head:i=1")
, and similar to the command in a shell terminal as
- sgfieldmath head:i=1 < data/a.sg > data/b.sg
The sources and targets prefix can be customized by the option sprefix
and tprefix respectively,
and the sources and targets suffix ca be customized by the option
ssuffix and tsuffix.
Process("b", "a", "sgfieldmath head:i=1", tprefix="tmp/")
is equivalent to
- sgfieldmath head:i=1 < data/a.sg >tmp/b.sg