Difference between revisions of "Code:SLURM"

Latest revision as of 14:38, 13 October 2014

User environment

Zeroth, set up your own system (for linux), by adding a host def for circe to your $HOME/.ssh/config file. Then you don't have to keep typing in circe's full path and your username when running ssh from linux.

Host rc
  User <username on circe>
  Hostname rcslurm.rc.usf.edu
  ServerAliveInterval 30
  ServerAliveCountMax 120
  ForwardX11 yes

You may need:

 mkdir -p $HOME/.ssh && chmod 700 $HOME/.ssh
 vi $HOME/.ssh/config # ':q' to exit
 chmod 600 $HOME/.ssh/config

The slurm job status command is squeue. A helpful alias to monitor your own jobs is,

 alias myq="squeue -u $USER"

Queue

Here's a basic template for queuing a job using MPI (here 2 whole nodes and 6h max run-time).

!/bin/bash
SBATCH -J test
SBATCH -N 2 -t 6:00:00

module load mpi/openmpi/1.4.5 compilers/intel/11.1.064

start=`date +%s` mpirun parallel-executable end=`date +%s` echo "Job completed in $((end-start)) seconds." </source> By default, slurm jobs start in the same directory that sbatch was invoked.

A few more useful options are:

#SBATCH -o output_log_name.log
#SBATCH --mem=2000

These will specify the name of the output log file instead of the default (names auto-generated from the job number, e.g. slurm-22425.out). By default, the log files include both standard output and standard error from the job.

Submit with <source lang="bash"> sbatch -p saturn job.sh </source>

The -p saturn selects the default queue and can be left out. Other possible values select different node partitions, and are:

"jupiter": 444 cores; preemptable by "deadline" QOS (currently inactive).
"saturn": 280 cores; default; preemptable by "deadline" QOS (currently inactive).
"neptune": 168 cores; preemptable by "deadline" QOS (currently inactive).
"hii_broad": 80 cores; testing "contributor" hardware pool; preemptable by "hii_broad" QOS (active).
"titan": 16 cores; no preemption; 128 GB RAM for large memory jobs.
"pluto": 8 cores; no preemption.

For execution and status, use squeue, or the myq command, defined above as an alias in .bashrc.

For more info, see LLNL's Slurm Quickstart Guide.

Here's a run-down on some of the environment variables available during running (for scripting) see sbatch's manpage for more:

SLURM_JOB_NAME - Name of the job.
SLURM_JOB_ID - The ID of the job allocation.
SLURM_CPUS_ON_NODE - Number of CPUS on the allocated node.
SLURM_JOB_NODELIST - List of nodes allocated to the job in a compressed format.
SLURM_JOB_NUM_NODES - Total number of nodes in the job’s resource allocation.
SLURM_JOB_CPUS_PER_NODE - Count of processors available to the job on this node.
SLURM_SUBMIT_DIR - The directory from which sbatch was invoked.
SLURM_JOB_PARTITION - Name of the partition in which the job is running.
SLURM_LOCALID - Node local task ID for the process within a job.
SLURM_GTIDS - Global task IDs running on this node. Zero origin and comma separated.

@@ Line 42: / Line 42: @@
  #SBATCH --mem=2000
-These will specify the name of the output log file instead of the default ()
+These will specify the name of the output log file instead of the default (names auto-generated from the job number, e.g. '''slurm-22425.out''').  By default, the log files include both standard output and standard error from the job.
 Submit with
@@ Line 49: / Line 49: @@
 </source>
-The '''-p saturn''' is the default and can be left out.  Other possible values select different node partitions, and are:
+The '''-p saturn''' selects the default queue and can be left out.  Other possible values select different node partitions, and are:
 * "jupiter": 444 cores; preemptable by "deadline" QOS (currently inactive).
-* "saturn": 280 cores; default; preemptable by "deadline" QOS
+* "saturn": 280 cores; default; preemptable by "deadline" QOS (currently inactive).
-(currently inactive).
 * "neptune": 168 cores; preemptable by "deadline" QOS (currently inactive).
-* "hii_broad": 80 cores; testing "contributor" hardware pool;
+* "hii_broad": 80 cores; testing "contributor" hardware pool; preemptable by "hii_broad" QOS (active).
-preemptable by "hii_broad" QOS (active).
 * "titan": 16 cores; no preemption; 128 GB RAM for large memory jobs.
 * "pluto": 8 cores; no preemption.

Difference between revisions of "Code:SLURM"

Latest revision as of 14:38, 13 October 2014

User environment

Queue

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools