Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
public:usage:fuchs [2020/05/15 10:49] – [Your First Job Script] geier | public:usage:fuchs [2024/03/11 20:45] – [Storage] geier | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== FUCHS Cluster Usage ====== | ====== FUCHS Cluster Usage ====== | ||
+ | <note tip>The cluster was updated to a new operating system (AlmaLinux 9). Please help us to help you. In the beginning it is usally very stressful because we get bomarded with tickets, like my previous software ran but nothings works anymore. It can be nessecary to set up software e.g. Spack from the scratch. Also some old ssh-keys with the rsa cipher might not work anymore. Please provide us some time to rearrange our dokumentation. Please discuss problems first in your group, maybe team members already found a solution to your problem, before attacking us with tickets. As less tickets we get as faster we can proceed to provide a working cluster for everybody. Also see **common errors**.</ | ||
- | The [[..: | + | [[..: |
===== Login ===== | ===== Login ===== | ||
Line 8: | Line 9: | ||
< | < | ||
- | On Windows systems please use/install a Windows SSH client (e.g. PuTTY, | + | Am I connected to the right server? Please find our ssh-rsa fingerprint here: |
+ | |||
+ | ++++fuchs.hhlr-gu.de fingerprints| | ||
+ | |||
+ | <WRAP center round info 90%>The '' | ||
+ | **ECDSA | ||
+ | **ECDSA | ||
+ | **ED25519 SHA256: | ||
+ | **ED25519 MD5: | ||
+ | </ | ||
+ | |||
+ | ++++ | ||
+ | <note important> | ||
+ | You may receive a warning from your system that something with the security is wrong, "maybe somebody is evedropping" | ||
+ | |||
+ | ssh-keygen -R fuchs.hhlr-gu.de | ||
+ | |||
+ | and accept the new fuchs.hhlr-gu.de key. Above you will find our unique ECDSA and ED25519 fingerprints. Some programs tend to display the fingerprint in the SHA256 or MD5 format. Just click on // | ||
+ | |||
+ | Please check with '' | ||
+ | On Windows systems please use/install a Windows SSH client (e.g. PuTTY, | ||
After your [[first login]] you will get the message, that your password has expired and you have to change it. Please use the password provided by CSC at the prompt, choose a new one and retype it. You will be logged out automatically. Now you can login with your new password and work on the cluster. | After your [[first login]] you will get the message, that your password has expired and you have to change it. Please use the password provided by CSC at the prompt, choose a new one and retype it. You will be logged out automatically. Now you can login with your new password and work on the cluster. | ||
- | <note warning> | + | <note warning> |
+ | \\ | ||
'' | '' | ||
- | on the command line. On a login node, any process that exceeds the CPU-time limit (e.g. a long running test program or a long running rsync) will be killed automatically.</ | + | on the command line. On the login node, any process that exceeds the CPU-time limit (e.g. a long running test program or a long running rsync) will be killed automatically.</ |
===== Environment Modules ===== | ===== Environment Modules ===== | ||
+ | There are several versions of software packages installed on our systems. The same name for an executable (e.g. '' | ||
- | There are several versions of software packages installed on our systems. The same name for an executable (e.g. mpirun) and/or library file may be used by more than one package. The environment module system, with its '' | + | < |
- | + | ||
- | < | + | |
If you want to know more about module commands, the '' | If you want to know more about module commands, the '' | ||
+ | ===== Working with Intel oneAPI ===== | ||
+ | With the command ' | ||
+ | |||
+ | <note tip>To avoid erros please use versions numbers instead of latest.</ | ||
+ | |||
+ | < | ||
+ | module load intel/ | ||
+ | module avail | ||
+ | ... | ||
+ | # new modules available within the Intel oneAPI modulefile | ||
+ | ----------------------------------- / | ||
+ | advisor/ | ||
+ | advisor/ | ||
+ | ccl/ | ||
+ | ccl/ | ||
+ | compiler-rt/ | ||
+ | compiler-rt/ | ||
+ | compiler-rt32/ | ||
+ | compiler-rt32/ | ||
+ | compiler/ | ||
+ | compiler/ | ||
+ | compiler32/ | ||
+ | compiler32/ | ||
+ | dal/ | ||
+ | dal/ | ||
+ | debugger/ | ||
+ | debugger/ | ||
+ | |||
+ | Key: | ||
+ | loaded | ||
+ | </ | ||
===== Compiling Software ===== | ===== Compiling Software ===== | ||
- | You can compile your software on the login nodes (or on any other node, inside a job allocation). | + | You can compile your software on the login nodes (or on any other node, inside a job allocation). |
* GNU compilers | * GNU compilers | ||
Line 37: | Line 89: | ||
For the right compilation commands please consider: | For the right compilation commands please consider: | ||
- | < | + | < |
- | [[https://software.intel.com/en-us/ | + | [[https://www.intel.com/content/ |
</ | </ | ||
- | To compile | + | To build and manage software which is not available |
- | ===== Debugging ===== | + | |
- | + | ||
- | The [[http:// | + | |
- | + | ||
- | - Compile your code with your favored MPI using the debug option -g, e.g.< | + | |
- | mpicc -g -o mpi_prog mpi_prog.c</ | + | |
- | - Load the TotalView module by running< | + | |
- | module load debug/ | + | |
- | - Allocate the resources you need using salloc, e.g.< | + | |
- | salloc -n 4 --partition=test --time=00: | + | |
- | - Start a TotalView debugging session, e.g.< | + | |
- | totalview </ | + | |
- | - Choose Debug a parallel session | + | |
- | - Choose your executable (mpi_prog), Parallel System (e.g. Intel MPI CSC or openmpi-m), number of tasks and load the session | + | |
===== Storage ===== | ===== Storage ===== | ||
Line 72: | Line 109: | ||
{{ : | {{ : | ||
- | By default, the space in your home directory is limited to 10 GB and in your scratch directory to 5 TB and/or 800000 inodes (which corresponds to approximately 200000+ files). You can check your homedir and scratch usage by running the '' | + | By default, the space in your home directory is limited to 30 GB, and in your scratch directory to 5 TB and/or 800000 inodes (which corresponds to approximately 200000+ files). You can check your homedir and scratch usage by running the '' |
< | < | ||
- | While the data in your home directory is backed up nightly (please ask, if you want us to restore anything from there), there is no backup of your scratch directory.</ | + | While the data in your home directory is backed up nightly (please ask, if you want us to restore anything from there, see also [[http:// |
If you need local storage on the compute nodes, you have to add the '' | If you need local storage on the compute nodes, you have to add the '' | ||
<code bash> | <code bash> | ||
Line 110: | Line 147: | ||
* [[# | * [[# | ||
- | For every compute job you have to submit a job script (unless working interactively using '' | + | For every compute job you have to submit a job script (unless working interactively using [[# |
sbatch jobscript.sh | sbatch jobscript.sh | ||
- | on a login node. A SLURM job script is a shell script | + | on a login node. A SLURM job script is a shell script |
#SBATCH ... | #SBATCH ... | ||
Line 138: | Line 175: | ||
#SBATCH --nodes=3 | #SBATCH --nodes=3 | ||
#SBATCH --ntasks=60 | #SBATCH --ntasks=60 | ||
- | #SBATCH --cpus-per-task=1 | + | #SBATCH --cpus-per-task=1 |
#SBATCH --mem-per-cpu=512 | #SBATCH --mem-per-cpu=512 | ||
#SBATCH --time=00: | #SBATCH --time=00: | ||
- | #SBATCH --no-requeue | + | #SBATCH --no-requeue |
- | #SBATCH --mail-type=FAIL | + | #SBATCH --mail-type=FAIL |
- | #SBATCH –-extra-node-info=2: | + | #SBATCH –-extra-node-info=2: |
srun hostname | srun hostname | ||
Line 149: | Line 186: | ||
</ | </ | ||
- | 1) For SLURM, a CPU core (a CPU thread, to be more precise) is a CPU.\\ | + | < |
- | 2) Prevent the job from being requeued after a failure.\\ | + | |< |
- | 3) Send an e-mail if sth. goes wrong.\\ | + | |< |
- | 4) Run job without | + | |< |
+ | |< | ||
+ | </ | ||
The '' | The '' | ||
- | Although nodes are allocated exclusively, | + | Although nodes are allocated exclusively, |
After saving the above job script as e.g. '' | After saving the above job script as e.g. '' | ||
Line 166: | Line 205: | ||
==== Job Monitoring ==== | ==== Job Monitoring ==== | ||
- | For job monitoring (to check the current state of your jobs) you can use the '' | + | For job monitoring (to check the current state of your jobs) you can use the '' |
If you need to cancel a job, you can use the '' | If you need to cancel a job, you can use the '' | ||
Line 177: | Line 216: | ||
#SBATCH --partition=fuchs | #SBATCH --partition=fuchs | ||
</ | </ | ||
- | to the job script | + | to the job script, when they want to use the FUCHS cluster. |
Line 236: | Line 275: | ||
In this (SIMD) example we assume, that there is a program (called '' | In this (SIMD) example we assume, that there is a program (called '' | ||
- | |||
- | If the running times of your processes vary a lot, consider using the //thread pool pattern//. Have a look at the '' | ||
==== Job Arrays ==== | ==== Job Arrays ==== | ||
Line 252: | Line 289: | ||
#SBATCH --mem-per-cpu=2000 | #SBATCH --mem-per-cpu=2000 | ||
#SBATCH --time=00: | #SBATCH --time=00: | ||
- | #SBATCH --array=0-319:20 | + | #SBATCH --array=0-399:20 |
#SBATCH --mail-type=FAIL | #SBATCH --mail-type=FAIL | ||
Line 285: | Line 322: | ||
</ | </ | ||
- | If the task running times vary a lot, consider using the //thread pool pattern//. Have a look at the '' | + | If the task running times vary a lot, consider using the //thread pool pattern//. Have a look at **GNU parallel**, for instance. |
==== OpenMP Jobs ==== | ==== OpenMP Jobs ==== | ||
Line 299: | Line 336: | ||
#SBATCH --time=48: | #SBATCH --time=48: | ||
- | export OMP_NUM_THREADS=20 | + | export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK |
- | ./omp_program | + | ./your_omp_program |
</ | </ | ||
Line 306: | Line 343: | ||
==== MPI Jobs ==== | ==== MPI Jobs ==== | ||
- | **Remember: | + | **Remember: |
See also: http:// | See also: http:// | ||
- | As an example, we want to run a program that spawns 80 Open MPI ranks and where 1200 MB of RAM are allocated for each rank. | + | As an example, we want to run a program that spawns 80 MPI ranks and where 1200 MB of RAM are allocated for each rank. |
<code bash># | <code bash># | ||
Line 323: | Line 360: | ||
module load mpi/ | module load mpi/ | ||
export OMP_NUM_THREADS=1 | export OMP_NUM_THREADS=1 | ||
- | mpirun | + | mpirun ./your_mpi_program |
</ | </ | ||
- | ==== Combining Small MPI Jobs ==== | + | < |
- | As mentioned earlier, running small jobs while full nodes are allocated leads to a waste of resources. In cases where you have, let's say, a lot of 10-rank MPI jobs (with similar runtimes and low memory consumption), | + | srun [--mpi=pmix] ./your_mpi_program |
- | + | ||
- | <code bash># | + | </note> |
- | #SBATCH --partition=fuchs | + | |
- | #SBATCH --nodes=1 | + | |
- | #SBATCH --ntasks=20 | + | |
- | #SBATCH --cpus-per-task=1 | + | |
- | #SBATCH --mem-per-cpu=2000 | + | |
- | #SBATCH --time=48: | + | |
- | #SBATCH --mail-type=FAIL | + | |
- | + | ||
- | export OMP_NUM_THREADS=1 | + | |
- | mpirun -np 10 ./program input01 >& 01.out & | + | |
- | # Wait a little before starting the next one. | + | |
- | sleep 3 | + | |
- | mpirun -np 10 ./program input02 >& 02.out & | + | |
- | # Wait for all child processes to terminate. | + | |
- | wait | + | |
- | </code> | + | |
- | + | ||
- | You might also need to disable core binding (please see the '' | + | |
+ | MPI implementations are typically designed to work seamlessly with job schedulers like Slurm. When you launch MPI tasks with '' | ||
==== Hybrid Jobs: MPI/OpenMP ==== | ==== Hybrid Jobs: MPI/OpenMP ==== | ||
- | MVAPICH2 example script (20 ranks, 5 threads each and 200 MB per thread, i.e. 1.2 GB per rank; so, for 20*5 threads, you'll get five 20-core nodes): | + | MVAPICH2 example script (20 ranks, 5 threads each and 200 MB per thread, i.e. 1 GB per rank; so, for 20*5 threads, you'll get five 20-core nodes): |
<code bash># | <code bash># | ||
#SBATCH --partition=fuchs | #SBATCH --partition=fuchs | ||
- | #SBATCH --ntasks=40 | + | #SBATCH --ntasks=20 |
#SBATCH --cpus-per-task=5 | #SBATCH --cpus-per-task=5 | ||
#SBATCH --mem-per-cpu=200 | #SBATCH --mem-per-cpu=200 | ||
Line 378: | Line 398: | ||
which disables this feature. The OS scheduler is now responsible for the placement of the threads during the runtime of the program. But the OS scheduler can dynamically change the thread placement during the runtime of the program. This leads to cache invalidation, | which disables this feature. The OS scheduler is now responsible for the placement of the threads during the runtime of the program. But the OS scheduler can dynamically change the thread placement during the runtime of the program. This leads to cache invalidation, | ||
+ | ==== Memory Allocation ==== | ||
+ | |||
+ | Normally the memory available per CPU thread is calculated by the whole amount of RAM divided by the number of threads. For instance 128GB / 40 threads = 3.2GB per thread. Keep in mind that the FUCHS cluster provides two threads per core. Now imagine you need more memory, let's say 8192MB per task. Type '' | ||
+ | |||
+ | < | ||
+ | #!/bin/bash | ||
+ | #SBATCH --job-name=< | ||
+ | #SBATCH --partition=fuchs | ||
+ | #SBATCH --ntasks=69 | ||
+ | #SBATCH --cpus-per-task=1 | ||
+ | |||
+ | ##SBATCH --mem-per-cpu=8192 | ||
+ | # therefore it's commented out. | ||
+ | |||
+ | #SBATCH --mem=0 | ||
+ | #SBATCH --ntasks-per-node=15 | ||
+ | |||
+ | srun hostname | ||
+ | |||
+ | # If everythins works fine you were granted 5 nodes, for example 4 nodes à 14 tasks and 1 node à 13 tasks | ||
+ | # 56 tasks + 13 tasks = 69 tasks, as requested. | ||
+ | </ | ||
==== Local Storage ==== | ==== Local Storage ==== | ||
Line 386: | Line 428: | ||
For interactive workflows you can use SLURM' | For interactive workflows you can use SLURM' | ||
- | < | + | < |
- | salloc: Granted job allocation | + | salloc: Granted job allocation |
salloc: Waiting for resource configuration | salloc: Waiting for resource configuration | ||
- | salloc: Nodes node45-[002-005] are ready for job | + | salloc: Nodes node27-[012-015] are ready for job |
- | [user@loginnode | + | [user@fuchs ~]$ |
</ | </ | ||
Now you can '' | Now you can '' | ||
- | + | < | |
- | < | + | [user@node27-012 ~]$ hostname |
- | [user@loginnode | + | node27-012 |
- | [user@node45-002 ~]$ hostname | + | [user@node27-012 ~]$ exit |
- | node45-002.cm.cluster | + | logout |
- | [user@node45-002 ~]$ logout | + | Connection to node27-012 closed. |
- | Connection to node45-002 closed. | + | |
- | ... | + | |
- | [user@loginnode ~]$ ssh node45-003 | + | |
- | [user@node45-003 ~]$ hostname | + | |
- | node45-003.cm.cluster | + | |
- | [user@node45-003 ~]$ logout | + | |
- | Connection to node45-003 closed. | + | |
- | ... | + | |
- | [user@loginnode ~]$ ssh node45-005 | + | |
- | [user@node45-005 ~]$ hostname | + | |
- | node45-005.cm.cluster | + | |
- | [user@node45-005 ~]$ logout | + | |
- | Connection to node45-005 closed. | + | |
</ | </ | ||
Or you can use '' | Or you can use '' | ||
- | < | + | < |
- | node45-002.cm.cluster | + | node27-013 |
- | node45-003.cm.cluster | + | node27-012 |
- | node45-005.cm.cluster | + | node27-015 |
- | node45-004.cm.cluster | + | node27-014 |
- | [user@loginnode ~]$ | + | |
</ | </ | ||
Finally you can terminate your interactive job session by running '' | Finally you can terminate your interactive job session by running '' | ||
- | < | + | < |
- | salloc: Relinquishing job allocation | + | exit |
- | [user@loginnode ~]$ | + | salloc: Relinquishing job allocation |
</ | </ | ||