Sign up to receive notifications
There are times when there may be hardware or software failures, or
we need to take down the nodes or related services for updates and make
changes. To ensure you are aware of these changes, please sign up to
the mailing list at
https://mailman.ic.ac.uk/mailman/listinfo/ae-nektar-compute
This is only for service notifications, should be low traffic, and
you can unsubscribe at any time.
Compute Node Instructions
NOTE: Nodes are gradually being transitioned to use the Dept of Aeronautics infrastructure. This means:
- You will log in using your Imperial College credentials
- Your home directory is now different, so you will need to transfer files across
- Only group storage located on the departmental infrastructure is accessible
Please contact c.cantwell@imperial.ac.uk to be added to the access list for these nodes.
There are the following compute nodes:
- farringdon.ae.ic.ac.uk
- blackfriars.ae.ic.ac.uk (ElectroCardioMaths) (College login)
- clapham.ae.ic.ac.uk (with 2 x Intel Xeon Phi)
- kingscross.ae.ic.ac.uk
- charingcross.ae.ic.ac.uk
- londonbridge.ae.ic.ac.uk (AMD system)
- fenchurchstreet.ae.ic.ac.uk (Yongyun)
- paddington.ae.ic.ac.uk
Nodes are equipped with different numbers of processing cores. Use
lscpu to find out about the number of logical cores ("CPU(s)").
These will show up as 32/48/72/96 cores, but this refers to
hardware threads and two hardware threads will share an arithmetic unit
on the CPU. Therefore, for compute intensive tasks, 32 cores will not
provide twice the compute capacity of 16 cores. Please be conscious of
other users on the system and check if the node is already in use (using
top before running compute-intensive jobs.
Access
Access is via SSH or RDP from the campus network (i.e. from a
College computer or the VPN, if from outside). Allocation of
compute nodes to specific users is currently entirely informal.
SSH access
To login via SSH you must have an ae-nektar account. All users with
access to ae-nektar can also log into all compute nodes.
For example: ssh username@paddington.ae.ic.ac.uk
RDP access
For users who prefer (or need) a graphical environment, such a
desktop can be accessed using the standard remote desktop protocol (as
used by Windows Remote Desktop).
When logging out, please select the Logout option.
DO NOT select shutdown or restart, for obvious reasons!
If you close the remote desktop viewer on your local computer without
selecting Logout, your session will continue running. Connecting as
normal will reconnect you to your existing session.
Sessions which have been idle for 14 days or more will be
automatically terminated to free resources for other users.
Changing your password
Once you have logged into one of the compute nodes you can change your
password by running the passwd command from a terminal. You will
be prompted to enter your old password first and then supply a new
password. The new password must:
- Be non-trivially different from your old password
- Contain characters from at least three of the character groups:
lowercase letters, uppercase letters, numbers, symbols.
Uppercase letters at the beginning and numbers at the end do not
count!
- Be at least 8 characters long
Note:If you access storage from a Windows computer (specifically
for the hhecm / NEKTAR drive), you must also change your password for
that separately using the smbpasswd command.
Available Software
Some standard and non-standard software is provided as modules which
can be loaded by the user. For instance the Boost libraries and Matlab
are available as modules. Here are some useful commands:
- List available modules: module avail
- Load a module: module load module-name
- Unload a module: module unload module-name
- Show loaded modules: module list
Matlab
To run Matlab in the graphical environment:
- Open a terminal
- Load the Matlab module: module load matlab
- Run matlab: matlab
Storage
The following network directories are shared between all compute nodes:
- /home (2.0TiB)
- /storage/scratch (20TiB)
- /storage/hhecm (24TiB) - ECM group
- /storage/*-scratch
The storage for these directories is shared among all users so please
do not abuse it. If no space is left on /home, other users may
not be able to log in or use the systems. A quota system is in place
which limits usage to 50 GB. If exceeded for more than 7 days, no
further files can be written by the user to the filesystem.
Here are some useful commands:
- du -sh - check how much space the current directory occupies
- df -h - check how much space is left on shared and local storage
- quota -s - (ae-nektar.ae.ic.ac.uk only) check how much of your quota you are using
Each compute node also has its local storage /scratch,
which is much faster to access. This storage should be used
temporarily while running jobs. Please move files to
network-based /storage/* when jobs are complete.
Usage / Workflow
In an ideal use case, a user script would copy the necessary files
from /home or /storage/scratch to /scratch and
run the compute job from there. At the end, the result is copied back to
the appropriate network share.
Project directories exist for different groups and research projects
and are shared between the people in that group. This enables
better organisation of the data store and continuity within projects
between existing and new users.
Please use these project directories where applicable/appropriate.
Ask for advice if you think a project space would be appropriate for
your work.
Disconnecting while running tasks
For long-running tasks it is desirable to leave them running while
detatching the remote connection. There are three approaches to
achieving this:
- Use 'screen'. The screen command is a terminal
multiplexer and allows the user to switch between multiple
virtual terminals within their real terminal. You can also
detatch from the screen and reattach at a later point and tasks
running in the virtual terminals will continue running. After
running screen you can use a combination of
CTRL+a, <key> to create and manage additional
virtual terminals. Some common tasks are:
- CTRL-a c : Create a new virtual terminal
- CTRL-a <SPACE> : Cycle through virtual
terminals
- CTRL-a d : Detatch from screen session and leave
tasks in virtual terminals running.
To reattach to the screen session run 'screen -rx'.
- Use RDP access If the session is closed (by closing the
RDP window) without using the logout button, the session will
stay active. Reconnecting to the machine (from the same
computer) should restore the previous session. Tasks running in
that session will continue running while the RDP session is
disconnected. As noted above, sessions which have remained
disconnected for 14 days will be automatically terminated.
- Use the NOHUP signal. nohup is a command to ignore the
hangup signal. This can be used to avoid background jobs from
terminating when logging off from a SSH server.
Example on how to use: nohup IncNavierStokesSolver
Simulation.xml >& output.txt &
The file output.txt contains the output that would be seen from the
terminal if nohup wasn't used. To check the progress of the
simulation, one can open the output file with: 'vim output.txt'
and then the command 'G' to jump to the end of the file. Or one
can use the command 'ls -l' to check the last time at which the
file has been updated.
If you wish to terminate the nohup process, first you will need
the process ID number (PID) of the task. This can be found with
the command 'ps -ef' (note that if u are running a process in
parallel there will be multiple PIDs). The process can then be
killed with kill <PID number>
Questions / Problems
If you require additional software installed, contact Chris:
c.cantwell [at] imperial.ac.uk. For all other issues, please
contact your supervisor/line-manager in the first instance, who can
redirect you to the most appropriate person for your needs.