Skip to main content

Storage at CARC

Storage at CARC 

CARC supports a number of different storage devices for users to read, write and store their data and results on.  The different devices vary in size, I/O speed, distance to the reading/writing process using them, and on whether they have quotas to limit their use and whether they are backed up or not. Certain storage devices, like the home directories and machine-scratch’s, are additionally shared resources, and serve multiple users simultaneously. This means that problems caused by poorly chosen storage locations could potentially interrupt the work of other users as well as your own. Don’t hesitate to open a help ticket if you have any questions about, for example, where to store a large dataset, how to read it quickly, or how to write a PBS script that first moves data onto the compute node in order to take advantage of the fast I/O, or any other issue related to storage at CARC.

Specifically, the four different types of storage are:

  • Home directory -/users/username- 100GB/user - Upon logging into any CARC machine, you will find yourself in your home directory, in /user/username, replacing “username” with your actual username (note, the home directory also goes by “~” and “$HOME”). You may notice that regardless of which machine you login to, the contents of your home directory are identical. This is because they are not part of any specific machine, but rather they are stored on separate computer entirely, and are then mounted by every head and compute node at CARC. They are backed up weekly and have quotas set so that no single user can store more than 100GB of data there.   

  • Machine-wide scratch disk  - ~/machine-scratch -> /machine/scratch/username - The most common place where data is stored after it is generated by running calculations, and before it is further analyses and then either downloaded, deleted, or some other long term data storage or archival plan is implemented. These are not backed up and CARC reserves the right to delete this data without advanced warning.

  • Hard drive (only on the compute nodes) - /tmp - ~1TBV - On the machines that support these, compute nodes have their own hard drives installed and they can be accessed simply by creating a directory in /tmp and then placing data there. Since the hard drive is dedicated to that compute node, this is one of the fastest places for I/O

  • Shared memory (only on the compute nodes) - /dev/shm - This is actually direct access to the machine's memory for use in storage. Initially limited to 1/2 of the total RAM on a node, the directory at /dev/shm appears to the user as a normal read/write accessible directory just like /tmp, but files written or read from any directory within are simply being stored into memory as if they were on a disk. This provides extremely fast I/O speeds, and is very useful if small temporary files are written and read often. Be careful with the amount of data you write here, however, because this competes directly with all processes using the compute nodes RAM, including yours. Like /tmp, /dev/shm is also cleared at the end of a PBS job, so you must move any data you want to keep off the compute node before the end of the calculation or walltime.  

Figure organizing various storage locations by size and speed

Figure 1. The various storage locations plotted by their relative size and I/O speed.

In order to determine which to use, it may be helpful to consider which of the following broad categories your data falls into:

 

  • Data that is hard to produce and long-lived - Eg. source code, scripts, documents, results. This data is either used to produce or is the product of other calculations and work

  • Results and temporary data - This is data that is produced by your calculations such as simulation logs and are typically further analyzed or have some data of interest extracted and summarized, perhaps even reported in a publication. This data can be regenerated relatively easily, by rerunning the calculation that produced it, and is often deleted once further processed or at most, at the end of a project.

 

We recommend that the first type of data be stored in your home directory, which is available from everywhere inside the CARC network, and is backed up each week. The second kind of data is best produced/stored on the machine-scratch drives, or in cases of very high I/O, can be moved to the compute node and onto either the hard drive, (assuming the machine in use has an internal hard drive) or onto shared memory, (assuming it is very small). An obvious benifit to using these devices that are very close to the CPU and are typically only used by a single user at a time, is the high read and write speed availible. The cost of this speed is the fact that the data must be first moved to the compute node if to be read in by the calculation, and then anything that is inteded to be saved moved back off the compute node before the end of walltime or before the job ends and the compute node is returned to the general resource pool.

 

An example PBS script that will first copy an input file to the compute node, and then copy all results of the calculation back into the directory where it was first taken is below:


#PBS -lnodes=1:ppn=8
#PBS -lwalltime=1:00:00
#PBS -N Local_storage

# First define a directory location as a variable, create the directory once the job has started, move data there, then cd to it and run  
SCRATCH=/tmp/$USER/$PBS_JOBID
mkdir -p “$SCRATCH”
cp -r ${PBS_O_WORKDIR}/large_input_data.dat “$SCRATCH” 
cd $SCRATCH

# now run my job
run_my_program  

# Now job has finished, so mv data back to where it came
cp -r  $SCRATCH/* $PBS_O_WORKDIR

#Finally clean up
rm -r $SCRATCH