At the introductory level, one should read this knowledge base entry to learn how to organize files at SHARCNET.
For users who have demanding storage needs we recommend speaking with a consultant to determine the best approach for your work.
How to manage really large files
Several SHARCNET systems use the SFS file system (a HP variant of an earlier version of Lustre). This means the storage presented as /work and /scratch is actually the amalgamation of the storage on several file servers (technically called an object store). By default, files are not split across the file servers, but reside on one of them. This means
- a file can only be as large as the remaining space on which ever server it is stored on, and
- it is possible to run out of space despite there appearing to be lots of free space,
where the later occurs because the file server the file is on becomes full (why the systems says out of space) despite other servers still having space (why the systems says there is still space).
The long and short is that for effective handling of large files, the system needs to be told to split them across multiple file servers. This is done via the following command
lfs setstripe <file or directory> <size> -1 <servers>
where <file or directory> is the file to or directory to split, <size> is the size in bytes to allocate on each file server, and <servers> is the number of file servers to split across (note that is a minus one and not a -l). The following points apply
- specifying a directory causes all newly created files under it to be split (most likely what you want),
- a smaller size/more servers adds performance as multiple servers can be streaming data at once, but also reduces performance due to overhead (suggested size might be on the order of 1-4MB and servers on the order of 4-8), and
- more servers increases the maximum file size by decreasing the amount stored on each file server, but also decreases the maximum file size by making it more likely a full server will be in set.
Further SHARCNET-specific hints and tips
Some further SHARCNET information that can help to get you started:
These two articles provide a good walk-through of using strace to identify and measure I/O activity: