From Documentation
Revision as of 10:47, 4 August 2010 by Isaac (Talk | contribs) (How to manage really large files)

Jump to: navigation, search

Introduction

At the introductory level, one should read this knowledge base entry to learn how to organize files at SHARCNET.

For users who have demanding storage needs we recommend speaking with a consultant to determine the best approach for your work.

How to manage really large files

Several SHARCNET systems use the SFS file system (a HP variant of an earlier version of Lustre). This means the storage presented as /work and /scratch is actually the amalgamation of the storage on several file servers (technically called an Object Store Targets (OST)). By default, files are not split across the file servers, but reside on one of them. This means

  • a file can only be as large as the remaining space on which ever server it is stored on, and
  • it is possible to run out of space despite there appearing to be lots of free space,

where the later occurs because the file server the file is on becomes full (why the systems says out of space) despite other servers still having space (why the systems says there is still space).

The long and short is that for effective handling of large files, the system needs to be told to split them across multiple file servers. This is done via the following command

lfs setstripe <file or directory> <size> -1 <servers>

where <file or directory> is the file to or directory to split, <size> is the size in bytes to allocate on each file server, and <servers> is the number of file servers to split across (note that is a minus one and not a -l).

For example, you may try in Saw

lfs setstripe /scratch/isaac/test/ 4m -1 8

this stripes the file across 8 bricks (or 8-way stripe). Please note that each server has a little different version of LFS. You can check this in detail as

[isaac@saw377 ~]$ lfs
lfs > help setstripe
setstripe: Create a new file with a specific striping pattern or
set the default striping pattern on an existing directory or
delete the default striping pattern from an existing directory
usage: setstripe <filename|dirname> <stripe_size> <stripe_index> <stripe_count>
      or 
      setstripe <filename|dirname> [--size|-s stripe_size]
                                   [--index|-i stripe_index]
                                   [--count|-c stripe_count]
      or 
      setstripe -d <dirname>   (to delete default striping)
      stripe_size:  Number of bytes on each OST (0 filesystem default)
      Can be specified with k, m or g (in KB, MB and GB respectively)
      stripe_index: OST index of first stripe (-1 filesystem default)
      stripe_count: Number of OSTs to stripe over (0 default, -1 all)


If you want to restripe a file on whale you still need to copy it. For example, to restripe "foo", first set the striping in the directory then do "mv foo foo-old && cp foo-old foo && rm foo-old". The 'cp' is necessary in order to create a new file and thus pick up the new strip settings.

Saw is the best server to practice this. For example, Saw currently has 93TB free, and all of the OSTs have at least 2T available, which means that the "minimum maximum" size of an 8-way striped file is 16TB. The maximum effective size of a file is the stripe-count times the lowest available space on any of its OSTs.

The following points apply

  • specifying a directory causes all newly created files under it to be split (most likely what you want),
  • a smaller size/more servers adds performance as multiple servers can be streaming data at once, but also reduces performance due to overhead (suggested size might be on the order of 1-4MB and servers on the order of 4-8), and
  • more servers increases the maximum file size by decreasing the amount stored on each file server, but also decreases the maximum file size by making it more likely a full server will be in set.
  • there is a disadvantage to striping too large. If any of the OSTs that the file is on goes down for some reason then any access to the file will hang. Access usually recovers once the problem is fixed.

Further SHARCNET-specific hints and tips

Some further SHARCNET information that can help to get you started:

Other resources

These two articles provide a good walk-through of using strace to identify and measure I/O activity: