Important data stored on Minerva can be protected by archiving the data in the IBM Tivoli Storage Management (TSM) system (renamed as Spectrum Protect after v8.1.7) installed on a Minerva server.
The TSM system will create two long-term tape copies of your data. One copy will be stored in the IBM tape library that is part of the Minerva complex and be available for rapid recall; the second copy will be stored off-site in a secure data vault for disaster protection. Data on both copies will be encrypted to deter unauthorized access.
Data retention policy
The archived data will have a retention time of 6 years and will then be deleted, please check the expiration date of your archived files. This is the responsibility of the user!
How to access TSM
TSM client or Spectrum Protect client v8.1.7 is installed on all internal login nodes, i.e., minerva13, minverva14 and on the data nodes. Users can issue archive commands, dsmc or dsmj, from either of the internal login nodes.
TSM cannot be accessed from external login nodes
Trying to use one of these commands in the external login node will result in a “Command not found” response”
Data that is archived is grouped in the TSM system by nodes. A node is an abstraction and can be physically many things. On Minerva, each user is considered a node to the TSM system and the node identity for each user is the userid.
The TSM system can be accessed via either a GUI or the command line. The command line mode is particularly useful when archiving large datasets in that it can be issued using the screen command. The screen can then be detached and the command can run unattended for the hours it may take to archive the data.
Tar small files before archive.
Because all the files that are archived are entered into a database, to prevent overflowing this database we ask that you first use the tar command to create a unix tar archive of bundles of small files and then archive the tar file to TSM. For information about tar see the man pages ( man tar ) on Minerva or check out this link.
Command Line and Screen is recommended for large data archiving or retrieval.
It is not recommended to use the GUI for large data archiving or retrieval because you would have to keep the interactive session open until all of the data are archived. Instead, start a “screen” session and issue the line command to perform the archiving. You can then detach the screen session and the command will continue executing.
Long retrieval time is expected.
Due to the large amount of archived data and number of tapes, most of the tapes are sitting on the cabinet instead of the TSM library. Our operators get email notifications when you issue a retrieve request, and they will fetch the desired tape and load it into the library. This process is manual and the responding time for the operators is one and half hours. This is the time when the process is showing ” [ -]” but without progressing.
Once the tape is loaded into the TSM library, the library will automatically mount the tape and read its data. This data transfer time is reasonably fast.
Note that tape check-in errors may also occur when there are simultaneous retrieval requests. When you get error such as “data is unavailable”, please send in a ticket and we are happy to resolve it for you.
Warning: If one specifies that files should be deleted automatically after archive and then subsequently deletes the archive object the data will be permanently lost.
Frequently Asked Questions
For a more extensive discussion of using TSM, see IBM Spectrum Protect Manual
How to use TSM Backup
You can setup a cron job on one of the login nodes to run backup. The main command for the cron would be:
“incr” which stands for “incremental backup”
dsmc incr -se=yournode /sc/orga/projects/PVG/ -sub=yes
To query backup
dsmc q backup /path/to/file
To restore a file
dsmc restore /path/to/file /different-restore-path-if-needed
The retention policy for backup
You can set up a cron to connect to the backup node, it will validate the files between the tape and on orga, and will keep certain copies of the updated files:
Policy for backup is as follows:
If a file exists on Minerva, 14 versions will be retained.
If a file is deleted from Minerva, 2 versions will be retained.
If more than one version is on backup the older versions will be deleted after 30 days.
If only one version remains on backup it will be kept for 90 days and then deleted.
For an explanation of these numbers please see table 2 here
Basically, it will keep a most recent version of files, and 14 older versions for 90 days.
It also keeps 2 versions of deleted files for 30 days if you deleted the original files on Minerva. The next time you make a connection to TSM, the corresponding files on tape will also be deleted when it is 30 days old.
To archive or backup?
If you have files constantly changing (i,e, some projects you are recently developing) you can choose to backup. If you finished your project and don’t need it for a long time, but would like to keep it for a while, you can use archive and free up the disk space.
When starting a project, the typical use is to initially archive all of the raw data. Then backup daily all of your work files for this project. Once the project is completed, archive all required data and delete from disk.
Compare to archive
You can safely remove your files from Minerva after you have archived on the TSM. The data stays in tape for 6 years. For backup, if you delete your original files, the backup data will be removed after 30 days.
How to access other users’ archived files
What to do if you receive Permission Denied
During retrieval you may encounter the following error message: ANS1590W I/O error writing file attribute: security.selinux for: /dir-to-file/file. errno = 13, Permission denied.
This error is due to TSM failed to write some extended security information for your file. We have that feature turned off. Same error message may also appear from untarring tarballs that you imported from another installation. Despite this error message that TSM throws, the retrieval will continue and the data will not be impacted.
Error message with file currently unavailable on server
Users are able to query the archived file, but the retrieve fails with error message:
11/14/17 18:03:31 ANS4035W File ‘filename’ currently unavailable on server.
This is due to either the tape information is not correct or it did not load into the Library correctly, please send in a ticket and we will fix it for you.
Error message with file write protected and unable to retrieve to the disk
Users retrieving a file but get error message stating that the file is “write protected” and the file is unable to write to the designated directory. TSM may ask for options, but choice with “Force an overwrite for this object” does not work. User may also specify a different destination directory for retrieval, which the user have the write permission, but same error exits.
This normally happens due to the write permission bit is taken away for the file when it is archived. Normally, this kind of file can be written to /hpc and /tmp, but not GPFS file systems such as /sc/hydra since GPFS has high security settings on the top layer. It is not possible to change the file permission once it is archived but we can provide workaround.
If the file is small in size, please check the space in your home dir (/hpc/users/userid), or /tmp directory on the login nodes (use “df -h” and look for available size in “/” dir). Retrieve to these two directories first and move out to your desired directory. Please constantly monitor the size of the /tmp and do not use over 70%.
If you have a large size of file to retrieve and can not fit in these two directories, please send in a ticket and the admins will retrieve for you.
Error message with exceeded maximum number of mount points
Users may get following error using TSM:
ANS0326E This node has exceeded its maximum number of mount points
This is due to that a maximum of 4 TSM connections are allowed at a time for each node/user. Both dsmc and dmsj commands counts for the TSM connection. Please limit your concurrent TSM connection to 4 and the error will go away. You may also want to check whether there is orphan TSM process from your earlier tsm activities. Terminate this orphan process will also free up more TSM connections.
Can I keep my archived data over the 6 years’ retention time?
For files needed past their expiration date, we suggest you retrieve those files and archive them again. This is good practice for two reasons: