This is an old revision of the document!
You might have data that is rarely, if ever, needed, but that you can't delete. You may want to remove it from the cluster storage to save on disk usage fees. Below are two approaches we suggest to achieve this.
We recommend you have two high-quality copies of all original data and difficult-to-reproduce data, and that they reside in different physical locations.
A regular USB harddrive you bought on Amazon does NOT count as a high-quality copy.
We suggest you implement two of the approaches described below, or something similar.
Purchase a small good-quality desktop RAID system to store your data. Typically this will be called NAS (Network-Attached Storage), and you can configure it with as many drives as you need. Buy 3.5“ enterprise-class (aka server-class) drives and set them up in redundant RAID configuration (RAID level 5 at least, level 6 would be better). This means that if one of the disks in the system fails, the others will maintain the data and you can replace the bad drive without losing any data. However you must have someone check on the system periodically to check it's condition, and setup email and other alerts so it tells you when there's an issue. Most all hard drives fail within 5 years of production.
We've had good experiences with the Synology DS (DiskStation) series of RAID systems, for example the DS416. These products have a good user interface and support connection over the network via NFS (linux/OSX), iSCSI (linux/OSX), rsync (linux/OSX), SFTP, Windows File Services. However they don't support use as a directly-connected drive over USB.
This is a service that provides very easy access to a modern robot-controlled high-availability tape archiving system. It provides a simple filesystem-view interface with simple file retrieval. Custom linux commands are provided for the user to make their archiving copies. Note that this is an archiving service, and is not meant to be a regular backup service. You are able to retrieve files, but such retrievals are expected to be rare.
Pricing is $0.015/GB/mo = $0.18/GB/year = $180/TB/year. This is a great price!
Your data is stored on mirrored tapes, meaning there is a redundant copy on a different set of tapes. However both copies reside in the same physical system, so a catastrophic event that destroys the system or the data center will wipe out all your data stored there.
HIPAA-protected data: The system is not yet HIPAA-compliant, but the approval is “well underway and nearing completion” as of Fall 2015 (still no resolution as of 11/12/15).
In order to create a user account PMACS needs this information:
User Info:
PI Info:
Contact: pmacshpc@med.upenn.edu
For more information, see PMACS HPC Services and HPC:Archive System Wiki