Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
fileaccess [2016/10/24 18:10] mgstauff [File Transfers & Accessing Data Drives] |
fileaccess [2018/03/26 14:57] (current) mgstauff [What server to transfer to/from ?] |
====== File Transfers & Accessing Data Drives ====== | ====== File Transfers & Accessing Data Drives ====== |
| |
====== Transfers To/From Non-Penn Collaborators ====== | ---- |
| |
If you have large amounts of data to transfer to/from collaborators outside of Penn, you can: | |
| |
| |
| |
| ====== Transfers To/From Non-CfN-Cluster Collaborators ====== |
| |
| If you have large amounts of data to transfer to/from collaborators outside of the CfN cluster, you can: |
| |
- use hard drives to transfer the data | - use hard drives to transfer the data |
- contact the sysadmins to discuss using our secure FTP server | - contact the sysadmins to discuss using our secure FTP server |
| |
====== Transfering To/From Your Cluster Account ====== | ====== Transferring from PACS/Sectra or Scanners ===== |
| |
| Data from PACS and scanners can be transferred directly to the cluster. [[pacs| See the PACS page]]. |
| |
| ====== Transferring To/From Your Cluster Account ====== |
| |
Generally you'll need to move files between your local/desktop computer and data directories on the cluster. There are a number of ways to do this: | Generally you'll need to move files between your local/desktop computer and data directories on the cluster. There are a number of ways to do this: |
170.212.169.225 - /data/picsl, /data/grossman | 170.212.169.225 - /data/picsl, /data/grossman |
170.212.169.49 - /data/tesla-data, /data/tesla-home | 170.212.169.49 - /data/tesla-data, /data/tesla-home |
| crich - /data/jux |
| |
---- | ---- |
===== scp - NOT RECOMMENDED ===== | ===== scp - NOT Highly Recommended ===== |
The Linux/Unix/Mac OSX ''scp'' command is not recommended for larger data transfers, because it does not verify the data integrity upon receipt at the destination. Note that TCP/IP packet-level checksums will be used for any internet traffic, but scp does not compute and compate a checksum on transferred data like ''rsync'' does. Generally, use ''rsync'' instead. | The Linux/Unix/Mac OSX ''scp'' command is not recommended for larger data transfers, because it does not verify the data integrity upon receipt at the destination. Note that TCP/IP packet-level checksums will be used for any internet traffic, but scp does not compute and compare a checksum on transferred data like ''rsync'' does. Generally, use ''rsync'' instead. |
---- | ---- |
===== rsync - secure remote copy ===== | ===== rsync - secure remote copy ===== |
| |
===Linux & Mac OSX=== | ===Linux & Mac OSX ( & Windows ) === |
| |
This is a powerful command line program for copying files between computers on the network. The basics are like this: | |
| |
rsync -a path-to-files-on-your-computer yourusername@chead:/data/your-data-directory/destination-directory | To use ''rsync'' on __Windows__, [[logging_in#windows_and_the_linux_bash_shell|see here]]. |
| |
| This is a powerful command line program for copying files between computers on the network. The recommended command for transferring to the cluster is this: |
| |
| rsync -prltD --chmod=Dug+rwx,Dg+s,Fug+rw,o-rwx <path-to-files-on-your-computer> <yourusername>@chead:</data/your-data-directory/sub-directory> |
| |
''-a'' - this option tells rsync to recursively copy all files and directories that you specify in the command, and to preserve file ownership and creation/modification dates. Also, **symlinks** are copied as symlinks, meaning the directory or file to which the symlink points is not copied. **NOTE** however that the ''-a'' option includes the options to copy permissions and user/group ownership. In some cases you may not want this, see below. | The options specified above tell rsync to recursively copy all files and directories that you specify in the command, and to modify file and directory ownership in a way that's appropriate for most cluster directories, and to preserve creation/modification dates. |
| |
''path-to-files-on-your-computer'' is the path to the files on your computer that you want to copy to chead. **NOTE** that you should NOT have a ''/'' slash at the end of the path, if you want the directory itself to be copied to your destination. If you do have a ''/'' slash at the end, only the contents of the directory will be copied, and not the directory itself. Here's the example of this from the ''rsync'' man page: | Also, **symlinks** are copied as symlinks, meaning the directory or file to which the symlink points is not copied. |
| |
| ''<path-to-files-on-your-computer>'' is the path to the files on your computer that you want to copy to chead. **NOTE** that you should NOT have a ''/'' slash at the end of the path, if you want the directory itself to be copied to your destination. If you do have a ''/'' slash at the end, only the contents of the directory will be copied, and not the directory itself. Here's the example of this from the ''rsync'' man page: |
| |
| |
rsync -av /src/foo/ /dest/foo | rsync -av /src/foo/ /dest/foo |
| |
''yourusername@chead'' tells rsync to login on chead using your username | ''<yourusername>@chead'' tells rsync to login on chead using your username |
| |
'':/data/your-data-directory/destination-directory'' is the path to your data directory on chead, e.g. /data/jet/mgstauff/destination | '':</data/your-data-directory/sub-directory>'' is the path to your data directory on chead, e.g. /data/jet/mgstauff/destination |
| |
===Permissions and Ownership considerations=== | ===Permissions and Ownership considerations=== |
As mentioned above, the ''-a'' option is an aggregate option, and among others it includes the ''-p'' option to preserve file permissions, and the ''-o'' and ''-g'' options to preserve file user and group ownership. This means the permissions, user and group from the sources files will be copied to the destination directory. Sometimes this is what you want. Other times, you don't want it. | |
| Typically, users simply use the ''-a'' option to ''rsync'' rather than the detailed options above. The ''-a'' option is an aggregate option, and among others it includes the ''-p'' option to preserve file permissions, and the ''-o'' and ''-g'' options to preserve file user and group ownership. This means the permissions, user and group from the sources files will be copied to the destination directory. Sometimes this is what you want. Other times, you don't want it. |
| |
For example if you have a data dir on you local machine that you're sync'ing to your cluster data dir whenever you acquire new data, you may want different permissions/ownership on the cluster. You may have files that haven't changed locally, but on the cluster you've changed their group ownership to allow other users to access them. When you next ''rsync'' from your local dir and use just the ''-a'' option, the group ownership will revert on the cluster to that what's on your local machine. And you may want to have different group permissions on the cluster, that are needed to facilitate sharing with other users. | For example if you have a data dir on you local machine that you're sync'ing to your cluster data dir whenever you acquire new data, you may want different permissions/ownership on the cluster. You may have files that haven't changed locally, but on the cluster you've changed their group ownership to allow other users to access them. When you next ''rsync'' from your local dir and use just the ''-a'' option, the group ownership will revert on the cluster to that what's on your local machine. And you may want to have different group permissions on the cluster, that are needed to facilitate sharing with other users. |
| |
To overcome this, you may use these options, or some combination: | To overcome this, you may use the options listed above, or some combination: |
| |
rsync -prltD --chmod=Dug+rwx,Dg+s,Fug+rw,o-rwx | rsync -prltD --chmod=Dug+rwx,Dg+s,Fug+rw,o-rwx |
| |
The ''-a'' option is really an aggregate of these options: ''-rlptgoD''. So above, I've passed all those manually except ''-o, -g''. This means that ownership on the cluster will be preserved for any files that already exist there. Also if you're copying into a directory that uses the group 'sticky bit' to make all new files be owned by the directory's group (as we do for group data directories), then new files will get the appropriate group on the cluster. | The ''-a'' option is really an aggregate of these options: ''-rlptgoD''. So above, I've passed all those manually except ''-o, -g''. This means that ownership on the cluster will be preserved for any files that already exist there. Also if you're copying into a directory that uses the group 'sticky bit' to make all new files be owned by the directory's group (as we do for group data directories), then new files will get the appropriate group on the cluster. |
| |
Be sure to choose the ''SFTP'' protocol during setup. The ''FTP'' protocol won't work. | Be sure to choose the ''SFTP'' protocol during setup. The ''FTP'' protocol won't work. |
| |
=== Linux & Mac OSX === | === Linux & Mac OSX ( & Windows with Cygwin or Windows Subsystem For Linux ) === |
| |
You can also run ''sftp'' from the terminal command line for a text-based version. | You can also run ''sftp'' from the terminal command line for a text-based version. |
| |
| To use ''sftp'' from the terminal on Windows, [[logging_in#windows_and_the_linux_bash_shell|try here]]. |
| |
---- | ---- |
===== SSHFS - Directly connect/mount your data directory ===== | ===== SSHFS - Directly connect/mount your data directory ===== |