This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
clusterbasics [2015/04/28 20:15] mgstauff |
clusterbasics [2017/09/25 17:24] (current) mgstauff [Introduction to The Cluster] |
||
---|---|---|---|
Line 5: | Line 5: | ||
The cluster is a collection of servers that work together to provide flexible, high-volume computing resources to all researchers at CfN. All these servers are physically housed together in a server room, and are dedicated to running computing jobs. This differs from the old CfN cluster, in which many of the servers were desktop machines in various labs. This meant that your desktop machine might slow down as the old cluster used it. We don't have that problem now with this new cluster. | The cluster is a collection of servers that work together to provide flexible, high-volume computing resources to all researchers at CfN. All these servers are physically housed together in a server room, and are dedicated to running computing jobs. This differs from the old CfN cluster, in which many of the servers were desktop machines in various labs. This meant that your desktop machine might slow down as the old cluster used it. We don't have that problem now with this new cluster. | ||
- | The cluster has a single login server, called the **front end** or **head node**. To use the cluster you login to the front end (chead.uphs.upenn.edu in our case) using your CfN [[accounts_and_vpn|account]]. You then use a **job scheduler** [[using_ogs_sge|(OGS/SGE)]] to either: | + | The cluster has a single login server, called the **front end** or **head node**. To use the cluster you login to the front end (chead.uphs.upenn.edu in our case) using your CfN [[accounts_and_vpn|account]]. You then use a **job scheduler** [[using_ogs_sge|(SGE)]] to either: |
* a) ask for an **interactive login** session to run something like the Matlab GUI, or use a terminal interactively - i.e. to type commands and run things that way. | * a) ask for an **interactive login** session to run something like the Matlab GUI, or use a terminal interactively - i.e. to type commands and run things that way. | ||
Line 14: | Line 14: | ||
In either case, the job scheduler takes care of finding a compute node to do the actual work of running your batch job or interactive session (both are called ' | In either case, the job scheduler takes care of finding a compute node to do the actual work of running your batch job or interactive session (both are called ' | ||
- | The cluster is running CentOS 6.3 with Rocks 6.1. This is the software that manages the collection of front and compute nodes, i.e. the cluster - Rocks is //not// the job scheduler. The jobs scheduler is OGS/SGE. | + | The cluster is running CentOS 6.9 (Linux operating system version) |
+ | |||
+ | ===== General Cluster Diagram ===== | ||
+ | {{:: | ||
===== The Queue - running multiple jobs ===== | ===== The Queue - running multiple jobs ===== | ||
The great thing about the cluster is that you can submit a lot of jobs at once (thousands, even), and walk away while they all run. By default, up to 32 of them will run at once if there' | The great thing about the cluster is that you can submit a lot of jobs at once (thousands, even), and walk away while they all run. By default, up to 32 of them will run at once if there' | ||
+ | |||
+ | **NOTE** the 32-job limit is actually a 32-//slot// limit. If you're multi-threading your jobs, you'll be able to run fewer of them at once. See the discussion further on in this wiki for details. | ||
If the cluster is busy, the scheduler may run fewer than 32 of your jobs at once, making sure every user gets their queued jobs run in turn. | If the cluster is busy, the scheduler may run fewer than 32 of your jobs at once, making sure every user gets their queued jobs run in turn. | ||
Line 25: | Line 31: | ||
===== Don't (generally) run programs on the front end itself ===== | ===== Don't (generally) run programs on the front end itself ===== | ||
- | Note that // | + | Note that // |
**EXCEPTIONS** | **EXCEPTIONS** | ||
Line 33: | Line 39: | ||
If you're running something quick like using fslview or ITK-SNAP to look at an image and do simple manipulations, | If you're running something quick like using fslview or ITK-SNAP to look at an image and do simple manipulations, | ||
- | __Certain FSL Commands__ | + | **Limits and TERMINATION** |
- | Some FSL commands submit their own qsub jobs so must be run from the front end. | + | __Applications/ |
- | [[fsl_usage|See details here.]] | + | ===== The Job Scheduler - SGE ===== |
- | **TERMINATION** | + | The job scheduler is an open-source version of the standard Sun Grid Engine (which was recently made closed-source). Its name is Son of Grid Engine (SoGE, version 8.1.8). Its website: |
- | + | Generally we'll call this SGE. | |
- | __Applications/ | + | |
- | ===== The Job Scheduler - OGS/SGE ===== | + | |
- | + | ||
- | The job scheduler is an open-source version of the standard Sun Grid Engine (which was recently made closed-source). Its name is Open Grid Scheduler - this is what ships with Rocks now. Its website: | + | |
- | If you're familiar with the PICSL cluster, | + | If you're familiar with the PICSL cluster, |
===== Applications / Programs ===== | ===== Applications / Programs ===== | ||
Line 61: | Line 63: | ||
===== Data & Home Directories ===== | ===== Data & Home Directories ===== | ||
- | Data is in ''/ | + | **Data** is in ''/ |
/ | / | ||
Line 71: | Line 73: | ||
New users get their data directories in ''/ | New users get their data directories in ''/ | ||
- | Your ''/ | + | **Your ''/ |
+ | |||
+ | ===Backup=== | ||
+ | |||
+ | All data directories are backed up to tape on a quarterly basis and tapes are stored off-site. Users are responsible for maintaining their own copies of original data in the event of catastrophic failure of the system. | ||
+ | |||
+ | Home directories are backed up nightly to another location in the data center, but not archived. Only the most recent contents of your home directory are backed up. | ||
+ | |||
+ | ===== Jobs Can Be Submitted from Nodes ===== | ||
+ | All '' | ||
+ | |||
+ | '' | ||
+ | |||
+ | If you get an error like this: | ||
+ | |||
+ | denied: host " | ||
+ | |||
+ | tell the admins. |