User Tools

Site Tools


clusterbasics

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
clusterbasics [2016/01/10 14:41]
mgstauff [Data & Home Directories]
clusterbasics [2017/09/25 17:24] (current)
mgstauff [Introduction to The Cluster]
Line 5: Line 5:
 The cluster is a collection of servers that work together to provide flexible, high-volume computing resources to all researchers at CfN. All these servers are physically housed together in a server room, and are dedicated to running computing jobs. This differs from the old CfN cluster, in which many of the servers were desktop machines in various labs. This meant that your desktop machine might slow down as the old cluster used it. We don't have that problem now with this new cluster. The cluster is a collection of servers that work together to provide flexible, high-volume computing resources to all researchers at CfN. All these servers are physically housed together in a server room, and are dedicated to running computing jobs. This differs from the old CfN cluster, in which many of the servers were desktop machines in various labs. This meant that your desktop machine might slow down as the old cluster used it. We don't have that problem now with this new cluster.
  
-The cluster has a single login server, called the **front end** or **head node**. To use the cluster you login to the front end (chead.uphs.upenn.edu in our case) using your CfN [[accounts_and_vpn|account]]. You then use a **job scheduler** [[using_ogs_sge|(OGS/SGE)]] to either:+The cluster has a single login server, called the **front end** or **head node**. To use the cluster you login to the front end (chead.uphs.upenn.edu in our case) using your CfN [[accounts_and_vpn|account]]. You then use a **job scheduler** [[using_ogs_sge|(SGE)]] to either:
  
   * a) ask for an **interactive login** session to run something like the Matlab GUI, or use a terminal interactively - i.e. to type commands and run things that way.   * a) ask for an **interactive login** session to run something like the Matlab GUI, or use a terminal interactively - i.e. to type commands and run things that way.
Line 14: Line 14:
 In either case, the job scheduler takes care of finding a compute node to do the actual work of running your batch job or interactive session (both are called 'jobs' by the scheduler, actually). In either case, the job scheduler takes care of finding a compute node to do the actual work of running your batch job or interactive session (both are called 'jobs' by the scheduler, actually).
  
-The cluster is running CentOS 6.with Rocks 6.1. This is the software that manages the collection of front and compute nodes, i.e. the cluster - Rocks is //not// the job scheduler. The jobs scheduler is OGS/SGE.+The cluster is running CentOS 6.9 (Linux operating system version) with Rocks 6.2 (Cluster management system). This is the software that manages the collection of front and compute nodes, i.e. the cluster - Rocks is //not// the job scheduler. The jobs scheduler is SGE (Actually SoGE, Son of Grid Engine, see below).
  
 ===== General Cluster Diagram ===== ===== General Cluster Diagram =====
Line 38: Line 38:
  
 If you're running something quick like using fslview or ITK-SNAP to look at an image and do simple manipulations, then it's ok to use the front end instead of an interactive job. This actually makes for more efficient use of the compute nodes since you won't be tying up a qlogin session when you're not really use much computational power. Just don't forget to exit your app when you're done. The important thing is not to run computationally intensive jobs on the front end, like using Matlab to run an analysis on some data. If you're running something quick like using fslview or ITK-SNAP to look at an image and do simple manipulations, then it's ok to use the front end instead of an interactive job. This actually makes for more efficient use of the compute nodes since you won't be tying up a qlogin session when you're not really use much computational power. Just don't forget to exit your app when you're done. The important thing is not to run computationally intensive jobs on the front end, like using Matlab to run an analysis on some data.
- 
-__Certain FSL Commands__ 
- 
-Some FSL commands submit their own qsub jobs so must be run from the front end. 
-[[fsl_usage|See details here.]] 
  
 **Limits and TERMINATION** **Limits and TERMINATION**
  
-__Applications/jobs/processes that use more than 5GB RAM or 5 minutes of CPU time on the front end will be unceremoniously terminated__. You will see a message like ''CPU time limit exceeded''.This means five minutes of the CPU's time, not five minutes of real world time or "wall clock time". So viewing images or testing matlab scripts that don't do lots of computation will run for much longer than five minutes, pretty much indefinitely. If you really need to run something for longer on the front end, let me know - there's a way to increase the Memory and CPU limits case-by-case. But normally you should run things using ''qlogin'' or ''qsub'' - see the section below on the Job Scheduler (OGS/SGE). +__Applications/jobs/processes that use more than 5GB RAM or 5 minutes of CPU time on the front end will be unceremoniously terminated__. You will see a message like ''CPU time limit exceeded''.This means five minutes of the CPU's time, not five minutes of real world time or "wall clock time". So viewing images or testing matlab scripts that don't do lots of computation will run for much longer than five minutes, pretty much indefinitely. If you really need to run something for longer on the front end, let me know - there's a way to increase the Memory and CPU limits case-by-case. But normally you should run things using ''qlogin'' or ''qsub'' - see the section below on the Job Scheduler (SGE). 
-===== The Job Scheduler - OGS/SGE =====+===== The Job Scheduler - SGE =====
  
-The job scheduler is an open-source version of the standard Sun Grid Engine (which was recently made closed-source). Its name is Open Grid Scheduler - this is what ships with Rocks now Its website: http://gridscheduler.sourceforge.net/features.html. Generally we'll call this SGE, OGS or OGS/SGE. SGE was the term used for so long that it's hard to wean off of it.+The job scheduler is an open-source version of the standard Sun Grid Engine (which was recently made closed-source). Its name is Son of Grid Engine (SoGE, version 8.1.8). Its website: https://arc.liv.ac.uk/trac/SGE 
 +Generally we'll call this SGE.
    
-If you're familiar with the PICSL cluster,  OGS should be the same as SGE there, give or take just a little. The main commands for users are still qlogin, qsub, etc. We'll discuss these in detail later in the wiki ([[Using_OGS_SGE|Using OGS/SGE]]), but please first keep reading below.+If you're familiar with the PICSL cluster,  our version of SGE is very similar, give or take just a little. The main commands for users are still qlogin, qsub, etc. We'll discuss these in detail later in the wiki ([[Using_OGS_SGE|Using SGE]]), but please first keep reading below.
  
 ===== Applications / Programs ===== ===== Applications / Programs =====
Line 85: Line 81:
 Home directories are backed up nightly to another location in the data center, but not archived. Only the most recent contents of your home directory are backed up. Home directories are backed up nightly to another location in the data center, but not archived. Only the most recent contents of your home directory are backed up.
  
-===== Jobs Must be Submitted from chead ===== +===== Jobs Can Be Submitted from Nodes ===== 
-All qsub jobs and qlogin sessions must be started from chead. If you try to do either from a compute nodeyou'll get an error like this:+All ''qlogin'' sessions must be started from chead. 
 + 
 +''qsub'' jobs can be submitted from chead, ''qlogin'' sessions or ''qsub'' jobs. 
 + 
 + If you get an error like this:
  
   denied: host "compute-0-6.local" is no submit host   denied: host "compute-0-6.local" is no submit host
  
 +tell the admins.
clusterbasics.1452436873.txt.gz · Last modified: 2016/01/10 14:41 by mgstauff