Search Mailing List Archives
[farmshare-discuss] FarmShare 2
aseishas at stanford.edu
Wed Apr 19 16:37:56 PDT 2017
SRCC would like to invite all users to test the new FarmShare environment! A brief description can be found below. Log in via SSH at rice.stanford.edu<http://rice.stanford.edu>. You might find that some things are missing or broken—please report any problems to research-computing-support at stanford.edu<mailto:research-computing-support at stanford.edu>.
These are the login nodes for the new environment and replace corn.stanford.edu<http://corn.stanford.edu>. There are currently fourteen systems in service; each has 8 CPU cores and 48 GB of memory. They can be used for interactive work like the corn systems in the current environment, but some resource limits are enforced to ensure multiple users can comfortably share a node (each user is currently allowed the use of 12 GB of memory, a ¼ share of CPU under load, and 128 GB of /tmp storage). These are the only systems with access to AFS in the new environment; see below.
These are managed (compute) nodes similar to the barley systems in the current environment. There are currently five nodes, each with 12 cores and 96 GB of memory, and two additional large-memory nodes, each with 16 cores and 768 GB of memory.
These are GPU nodes like the rye systems in the current environment. There are currently ten systems, each with 16 cores, 128 GB of memory, and one Tesla K40. Unlike the rye systems, these nodes are managed: you must submit a job to run on the oat systems.
Documentation<https://srcc.stanford.edu/farmshare2> is a work-in-progress.
The new environment uses Slurm<https://slurm.schedmd.com/archive/slurm-16.05.8/> rather than GridEngine for job management. If you have used Sherlock<http://sherlock.stanford.edu/> the system should be familiar to you. We’ve installed a package that provides limited compatibility with GE commands but we encourage you to learn and use the native Slurm commands<https://slurm.schedmd.com/pdfs/summary.pdf> (PDF) whenever possible. There are separate Slurm partitions for the standard compute nodes (“normal”), the large-memory nodes (“bigmem”), and the GPU nodes (“gpu”). There are corresponding Slurm qualities-of-service, as well as a QoS for long-running jobs (“long”); normal jobs have a maximum runtime of two days, and long jobs a maximum of seven.
If you need an interactive session that exceeds the resource limits on rice, or you require a feature (like access to a GPU) not available on the login nodes, there is also an “interactive” QoS for this purpose. Be sure to request explicitly the resources you need when submitting a job. Each user is currently allowed one interactive session with a maximum runtime of one day. FarmVNC is no longer supported, but TurboVNC server is installed as a module, and we hope to have better support for VNC use cases in the future.
A final note on one major difference in the new environment: AFS is accessible on the rice systems only, and is not used for home directories. Home directories are served from a dedicated file server, and per-user quota is currently 50 GB. /farmshare/user_data is mounted everywhere, and can be used for additional storage or for transferring data between environments.
Research Computing Specialist
Stanford | University IT
aseishas at stanford.edu<mailto:aseishas at stanford.edu> · 650.725.7490<tel:6507257490>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the farmshare-discuss