- The Maxwell Cluster
The High Performance Computing (HPC) service provides large amounts of computational processing power in support academic research. It is available to all University staff and students (When authorised by their supervisor). Access for external collaborators can be arranged on request.
What is High Performance Computing?
High Performance Computing involves using interconnected clusters of many computers - sometimes referred to as a supercomputer - to accelerate large computing tasks. The HPC service is suited to solving problems that require considerable computational power or involve huge amounts of data that would normally take weeks or months to analyse on a desktop PC, if it can be done at all. HPC clusters can provide over a thousand desktop computers’ worth of resources for days on end, completing the work a single desktop computer would take a year to do in just one day.
HPC clusters are already being used by a number of researchers across schools in a wide range of disciplines and research topics – e.g. genome sequencing and analysis, chemical pathway simulation, climate change impact assessment and financial systems modelling – and as a catalyst and enabler for inter school and interdisciplinary research in areas such as systems biology.
If your research is being bottlenecked by computing resources, the HPC service could help by:
- Delivering results faster
- Improving time to science and market
- Propelling new results and discoveries
- Developing and innovating new and game-changing products
About the Service
The university operates its own in-house HPC cluster named Maxwell after James Clerk Maxwell, the eminent Scottish Theoretical Physicist who spent a number of years as a professor at the University’s Marischal College. Maxwell is a Linux supercomputing cluster housed in the Edward Wright Datacentre and provides:
- 1240 CPU cores and 12TB of RAM - minimum 256GB per node
- Several specialist nodes: One with 3TB of memory, others with GPU cards
- Resilient high-speed networks including 100Gb/s InfiniBand interconnects for multi-node MPI jobs and 10Gbe to the Campus network
- Over 1 Petabyte tiered storage including 15TB of very high-performance storage
- A wide variety of commercial and free HPC-optimized software - see current list
- Compilers for user developed applications
- Galaxy - Access to a Galaxy server which utilises the HPC’s resources. This open-source, web-based platform is designed to make computational biology accessible to researchers that do not have computer programming experience. Galaxy uses a graphical user interface to run software on the worker nodes of the cluster - see current list of software. It allows for workflows to be saved and shared, supporting reproducible and transparent analysis. Researchers at the University of Aberdeen are part of the Galaxy Training Network, who run subject specific workshops on Galaxy throughout the academic year.
Where your requirements exceed what is available in Maxwell, we are also collaborating with a number of external bodies and commercial cloud providers and can advise on the use of much larger HPC Clusters.
The in-house HPC infrastructure is self-service orientated. We maintain the hardware and software on behalf of all users, but users are expected to understand how to use the software and perform the work themselves. Users are free to install their own software within their personal storage allocation, though IT Services will consider installing additional software onto the cluster on a case-by-case basis.
- Using the HPC
How do I request an account?
Maxwell uses the same login credentials as your central university account but you must get your individual account approved before you can use it on Maxwell.
To have your account enabled for Maxwell access, send an email to email@example.com with your university username, and the following information:
- A brief description of your research – i.e. what you’ll be using Maxwell for
- An estimate of the timescales – i.e. how long you will be using it
- An estimate of the scope of your resource requirements – i.e. how much memory, CPU, or storage you are expecting to use
- Any special software or support requirements you need.
- Maxwell is a research facility so students (both taught and research) will need to supply a letter of endorsement from their supervisor.
Your request will be forwarded to our dedicated Digital Research Services Team for approval. Please allow 2-3 working days for your request to be actioned. For non-routine requests, we may contact you to discuss your requirements and to assess whether the work you propose work is viable.
Once your request has been approved you will be able to access Maxwell via a range of SSH, Remote Desktop and Portal options.
What are the costs for using Maxwell?
There is no charge to hold an account on Maxwell or to try out the service. Limited free use (up to 1000 core hours) is available for the following:
- Small pilot projects and unfunded PGR projects
- Those wishing to test or familiarise themselves with the service
- Those who wish to assess the suitability of the service prior to applying for funding
- Training and documentation use
Where the HPC service is used as part of a research project, you should ensure the costs are incorporated into your grant proposals so that your use of the service can be charged to the appropriate grants. The standard costs for using the HPC service are listed below. Costs for using the HPC are also included in IT Services research costing tool.
- Standard use of the HPC Cluster costs 10p per core-hour of CPU (any nodes).
- Standard support, including training and documentation, is included at no additional charge
- Additional support, including installation and troubleshooting of bespoke applications, is charged at £400 per day in ½ day increments.
- Additional storage beyond the standard allocation is charged at £500 per terabyte per year
- Usage of specialist nodes is available on request.
Please contact the Digital Services Team via email using the firstname.lastname@example.org address,if any items are unclear or if you are unsure of your project requirements.
How do I access Maxwell?
Maxwell can be accessed both on-campus and off-campus using a variety of methods. There is a basic web interface, and a web-based Galaxy instance. However, to make full use of the service you will need software that supports the SSH protocol and optionally X for graphical applications and SCP for transferring files. For novice users we recommend using the X2Go (graphical remote desktop) client, PuTTY for SSH access and WinSCP for file transfer. These applications are available to install automatically on university Windows PCs via the Software Centre and can also be downloaded free of charge to use on your own computer. For those already familiar with accessing Linux servers, you can use your own choice of SSH, X or SFTP software.
Several of these access methods will need alterations to their configuration to work from non-university computers. For detailed information on installing the software and connecting to Maxwell, please see our User Guide.
Is training available?
- Policies and Entitlement
Use of the HPC service is subject to the University of Aberdeen’s Conditions for using IT Facilities and other applicable IT Policies In addition, specific information on the HPC is given below.
For general enquiries on using the service, such as logging in and registering, please contact email@example.com. For more complex queries or requests for a specific system configuration, the Digital Research Services Team, contactable via the Service Desk, will be happy to advise.
All standard HPC accounts are entitled to the following by default:
- Access to up to 200 job slots/CPU cores at a time, subject to availability
- 50GB of resilient, backed-up personal home space
- Access to the 1TB of scratch working storage
- Ability to run jobs with up to 200GB of RAM allocations
- Access to all preinstalled free and commercial software
- Support via the service desk during office hours
The cluster runs Open Grid Scheduler to automatically allocate jobs submitted by users onto available compute servers. The scheduler balances the availability of slots among all users in order to permit fair access to the system, taking into account the specific requirements of each job (e.g. number of CPUs, amount of RAM, job duration, and node affinity requirements).
The scheduler starts queued jobs as space becomes available and can be set to advise users of job status by email.
Larger jobs require more time and resource are more difficult to schedule, so it is to the everyone’s advantage to make sure the requested resource is as accurate as possible. This allows smaller jobs to be backfilled into available space and therefore started/completed earlier than larger jobs.
An awareness of the following will help users provide accurate information when scheduling a job:
- When insufficient memory is requested for a job, the job will not run and will need to be rescheduled with more memory requested.
- Where more memory is requested than is used, the user will have the full amount allocated to them as this resource is blocked and is not usable elsewhere.
- The default runtime of any job is 24 hours
- If more time is needed this must be explicitly stated.
- Less time can also be requested.
- When insufficient time is allocated to a job, the job will be stopped when the allocated time has elapsed and will need to be rescheduled with more time requested.
- Where the actual time used is less than the time requested, only the actual time will be attributed to a user’s account.
- This is because on completion of a job, Maxwell can begin to use the free resource on another job.
- Interactive jobs can run only when there is space immediately available on Maxwell.
Very High Memory Node
Use of the very high memory node is governed by the Digital Research Services Team to ensure it is used appropriately. Users with the authority to do so can request the use of this node via Maxwell.
Once scheduled, statistics on the job are available from Maxwell using the ‘squeue’ command. This advises users of the status of a running job or the priority of a queued job.
Open Grid Scheduler can also be used to effectively prioritise jobs. The following policies are in place to assist with prioritisation:
- Prioritisation of jobs only comes into play when a job needs to be queued. Otherwise, where space is free a job will run immediately
- Jobs that have been queuing for longer to run automatically increase in priority over time
- Fair-share policy is taken into account for a user/project with a half-life of 14 days resource usage. This ensures one area does not unfairly monopolise the resource
- Users will be grouped initially by School and thereafter assigned to specific projects
Monitoring and Arbitration
The Digital Research Services Team is responsible for monitoring use of the system and should be contacted via firstname.lastname@example.org to resolve any perceived scheduling or prioritisation issues.
The Digital Research Services Team will monitor the requested resource and actual resource use of jobs and be on hand to advise users where there is opportunity to streamline jobs and make better use of Maxwell. Users will not be permitted to continually over estimate job requirements at the expense of other cluster users.
The following storage is available for users:
- 1TB (total) scratch space – this data is not backed up. It can be written to by the owning user and is to be used to store large volumes of data to be processed.
- 1TB Runtime scratch TB– this is the fastest available storage. Data will only last as long as a job lasts and will be automatically deleted.
- 3.5TB (total) For local user home directories – The local user home directories are backed up on a nightly basis. Users are given a quota of 50GB home file space.
Availability and maintenance
Maxwell is designed to ensure maximum availability and continued run time even in the event that some cores or nodes stop functioning correctly. When this happens, these issues will be resolved, as far as possible, without any additional disruption to the cluster.
A maximum one day’s maintenance per quarter will be set aside for regular cluster maintenance (patching, upgrades etc.) to ensure the longevity of the system and limit the need for longer periods of unplanned down time. Planned maintenance will be communicated in advance to all users and will be scheduled to cause minimum disruption.
Emergency maintenance or unplanned disruptions
Every effort will be made to ensure there are no unplanned disruptions to the service. Where events, either internal or external to the HPC, do cause disruption, every effort will be made by IT Services and support partners to restore service as quickly as possible.
Applying for Resources
To apply for an account and have a specific project set up on Maxwell, please email email@example.com with an outline of your requirements. Applicants should ensure they include ‘HPC’ or ‘Maxwell’ in the subject line.
- External Resources
Is your cutting edge/novel research problem beyond the capabilities of the Maxwell HPC and you need more computing power? We can provide access to more powerful and more specialised HPC e.g. GPU based HPC. They will have that edge you need to achieve your research goals. We will facilitate your access to these Tier 2 (Regional level) and Tier 1 (National and International level) HPC resources. They all have their own access costs and conditions.
Access to the following Tier 1 and Tier 2 HPC:
- Archer (Tier 1)- National HPC mainly funded by EPSRC - https://www.archer.ac.uk/access
- Cirrus (Tier 2) - hosted by EPCC in Edinburgh – Access is free for EPSRC funded projects. https://epsrc.ukri.org/research/facilities/hpc/tier2
- Jade (Tier 2)– a specialised GPU HPC – Access is very restricted.
With their abundance of memory, these HPC’s allow you to run memory intensive multi- and single core problems simultaneously. The user friendly ‘batch’ environment of these HPC’s provides a quality of life service for user, allowing them to queue computing jobs without further interaction.
If you think your computational needs are likely to bring Maxwell’s to its knees, then please contact us and we can see about getting you access to external HPC resources.
Is High Performance Computing of use to me?
If your research requires the processing of large amounts of data and/or your current processing of data takes weeks/months to process on a personal computer, there is a high possibility you would benefit from using the new institutional HPC service.
Who can use Maxwell?
All staff and researchers with a requirement to use the HPC service will have the opportunity to do so.
Postgraduate research students should present business cases via their research advisor in the first instance.
Who provides support for Maxwell?
Our dedicated Digital Research Infrastructure Support team will advise researchers on the use of the cluster, the scheduling of jobs and any software relating to their requirements. Contact the Digital Research Services Team via the firstname.lastname@example.org
The 24/7 monitoring and support of hardware and management of software and Maxwell users is provided by a third-party provider experienced in the installation and maintenance of HPC systems.
How do I access Maxwell?
Once a request has been made accepted and users are added to Maxwell users will be able to access the Maxwell via a range of SSH, Remote Desktop and Portal options. See our User Guide for more.
Can I access Maxwell off campus?
Yes. Portal access is also available off-campus from any PC with an internet connection, including personal desktop or laptops PCs. See our User Guide for more.
Can I access the HPCC from my personal device?
Yes, any personal desktop or laptop can be used to access the cluster via the portal or through the VPN.
Can I change my password?
Passwords can be changed via the standard route for changing IT account passwords. Users are reminded that passwords for Maxwell are not specific to the cluster.
What software is provided?
We provide a wide range of commercial and free HPC-optimized software - see current list for details.
We also provide a range of software for Galaxy - see current list for details.
Can I add software to the cluster?
Yes - any requirement for specific software packages to be run on the Maxwell will be reviewed internally and will be added by the support contractor. We anticipate that the majority of software requirements will be able to be installed on the cluster, however users will be responsible for the licensing of any commercial software required.
Users are encouraged to ensure that software required is loaded onto the shared repository for re-use. It can also be shared on a nationwide database for use by other institutions. The Digital Research Services Team will advise you on how to do this.
How do I acknowledge the use of Maxwell?
We suggest the following:
"The authors would like to acknowledge the support of the Maxwell Compute Cluster funded by the University of Aberdeen."
Where can I store my research data when it is not being used on the HPCC and is it secure?
IT Services provides and manages a resilient networked data storage solution for the University which is replicated continuously to a Disaster Recovery site and is backed up nightly. This system is tiered and provides short-term and long-term storage facilities to the University, ensuring data is readily accessible.
Data is stored within secure shared drives, typically set up for teams, research grants etc. Each shared drive has a senior individual within the team nominated as the shared drive owner. This individual has responsibility for authorising access rights to the shared drive, as well as ensuring data is curated in compliance with University policies
How much does it cost to store my research data?
The cost of data storage and additional file sharing services are detailed in the IT Service Catalogue.