Introductory Site Guide
Table of Contents
- 1. Introduction
- 1.1. Purpose of this document
- 1.2. About the Navy DSRC
- 1.3. Who our services are for
- 1.4. How to get an account
- 1.5. Visiting the Navy DSRC
- 2. Policies
- 2.1. Baseline Configuration (BC) policies
- 2.2. Login node abuse policy
- 2.3. File space management policy
- 2.4. Maximum session lifetime policy
- 2.5. Batch use policy
- 2.6. Special request policy
- 2.7. Account removal policy
- 2.8. Communications policy
- 2.9. System availability policy
- 2.10. Data import and export policy
- 2.10.1. Network file transfers
- 2.10.2. Reading/Writing media
- 2.11. Account sharing policy
- 3. Available resources
- 3.1. HPC systems
- 3.2. Data storage
- 3.2.1. File systems
- 3.2.2. Archive system
- 3.3. Computing environment
- 3.3.1. Software
- 3.3.2. Bring your own code
- 3.3.3. Batch schedulers
- 3.3.4. Advance Reservation Service (ARS)
- 3.4. HPC Portal
- 3.5. Secure Remote Desktop (SRD)
- 3.6. Network connectivity
- 4. How to access our systems
- 5. How to get help
- 5.1. Productivity Enhancement and Training (PET)
- 5.2. User Advocacy Group (UAG)
- 5.3. Baseline Configuration Team (BCT)
- 5.4. Computational Research and Engineering Acquisition Tools and Environments (CREATE)
- 5.5. Data Analysis and Assessment Center (DAAC)
1.1. Purpose of this document
This document introduces users to the Navy DoD Supercomputing Resource Center (DSRC). It provides an overview of available resources, links to important documentation, important policies governing the use of our systems, and other information to help you make efficient and effective use of your allocated hours.
1.2. About the Navy DSRC
The Navy DSRC is one of five DSRCs managed by the DoD High Performance Computing Modernization Program (HPCMP). The DSRCs deliver a range of compute-intensive and data-intensive capabilities to the DoD science and technology, test and evaluation, and acquisition engineering communities. Each DSRC operates and maintains major High Performance Computing (HPC) systems and associated infrastructure, such as data storage, in both unclassified and classified environments. The HPCMP provides user support through a centralized help desk and data analysis/visualization group.
The Navy DSRC is operated by the Commander, Naval Meteorology and Oceanography Command (COMNAVMETOCCOM) and is located at John C. Stennis Space Center in Mississippi. COMNAVMETOCCOM provides atmospheric and oceanographic support to the Department of Defense through a wide range of modeling, prediction and data collection techniques.
The Navy DSRC, formerly the NAVO MSRC, was the second of the four major shared DoD High Performance Computing (HPC) centers to be formed under the auspices of the DoD HPC Modernization Program.
1.3. Who our services are for
The HPCMP's services are available to researchers in the Research, Development, Test, and Evaluation (RDT&E) and acquisition engineering communities of the DoD and its respective Services and Agencies, DoD contractors, and University staff working on a DoD research grant.
For more details, see HPCMP Presentation " Who may run on HPCMP Resources?"
1.4. How to get an account
Anyone meeting the above criteria may request an HPCMP account. A Help Desk video is available to guide you through the process of getting an account. To begin the account application process, visit HPC Centers: Obtaining an Account, and follow the instructions presented there.
1.5. Visiting the Navy DSRC
If you need to travel to the Navy DSRC, there are security procedures that must be completed BEFORE planning your trip. Please visit our Planning a Visit page and coordinate with your Service/Agency Approval Authority (S/AAA) to ensure that all requirements are met.
2.1. Baseline Configuration (BC) policies
The Baseline Configuration Team sets policies that apply to all HPCMP HPC systems. The BC Policy Compliance Matrix provides an index of all BC policies and compliance status of systems at each DSRC.
2.2. Login node abuse policy
The login nodes provide login access to the systems and support such activities as compiling, editing and general interactive use by all users. Consequently, memory- or CPU-intensive programs running on the login nodes can significantly affect all users of the system. Therefore, only small serial applications requiring less than 15 minutes of compute time and less than 8 GB of memory are allowed on the login nodes. Any jobs running on the login nodes that exceed these limits will be terminated.
2.3. File space management policy
Close management of the space in the /p/work1 file system is a high priority. Files in the /p/work1 file system that have not been access in 21 days are subject to the purge cycle. If available space becomes critically low, a manual purge may be run, and all files in the /p/work1 file system are eligible for deletion. Using the touch command (or similar commands) to prevent files from being purged is discouraged. Users are expected to keep up with file archival and removal within the normal purge cycles.
Note: If it is determined as part of the normal purge cycle that files in your $WORKDIR directory must be deleted, you WILL NOT be notified prior to deletion. You are responsible to monitor your workspace to prevent data loss.
2.4. Maximum session lifetime policy
To provide users with a more secure high performance computing environment, the Navy DSRC has implemented a limit on the lifetime of all terminal/window sessions. Any idle terminal or window session connections to the Navy DSRC shall be terminated after 4 hours. Regardless of activity, any terminal or window session connections to the Navy DSRC shall be terminated after 24 hours.
2.5. Batch use policy
Batch queue environments are available on all of the HPC systems. The batch environment is the primary environment for most user work. All of the systems at the Navy DSRC use the PBS batch queue system.
The batch queue environments allow users to submit, monitor and terminate their own batch jobs. This capability is intended for jobs requiring large amounts of memory and/or CPU time that generally run for many hours.
All HPC systems have identical queue names: urgent, frontier, high, debug, standard and background; however each queue has different properties as specified in the tables below. Each of these queues is assigned a priority factor within the batch system and are listed from highest to lowest.
|Highest||urgent||24 Hours||768||Designated urgent projects by DoD HPCMP|
|frontier||168 Hours||19,200||Frontier projects only|
|high||168 Hours||15,840||Designated high-priority projects by service/agency|
|HIE||24 Hours||384||Rapid response for interactive work|
|debug||30 Minutes||2,400||User diagnostic jobs|
|standard||24 Hours||8,168||Normal priority user jobs|
|Serial||168 Hours||1||Serial user jobs|
|gpu||24 Hours||48||GPU-accelerated jobs|
|transfer||48 Hours||N/A||Data transfer jobs|
|bigmem||96 Hours||768||Large-memory jobs|
|Lowest||background||4 Hours||1,200||User jobs that will not be charged against the project allocation|
|Highest||urgent||24 Hours||16,384||Designated urgent projects by DoD HPCMP|
|frontier||168 Hours||32,768||Designated frontier projects by DoD HPCMP|
|high||168 Hours||16,384||Designated high-priority projects by Service/Agency|
|debug||30 Minutes||8,192||User diagnostic jobs|
|HIE||24 Hours||1,024||Rapid response for interactive work|
|viz||24 Hours||128||Visualization Jobs|
|standard||168 Hours||16,384||Normal priority user jobs|
|mla||24 Hours||128||Machine Learning Accelerated Jobs|
|smla||24 Hours||128||Machine Learning Accelerated Jobs|
|dmla||24 Hours||128||Machine Learning Accelerated Jobs|
|bigmem||24 Hours||224||Large-memory jobs|
|transfer||24 Hours||N/A||Data transfer jobs|
|Lowest||background||4 Hours||1,024||User jobs that will not be charged against the project allocation.|
2.6. Special request policy
All special requests for allocated HPC resources, including increased priority within queues, increased queue parameters for maximum number of cores and Wall Time, and dedicated use should be directed to the HPC Help Desk. Additional documentation of the requirement and associated justification will be reviewed for approval by the Navy DSRC Management.
2.7. Account removal policy
This policy covers the disposition or removal of user data when the user is no longer eligible for a given HPCMP account on any one or more systems in the HPCMP. At the time a user becomes ineligible for an HPCMP user account, the user’s access to that account will be disabled.
The user and the Principal Investigator (PI) are responsible for arranging for the disposition of the data prior to account deactivation. The user may request special assistance or specific exemptions or extensions, based on such criteria as availability of resources, technical difficulties or other special needs. If the user does not require any special assistance, then the respective Center will promptly contact the user, the PI of the project and the responsible S/AAA to determine the proposed disposition of the user’s data. All data disposition actions will be performed as specified in the HPCMP’s Data Protection Policy. If the Center is unable to reach the aforementioned individuals, or if the contacted person(s) does not respond before the account is deactivated, the user’s data stored on the systems or home directories will be moved to archive storage, and one of the following two cases must hold:
- User has an account at another HPCMP Center. Then, the user, the PI of the project or the responsible S/AAA, as appropriate, has one year to arrange to move the data from the archive to the HPCMP Center where they have an active account. After this time period has expired, the Center may delete the user’s data.
- User does not have an account at another HPCMP Center. Then, the user, the PI of the project, or the responsible S/AAA, as appropriate, has one year to arrange to retrieve the data from the HPCMP resources. After this time period has expired, the Center may delete the user’s data.
In special cases such as but not limited to, security incidents or HPCMP resource abuse, access to a user account and/or user data may be immediately prohibited or deleted as appropriate for the circumstances as judged by the Center or HPCMP.
Please note the following. Exceptions to this general data disposition policy can and will be made as necessary within the ability of the Center to fulfill such request, given reasonable justification as judged by the Center.
2.8. Communications policy
The Navy DSRC uses the following methods to communicate announcements and important information to users about the HPC systems and the environment:
- Mass e-mails are sent to all users or those assigned to a particular HPC system.
- Maintenance notices are posted on the Navy DSRC public site at: https://www.navydsrc.hpc.mil.
- Maintenance notices are posted on the HPC Centers public site at: https://centers.hpc.mil.
- System login messages posted to the appropriate HPC systems.
It is also vital to the Navy DSRC’s communication process, and mutually beneficial to the users, to understand the responsibilities of being a good citizen of the Navy DSRC. The following is asked of the users:
- Please keep the Navy DSRC apprised of current e-mail addresses. This way we can assure that vital information about the Center reaches you. Please contact your S/AAA to have your e-mail address updated.
- Please check the website, which has up to date current news and information on topics such as HPC resource availability, upcoming training opportunities or updates to the user guides and policies and procedures documentation.
2.9. System availability policy
Planned outages at the DSRCs that affect HPCMP Compute resources follow the BC policy FY06-11 (Announcing and Logging Changes). Each HPCMP allocated compute system has differing system availability requirements per the awarded contract between the Government and the HPC vendor. These policies affect both scheduled and unscheduled downtime.
2.10. Data import and export policy
2.10.1. Network file transfers
The preferred transfer method is over the network using the encrypted (Kerberos) file transfer programs: rcp, scp, sftp, or mpscp. In cases of large numbers of files (> 1000) and/or large amounts of data (> 100 GB), the transfer should consider using the Scalable Copy Accelerated by MPI (SCAMPI) utility. For information on using SCAMPI, see the SCAMPI User Guide. Users can also contact the HPC Help Desk for assistance in the process. Depending on the nature of the transfer, transfer time may be improved by reordering the data retrieval from tapes, taking advantage of available bandwidth to/from the Center, or dividing the transfer into smaller parts; the Navy DSRC staff will assist users to the extent that they are able. A physical media transfer may also be an option. Limitations such as available resources and network problems outside the Center can be expected, and the user should allow sufficient time to do the transfers.
2.10.2. Reading/Writing media
The Navy DSRC currently has the ability to import or export user data with the resources of the mass storage/archival system. This is considered a special request and should be directed to the HPC Help Desk. Additional documentation of the requirement and associated justification will be reviewed for approval by the Navy DSRC Management.
2.11. Account sharing policy
Users are responsible for all passwords, accounts, YubiKeys, and associated PINs issued to them. Users are not to share their passwords, accounts, YubiKeys, or PINs with any other individual for any reason. Doing so is a violation of the contract that users are required to sign in order to obtain access to DoD High Performance Computing Modernization Program (HPCMP) computational resources.
Upon discovery/notification of a violation of the above policy, the following actions will be taken:
- The user account will be disabled. No further logins will be permitted.
- All account assets will be frozen. File and directory permissions will be set such that no other users can access the account assets.
- Any queued and executing jobs in the batch queues will be deleted.
- The Service/Agency Approval Authority (S/AAA) who authorized the account will be notified of the policy violation and the actions taken.
Upon the first occurrence of a violation of the above policy, the S/AAA has the authority to request that the account be re-enabled. Upon the occurrence of a second or subsequent violation of the above policy, the account will only be re-enabled if the user's supervisory chain of command, S/AAA, and the High Performance Computing Modernization Office (HPCMO) all agree that the account should be re-enabled.
The disposition of account assets will be determined by the S/AAA. The S/AAA can:
- Request that account assets be transferred to another account.
- Request that account assets be returned to the user.
- Request that account assets be deleted and the account closed.
If there are associate investigators who need access to Navy DSRC computer resources, we encourage them to apply for an account. Separate account holders may access common project data as authorized by the project principal investigator (PI).
3. Available resources
3.1. HPC systems
The Navy DSRC unclassified HPC systems are accessible through the Defense Research and Engineering Network (DREN) to all active users. Our current HPC systems include:
Gaffney is an HPE SGI 8600 system. It has 704 standard compute nodes, 16 large-memory compute nodes, and 32 GPU compute nodes (a total of 752 compute nodes or 36,096 compute cores). It is rated at 3.05 peak PFLOPS. For more information about Gaffney, visit our hardware page.
Koehr is an HPE SGI 8600 system. It has 704 standard compute nodes, 16 large-memory compute nodes, and 32 GPU compute nodes (a total of 752 compute nodes or 36,096 compute cores). It is rated at 3.05 peak PFLOPS. For more information about Koehr, visit our hardware page.
Narwhal is an HPE Cray EX system. It has 2,176 standard compute nodes, 12 large-memory nodes, 16 visualization accelerated nodes, 32 1-MLA accelerated nodes, and 32 2-MLA accelerated nodes (a total of 2,268 compute nodes or 290,304 compute cores). It has 590 TB of memory and is rated at 12.8 peak PFLOPS. For more information about Narwhal, visit our hardware page.
For information on restricted systems, see the Restricted Systems page (PKI required).
3.2. Data storage
3.2.1. File systems
Each HPC system has several file systems available for storing user data. Your personal directories on these file systems are commonly referenced via the $HOME, $WORKDIR, $CENTER, and $ARCHIVE_HOME environment variables. Other file systems may be available as well.
|$HOME||Your home directory on the system|
|$WORKDIR||Your temporary work directory on a high-capacity, high-speed scratch file system used by running jobs|
|$CENTER||Your short-term (120-day) storage directory on the Center-Wide File System (CWFS)|
|$ARCHIVE_HOME||Your archival directory on the archive server|
For details about the specific file systems on each system, see the system user guides on the documentation page.
3.2.2. Archive system
All of our HPC systems have access to an online archival system, st-vsm1, which provides long term storage for users' files on a petascale robotic tape library system. A 2-PB disk cache frontends the tape file system and temporarily holds files while they are being transferred to or from tape.
For information on using the archive server, see the Archive User Guide.
3.3. Computing environment
To ensure a consistent computing environment and user experience on all HPCMP HPC systems, all systems follow a standard configuration baseline. For more information on the policies defining the baseline configuration, see the Baseline Configuration Compliance Matrix. All systems run variants of the Linux operating system, but the computing environment varies by vendor and architecture due to vendor-specific enhancements.
Each HPC system hosts a large variety of compiler environments, math libraries, programming tools, and third-party analysis applications which are available via loadable software modules. A list of software is available on the software page, or for more up-to-date software information, use the module commands on the HPC systems. Specific details of the computing environment on each HPC system are discussed in the system user guides, available on the documentation page.
To request additional software or to request access to restricted software, please contact the HPC Help Desk at firstname.lastname@example.org.
3.3.2. Bring your own code
While all HPCMP HPC systems offer a diversity of open source, commercial and government software, there are times when we don't support the application codes and tools needed for specific projects. The following information describes a convenient way to utilize your own software on our systems.
Our HPC systems provide you with adequate file space to store your codes. Data stored in your home directory ($HOME) will be backed up on a periodic basis. If you need more home directory space, you may submit a request to the HPC Help Desk at email@example.com. For more details on home directories, see to the Baseline Configuration (BC) policy FY12-01 (Minimum Home Directory Size and Backup Schedule).
If you need to share an application among multiple users, BC policy FY10-07 (Common Location to Maintain Codes) explains how to create a common location on the $PROJECTS_HOME file system, to place applications and codes without using home directories or scrubbed scratch space. To request a new "project directory," please provide the following information to the HPC Help Desk:
- Desired DSRC system where a project directory is being requested.
- POC Information: Name of the sponsor of the project directory, user name, and contact information.
- Short Description of Project: Short summary of the project describing the need for a project directory.
- Desired Directory Name: This will be the name of the directory created under $PROJECTS_HOME.
- Is the code/data in the project directory restricted (e.g. ITAR, etc.)?
- Desired Directory Owner: The user name to be assigned ownership of the directory.
- Desired Directory Group: The group name to be assigned to the directory.
(New group names must be 8 characters or less)
- Additional users to be added to the group.
If the POC for the project directory ceases being an account holder on the system, project directories will be handled according to the user data retention policies of the center.
Once the project directory is created, you can install software (custom or open source) in this directory. Then, depending on requirements, you can set file and/or directory permissions to allow any combination of group read, write, and execute privileges. Since this directory is fully owned by the POC, he or she can even make use of different groups within subdirectories to provide finer granularity of permissions.
Users are expected to ensure that any software or data that is placed on HPCMP systems is protected according to any external restrictions on the data. Users are also responsible for ensuring no unauthorized or malicious software is introduced to the HPCMP environment.
For installations involving restricted software, it is your responsibility to set up group permissions on the directories and to protect the data. It is crucially important to note that there are users on the HPCMP systems who are not authorized to access restricted data. You may not run servers or use software that communicates to a remote system without prior authorization.
If you need help porting or installing your code, the HPC Help Desk provides a "Code Assist" team that specializes in helping users with installation and configuration issues for user supplied codes. To get help, simply contact the HPC Help Desk and open a ticket.
Please contact the HPC Help Desk firstname.lastname@example.org to discuss any special requirements.
3.3.3. Batch schedulers
Our HPC systems use various batch schedulers to manage user jobs and system resources. Basic instructions and examples for using the scheduler on each system can be found in the system user guides. More extensive information can be found in the Scheduler Guides. These documents are available on the documentation page.
Schedulers place user jobs into different queues based on the project associated with the user account. Most users only have access to the debug, standard, transfer, HIE, and background queues, but other queues may be available to you depending on your project. For more information about the queues on a system, see the Scheduler Guides.
3.3.4. Advance Reservation Service (ARS)
Another way to schedule jobs is through the Advance Reservation Service. This service allows users to reserve resources for use at specific times and for specific durations. The ARS works in tandem with the batch scheduler to ensure that your job runs at the scheduled time, and that all required resources (i.e., nodes, licenses, etc.) are available when your job begins. For information on using the ARS, see the ARS User Guide.
3.4. HPC Portal
The HPC Portal provides a suite of custom web applications, allowing you to access a command line, manage files, and submit and manage jobs from a browser. It also supports pre/post-processing and data visualization by making DSRC-hosted desktop applications accessible over the web. For more information about the HPC Portal, see the HPC Portal page on the HPC Centers website.
3.5. Secure Remote Desktop (SRD)
The Secure Remote Desktop enables users to launch a gnome desktop on an HPC system via a downloadable Java interface client. This desktop is then piped to the user's local workstation (Linux, Mac, or Windows) for display. Once the desktop is launched, a user may run any software application installed on the HPC system. For information on using SRD, or to download the client, see the Secure Remote Desktop page on the DAAC website.
3.6. Network connectivity
The Navy DSRC is a primary node on the Defense Research and Engineering Network (DREN), which provides up to 40-Gb/sec service to DoD HPCMP centers nationwide across a 100-Gb/sec backbone. We connect to the DREN via a 10-Gb/sec circuit linking us to the DREN backbone.
The DSRC's local network consists of a 40-Gb/sec fault-tolerant backbone equaling 80-Gb/sec across the enclave with 10-Gb/sec connections to the HPC and 40-Gb/sec to the archive systems.
4. How to access our systems
The HPCMP uses a network authentication protocol called Kerberos to authenticate user access to our HPC systems. Before you can login, you must download and install an HPCMP Kerberos client kit on your local system. For information about downloading and using these kits, visit HPC Centers: Kerberos & Authentication, and click on the tab for your platform. There you will find instructions for downloading and installing the kit, getting a ticket, and logging in.
After installing and configuring a Kerberos client kit, you can access our HPC systems via standard Kerberized commands, such as ssh. File transfers between local and remote systems can be accomplished via the scp, mpscp, or scampi commands. For additional information on using the Kerberos tools, see the Kerberos User Guide or review the tutorial video on Logging into an HPC System. Instructions for logging into each system can be found in the system user guides on the documentation page.
Another way to access the HPC systems is through the HPC Portal. For information on using the portal, visit HPC Centers: HPC Portal. You may also wish to review the HPC Portal demonstration videos. To log into the portal, click on the link for the center where your account is located.
For information on accessing restricted systems, see the system user guides on the Restricted Systems page (PKI required).
5. How to get help
For almost any issue, the first place you should turn for help is the HPC Help Desk. You can email the Help Desk at email@example.com. You can also contact the Help Desk via phone, fax, DSN, or even traditional mail. Full contact for the Help Desk is available on HPC Centers: Technical and Customer Support. The Help Desk can assist with a wide array of technical issues related to your account and your use of our systems. The Help Desk can also assist in connecting you with various special-purpose groups to address your particular need.
5.1. Productivity Enhancement and Training (PET)
The PET initiative gives users access to computational experts in many HPC technology areas. These HPC application experts help HPC users become more productive using HPCMP supercomputers. The PET initiative also leverages the expertise of academia and industry experts in new technologies and provides training on HPC-related topics. Help in specific computational technology areas is available providing a wide range of expertise including algorithm development and implementation, code porting and development, performance analysis, application and I/O optimization, accelerator programming, preprocessing and grid generation, workflows, in-situ visualization, and data analytics.
5.2. User Advocacy Group (UAG)
The UAG provides a forum for users of HPCMP resources to influence policies and practices of the Program; to facilitate the exchange of information between the user community and the HPCMP; to serve as an advocate for HPCMP users; and to advise the HPC Modernization Program Office on policy and operational matters related the HPCMP.
5.3. Baseline Configuration Team (BCT)
The BCT is tasked to define a common set of capabilities and functions so that users can work more productively and collaboratively when using the HPC resources at multiple computing centers. To accomplish this, the BCT passes policies which collectively create a configuration baseline for all HPC systems.
5.4. Computational Research and Engineering Acquisition Tools and Environments (CREATE)
The CREATE program provides tools to enhance the productivity of the DoD acquisition engineering workforce by providing high fidelity design and analysis tools with capabilities greater than today's tools, reducing the acquisition development and test process cycle. CREATE projects provide enhanced engineering design tools for the DoD HPC community.
5.5. Data Analysis and Assessment Center (DAAC)
The DAAC serves the needs of DoD HPCMP scientists to analyze an ever increasing volume and complexity of data. Their mission is to put visualization and analysis tools and services into the hands of every user.