MARCC Contingency Plan

Type: High Performance and Data Intensive Computing resources

Location:  5400 E. Lombard St, Baltimore Md, 21224 (Bayview campus)

Physical Access:  Restricted to authorized card users

Purpose: MARCC provides high performance computing resources to the Johns Hopkins University and University of Maryland research community.  These resources are in the form of cpu-cycles to analyze data, large amounts of storage/backups, and network connectivity to ensure access (remote) to the core facility and to transfer data between different sites.

Designation: Critical

Power:  BGE is the main power provider. MARCC also has a couple of generators for backup power.

In the event of an emergency most resources at MARCC can be accessed remotely via the internet.

Systems group:Most of the work on the systems side can also be done on a remote basis but it could be slow and may require additional effort. There are a few instances where physical presence is required at the site. We have daily delivery of parts to be replaced or sent back for replacement. The file systems (where all the data is housed) do need almost daily attention as hard drives will need to be replaced. MARCC has developed scripts that indicate how many and which hard drives need or will need attention. This task can be scheduled on a two day rotation basis for the systems team. It may involve just a few hours per instance.

In the event of a full close down, the file systems can be shutdown gracefully. However, this is not advisable as it may be difficult to bring the file systems back up and there is possible loss of data sets.

User/Application/Integration support: MARCC communicates with the user community via a ticketing system housed at MARCC with a backup at Bloomberg 156. This media will be the primary means of communication between MARCC and all researchers.  Internally, we use slack for immediate communication between all group members.  MARCC also uses its web site (www.marcc.jhu.edu) to post updates about current events, downtimes, and general issues.

Facility: Most components are monitored by RUNOS, the software used to monitor the facility. The team can use a quick reactive mode to attend emergency calls or signals.

Essential personnel:  Joseph Biondo and Jaime Combariza

Emergency contacts:  MARCC keeps a list of emergency contacts at the Confluence site. The MARCC director also keeps a copy of this list.