Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
CLASS-HPC-GCP/README.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
26 lines (22 sloc)
1.65 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# CLASS-HPC-GCP | |
Internet2 CLASS Capstone Project, HPC-GCP Team | |
It is convenient to deploy an HPC cluster on the GCP cloud using | |
terraform scripts. The scripts are inside the directory slurm-gcp. You | |
need to edit the scripts - basic.tfvars and main.tf. For a basic | |
deployment, it is sufficient to redefine a few variables such as | |
project name, cluster name, etc. Of course, you can fine tune the | |
variables and scripts to fit your needs. | |
* Create a GCP Project (for example, class-capstone) | |
* Go to API services and enable the API for compute engines and deployment manager | |
* Start the cloud shell (If you set up the environment for gcloud shell or sdk on your laptop, you can use your development environment in place of cloud shell) | |
* Clone the gcp repo: git clone https://github.com/SchedMD/slurm-gcp.git | |
* Go to the directory slurm-gcp/tf/examples | |
* Make a copy of the basic example. `cp basic.tfvar.example basic.tfvar` | |
* Edit the basics.tfvar file. Add a line “project = class-capstone” (or any name you like) | |
* Open main.tf, and make sure that the bash variable `source` refers to the correct path | |
* Initialize terraform `terraform init` | |
* Start the HPC cluster. `terraform apply -var-file=basic.tfvars`. | |
* Go to your GCP dashboard and check the compute engines. You should see the controller and the login node up and running. | |
* SSH into the login node and check the slurm status (`sinfo`) or run some test jobs. | |
Note that the minimal disk-size is 20 GB to accommodate the size of the VM. If you wan’t more than 20 GB, that’s okay. | |
The above steps are explained in the document page: https://cloud.google.com/architecture/deploying-slurm-cluster-compute-engine |