From 3f2b4742f1b965563b300bbd0c94b84ccf05f46f Mon Sep 17 00:00:00 2001 From: Timothy Middelkoop Date: Mon, 10 Jan 2022 22:29:01 +0000 Subject: [PATCH] Add research context and exercise for GCP. --- content/GCP/06_running_analysis.ipynb | 47 ++++++++++++++++++++------- 1 file changed, 35 insertions(+), 12 deletions(-) diff --git a/content/GCP/06_running_analysis.ipynb b/content/GCP/06_running_analysis.ipynb index 052ffe8..730eef5 100644 --- a/content/GCP/06_running_analysis.ipynb +++ b/content/GCP/06_running_analysis.ipynb @@ -22,12 +22,19 @@ "* Understand the basics of Identity and Access Management (IAM)\n", "* Add collaborators to a Bucket with appropriate permissions.\n", "\n", - "\n", - "\n", - "\n", "```" ] }, + { + "cell_type": "markdown", + "id": "06ef4791-862e-4c05-afc8-98d440c47f73", + "metadata": {}, + "source": [ + "## A Research Computational and Data Workflow - Drew's story\n", + "\n", + "Drew needs to do some analysis on the data. They need data (satellite images stored in the cloud), computational resources (a virtual machine), some software (we will supply this), and a place to store the results (Cloud Storage). We will assemble all these parts in the cloud \n" + ] + }, { "cell_type": "markdown", "id": "6291edee-c2df-4941-9b8e-de42649640f9", @@ -35,14 +42,18 @@ "source": [ "## Create a VM\n", "\n", - "Since we only create resources as we need them in the cloud. As an exercise you will now create a VM for our analysis. In this case will give the VM **Full** access to **Storage**. \n", + "Since we only create resources as we need them in the cloud, we will now create a new virtual machine (VM) for Drew to use for their analysis.\n", + "\n", + "We will do this as an exercise to give you practice in creating resources. Since the virtual machine will need access to storage on your behalf, you will need to change the **access scope** to give **Full** access to the **Storage** API to the virtual machine. \n", + "\n", + "### Exercise\n", "\n", "Using the console navigate to the \"Compute Engine\" service and create a new VM with the following properties.\n", " * Call the VM \"essentials\"\n", - " * Allow the VM \"Full\" access to \"Storage\". This can be found under \"Identity and API\" and then selecting \"Set access for each API\" and change \"Storage\" to \"Full\". **This will allow the VM to create, read, write, and delete all storage buckets in the project\"**\n", - " * Feel free to select a larger VM by changing the machine type to something larger, for example an \"e2-standard-2\".\n", - " \n", - "When you are done connect to the machine as described below." + " * Allow the VM \"Full\" access to \"Storage\". This can be found under \"Identity and API\" on the \"create an instance\" page and then selecting \"Set access for each API\" and change \"Storage\" to \"Full\". **This will allow the VM to create, read, write, and delete all storage buckets in the project\"**\n", + " * Feel free to select a bit larger VM by changing the machine type to something larger, for example an \"e2-standard-2\".\n", + "\n", + "*When you are done feel free to connect to the virtual machine on your own for additional practice. Once everyone has created their VM we will connect to the machine as described below.*" ] }, { @@ -52,7 +63,7 @@ "source": [ "## Connect to the VM\n", "\n", - "First login to the instance from the Cloud Shell by running the following command:\n", + "Now login to the new virtual machine instance by opening up the Cloud Shell and by running the following command:\n", "```\n", "gcloud compute ssh essentials\n", "```\n", @@ -246,7 +257,9 @@ "source": [ "## Access the bucket\n", "\n", - "First test that our tools are working and that we can access the public bucket that we will be using." + "Now we need to verify that Drew has access to the analysis data. \n", + "\n", + "We do this by testing that our tools are working and that we can access the public bucket that we will be using." ] }, { @@ -278,14 +291,24 @@ "gsutil ls gs://gcp-public-data-landsat/" ] }, + { + "cell_type": "markdown", + "id": "9e16e8b5-a178-492a-aa80-5affe721b6ca", + "metadata": {}, + "source": [ + "## Getting the data" + ] + }, { "cell_type": "markdown", "id": "89acffba-cbce-436c-98dd-05467b6675a6", "metadata": {}, "source": [ - "The index file is a list of all the files in the bucket and we can use it to search and filter files.\n", + "Since the Landsat data is *huge* we do not, and cannot, download everything to the virtual machine. We will only analyzing a subset of the data.\n", + "\n", + "We will use the the `index.csv.gz` file, which is a list of all the files and additional metadata in the bucket and we can use it to search and filter the data.\n", "\n", - "We will get the index and uncompress the file placing it in the `data/` directory (this is ignored by git). This should take around 2 min with a `e2-medium` instance in the `us-west2` region." + "We will first get the index and uncompress the file placing it in the `data/` directory (this is ignored by git). This should take around 2 min with a `e2-medium` instance in the `us-west2` region." ] }, {