diff --git a/Build.md b/Build.md index 852be0a..3942f65 100644 --- a/Build.md +++ b/Build.md @@ -39,3 +39,7 @@ export GOOGLE_CLOUD_PROJECT=just-armor-301114 export DEVSHELL_PROJECT_ID=$GOOGLE_CLOUD_PROJECT gcloud config set project $GOOGLE_CLOUD_PROJECT ``` + +## AWS + +Expect that `aws` is installed locally. A ssh-key named 'learner' is required to access the account. diff --git a/content/GCP/.gitignore b/content/GCP/.gitignore deleted file mode 100644 index f71307d..0000000 --- a/content/GCP/.gitignore +++ /dev/null @@ -1 +0,0 @@ -CLASS-Examples/ diff --git a/content/GCP/05_intro_to_cloud_storage.ipynb b/content/GCP/03_intro_to_cloud_storage.ipynb similarity index 100% rename from content/GCP/05_intro_to_cloud_storage.ipynb rename to content/GCP/03_intro_to_cloud_storage.ipynb diff --git a/content/GCP/05_cli_storage.ipynb b/content/GCP/05_cli_storage.ipynb new file mode 100644 index 0000000..02ca086 --- /dev/null +++ b/content/GCP/05_cli_storage.ipynb @@ -0,0 +1,509 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c73369d2-8d8c-4764-bcab-cf8a0ec71e57", + "metadata": {}, + "source": [ + "# Managing Cloud Storage from the Command Line\n", + "\n", + "Learner Questions\n", + " * How do I store data in a Bucket?\n", + "\n", + "Learning Objectives\n", + " * Manage buckets using the command line.\n", + " * Create a bucket with a sensible configuration. \n" + ] + }, + { + "cell_type": "markdown", + "id": "3bcdd299-5d83-4bb6-b2bb-daa85b392a19", + "metadata": {}, + "source": [ + "Now that Drew understands how to create create buckets using the web console and to use the Google command line tools they are now going to explore managing buckets with the command line." + ] + }, + { + "cell_type": "markdown", + "id": "86da8dfa-44d6-45e7-8b43-36ef24626955", + "metadata": {}, + "source": [ + "## Configuration\n", + "\n", + "It is important to verify the Account and Project information every time you start interacting with the cloud. We also use this opportunity to set the configuration environment variables (`PROJECT`) for the Episode.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "8e3ef8dc-e41b-45bb-8f5c-cba30add845b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Your active configuration is: [cloudshell-31923]\n", + "student31@class.internet2.edu\n" + ] + } + ], + "source": [ + "gcloud config get-value account" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "9db0f3ca-2402-4002-b9d3-e634580e8f7f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Your active configuration is: [cloudshell-31923]\n", + "just-armor-301114\n" + ] + } + ], + "source": [ + "gcloud config get-value project" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "36fcc48b-8a5b-4158-9704-f1ea7c9951eb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "just-armor-301114\n" + ] + } + ], + "source": [ + "PROJECT=$(gcloud config list --format='value(core.project)')\n", + "echo $PROJECT" + ] + }, + { + "cell_type": "markdown", + "id": "a8a166be-e959-497c-8ba0-44014442310e", + "metadata": {}, + "source": [ + "## Create a Bucket\n", + "\n", + "We will first make a new bucket (mb) with mostly default values. We will explore the newly created bucket with default values then immediately destroy it. This is a typical pattern when learning a new service by reducing chances to make a mistake (errors can be hard to understand). In most cases resources should not be created with all default values. We will re-create the bucket in the next section with more sensible values. " + ] + }, + { + "cell_type": "markdown", + "id": "f020bfcf-2e0e-4b8f-97f3-6b4753d95146", + "metadata": {}, + "source": [ + "We first generate a bucket name and store it in an environment variable for future use. Bucket names are globally unique, so here we use \"essentials\" with your Cloud Shell username (the username part of the Google Account) as as a prefix to name the bucket something unique and easy to understand where it came from." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "255eb914-a06a-4bfd-adbb-7ca2f94678e7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "essentials-student31-2021-10-26\n" + ] + } + ], + "source": [ + "BUCKET=\"essentials-${USER}-$(date +%F)\"\n", + "echo $BUCKET" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "b0661b4c-b56b-4d6d-ba3c-e1c5f38699c6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Creating gs://essentials-student31-2021-10-26/...\n" + ] + } + ], + "source": [ + "gsutil mb \"gs://$BUCKET\"" + ] + }, + { + "cell_type": "markdown", + "id": "42584633-f77f-43b4-bb23-49ae4364af07", + "metadata": {}, + "source": [ + "If the bucket creation fails with a \"ServiceException: 409 A Cloud Storage bucket named ... already exists.\" it means that the name is not unique or you have already created the bucket. You will need to either delete the bucket or choose another name.\n", + "\n", + "*Advanced Note: Bucket creation may fail if you are not using the Cloud Shell and the machine you are using has generic account name such as `pi`, `admin`, or other standard username. If this is the case just set `BUCKET` to some unique value.*" + ] + }, + { + "cell_type": "markdown", + "id": "7cc6dd9a-bae4-4e59-8e7c-48007aaccd28", + "metadata": {}, + "source": [ + "## Show the Bucket\n", + "\n", + "We will now list (Enumerate) the resource to verify that the resource (bucket) has been created by using the `gsutil ls` command to list all the buckets." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "8ae2988a-fca8-4601-a7e5-458627fe68e3", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "gs://essentials-student31-2021-10-26/\n" + ] + } + ], + "source": [ + "gsutil ls" + ] + }, + { + "cell_type": "markdown", + "id": "7d50532e-1c0f-488d-8710-439a62617529", + "metadata": {}, + "source": [ + "In this case we see that there is only one bucket and it is the one we just created. Your project may contain other buckets." + ] + }, + { + "cell_type": "markdown", + "id": "00973c8c-b3bb-41ba-b3aa-9a425284c70a", + "metadata": {}, + "source": [ + "Just like the unix `ls` command, the `gsutil ls` command has a lot of (similar) options. Let's get detailed information about the bucket. Note we must specify the bucket by using the `-b` option to specify the bucket. Type `gsutil ls --help` for more information on the command and options." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "ecc0e4fe-c9d0-4edd-9a0c-56b5b9ec7c54", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "gs://essentials-student31-2021-10-26/ :\n", + "\tStorage class:\t\t\tSTANDARD\n", + "\tLocation type:\t\t\tmulti-region\n", + "\tLocation constraint:\t\tUS\n", + "\tVersioning enabled:\t\tNone\n", + "\tLogging configuration:\t\tNone\n", + "\tWebsite configuration:\t\tNone\n", + "\tCORS configuration: \t\tNone\n", + "\tLifecycle configuration:\tNone\n", + "\tRequester Pays enabled:\t\tNone\n", + "\tLabels:\t\t\t\tNone\n", + "\tDefault KMS key:\t\tNone\n", + "\tTime created:\t\t\tTue, 26 Oct 2021 20:47:06 GMT\n", + "\tTime updated:\t\t\tTue, 26 Oct 2021 20:47:06 GMT\n", + "\tMetageneration:\t\t\t1\n", + "\tBucket Policy Only enabled:\tFalse\n", + "\tPublic access prevention:\tunspecified\n", + "\tRPO:\t\t\t\tDEFAULT\n", + "\tACL:\t\t\t\t\n", + "\t [\n", + "\t {\n", + "\t \"entity\": \"project-owners-1002111293252\",\n", + "\t \"projectTeam\": {\n", + "\t \"projectNumber\": \"1002111293252\",\n", + "\t \"team\": \"owners\"\n", + "\t },\n", + "\t \"role\": \"OWNER\"\n", + "\t },\n", + "\t {\n", + "\t \"entity\": \"project-editors-1002111293252\",\n", + "\t \"projectTeam\": {\n", + "\t \"projectNumber\": \"1002111293252\",\n", + "\t \"team\": \"editors\"\n", + "\t },\n", + "\t \"role\": \"OWNER\"\n", + "\t },\n", + "\t {\n", + "\t \"entity\": \"project-viewers-1002111293252\",\n", + "\t \"projectTeam\": {\n", + "\t \"projectNumber\": \"1002111293252\",\n", + "\t \"team\": \"viewers\"\n", + "\t },\n", + "\t \"role\": \"READER\"\n", + "\t }\n", + "\t ]\n", + "\tDefault ACL:\t\t\t\n", + "\t [\n", + "\t {\n", + "\t \"entity\": \"project-owners-1002111293252\",\n", + "\t \"projectTeam\": {\n", + "\t \"projectNumber\": \"1002111293252\",\n", + "\t \"team\": \"owners\"\n", + "\t },\n", + "\t \"role\": \"OWNER\"\n", + "\t },\n", + "\t {\n", + "\t \"entity\": \"project-editors-1002111293252\",\n", + "\t \"projectTeam\": {\n", + "\t \"projectNumber\": \"1002111293252\",\n", + "\t \"team\": \"editors\"\n", + "\t },\n", + "\t \"role\": \"OWNER\"\n", + "\t },\n", + "\t {\n", + "\t \"entity\": \"project-viewers-1002111293252\",\n", + "\t \"projectTeam\": {\n", + "\t \"projectNumber\": \"1002111293252\",\n", + "\t \"team\": \"viewers\"\n", + "\t },\n", + "\t \"role\": \"READER\"\n", + "\t }\n", + "\t ]\n" + ] + } + ], + "source": [ + "gsutil ls -L -b \"gs://$BUCKET\"" + ] + }, + { + "cell_type": "markdown", + "id": "1e7fceeb-661e-4a42-8fa4-5c2c50503ea6", + "metadata": {}, + "source": [ + "You can see that the bucket is \"multi-region\" and uses the \"standard\" storage class (this the type of storage). The standard storage class is best used for frequently used buckets. Other storage classes are designed for less frequent use, are less expensive, but come with more restrictions and more complex access costs and should be only used after careful consideration.\n", + "\n", + "Ignore the \"ACL\" section for now.\n", + "\n", + "You may also want to verify that you can see the newly created bucket in the web console dashboard or \"Cloud Storage\" page and explore the properties here." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "8fc85600-a793-413b-b139-642e4cf08a8e", + "metadata": {}, + "outputs": [], + "source": [ + "sleep 2 # used to ensure that the next command runs correctly in Jupyter" + ] + }, + { + "cell_type": "markdown", + "id": "c9480ea5-41b3-4bde-ad01-3f290792bc0a", + "metadata": {}, + "source": [ + "## Bucket Activity\n", + "Next we will check/follow the Activity log to ensure that the bucket was created. This command assumes that there was no other activity in the account, you may need to increase the `--limit` value to something larger to find the activity. \n", + "\n", + "Activity logs are used track important project level activity (such as bucket creation and deletion) and can be used for security and tracking resources and cannot be deleted. This can be used to debug if something goes wrong or in a security audit or investigation.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "bb8e7270-2f64-4fbd-b312-b18a7960700f", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "---\n", + "insertId: -z8eaicef61t3\n", + "logName: projects/just-armor-301114/logs/cloudaudit.googleapis.com%2Factivity\n", + "protoPayload:\n", + " '@type': type.googleapis.com/google.cloud.audit.AuditLog\n", + " authenticationInfo:\n", + " principalEmail: student31@class.internet2.edu\n", + " authorizationInfo:\n", + " - granted: true\n", + " permission: storage.buckets.create\n", + " resource: projects/_/buckets/essentials-student31-2021-10-26\n", + " resourceAttributes: {}\n", + " methodName: storage.buckets.create\n", + " request:\n", + " defaultObjectAcl:\n", + " '@type': type.googleapis.com/google.iam.v1.Policy\n", + " bindings:\n", + " - members:\n", + " - projectViewer:just-armor-301114\n", + " role: roles/storage.legacyObjectReader\n", + " - members:\n", + " - projectOwner:just-armor-301114\n", + " - projectEditor:just-armor-301114\n", + " role: roles/storage.legacyObjectOwner\n", + " requestMetadata:\n", + " callerIp: 35.239.199.87\n", + " callerSuppliedUserAgent: apitools Python/3.7.3 gsutil/5.4 (linux) analytics/disabled\n", + " interactive/True command/mb google-cloud-sdk/361.0.0,gzip(gfe)\n", + " destinationAttributes: {}\n", + " requestAttributes:\n", + " auth: {}\n", + " time: '2021-10-26T20:47:05.790066575Z'\n", + " resourceLocation:\n", + " currentLocations:\n", + " - us\n", + " resourceName: projects/_/buckets/essentials-student31-2021-10-26\n", + " serviceData:\n", + " '@type': type.googleapis.com/google.iam.v1.logging.AuditData\n", + " policyDelta:\n", + " bindingDeltas:\n", + " - action: ADD\n", + " member: projectEditor:just-armor-301114\n", + " role: roles/storage.legacyBucketOwner\n", + " - action: ADD\n", + " member: projectOwner:just-armor-301114\n", + " role: roles/storage.legacyBucketOwner\n", + " - action: ADD\n", + " member: projectViewer:just-armor-301114\n", + " role: roles/storage.legacyBucketReader\n", + " serviceName: storage.googleapis.com\n", + " status: {}\n", + "receiveTimestamp: '2021-10-26T20:47:06.793562424Z'\n", + "resource:\n", + " labels:\n", + " bucket_name: essentials-student31-2021-10-26\n", + " location: us\n", + " project_id: just-armor-301114\n", + " type: gcs_bucket\n", + "severity: NOTICE\n", + "timestamp: '2021-10-26T20:47:05.785456012Z'\n" + ] + } + ], + "source": [ + "gcloud logging read --limit 1" + ] + }, + { + "cell_type": "markdown", + "id": "a322d55d-5f1c-4ca9-8aec-ec6a4579cea1", + "metadata": {}, + "source": [ + "## Remove a Bucket\n", + "Since we are done exploring the bucket we will remove the bucket (rb). This is a common pattern in cloud computing, to remove resources once we are done with them otherwise they will just sit around incurring costs." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "7b32764d-e7b5-42d6-8f88-bc6936b2024a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "gs://essentials-student31-2021-10-26/\n" + ] + } + ], + "source": [ + "gsutil ls" + ] + }, + { + "cell_type": "markdown", + "id": "b3ff49c2-01b5-4c1b-a3e9-100f55c2f774", + "metadata": {}, + "source": [ + "We first verify that our environment variable is set. This is a useful pattern to catch simple bugs faster and to aid in debugging." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "82487bd8-14f4-4811-8f5a-dd346042bd0f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Bucket: essentials-student31-2021-10-26\n" + ] + } + ], + "source": [ + "echo \"Bucket: $BUCKET\"" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "8635db94-2f3c-46d2-acc2-76a176fb37e7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Removing gs://essentials-student31-2021-10-26/...\n" + ] + } + ], + "source": [ + "gsutil rb \"gs://$BUCKET\"" + ] + }, + { + "cell_type": "markdown", + "id": "9002604e-6908-4a86-ab45-b9dd4cb11f75", + "metadata": {}, + "source": [ + "We verify that the bucket has been removed. In this example there is not output since there are no more buckets." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "ed6f0204-0de2-4d20-ab00-3b7a1023fd86", + "metadata": {}, + "outputs": [], + "source": [ + "gsutil ls" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Bash", + "language": "bash", + "name": "bash" + }, + "language_info": { + "codemirror_mode": "shell", + "file_extension": ".sh", + "mimetype": "text/x-sh", + "name": "bash" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/content/GCP/06_running_analysis.ipynb b/content/GCP/06_running_analysis.ipynb index 29291a2..0ba3926 100644 --- a/content/GCP/06_running_analysis.ipynb +++ b/content/GCP/06_running_analysis.ipynb @@ -30,7 +30,7 @@ "source": [ "## Connect to the VM\n", "\n", - "First login to the instance from the Cloud Shell\n", + "First login to the instance from the Cloud Shell by running the following command:\n", "```\n", "gcloud compute ssh instance-1\n", "```\n", @@ -91,11 +91,8 @@ "0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.\n", "Need to get 0 B/5633 kB of archives.\n", "After this operation, 36.2 MB of additional disk space will be used.\n", - "debconf: unable to initialize frontend: Dialog\n", - "debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)\n", - "debconf: falling back to frontend: Readline\n", "Selecting previously unselected package git.\n", - "(Reading database ... 55447 files and directories currently installed.)\n", + "(Reading database ... 56121 files and directories currently installed.)\n", "Preparing to unpack .../git_1%3a2.20.1-2+deb10u3_amd64.deb ...\n", "Unpacking git (1:2.20.1-2+deb10u3) ...\n", "Setting up git (1:2.20.1-2+deb10u3) ...\n" @@ -109,6 +106,16 @@ { "cell_type": "code", "execution_count": 2, + "id": "96db6a66-3fbf-419a-b8c8-dbb27639e990", + "metadata": {}, + "outputs": [], + "source": [ + "cd ~" + ] + }, + { + "cell_type": "code", + "execution_count": 3, "id": "36554c99-ba08-4733-8ef2-e68d42d0d2b7", "metadata": {}, "outputs": [ @@ -117,11 +124,11 @@ "output_type": "stream", "text": [ "Cloning into 'CLASS-Examples'...\n", - "remote: Enumerating objects: 16, done. \n", - "remote: Counting objects: 100% (16/16), done. \n", - "remote: Compressing objects: 100% (13/13), done. \n", - "remote: Total 41 (delta 4), reused 15 (delta 3), pack-reused 25 \n", - "Unpacking objects: 100% (41/41), done.\n" + "remote: Enumerating objects: 23, done.\u001b[K\n", + "remote: Counting objects: 100% (23/23), done.\u001b[K\n", + "remote: Compressing objects: 100% (18/18), done.\u001b[K\n", + "remote: Total 48 (delta 8), reused 20 (delta 5), pack-reused 25\u001b[K\n", + "Unpacking objects: 100% (48/48), done.\n" ] } ], @@ -131,17 +138,17 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "id": "90c1cda7-60d4-44bb-84f8-e776a77a94ab", "metadata": {}, "outputs": [], "source": [ - "cd CLASS-Examples/landsat/" + "cd ~/CLASS-Examples/landsat/" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "id": "55b628d5-6e5c-45a5-9cd3-c129db9cdcd2", "metadata": {}, "outputs": [ @@ -150,12 +157,12 @@ "output_type": "stream", "text": [ "total 24\n", - "-rw-r--r-- 1 tmiddelkoop tmiddelkoop 841 Nov 9 22:31 ReadMe.md\n", - "-rw-r--r-- 1 tmiddelkoop tmiddelkoop 72 Nov 9 22:31 clean.sh\n", - "-rw-r--r-- 1 tmiddelkoop tmiddelkoop 256 Nov 9 22:31 download.sh\n", - "-rw-r--r-- 1 tmiddelkoop tmiddelkoop 314 Nov 9 22:31 get-index.sh\n", - "-rw-r--r-- 1 tmiddelkoop tmiddelkoop 110 Nov 9 22:31 search.json\n", - "-rw-r--r-- 1 tmiddelkoop tmiddelkoop 1447 Nov 9 22:31 search.py\n" + "-rw-r--r-- 1 learner learner 862 Nov 10 22:31 ReadMe.md\n", + "-rw-r--r-- 1 learner learner 72 Nov 10 22:31 clean.sh\n", + "-rw-r--r-- 1 learner learner 280 Nov 10 22:31 download.sh\n", + "-rw-r--r-- 1 learner learner 314 Nov 10 22:31 get-index.sh\n", + "-rw-r--r-- 1 learner learner 76 Nov 10 22:31 search.json\n", + "-rw-r--r-- 1 learner learner 783 Nov 10 22:31 search.py\n" ] } ], @@ -175,7 +182,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "id": "e56ab74a-ae6d-4602-a26b-4a2656bd40cd", "metadata": {}, "outputs": [ @@ -212,7 +219,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "id": "bbe85b75-c7cd-40ed-a3b0-37cbd0a5f52e", "metadata": {}, "outputs": [ @@ -230,7 +237,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, "id": "18a9b71c-5871-4ce2-a202-b48ad04e8d38", "metadata": {}, "outputs": [ @@ -239,12 +246,7 @@ "output_type": "stream", "text": [ "Copying gs://gcp-public-data-landsat/index.csv.gz...\n", - "==> NOTE: You are downloading one or more large file(s), which would \n", - "run significantly faster if you enabled sliced object downloads. This\n", - "feature is enabled by default but requires that compiled crcmod be\n", - "installed (see \"gsutil help crcmod\").\n", - "\n", - "/ [1 files][757.2 MiB/757.2 MiB] \n", + "- [1 files][757.2 MiB/757.2 MiB] 54.0 MiB/s \n", "Operation completed over 1 objects/757.2 MiB. \n" ] } @@ -255,7 +257,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "id": "2cdaf24c-c4aa-4e80-9236-939e7c982916", "metadata": {}, "outputs": [], @@ -265,7 +267,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 10, "id": "b005876c-f9af-43d6-80c6-f04295413b9b", "metadata": {}, "outputs": [ @@ -274,13 +276,226 @@ "output_type": "stream", "text": [ "total 2.6G\n", - "-rw-r--r-- 1 tmiddelkoop tmiddelkoop 2.6G Nov 9 22:31 index.csv\n" + "-rw-r--r-- 1 learner learner 2.6G Nov 10 22:32 index.csv\n" ] } ], "source": [ "ls -lh data" ] + }, + { + "cell_type": "markdown", + "id": "fcde8334-f58d-4c3d-995a-2491be0f95ea", + "metadata": {}, + "source": [ + "We will now explore the data" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "ffe969db-d207-44fe-8957-8d129c76ee8f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SCENE_ID,PRODUCT_ID,SPACECRAFT_ID,SENSOR_ID,DATE_ACQUIRED,COLLECTION_NUMBER,COLLECTION_CATEGORY,SENSING_TIME,DATA_TYPE,WRS_PATH,WRS_ROW,CLOUD_COVER,NORTH_LAT,SOUTH_LAT,WEST_LON,EAST_LON,TOTAL_SIZE,BASE_URL\n", + "LM41170311983272FFF03,LM04_L1TP_117031_19830929_20180412_01_T2,LANDSAT_4,MSS,1983-09-29,01,T2,1983-09-29T01:45:39.0520000Z,L1TP,117,31,2.0,42.79515,40.7823,124.88634,127.85668,27769529,gs://gcp-public-data-landsat/LM04/01/117/031/LM04_L1TP_117031_19830929_20180412_01_T2\n", + "LM10890151972214AAA05,LM01_L1GS_089015_19720801_20180428_01_T2,LANDSAT_1,MSS,1972-08-01,01,T2,1972-08-01T22:10:17.7940000Z,L1GS,89,15,0.0,65.211,62.9963,-170.33714,-165.11701,16228538,gs://gcp-public-data-landsat/LM01/01/089/015/LM01_L1GS_089015_19720801_20180428_01_T2\n", + "LC80660912015026LGN02,LC08_L1GT_066091_20150126_20180202_01_T2,LANDSAT_8,OLI_TIRS,2015-01-26,01,T2,2015-01-26T21:24:43.3704780Z,L1GT,66,91,94.98,-43.51716,-45.68406,-177.72298,-174.66884,1075234161,gs://gcp-public-data-landsat/LC08/01/066/091/LC08_L1GT_066091_20150126_20180202_01_T2\n" + ] + } + ], + "source": [ + "head --lines=4 data/index.csv" + ] + }, + { + "cell_type": "markdown", + "id": "532e6da3-302a-4e8a-8570-752995f30f1d", + "metadata": {}, + "source": [ + "## Search for Data\n", + "\n", + "We can see the data is well formed and what we expect. We will now use this data to download data related to a specific point and for the Landsat 8. The following script does a simple filter." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "c5e300c3-e1f3-4cd4-9679-77725e61c4db", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "#!/usr/bin/python3\n", + "import json\n", + "import csv\n", + "import sys\n", + "\n", + "# Example: Burr Oak Tree\n", + "# 38.899313,-92.464562 (Lat north+, Long west-) ; Landsat Path 025, Row 033\n", + "config=json.load(open(\"search.json\"))\n", + "lat,lon=config['lat'],config['lon']\n", + "landsat=config['landsat']\n", + "\n", + "reader=csv.reader(sys.stdin)\n", + "header=next(reader) # skip header\n", + "for l in reader:\n", + " SCENE_ID,PRODUCT_ID,SPACECRAFT_ID,SENSOR_ID,DATE_ACQUIRED,COLLECTION_NUMBER,COLLECTION_CATEGORY,SENSING_TIME,DATA_TYPE,WRS_PATH,WRS_ROW,CLOUD_COVER,NORTH_LAT,SOUTH_LAT,WEST_LON,EAST_LON,TOTAL_SIZE,BASE_URL=l\n", + " west,east=float(WEST_LON),float(EAST_LON)\n", + " north,south=float(NORTH_LAT),float(SOUTH_LAT)\n", + " if SPACECRAFT_ID==landsat and north >= lat and south <= lat and west <= lon and east >= lon:\n", + " print(BASE_URL) # output BASE_URL\n" + ] + } + ], + "source": [ + "cat search.py" + ] + }, + { + "cell_type": "markdown", + "id": "4aa3de47-3dd4-4a0f-9f07-f2f004de7054", + "metadata": {}, + "source": [ + "We can see that the actual search data comes from the file `search.json`. The program reads the data from the standard input and iterates over all rows in the CSV file. It filters the results for which the image contains the pint and prints out the bucket URL for them. We are interested in all products that contain the Burr Oak Tree." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "c9872510-4265-4b0e-aeb5-5a829ff69b24", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"lat\": 38.899313,\n", + " \"lon\": -92.464562,\n", + " \"landsat\": \"LANDSAT_8\"\n", + "}\n" + ] + } + ], + "source": [ + "cat search.json" + ] + }, + { + "cell_type": "markdown", + "id": "cbb27235-6bc4-4eb6-b668-5c30427a28b8", + "metadata": {}, + "source": [ + "Now lets test this on a subset of the data." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "6912a9ec-0f9b-4500-ba20-d4280592b323", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1\n", + "gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1\n" + ] + } + ], + "source": [ + "head --lines=100000 data/index.csv | python3 search.py" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "3572c518-df83-4906-bfa6-a37bde2a5063", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "#!/bin/bash\n", + "\n", + "# Read space separated URL from STDIN and download \n", + "while read -r URL ; do\n", + " echo \"+++ $URL\"\n", + " # -m parallel\n", + " # -n no-clobber (do not re-download data)\n", + " # -r recursive (download all the data in the specified URL)\n", + " gsutil -m cp -n -r \"${URL}/\" data/\n", + "done\n" + ] + } + ], + "source": [ + "cat download.sh" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "cccec3e1-0dcd-4e3b-a059-a884f5219b66", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+++ gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_ANG.txt...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B1.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B11.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B10.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B2.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B8.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B9.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_BQA.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B3.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B4.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B6.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B5.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_MTL.txt...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20160521_20170223_01_T1/LC08_L1TP_025033_20160521_20170223_01_T1_B7.TIF...\n", + "- [14/14 files][ 1021 MiB/ 1021 MiB] 100% Done \n", + "Operation completed over 14 objects/1021.8 MiB. \n", + "+++ gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_ANG.txt...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B10.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B1.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B2.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B11.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B8.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B4.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_MTL.txt...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B9.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B3.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B7.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B6.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_BQA.TIF...\n", + "Copying gs://gcp-public-data-landsat/LC08/01/025/033/LC08_L1TP_025033_20171218_20171224_01_T1/LC08_L1TP_025033_20171218_20171224_01_T1_B5.TIF...\n", + "- [14/14 files][ 1.0 GiB/ 1.0 GiB] 100% Done \n", + "Operation completed over 14 objects/1.0 GiB. \n" + ] + } + ], + "source": [ + "head --lines=100000 data/index.csv | python3 search.py | bash download.sh" + ] } ], "metadata": { diff --git a/content/GCP/clean.sh b/content/GCP/clean.sh deleted file mode 100755 index 454a326..0000000 --- a/content/GCP/clean.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -echo "=== clean.sh" - -## Cleanup for 06_running_analysis -sudo apt remove git -rm -rf ./CLASS-Examples diff --git a/content/GCP/intro_to_GCP_Essentials.ipynb b/content/GCP/intro_to_GCP_Essentials.ipynb index 4e9c5ab..2bd2d17 100644 --- a/content/GCP/intro_to_GCP_Essentials.ipynb +++ b/content/GCP/intro_to_GCP_Essentials.ipynb @@ -11,11 +11,12 @@ "\n", "1. [Introduction to the GCP Cloud Console](./01_intro_to_cloud_console)\n", "2. [Introduction to Cloud Compute](./02_intro_to_compute)\n", - "3. [Introduction to the Cloud CLI](./04_intro_to_cli)\n", - "4. [Introduction to Cloud Storage](./05_intro_to_cloud_storage)\n", - "5. [Running Analysis on the Cloud](./06_running_analysis)\n", - "6. [Monitoring Costs](./07_monitoring_costs)\n", - "7. [Cleaning up Resources and Best Practices](./08_cleaning_up_resources)\n" + "3. [Introduction to Cloud Storage](./03_intro_to_cloud_storage)\n", + "4. [Introduction to the Cloud CLI](./04_intro_to_cli)\n", + "4. [Using the Cloud Storage CLI](./05_cli_storage)\n", + "6. [Running Analysis on the Cloud](./06_running_analysis)\n", + "7. [Monitoring Costs](./07_monitoring_costs)\n", + "8. [Cleaning up Resources and Best Practices](./08_cleaning_up_resources)\n" ] } ], diff --git a/content/_toc.yml b/content/_toc.yml index e771863..38a8fcb 100644 --- a/content/_toc.yml +++ b/content/_toc.yml @@ -37,13 +37,14 @@ parts: sections: - file: GCP/01_intro_to_cloud_console - file: GCP/02_intro_to_compute + - file: GCP/03_intro_to_cloud_storage - file: GCP/04_intro_to_cli - - file: GCP/05_intro_to_cloud_storage + - file: GCP/05_cli_storage - file: GCP/06_running_analysis - file: GCP/07_monitoring_costs - file: GCP/08_cleaning_up_resources - file: GCP/glossary - + - caption: Extra Learning Materials chapters: - file: ELM/01_bash_shell diff --git a/scripts/aws-create.sh b/scripts/aws-create.sh new file mode 100755 index 0000000..43e7941 --- /dev/null +++ b/scripts/aws-create.sh @@ -0,0 +1,75 @@ +#!/bin/bash + +# Options +BRANCH="${1:-aws-dev}" # checkout branch $1 + +# Static Config - update aws-*.sh files +NAME=learner +VM=essentials +PROJECT=CLASS-Essentials +GITHUB=github.internet2.edu +REPO="git@${GITHUB}:CLASS/${PROJECT}.git" + +echo "=== aws-create.sh $PROJECT $BRANCH" + +VPC=$(aws ec2 describe-vpcs --filter "Name=tag:Name,Values=${VM}" --query "Vpcs[].VpcId" --output text) +SUBNET=$(aws ec2 describe-subnets --filter "Name=tag:Name,Values=${VM}" --query "Subnets[].SubnetId" --output text) +SG=$(aws ec2 describe-security-groups --filters "Name=group-name,Values=${VM}" --query "SecurityGroups[].GroupId" --output text) + +echo "+++ networking: $VM $VPC $SUBNET $SG" +if [ -z "${VPC}" -o -z "${SUBNET}" -o -z "${SG}" ] ; then + echo "--- '${VM}' networking does not exist. Use 'aws-vpc-create.sh' to create" + exit 1 +fi + +IP=$(aws ec2 describe-instances --filters 'Name=instance-state-name,Values=running' 'Name=tag:Name,Values=essentials' --query "Reservations[*].Instances[*].PublicIpAddress" --output text --no-cli-pager) +if [ -z "${IP}" ] ; then + echo "+++ creating VM" + aws ec2 run-instances \ + --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$VM}]" \ + --image-id resolve:ssm:/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2 \ + --instance-type m6i.large \ + --subnet-id $SUBNET \ + --security-group-ids $SG \ + --key-name $NAME \ + --no-cli-pager +fi + +while [ -z ${IP:=$(aws ec2 describe-instances --filters 'Name=instance-state-name,Values=running' "Name=tag:Name,Values=${VM}" --query 'Reservations[*].Instances[*].PublicIpAddress' --output text --no-cli-pager)} ] ; do + echo "+++ waiting for IP" + sleep 1 +done + +echo "+++ wait for boot and cloud-init ${VM} ${IP}" +ssh-keygen -R $IP +while ! ssh ec2-user@$IP sudo cloud-init status --wait ; do + sleep 1 +done + +echo "+++ configuring VM" + +ssh ec2-user@$IP -A < .ssh/known_hosts +git config --global color.ui auto +git config --global push.default simple +git config --global pull.ff only +git config --global user.name "$(git config user.name)" +git config --global user.email "$(git config user.name)" +git clone --branch $BRANCH $REPO +EOF + +echo "+++ configure ~/.ssh/$VM.config" +cat > ~/.ssh/$VM.config < .ssh/known_hosts +gcloud compute ssh --zone=$ZONE $NAME@$VM --ssh-flag='-A' < .ssh/known_hosts git config --global color.ui auto git config --global push.default simple git config --global pull.ff only +git config --global user.name "$(git config user.name)" +git config --global user.email "$(git config user.name)" git clone --branch $BRANCH $REPO -cd $PROJECT -git config user.name "$(git config user.name)" -git config user.email "$(git config user.name)" EOF echo "+++ configure local ssh" gcloud compute config-ssh echo "+++ starting Jypter" -gcloud compute ssh --zone=$ZONE $VM --ssh-flag='-t -L 8080:localhost:8080 -L 8081:localhost:8081' --command="cd $PROJECT ; ./scripts/jupyter-lab.sh" +gcloud compute ssh --zone=$ZONE $NAME@$VM --ssh-flag='-t -L 8080:localhost:8080 -L 8081:localhost:8081' --command="cd $PROJECT ; ./scripts/jupyter-lab.sh" diff --git a/scripts/gcp-stop.sh b/scripts/gcp-delete.sh similarity index 100% rename from scripts/gcp-stop.sh rename to scripts/gcp-delete.sh diff --git a/scripts/gcp-vm-clean.sh b/scripts/gcp-vm-clean.sh new file mode 100755 index 0000000..e4ec2bb --- /dev/null +++ b/scripts/gcp-vm-clean.sh @@ -0,0 +1,10 @@ +#!/bin/bash + +## Clean VM in prep for (re)running VM work + +echo "=== gcp-clean-vm.sh" + +## Cleanup for 06_running_analysis +sudo apt remove git +rm -rf ~/CLASS-Examples + diff --git a/scripts/gcp-vpc-create-default.sh b/scripts/gcp-vpc-create-default.sh new file mode 100755 index 0000000..b6a8485 --- /dev/null +++ b/scripts/gcp-vpc-create-default.sh @@ -0,0 +1,9 @@ +#!/bin/bash + +## Create a default VPC similar to the GCP default. +echo "=== gcp-vpc-create-default.sh" + +echo "+++ creating the default VCP network allowing internal traffic and external ssh and ICMP access" +gcloud compute networks create default +gcloud compute firewall-rules create default-allow-internal --network default --allow all --source-ranges=10.128.0.0/9 +gcloud compute firewall-rules create default-allow-external --network default --allow tcp:22,icmp diff --git a/scripts/jupyter-lab.sh b/scripts/jupyter-lab.sh index 36ea3a0..5f131d6 100755 --- a/scripts/jupyter-lab.sh +++ b/scripts/jupyter-lab.sh @@ -4,7 +4,7 @@ if [ -r ./local.sh ] ; then . ./local.sh fi -echo "=== Install and run JupyterLab locally" +echo "=== jupyterlab.sh Install and run JupyterLab locally" echo "+++ installing jupyter" python3 -m venv .venv @@ -18,4 +18,5 @@ python3 -m bash_kernel.install python3 -m pip install --upgrade jupyterlab-spellchecker +echo "+++ run jupyter" jupyter-lab --port=8081