Managing Cloud Storage from the Command Line#

Overview

Teaching: 20 min

Exercises: 2 min

Questions:

  • How do I store data in a Bucket?

Objectives:

  • Manage buckets using the command line.

  • Create a bucket with a sensible configuration.

  • Learn to manage data (add, list, retrieve, delete) in a bucket using the command line.

Now that Drew understands how to create create buckets using the web console and to use the Google command line tools they are now going to explore managing buckets with the command line.

Configuration#

We will use what we learned in the previous episode to set the configuration by entering the commands below. Remember, it is important to verify the Account and Project information every time you start interacting with the cloud.

PROJECT=$(gcloud config get-value project)
ACCOUNT=$(gcloud config get-value account)
BUCKET="essentials-$USER-$(date +%F)"
echo "Account: $ACCOUNT Project: $PROJECT Bucket: $BUCKET"
Account: 998995103712-compute@developer.gserviceaccount.com Project: essentials-learner Bucket: essentials-learner-2022-05-02

Tip

If you need to setup a new Cloud Shell you can copy and paste the above commands into the new shell.

Create a Bucket#

We will first make a new bucket (mb) with the default values. We will explore the newly created bucket with default values then immediately destroy it. This is a typical pattern when learning a new product by reducing chances to make a mistake (errors can be hard to understand). In most cases resources should not be created with all default values. We will re-create the bucket in the next section with more sensible values.

We first verify that $BUCKET is set correctly.

echo $BUCKET
essentials-learner-2022-05-02

And now we create the bucket. Note that most times we must use gs:// to indicate that it is a bucket.

gsutil mb gs://$BUCKET
Creating gs://essentials-learner-2022-02-03/...

If the bucket creation fails with a “ServiceException: 409 A Cloud Storage bucket named … already exists.” it means that the name is not unique or you have already created the bucket. You will need to either delete the bucket or choose another name.

Show the Bucket#

We will now list all the buckets (resources) to verify that the bucket has been created by using the gsutil ls command.

gsutil ls
gs://essentials-learner-2022-02-03/

In this case we see that there is only one bucket and it is the one we just created. Your project may contain other buckets.

Just like the Linux ls command, the gsutil ls command has a lot of (similar) options. Let’s get detailed information about the bucket. Note we must specify the bucket by using the -b option to specify the bucket. Type gsutil ls --help for more information on the command and options.

gsutil ls -L -b gs://$BUCKET | head -18
gs://essentials-learner-2022-02-03/ :
	Storage class:			STANDARD
	Location type:			multi-region
	Location constraint:		US
	Versioning enabled:		None
	Logging configuration:		None
	Website configuration:		None
	CORS configuration: 		None
	Lifecycle configuration:	None
	Requester Pays enabled:		None
	Labels:				None
	Default KMS key:		None
	Time created:			Thu, 03 Feb 2022 22:19:56 GMT
	Time updated:			Thu, 03 Feb 2022 22:19:56 GMT
	Metageneration:			1
	Bucket Policy Only enabled:	False
	Public access prevention:	inherited
	RPO:				DEFAULT

You can see that the bucket is “multi-region” and uses the “standard” storage class (this is the type of storage). The standard storage class is best used for frequently used buckets. Other storage classes are designed for less frequent use, are less expensive, but come with more restrictions and more complex access costs and should only be used after careful consideration.

We used the | head -18 command here to limit the response to the first few lines (we ignore ACL for now)

You may also want to verify that you can see the newly created bucket in the web console dashboard or “Cloud Storage” page and explore the properties there.

Bucket Activity#

Next we will check/follow the Activity log to ensure that the bucket was created. This command assumes that there was no other activity in the account, you may need to increase the --limit value to something larger to find the activity.

Activity logs are used to track important project level activity (such as bucket creation and deletion) and can be used for security and tracking resources and cannot be deleted. This can be used to debug if something goes wrong or in a security audit or investigation.

gcloud logging read --limit 1 | head
---
insertId: -oywycvf1xe8xy
logName: projects/essentials-learner/logs/cloudaudit.googleapis.com%2Factivity
protoPayload:
  '@type': type.googleapis.com/google.cloud.audit.AuditLog
  authenticationInfo:
    principalEmail: learner@class.internet2.edu
  authorizationInfo:
  - granted: true
    permission: storage.buckets.create

Note

You will probably need to enable the logging api to continue, press y to accept as shown below:

API [logging.googleapis.com] not enabled on project [998995103712]. Would you 
like to enable and retry (this will take a few minutes)? (y/N)?

Use the Bucket#

Once the bucket is created, we can now store objects in them and work with them. It is possible to copy objects (files, strings, etc) in and out of the bucket using the gsutil cp command. The bucket can take objects from the command line or local files

First create a simple file called one.txt with the contents “test one”

echo "test one" > one.txt

Display the contents of the file by using the cat command.

cat one.txt
test one

Copy the file one.txt into the bucket as object “1”.

gsutil cp one.txt gs://$BUCKET/1
Copying file://one.txt [Content-Type=text/plain]...
/ [1 files][    9.0 B/    9.0 B]                                                
Operation completed over 1 objects/9.0 B.                                        
gsutil ls gs://$BUCKET
gs://essentials-learner-2022-02-03/1

Copy the object to the standard out (displays it on the screen).

gsutil cat gs://$BUCKET/1
test one

Remove the orignal file to make sure we are actually seeing the object in the bucket, not the local file.

rm -v one.txt
removed 'one.txt'
gsutil cat gs://$BUCKET/1
test one

We can also copy an object to a local file.

gsutil cp gs://$BUCKET/1 one.txt
Copying gs://essentials-learner-2022-02-03/1...
/ [1 files][    9.0 B/    9.0 B]                                                
Operation completed over 1 objects/9.0 B.                                        
cat one.txt
test one

Remove the local copy.

rm -v one.txt
removed 'one.txt'

We can also copy the output of a command directly into an object in a bucket. Since output of the date command changes we can be sure that we are not seeing old data in an object.

date | gsutil cp - gs://$BUCKET/2
Copying from <STDIN>...
/ [1 files][    0.0 B/    0.0 B]                                                
Operation completed over 1 objects.                                              

List all the objects in the buckets and verify the contents of object “2”.

gsutil ls gs://$BUCKET
gs://essentials-learner-2022-02-03/1
gs://essentials-learner-2022-02-03/2

Exercise

  • Display the object with the date in the bucket.

gsutil cat gs://$BUCKET/2
Thu 03 Feb 2022 10:20:26 PM UTC

Objects can also be removed from buckets. We will remove object “1” from the Bucket.

gsutil rm gs://$BUCKET/1
Removing gs://essentials-learner-2022-02-03/1...
/ [1 objects]                                                                   
Operation completed over 1 objects.                                              
gsutil ls gs://$BUCKET
gs://essentials-learner-2022-02-03/2

Try to remove the bucket.

gsutil rb gs://$BUCKET
/bin/true # ignore expected error in Jupyter
Removing gs://essentials-learner-2022-02-03/...
NotEmptyException: 409 BucketNotEmpty (essentials-learner-2022-02-03)

Buckets must be empty before they can be deleted (just like subdirectories). (The /bin/true ignores the expected error in Jupter so Jupyter does not stop processing the entire notebook)

gsutil rm gs://$BUCKET/2
Removing gs://essentials-learner-2022-02-03/2...
/ [1 objects]                                                                   
Operation completed over 1 objects.                                              
gsutil ls
gs://essentials-learner-2022-02-03/

Remove the Bucket#

Since we are done exploring the bucket we will remove the bucket (rb). This is a common pattern in cloud computing, to remove resources once we are done with them otherwise they will just sit around incurring costs.

gsutil ls
gs://essentials-learner-2022-02-03/

We first verify that our environment variable is set. This is a useful pattern to catch simple bugs faster and to aid in debugging.

echo "Bucket: $BUCKET"
Bucket: essentials-learner-2022-02-03

Check to see if the bucket is empty

gsutil ls gs://$BUCKET

Remove the bucket

gsutil rb gs://$BUCKET
Removing gs://essentials-learner-2022-02-03/...

We verify that the bucket has been removed.

gsutil ls

In this case the empty response indicates that there are no Buckets in the project.