README.md

# Ansible Deployment for InCommon COmanage Registry Training

This repository contains the necessary Ansible and other files for
deploying the InCommon COmanage Registry Training environment.

The primary Ansible playbook when run will create

* a AWS Virtual Private Cloud (VPC) with the name `comanage_training`.
All infrastructure is created within the VPC and can be deprovisioned by
deleting the VPC.

* an internet gateway (IG) to connect the VPC to the internet.

* public and private subnets within the VPC.

* NATs to allow virtual machines in the private subnets to open
connections to the internet (e.g. to execute `yum update`).

* appropriate security groups.

* SSH bastion hosts (one per public subnet).

* a host for a Shibboleth IdP. The IdP is deployed using the TAP image
and a Docker Swarm service stack (compose) file, and includes an LDAP server
pre-populated with user accounts for SAML authentication.

* N hosts for trainees. Each host is a single-node Docker Swarm
pre-populated with most details necessary for deploying COmanage Registry
using the TAP image.

* Target groups and an application load balancer (ALB) that terminates
TLS and is configured to route web traffic to the IdP and the COmanage
Registry hosts.

* Route53 DNS configurations so that the IdP and the training nodes can
all be easily reached.

## Secrets

There are no unencrypted secrets in this repository. All secrets,
including SAML keys, are encrypted using the Ansible vault tooling.
Refer to the Ansible documentation for details on how to manage the
encrypted files and strings.

## Prerequisites

You will need to have an AWS access key and AWS secret access key provisioned
by an administrator for the internet2-training AWS account.

You will need to have the Ansible vault password used with this ansible
deployment.

You will need to have the AWS-Trng-1.pem (or other approved key) used
for the initial login access to virtual machines.

You will need to use the AWS Console to access the Certificate Manager
and provision (or renew) an X.509 wildcard certificate for the domain
`*.comanage.incommon.training`.

## Set up Environment

To set up the environment for ansible the first time:

```
git clone https://github.internet2.edu/skoranda/comanage-registry-training-ansible.git
cd comanage-registry-training-deployment
python3 -m venv .
source bin/activate
pip install --upgrade pip
pip install ansible==2.10.7
pip install boto
pip install boto3
ansible-galaxy collection install amazon.aws
ansible-galaxy collection install community.aws
ansible-galaxy collection install community.docker
cp /path/to/AWS-Trng-1.pem .
```

Some ansible files are encrypted using `ansible-vault`. When running
a playbook ansible needs to be able to find the password for the
vault.

Create a file to hold the vault password, e.g.

```
touch ./.vault_pass.txt
chmod 600 ./.vault_pass.txt
```
Find the vault password and enter it into the file you just created.

## Initialization Before Running Playbooks

Do this each time to run ansible commands or playbooks
to set up the environment:

```
cd comanage-registry-training-deployment
source bin/activate

export ANSIBLE_CONFIG=`pwd`/ansible.cfg
export ANSIBLE_INVENTORY=`pwd`/aws_ec2.yml
export ANSIBLE_SSH_ARGS="-F `pwd`/ssh_config -C -o ControlMaster=auto -o ControlPersist=3600s"
export ANSIBLE_VAULT_PASSWORD_FILE=`pwd`/.vault_pass.txt

export AWS_ACCESS_KEY_ID='XXXXXXXX'
export AWS_SECRET_ACCESS_KEY='XXXXXXXX'
export AWS_REGION=us-west-2

ssh-add ./AWS-Trng-1.pem
```

## Configuration

Most of the configurable details, including the number of training nodes to
deploy, are set in the file

```
group_vars/all.yml
```

Review that file before running the playbook.

## Changing Training Password

The password used by trainees for SSH, authenticating to the IdP,
and for configuring the COmanage LDAP Provisioner is also set in the file
`group_vars/all.yml`.

Once you have determined the password, use the following command to
generate the encrypted version to paste into that file:

```
ansible-vault encrypt_string 'PASSWORD' --name comanage_training_password
```

## Provision the COmanage Training Infrastructure

To provision the infrastructure execute the playbook:

```
ansible-playbook comanage_registry_training.yml
```

**Note: After increasing the number of training nodes, you must restart
the IdP service by doing**

```
docker service update --force idp_shibboleth-idp
```

To reconfigure only the training nodes once they have already
been provisioned:

```
ansible-playbook training_nodes.yml --tags training_nodes
```

To get a list of inventory after provisioning (helpful to obtain mappings
to use with --limit to target specific nodes):

```
ansible-inventory --list
```

To reconfigure one specific node, for example

```
ansible-playbook \
    training_nodes.yml \
    --tags training_nodes \
    --limit tag_public_fqdn_registry4_comanage_incommon_training
```

To force an update of the COmanage Registry services stack file:

```
ansible-playbook \
    training_nodes.yml \
    --tags update_stack_file \
    -e force_update_stack_file=yes
```

To force an update of the LDIF used by training node LDAP:

```
ansible-playbook \
    training_nodes.yml \
    --tags update_structure_ldif_file \
    -e force_update_structure_ldif_file=yes
```

## SSH Access

Trainers may use their provisioned SSH keys to access all nodes. Each trainer
has a dedicated account on each node.

Trainees may SSH using the account `training` and the provisioned password.

Begin by logging into the bastion node, e.g.

```
$ ssh training@ssh.comanage.incommon.training
training@ssh.comanage.incommon.training's password:
Last login: Thu Nov  7 15:12:40 2019 from some/host
[training@ssh ~]$
```

From there each trainee may SSH into their assigned host:

```
[training@ssh ~]$ ssh registry1-private
training@registry1-private's password:
Last login: Thu Nov  7 17:43:27 2019 from ip-192-168-10-10.us-west-2.compute.internal
[training@registry1-private ~]$
```

Only trainers may SSH into the IdP node:

```
skoranda@paprika:~$ ssh -A ssh.comanage.incommon.training
Last login: Thu Nov  7 15:01:48 2019 from some.host
[skoranda@ssh ~]$ ssh login-private
Last login: Thu Nov  7 17:43:56 2019 from ip-192-168-10-10.us-west-2.compute.internal
```

## Deploying the IdP

The Ansible tooling does not automatically start the IdP service stack.
To start the stack log into the IdP node and execute

```
docker stack deploy --compose-file /opt/shibboleth-idp-stack.yml idp
```

**Note: After increasing the number of training nodes, you must restart
the IdP service by doing**

```
docker service update --force idp_shibboleth-idp
```

Useful Docker Swarm commands for the IdP node are

```
docker stack ls

docker service ls

docker service ps idp_shibboleth-idp

docker service ps idp_ldap

docker service logs -f idp_shibboleth-idp

docker service logs -f idp_ldap

docker stack rm idp
```

## Deploying COmanage Registry

Each trainee is expected to SSH to the bastion host and then to their
assigned node. In the home directory for the `training` user the trainee
will find the Docker Swarm services stack (compose) file for deploying
COmanage Registry, a MariaDB database, and an LDAP server.

Before deploying the service stack the trainee must first, as an exercise,
create some Docker Swarm secrets (see the training materials for details).
Most secrets have been pre-populated using Ansible to save time, but the
trainee is expected to create a few secrets.

Once successfully deployed, COmanage Registry is available at the URL

```
https://registry1.comanage.incommon.training
```

for node 1, and

```
https://registry2.comanage.incommon.training
```

for node 2, and so on.

## Fixing a bad bootstrap

Trainees that do not follow the instructions closely may bootstrap
COmanage Registry with a configuration that does not allow them
to authenticate as the platform administrator. When that happens
follow these steps on the training node as the root user:

1. `sudo docker stack rm comanage`
1. `sudo cd /srv/docker/var/lib/mysql`
1. `sudo rm -rf ./*`
1. `sudo cd /srv/docker/srv/comanage-registry/local/Config`
1. `sudo rm -rf ./*`

Then tell the user to correct the error in the stack file
and try again.

## Interference from existing SSH agent

If you find that your existing SSH agent is interfering with the SSH connections
used by ansible, it might help to start with a fresh agent when you begin your
work for the say:

```
cd comanage-registry-training-deployment
rm ./ssh_mux_*
kill $SSH_AGENT_PID
unset SSH_AUTH_SOCK
eval `ssh-agent -s`
ssh-add ./AWS-Trng-1.pem
```
	# Ansible Deployment for InCommon COmanage Registry Training

	This repository contains the necessary Ansible and other files for
	deploying the InCommon COmanage Registry Training environment.

	The primary Ansible playbook when run will create

	* a AWS Virtual Private Cloud (VPC) with the name `comanage_training`.
	All infrastructure is created within the VPC and can be deprovisioned by
	deleting the VPC.

	* an internet gateway (IG) to connect the VPC to the internet.

	* public and private subnets within the VPC.

	* NATs to allow virtual machines in the private subnets to open
	connections to the internet (e.g. to execute `yum update`).

	* appropriate security groups.

	* SSH bastion hosts (one per public subnet).

	* a host for a Shibboleth IdP. The IdP is deployed using the TAP image
	and a Docker Swarm service stack (compose) file, and includes an LDAP server
	pre-populated with user accounts for SAML authentication.

	* N hosts for trainees. Each host is a single-node Docker Swarm
	pre-populated with most details necessary for deploying COmanage Registry
	using the TAP image.

	* Target groups and an application load balancer (ALB) that terminates
	TLS and is configured to route web traffic to the IdP and the COmanage
	Registry hosts.

	* Route53 DNS configurations so that the IdP and the training nodes can
	all be easily reached.

	## Secrets

	There are no unencrypted secrets in this repository. All secrets,
	including SAML keys, are encrypted using the Ansible vault tooling.
	Refer to the Ansible documentation for details on how to manage the
	encrypted files and strings.

	## Prerequisites

	You will need to have an AWS access key and AWS secret access key provisioned
	by an administrator for the internet2-training AWS account.

	You will need to have the Ansible vault password used with this ansible
	deployment.

	You will need to have the AWS-Trng-1.pem (or other approved key) used
	for the initial login access to virtual machines.

	You will need to use the AWS Console to access the Certificate Manager
	and provision (or renew) an X.509 wildcard certificate for the domain
	`*.comanage.incommon.training`.

	## Set up Environment

	To set up the environment for ansible the first time:

	```
	git clone https://github.internet2.edu/skoranda/comanage-registry-training-ansible.git
	cd comanage-registry-training-deployment
	python3 -m venv .
	source bin/activate
	pip install --upgrade pip
	pip install ansible==2.10.7
	pip install boto
	pip install boto3
	ansible-galaxy collection install amazon.aws
	ansible-galaxy collection install community.aws
	ansible-galaxy collection install community.docker
	cp /path/to/AWS-Trng-1.pem .
	```

	Some ansible files are encrypted using `ansible-vault`. When running
	a playbook ansible needs to be able to find the password for the
	vault.

	Create a file to hold the vault password, e.g.

	```
	touch ./.vault_pass.txt
	chmod 600 ./.vault_pass.txt
	```
	Find the vault password and enter it into the file you just created.

	## Initialization Before Running Playbooks

	Do this each time to run ansible commands or playbooks
	to set up the environment:

	```
	cd comanage-registry-training-deployment
	source bin/activate

	export ANSIBLE_CONFIG=`pwd`/ansible.cfg
	export ANSIBLE_INVENTORY=`pwd`/aws_ec2.yml
	export ANSIBLE_SSH_ARGS="-F `pwd`/ssh_config -C -o ControlMaster=auto -o ControlPersist=3600s"
	export ANSIBLE_VAULT_PASSWORD_FILE=`pwd`/.vault_pass.txt

	export AWS_ACCESS_KEY_ID='XXXXXXXX'
	export AWS_SECRET_ACCESS_KEY='XXXXXXXX'
	export AWS_REGION=us-west-2

	ssh-add ./AWS-Trng-1.pem
	```

	## Configuration

	Most of the configurable details, including the number of training nodes to
	deploy, are set in the file

	```
	group_vars/all.yml
	```

	Review that file before running the playbook.

	## Changing Training Password

	The password used by trainees for SSH, authenticating to the IdP,
	and for configuring the COmanage LDAP Provisioner is also set in the file
	`group_vars/all.yml`.

	Once you have determined the password, use the following command to
	generate the encrypted version to paste into that file:

	```
	ansible-vault encrypt_string 'PASSWORD' --name comanage_training_password
	```

	## Provision the COmanage Training Infrastructure

	To provision the infrastructure execute the playbook:

	```
	ansible-playbook comanage_registry_training.yml
	```

	**Note: After increasing the number of training nodes, you must restart
	the IdP service by doing**

	```
	docker service update --force idp_shibboleth-idp
	```

	To reconfigure only the training nodes once they have already
	been provisioned:

	```
	ansible-playbook training_nodes.yml --tags training_nodes
	```

	To get a list of inventory after provisioning (helpful to obtain mappings
	to use with --limit to target specific nodes):

	```
	ansible-inventory --list
	```

	To reconfigure one specific node, for example

	```
	ansible-playbook \
	training_nodes.yml \
	--tags training_nodes \
	--limit tag_public_fqdn_registry4_comanage_incommon_training
	```

	To force an update of the COmanage Registry services stack file:

	```
	ansible-playbook \
	training_nodes.yml \
	--tags update_stack_file \
	-e force_update_stack_file=yes
	```

	To force an update of the LDIF used by training node LDAP:

	```
	ansible-playbook \
	training_nodes.yml \
	--tags update_structure_ldif_file \
	-e force_update_structure_ldif_file=yes
	```

	## SSH Access

	Trainers may use their provisioned SSH keys to access all nodes. Each trainer
	has a dedicated account on each node.

	Trainees may SSH using the account `training` and the provisioned password.

	Begin by logging into the bastion node, e.g.

	```
	$ ssh training@ssh.comanage.incommon.training
	training@ssh.comanage.incommon.training's password:
	Last login: Thu Nov 7 15:12:40 2019 from some/host
	[training@ssh ~]$
	```

	From there each trainee may SSH into their assigned host:

	```
	[training@ssh ~]$ ssh registry1-private
	training@registry1-private's password:
	Last login: Thu Nov 7 17:43:27 2019 from ip-192-168-10-10.us-west-2.compute.internal
	[training@registry1-private ~]$
	```

	Only trainers may SSH into the IdP node:

	```
	skoranda@paprika:~$ ssh -A ssh.comanage.incommon.training
	Last login: Thu Nov 7 15:01:48 2019 from some.host
	[skoranda@ssh ~]$ ssh login-private
	Last login: Thu Nov 7 17:43:56 2019 from ip-192-168-10-10.us-west-2.compute.internal
	```

	## Deploying the IdP

	The Ansible tooling does not automatically start the IdP service stack.
	To start the stack log into the IdP node and execute

	```
	docker stack deploy --compose-file /opt/shibboleth-idp-stack.yml idp
	```

	**Note: After increasing the number of training nodes, you must restart
	the IdP service by doing**

	```
	docker service update --force idp_shibboleth-idp
	```

	Useful Docker Swarm commands for the IdP node are

	```
	docker stack ls

	docker service ls

	docker service ps idp_shibboleth-idp

	docker service ps idp_ldap

	docker service logs -f idp_shibboleth-idp

	docker service logs -f idp_ldap

	docker stack rm idp
	```

	## Deploying COmanage Registry

	Each trainee is expected to SSH to the bastion host and then to their
	assigned node. In the home directory for the `training` user the trainee
	will find the Docker Swarm services stack (compose) file for deploying
	COmanage Registry, a MariaDB database, and an LDAP server.

	Before deploying the service stack the trainee must first, as an exercise,
	create some Docker Swarm secrets (see the training materials for details).
	Most secrets have been pre-populated using Ansible to save time, but the
	trainee is expected to create a few secrets.

	Once successfully deployed, COmanage Registry is available at the URL

	```
	https://registry1.comanage.incommon.training
	```

	for node 1, and

	```
	https://registry2.comanage.incommon.training
	```

	for node 2, and so on.

	## Fixing a bad bootstrap

	Trainees that do not follow the instructions closely may bootstrap
	COmanage Registry with a configuration that does not allow them
	to authenticate as the platform administrator. When that happens
	follow these steps on the training node as the root user:

	1. `sudo docker stack rm comanage`
	1. `sudo cd /srv/docker/var/lib/mysql`
	1. `sudo rm -rf ./*`
	1. `sudo cd /srv/docker/srv/comanage-registry/local/Config`
	1. `sudo rm -rf ./*`

	Then tell the user to correct the error in the stack file
	and try again.

	## Interference from existing SSH agent

	If you find that your existing SSH agent is interfering with the SSH connections
	used by ansible, it might help to start with a fresh agent when you begin your
	work for the say:

	```
	cd comanage-registry-training-deployment
	rm ./ssh_mux_*
	kill $SSH_AGENT_PID
	unset SSH_AUTH_SOCK
	eval `ssh-agent -s`
	ssh-add ./AWS-Trng-1.pem
	```