Merge branch 'master' of https://github.internet2.edu/InCommon/md-tra…

…nsforms
InCommon · Oct 30, 2016 · 99368f9 · 99368f9
2 parents 6198803 + 907deba
commit 99368f9
Showing 1 changed file with 123 additions and 40 deletions.
diff --git a/README.md b/README.md
@@ -2,22 +2,10 @@
 
 XSLT transformations of SAML metadata
 
-## Contents
-
-Executables:
-
-* http_xsltproc.sh
-
-Library files:
-
-* list_all_IdP_DisplayNames_csv.xsl
-* list_all_IdPs_csv.xsl
-* list_all_RandS_IdPs_csv.xsl
-* list_all_RandS_SPs_csv.xsl
-* list_all_SPs_csv.xsl
-
 ## Installation
 
+The scripts in this repository depend on a [Bash Library](https://github.internet2.edu/InCommon/bash-library) of basic scripts. Download and install the latter before continuing.
+
 Download the source, change directory to the source directory, and install the source into ``/tmp`` as follows:
 
 ```Shell
@@ -34,63 +22,158 @@ $ export LIB_DIR=$HOME/lib
 $ ./install.sh $BIN_DIR $LIB_DIR
 ```
 
-An installation directory will be created if it doesn't already exist.
+An installation directory will be created if one doesn't already exist. In any case, the following files will be installed:
+
+```Shell
+$ ls -1 $BIN_DIR
+http_xsltproc.sh
+
+$ ls -1 $LIB_DIR 
+list_all_IdP_DisplayNames_csv.xsl
+list_all_IdPs_csv.xsl
+list_all_RandS_IdPs_csv.xsl
+list_all_RandS_SPs_csv.xsl
+list_all_SPs_csv.xsl
+```
 
 ## Overview
 
-Bash script ``http_xsltproc.sh`` is a wrapper around the ``xsltproc`` command-line tool. Unlike ``xsltproc``, this script fetches the target XML document from an HTTP server. See the inline help file for details:
+Bash script ``http_xsltproc.sh`` is a wrapper around the ``xsltproc`` command-line tool. Unlike ``xsltproc``, the ``http_xsltproc.sh`` script fetches the target XML document from an HTTP server using HTTP Conditional GET [RFC 7232]. If the server responds with 200, the script caches the resource and returns the response body. If the server responds with 304, the script returns the cached resource instead. See the inline help file for details:
 
 ```Shell
 $ $BIN_DIR/http_xsltproc.sh -h
 ```
 
-Here's an example of script usage:
+The ``http_xsltproc.sh`` script requires two environment variables. ``CACHE_DIR`` is the absolute path to the cache directory (which may or may not exist) whereas ``LIB_DIR`` specifies a directory containing various helper scripts.
+
+For example, let's use the library installed in the previous section and specify the cache as follows:
 
 ```Shell
-$ MD_LOCATION=http://md.incommon.org/InCommon/InCommon-metadata.xml
-$ $BIN_DIR/http_xsltproc.sh $LIB_DIR/list_all_IdP_DisplayNames_csv.xsl $MD_LOCATION | head
-IdP Display Name,IdP Entity ID,IdP Discovery,Registrar ID
-"Ohio State University",urn:mace:incommon:osu.edu,show,https://incommon.org
-"Cornell University",https://shibidp.cit.cornell.edu/idp/shibboleth,show,https://incommon.org
-"University of California - Office of the President",urn:mace:incommon:ucop.edu,show,https://incommon.org
-"University of California-Irvine",urn:mace:incommon:uci.edu,show,https://incommon.org
-"University of Washington",urn:mace:incommon:washington.edu,show,https://incommon.org
-"Internet2",urn:mace:incommon:internet2.edu,show,https://incommon.org
-"University of California-San Diego",urn:mace:incommon:ucsd.edu,show,https://incommon.org
-"Georgetown University",https://shibb-idp.georgetown.edu/idp/shibboleth,show,https://incommon.org
-"Case Western Reserve University",urn:mace:incommon:case.edu,show,https://incommon.org
+$ export CACHE_DIR=/tmp/cache
 ```
 
+The following examples show how to use the script to create some cron jobs on incommon.org.
+
 ### Example #1
 
-Consider the following URLs:
+The goal is to transform InCommon metadata into the following CSV file:
 
-```Shell
-xml_location=http://md.incommon.org/InCommon/InCommon-metadata.xml
-resource_url=https://incommon.org/federation/metadata/all_IdP_DisplayNames.csv
-```
+* https://incommon.org/federation/metadata/all_IdP_DisplayNames.csv
+
+The above resource is used to construct a [List of IdP Display Names](https://spaces.internet2.edu/x/2IDmBQ) in the spaces wiki.
 
-Suppose there is an automated process that transforms the SAML metadata at ``xml_location`` into the CSV file at ``resource_url``. Specifically, let's suppose the following process runs every hour on www.incommon.org:
+Suppose there is an automated process that transforms the main InCommon metadata aggregate into the CSV file at the above URL. Specifically, let's suppose the following process runs every hour on incommon.org:
 
 ```Shell
-# the XSL script and the shell script are included in the md-transforms repository
+# determine the metadata location
+xml_location=http://md.incommon.org/InCommon/InCommon-metadata.xml
+
+# create the resource
 xsl_file=$LIB_DIR/list_all_IdP_DisplayNames_csv.xsl
 resource_file=/tmp/all_IdP_DisplayNames.csv
-$BIN_DIR/http_xsltproc.sh -F -o "$resource_file" "$xsl_file" "$xml_location"
+$BIN_DIR/http_xsltproc.sh -F -o $resource_file $xsl_file $xml_location
 exit_code=$?
-if [ $exit_code -ne 0 ]; then
+[ $exit_code -eq 1 ] && exit 0  # short-circuit if 304 response
+if [ $exit_code -gt 1 ]; then
 	echo "ERROR: http_xsltproc.sh failed with status code: $exit_code" >&2
 	exit $exit_code
 fi
 
-# the resource_dir is the target web directory for the resource_file
+# move the resource to the web directory
 resource_dir=/home/htdocs/www.incommonfederation.org/federation/metadata/
 mv $resource_file $resource_dir
-exit 0
 ```
 
 Observe that the command ``http_xsltproc.sh -F`` forces a fresh SAML metadata file. If the server responds with ``304 Not Modified``, the process terminates without updating the resource file.
 
+### Example #2
+
+The goal is to transform InCommon metadata into the following pair of CSV files:
+
+* https://incommon.org/federation/metadata/all_RandS_IdPs.csv
+* https://incommon.org/federation/metadata/all_RandS_SPs.csv 
+
+The above resources are used to construct the [List of Research and Scholarship Entities](https://spaces.internet2.edu/x/ZoUABg) in the spaces wiki.
+
+Suppose there is an automated process that transforms the main InCommon metadata aggregate into the CSV files at the above URLs. Specifically, let's suppose the following process runs every hour on incommon.org:
+
+```Shell
+# determine the metadata location
+xml_location=http://md.incommon.org/InCommon/InCommon-metadata.xml
+
+# create the first resource
+xsl_file=$LIB_DIR/list_all_RandS_IdPs_csv.xsl
+resource1_file=/tmp/all_RandS_IdPs.csv
+$BIN_DIR/http_xsltproc.sh -F -o $resource1_file $xsl_file $xml_location
+exit_code=$?
+[ $exit_code -eq 1 ] && exit 0  # short-circuit if 304 response
+if [ $exit_code -gt 1 ]; then
+	echo "ERROR: http_xsltproc.sh failed with status code: $exit_code" >&2
+	exit $exit_code
+fi
+
+# create the second resource
+xsl_file=$LIB_DIR/list_all_RandS_SPs_csv.xsl
+resource2_file=/tmp/all_RandS_SPs.csv
+$BIN_DIR/http_xsltproc.sh -C -o "$resource2_file" "$xsl_file" "$xml_location"
+exit_code=$?
+[ $exit_code -eq 1 ] && exit 0  # short-circuit if not cached
+if [ $exit_code -gt 1 ]; then
+	echo "ERROR: http_xsltproc.sh failed with status code: $exit_code" >&2
+	exit $exit_code
+fi
+
+# move the resources to the web directory
+resource_dir=/home/htdocs/www.incommonfederation.org/federation/metadata/
+mv $resource1_file $resource2_file $resource_dir
+```
+
+Observe the commands ``http_xsltproc.sh -F`` and ``http_xsltproc.sh -C``. The former forces a fresh SAML metadata file as in the previous example; the latter goes directly to cache. If file is not in the cache (which is highly unlikely), the process terminates without updating any resource files.
+
+### Example #3
+
+The goal is to transform InCommon metadata into the following pair of CSV files:
+
+* https://incommon.org/federation/metadata/all_exported_IdPs.csv
+* https://incommon.org/federation/metadata/all_exported_SPs.csv
+
+The above resources are used to construct the [List of Exported Entities](https://spaces.internet2.edu/x/DYD4BQ) in the spaces wiki.
+
+Suppose there is an automated process that transforms the InCommon export aggregate into the CSV files at the above URLs. Specifically, let's suppose the following process runs every hour on incommon.org:
+
+```Shell
+# determine the metadata location
+xml_location=http://md.incommon.org/InCommon/InCommon-metadata-export.xml
+
+# create the first resource
+xsl_file=$LIB_DIR/list_all_IdPs_csv.xsl
+resource1_file=/tmp/all_exported_IdPs.csv
+$BIN_DIR/http_xsltproc.sh -F -o $resource1_file $xsl_file $xml_location
+exit_code=$?
+[ $exit_code -eq 1 ] && exit 0  # short-circuit if 304 response
+if [ $exit_code -gt 1 ]; then
+	echo "ERROR: http_xsltproc.sh failed with status code: $exit_code" >&2
+	exit $exit_code
+fi
+
+# create the second resource
+xsl_file=$LIB_DIR/list_all_SPs_csv.xsl
+resource2_file=/tmp/all_exported_SPs.csv
+$BIN_DIR/http_xsltproc.sh -C -o "$resource2_file" "$xsl_file" "$xml_location"
+exit_code=$?
+[ $exit_code -eq 1 ] && exit 0  # short-circuit if not cached
+if [ $exit_code -gt 1 ]; then
+	echo "ERROR: http_xsltproc.sh failed with status code: $exit_code" >&2
+	exit $exit_code
+fi
+
+# move the resources to the web directory
+resource_dir=/home/htdocs/www.incommonfederation.org/federation/metadata/
+mv $resource1_file $resource2_file $resource_dir
+```
+
+The commands ``http_xsltproc.sh -F`` and ``http_xsltproc.sh -C`` behave exactly as described in the previous example.
+
 ## Compatibility
 
 The executable scripts are compatible with GNU/Linux and Mac OS. The library files are written in XSLT 1.0.