If you have any experience of supporting large scale infrastructures, whether they are based on ‘old school’ tin and wires, virtual machines or cloud based technologies you will know that it is important to be able to create consistently repeatable platform builds. This includes ensuring that the network infrastructure, ‘server hardware’, operating systems and applications are installed and configured the same way each time.
Historically this would have been achieved via the use the same hardware, scripted operating system installs and in the Windows application world of my past the use of application packagers and installers such as Microsoft Systems Management Server.
With the advent of cloud computing the requirements for consistency are still present and just as relevant. However the methods and tools used to create cloud infrastructures are now much more akin to application code than the shell script / batch job methods of the past (although some of those skills are still needed). The skills needed to support this are really a mix of both development and sys-ops and have led to the creation of Dev-Ops as a role in its own right.
Recently along with one of my colleagues I was asked to carry out some work to create a new AWS based environment for one of our customers. The requirements for the environment were that it needed to be:
- Consistent
- Repeatable and quick to provision
- Scalable (the same base architecture needed to be used for development, test and production just with differing numbers of server instances)
- Running Centos 6.3
- Running Fuse ESB and MySQL
To create the environment we decided to use a combination of AWS CloudFormation to provision the infrastructure and Opscode Chef to carry out the installation of application software, I focussed primarily on the CloudFormation templates while my colleague pulled together the required Chef recipes.
Fortunately we had recently had a CloudFormation training day delivered by our AWS Partner Solutions Architect so I wasn’t entering the creation of the scripts cold, as at first the JSON syntax and number of things you can do with CloudFormation can be a little daunting.
To help with script creation and understanding I would recommend the following:
For the environment we were creating the infrastructure requirements were:
- VPC based
- 5 subnets
- Public Web – To hold web server instances
- Public Secure – To hold bastion instances for admin access
- Public Access – To hold any NAT instances needed for private subnets
- Private App – To hold application instances
- Private Data – To hold database instances
- ELB
- External – Web server balancing
- Internal – Application server balancing
- Security
- Port restrictions between all subnets (i.e. public secure can only see SSH on app servers)
To provision this I decided that rather than one large CloudFormation template I would split the environment into a number of smaller templates:
- VPC Template – This created the VPC, Subnets, NAT and Bastion instances
- Security Template – This created the Security Groups between the subnets
- Instance Templates – These created the required instance types and numbers in each subnet
This then allowed us to swap out different Instance Templates depending on the environment we were creating for (i.e development could have single instances in each subnet whereas Test could have ELB balanced pairs or production could use features such as auto-scaling).
I won’t go into the details of the VPC and Security Templates here, suffice it to say that with the multiple template approach the outputs from the creation of one stack were used as the inputs to the next.
For the Instance Templates the requirement was that the instances would be running Centos 6.3 and that we would use Chef to deploy the required application components onto them. When I started looking in to how we would set the instances up do this I found that the examples available for Centos and CloudFormation were extremely limited compared to Ubuntu or Windows. As this is the case I would recommend working from a combination of the Opscode guide to installing Chef on Centos and AWS’s documentation on Integrating AWS with Opscode Chef.
Along the way to producing the finished script there were a number of lessons which I will share with you to help with your installation, the first of these was the need to use a Centos.org AMI from the AWS Marketplace. After identifying the required AMI I tried running up a test template to see what happens before signing up for it in the Marketplace, in CloudFormation this failed with an error of ‘AccessDenied. User doesn’t have permission to call ec2::RunInstances’ which was slightly misleading. Once I’d signed our account up for the AMI then this was cured.
The next problem I encountered was really one of my own making / understanding. When looking at AMIs to use I made sure that we had picked one that was Cloud-Init enabled, in my simplistic view I thought that this meant that commands such as cfn-init that are used within CloudFormation to carry out CloudFormation specific tasks would already be present. This wasn’t the case as the cfn- commands are part of a separate bootstrap installer that needs to be included in the UserData Section of the template (see below):
"UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [
"#!/bin/bash -v\n",
"function error_exit\n",
"{\n",
" cfn-signal -e 1 -r \"$1\" '", { "Ref" : "ResFuseClientWaitHandle" }, "'\n",
" exit 1\n",
"}\n",<br /> "# Install the CloudFormation tools and call init\n",
"# Note do not remove this bit\n",<br /> "easy_install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz\n",
"cfn-init --region ", { "Ref" : "AWS::Region" },
" -s ", { "Ref" : "AWS::StackName" }, " -r ResInstanceFuse ",
" --access-key ", { "Ref" : "ResAccessKey" },
" --secret-key ", { "Fn::GetAtt" : [ "ResAccessKey", "SecretAccessKey" ]},
" -c set1",
" || error_exit 'Failed to run cfn-init'\n",
"# End of CloudFormation Install and init\n",
"# Make the Chef log folder\n",
"mkdir /etc/chef/logs\n",
"# Try starting the Chef client\n",
"chef-client -j /etc/chef/roles.json --logfile /etc/chef/logs/chef.log > /tmp/initialize_chef_client.log 2>&1 || error_exit 'Failed to initialise chef client' \n",
"# Signal success\n",
"cfn-signal -e $? -r 'Fuse Server configuration' '", { "Ref" : "ResFuseClientWaitHandle" }, "'\n"
]]}}
As the cfn-signal which comes as part of the bootstrap installer is used for messaging to any wait handlers defined in the template this can lead to long breaks at the coffee machine before any feedback is received if they are not present.
The final lesson was how to deploy the Chef Client and configuration to the instances. Chef is a rubygems package, so needs this and supporting packages present on the instance before it can be installed. Within CloudFormation packages can be installed via the use of the packages configuration sections of AWS::CloudFormation::Init which for Linux supports rpm, yum and rubygems installers. Unfortunately for the AMI we chose to use the available repositories didn’t contain all packages necessary for our build, to get around this I had to rpm on the rbel repository definitions before using a combination of yum and rubygems to install Chef:
"packages" : {
"rpm" : {
"rbel" : "http://rbel.frameos.org/rbel6"
},
"yum" : {
"ruby" : [],
"ruby-devel" : [],
"ruby-ri" : [],
"ruby-rdoc" : [],
"gcc" : [],
"gcc-c++" : [],
"automake" : [],
"autoconf" : [],
"make" : [],
"curl" : [],
"dmidecode" : [],
"rubygems" : []
},
"rubygems" : {
"chef" : []
}
}
Once Chef was installed the next job was to create the Chef configuration files and validation key on the instance. This was carried out using the “files” options within AWS::CloudFormation::Init:
"files" : {
"/etc/chef/client.rb" :
"content" : { "Fn::Join" : ["", [
"log_level :info", "\n", "log_location STDOUT", "\n",
"chef_server_url '", { "Ref" : "ParChefServerUrl" }, "'", "\n",
"validation_key \"/etc/chef/chef-validator.pem\n",
"validation_client_name '", { "Ref" : "ParChefValidatorName" }, "'", "\n"
]]},
"mode" : "000644",
"owner" : "root",
"group" : "root"
},
"/etc/chef/roles.json" : {
"content" : {
"run_list" : [ "role[esb]" ]
},
"mode" : "000644",
"owner" : "root",
"group" : "root"
},
"/etc/chef/chef-validator.pem" : {
"source" : { "Fn::Join" : ["", [{ "Ref" : "ParChefKeyBucket" }, { "Ref" : "ParChefValidatorName" }, ".pem"]]},
"mode" : "000644",
"owner" : "root",
"group" : "root",
"authentication" : "S3Access"
}
}
The hardest part of this was the validation key, as we had multiple instances wanting to use the same key we decided to place this within an S3 bucket and pull the key down. During the script creation I tried multiple ways of doing this, such as using S3Cmd (which needed another repository and set of configuration to run) but found that using the files section worked best.
Once Chef was installed the client was started via the UserData section (basically a shell script), this then handed control of what additional software and configuration is installed on the instance to the Chef Master. How much Chef does at this stage is a bit of a balancing act as the wait handler within the template will fail the stack creation if its timeout period is exceeded.
As you can probably tell if you have got this far, the creation of the templates took quite a few iterations to get right as I learnt more about CloudFormation. When debugging what is going on it is worth remembering that you should always set the stack to not rollback on failure. This then allows you to access the instances created to find out where they got to within the install, as the UserData section is basically a shell script with some CloudFormation hooks, more times than not the faults are likely to be the same as you would see on a standard non-AWS Linux install. Also for a Centos install remember that the contents of /var/log are your friend as both cloud-init and cfn-init create log files here for debugging purposes.
After watching Werner Vogels keynote speech from AWS Re:Invent it’s clear that treating infrastructure as a programmable resource (i.e. using technologies such as CloudFormation and Chef) is somewhere organisations need to be moving towards and based on my experience so far I will be recommending using this approach on all future AWS environments we get involved with, even the small ones.
Whilst there is a bit of a learning curve the benefits of repeatable builds, known configuration and the ability to source control infrastructure far outweigh any shortcomings, such as granular template validation which I’m sure will come with time.
If you have any comments or want to know more please get in touch.