Infrastructure as Code and code quality

Infrastructure as Code and code quality

In January, Inawisdom had our DevOps competency accredited to us. This was a massive acknowledgement of our DevOps ability and also how we uniquely apply it to the world of Machine Learning. The challenge with DevOps and the Developer experience on AWS is that it is one of the most fast changing and evolving areas. There are lots of announcements every week. One of the areas of DevOps that is moving fast is Infrastructure as Code (IaC) and this blog is focused on the Amazon Cloud Development Kit (CDK), Amazon’s next iteration of IaC. The Amazon CDK has been around for the last 2 years and I have been meaning to give it a go.

Background

I remember when I first fell in love with CloudFormation, it was amazing how I could spin up quickly and repeatedly infrastructure and solutions from script. However, as time progressed, I wanted to standardise some components into reusable templates to provide consistency and to accelerate development. This is possible in Cloud Formation using nests or import/export etc, but it is overly complex, fiddly and seems limited. Being from a development background what I really wanted was for CloudFormation to be Object Orientated and be able to use other programming constructs. Then! last December I went to re:invent 2019 and I saw a few sessions on the CDK. I was impressed so I put it higher up on my list of things to look at and play with. However, I did not really have the time or need during my 9-5 working hours to use it. Therefore, I decided to take a look in my own time and work on a community project that I had in mind.

Project

The community project I had in mind was to build a complete deployment of SonarQube. SonarQube is a static code analysis tool that I have used a lot in my past roles as a Java Developer and Development Manager. I wanted to see if SonarQube would be any good for Python, how I could integrate into our DevOps processes and how I could run it. In addition to this, I did not want to stand up any EC2 instances (I am a serverless kind of guy), so therefore I decided to use ECS Fargate and run SonarQube in Docker.

Docker Image

The first step was to take the public sonarqube image and run it up on my MacBook, create a project and then run the client over my python code. To start a sonarqube container locally then run:

docker run -d --name sonarqube:8.2-community -p 9000:9000 sonarqube

Using your browser of choice connect to http://localhost:9000/, the admins credentials are on the docker web page and please change the password when you first log in. Then once logged in then create a project, I called mine, ‘aioboto3’ (another open source project I am working on), I suggest you name yours something meaningful to you:

Next follow the inbuilt guide and run SonarQube over your code:

The results will show you how great your code is or areas you need to improve:

CDK Project

The next step following the initial validation of the Docker container was to follow the AWS CDK Getting Started and setup a CDK project. I called mine ‘sonar’ and selected python3 as my language of choice. Please note however, that all languages are trans-compiled from TypeScript. You may wish to use NodeJS to smooth the autocompletion in your IDE. I also took the decision I wanted to create all the solution in a single project, this included VPC, ECS, and RDS.

VPC

The VPC was the easiest thing to setup. To be honest, this was the fastest VPC I had ever created compared to doing it in the console and with AWS CloudFormation. Here is the CDK code:

ec2.Vpc(stack, "SonarVpc", max_azs=3)

Once deployed, the following was created:

  • VPC with CIDR of 10.0.0.0/16
  • 2 /18 public subnets
  • 2 /18 private subnets
  • 2 public routing tables with shared IGW
  • 2 private routing tables with a NAT in each AZ

ECS

The next step was to create the ECS cluster, So I proceeded in setting one up and it took me about 10mins. However, later when starting the SonarQube Docker container I found out that it required ulimits to be set on the host and this is not supported by Fargate. This meant that I had to resort to using an EC2 backed cluster and the CDK code I used to create the ECS Cluster was:

    # Create ECS cluster
    cluster = ecs.Cluster(stack, "SonarCluster",
                            capacity=ecs.AddCapacityOptions(
                                instance_type=ec2.InstanceType('m5.large')),
                            vpc=vpc)

    asg = cluster.autoscaling_group
    user_data = asg.user_data
    user_data.add_commands('sysctl -qw vm.max_map_count=262144')
    user_data.add_commands('sysctl -w fs.file-max=65536')
    user_data.add_commands('ulimit -n 65536')
    user_data.add_commands('ulimit -u 4096')

Once deployed, the following was created:

  • An ECS Cluster with an AutoScaling group across both private subnets
  • An EC2 ‘m5.large’ instance using the latest ECS optimise AMI inside the AutoScaling group
  • An EC2 ‘m5.large’ instance was registered with the ECS cluster
  • The ulimits were applied to the EC2 ‘m5.large’ instance
  • A security group for the EC2 ‘m5.large’
  • An Instance Profile and IAM Role for the EC2 ‘m5.large’ instance

RDS

The next step is to save the results of the analytics that SonarQube produces beyond the lifetime of a single container. To do this, SonarQube supports a limited number of databases, two commercial and PostgreSQL. Therefore, I preceded in creating a PostgreSQL RDS cluster using the CDK code below:

    # Create DB cluster

    pgroup = rds.ClusterParameterGroup.from_parameter_group_name(
            stack, "SonarDBParamGroup",
            parameter_group_name='default.aurora-postgresql11'
        )

    selection = ec2.SubnetSelection(
        subnet_type= ec2.SubnetType.PRIVATE
    )

    return rds.DatabaseCluster(stack, 'Database',
        engine= rds.DatabaseClusterEngine.AURORA_POSTGRESQL,
        default_database_name= 'sonarqube',
        engine_version='11.6',
        parameter_group= pgroup,
        master_user= rds.Login(
            username = user,
            password = core.SecretValue.plain_text(pwd),
        ),

        instance_props= rds.InstanceProps(
            instance_type= ec2.InstanceType.of(
                ec2.InstanceClass.BURSTABLE3,
                ec2.InstanceSize.MEDIUM
            ),
            vpc_subnets= selection,
            vpc = vpc
        )
    )

Once deployed, the following was created:

  • PostgreSQL 11.6 parameter group
  • A DB subnet group of the two private subnets
  • A PostgreSQL 11.6 RDS cluster that included a write node in one AZ and a read replica in another AZ.
  • A security group for the RDS instance with ports 5432 and 3306 allowed from the EC2 instances (It is a known CDK issue that PostgreSQL instances are we  created on port 3306, you can override this)

Task

The final stage is to deploy the Docker container into the ECS cluster and provide access to it from an Application Load Balancer. To help you with this the CDK has ECS Patterns which are blueprints for the most common ECS service setups. For SonarQube I used the ApplicationLoadBalancedTaskImageOptions pattern with the following considerations:

  • The version of SonarQube had to be ‘8.2-community’ – The latest open source edition
  • SonarQube can run on multiple nodes as a cluster, however clustering is problematic as each node needs to discover the other members within the cluster. Therefore, I decided to use a single Docker container and set number of tasks to one.
  • SonarQube is Java based and uses a high amount of memory, so I decided to allocate 2GB.
  • As mentioned earlier SonarQube requires some specific ulimits as it runs an embedded Elastic Search and there is no option to use an external ElasticSearch.

The CDK code I used to create the task was:

url = 'jdbc:postgresql://{}/sonarqube'.format(db.cluster_endpoint.socket_address)

        task = ecs_patterns.ApplicationLoadBalancedEc2Service(self, "SonarService",
            cluster=cluster,            # Required
            cpu=512,                    # Default is 256
            desired_count=1,            # Default is 1
            task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
                image=ecs.ContainerImage.from_registry("sonarqube:8.2-community"),
container_port=9000,
                environment={
                    'sonar.jdbc.password': pwd,
                    'sonar.jdbc.url': url,
                    'sonar.jdbc.username': user
                },
            ),

            memory_limit_mib=2048,
            public_load_balancer=True)

        container = task.task_definition.default_container
        container.add_ulimits(
            ecs.Ulimit(
                name=ecs.UlimitName.NOFILE,
                soft_limit=65536,
                hard_limit=65536
            )
        )

Once deployed this created

  • An ECS service and service role
  • An ECS task definition and task tole
  • A public Application Load Balancer in the public subnets and a Target Group for the ECS task
  • Application Load Balancer security group with inbound port 80 access from the internet
  • Updated the security group for the EC2 ‘m5.large’ instances to allow the ephemeral port range from the Application Load Balancer.

The problem with secrets

Initially for handling DB credentials I went with best practice and used AWS Secrets Manager. The Master Username and Password for the RDS instance was stored as a secret in AWS Secrets Manager. Then the secrets ARN was passed using the secret feature on ECS to pass it to the running tasks. However, upon investigation, it was only the ARN string that was passed and not the contents of the secret.  So, I then looked at the valueOf approach, but this is not supported by the CDK.

Lastly, in order to get this up and running quickly I took the quick and dirty approach of setting the RDS Master Username and Password manually and sending it in the plain as environment variables to the ECS Task. I would not recommend you do this and when I get some more time, I will update the solution to use a bash file to get the secret and set the Java parameters at runtime inside the container.

The good news however is that I actually ran SonarQube over the CDK code and it did highlight this:

The Repository

I have created a project on GIT hub with my code in it, so if you would like to take a look at the final CDK code and deploy your own SonarQube deployment please fork and download the code. The repository is https://github.com/philbasford/sonar. Please make sure you change the master database username and password.

Conclusion

The CDK pushes Infrastructure as Code to the next level as it accomplishes a few things:

  • You are able to code infrastructure using programming constructs such as if and loops. This allows you to make your infrastructure more adaptable.
  • You can use functions and classes to put better structure into your IaC, with more reuse and polymorphism.
  • The AWS CDK has lots of best practice built in with sensible defaults.

I did however find a fair amount of gotchas on the way and, like any SDK, it takes a little while to able to navigate it. However, on balance it did allow me to get a solution up and running incredibly fast. So, I would recommend anyone to give it ago as I am managed to do my project with 103 of CDK Code compared 1700 lines of AWS CloudFormation JSON.

Phil Basford
phil@inawisdom.com
No Comments

Post A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.