• This email address is being protected from spambots. You need JavaScript enabled to view it.
    • +44 (0) 20374 57368
Paul Iway

Paul Iway

Senior Developer

Wednesday, 26 September 2018 13:11

How we migrated our GKE cluster to another region

We desided for a number of business reasons to move one of our existing Kubernetes cluster new a geographic region.

The thing we were most worried about is that we had persistent volumes attached to MySQL instances for our test environments running in k8s.

There isn’t a straightforward way for this. One common way is to create a snapshot of etcd but we’re on GKE so that’s out of the question. Luckily we found Ark.

Ark is a disaster recovery tool for Kubernetes clusters. It can take backups of the whole cluster with the ability to restore it using a single command. We can even have it run on a schedule. Persistent volumes are also taken care of. It has good documentation so setting it up was almost a breeze if not because of a bug with RBAC in GKE.

Download

A simple git clone This email address is being protected from spambots. You need JavaScript enabled to view it.:heptio/ark.git was all I did to download Ark. Its master branch is frequently updated and is not stable. The maintainers recommend checking out the latest tagged version. At this time, the latest release is v0.9.5.

Setting it up

Ark works by creating custom resources in k8s for its operations conveniently defined in a single yaml file.
I had to kubectl apply the yaml file to the American cluster and the shiny new European cluster where we’re moving into.

This is where the RBAC bug on GKE appears:
User "This email address is being protected from spambots. You need JavaScript enabled to view it." cannot create clusterrolebindings.rbac.authorization.k8s.io at the cluster scope: No policy matched.

To work around this I had to have my Google account granted with the cluster-admin role in both clusters:
kubectl create clusterrolebinding paul-cluster-admin-binding --clusterrole=cluster-admin --user=This email address is being protected from spambots. You need JavaScript enabled to view it.

Ironically, it spits out the same error unless you’re an account with the Owner IAM Role.

Apparently, this is a known issue on GKE:

Because of the way Container Engine checks permissions when you create a Role or ClusterRole, you must first create a RoleBinding that grants you all of the permissions included in the role you want to create. An example workaround is to create a RoleBinding that gives your Google identity a cluster-admin role before attempting to create additional Role or ClusterRole permissions. This is a known issue in the Beta release of Role-Based Access Control in Kubernetes and Container Engine version 1.6.

Cloud Storage Bucket

Apart from persistent volumes, Ark stores its backups in a cloud storage bucket. This bucket should be exclusive to Ark because each backup is stored in its own subdirectory in the bucket’s root. A service account will be needed to authorize Ark to upload files into the bucket.

Service account

I created a service account just for Ark to use. It will need read and write access to the bucket. In GKE, persistent volumes are just disks attached to the nodes so I had to give it permissions for those too. These are permissions given to the service account:

     compute.disks.get
     compute.disks.create
     compute.disks.createSnapshot
     compute.snapshots.get
     compute.snapshots.create
     compute.snapshots.useReadOnly
     compute.snapshots.delete
     compute.projects.get

The Ark server config

At this point the bucket has been created and Ark has been allowed upload to it. Now it will need to know which bucket to use by setting the Ark Config (a custom resource defined by Ark):

# examples/gcp/00-ark-config.yaml
 ...
backupStorageProvider:
  name: gcp
bucket: neso-cluster-backup
 ...

The Ark server Deployment

To hand off the service account to Ark a k8s secret named cloud-credentials containing the service account key will have to be created.

# download service account key
gcloud iam service-accounts keys create ark-svc-account \
     --iam-account $SERVICE_ACCOUNT_EMAIL

# create secret
kubectl create secret generic cloud-credentials \
    --namespace heptio-ark \
    --from-file cloud=ark-svc-account

In the Ark Deployment yaml file, there wasn’t anything that needed to be changed. All that’s left to start the server is to kubectl apply the Config and the Deployment.

Generating a backup

After everything’s been set up on both clusters and the Ark client install locally. It’s time to put Ark to the test. Making sure kubectl's context was set to the US cluster, with fingers crossed we generated the backup:

$ ark backup create us-cluster --exclude-namespaces kube-system,kube-public,heptio-ark

Gave it a few minutes and then:

$ ark backup get
NAME                           STATUS      CREATED                         EXPIRES   SELECTOR
us-cluster                     Completed   2018-09-21 15:59:35 +0800 +08   30d       <none>

Restoring the backup

The backup includes all the resources from pods to ingresses. We wanted to keep the IP addresses we used in the old cluster. To free up the IP addresses, down go the ingresses in the old cluster.

Now setting the kubectl context to the new cluster in Europe. It took a while for the cluster to see the backup but it did appear eventually:

$ ark backup get
NAME                           STATUS      CREATED                         EXPIRES   SELECTOR
us-cluster                     Completed   2018-09-21 15:59:35 +0800 +08   30d       <none>

$ ark restore create --from-backup us-cluster

Ark was able to restore everything except for the persistent volumes. Our applications could not connect to the databases. Taking a closer look, it appears that Ark created the disks but they were in the region where the backups were created. The maintainers are aware of this issue and added the fix for this in V0.10.0.

We can’t wait for that release, though. We had no choice but to move the databases out of k8s. We ultimately decided to spin up a CloudSQL instance and stick our test environments’ databases there.

Conclusion

Ark is an awesome tool. Although the migration did not go as smooth as it should have, some good came out of it. It forced us to move our database outside of kubernetes which we shouldn’t be doing in the first place. Also, we now have regular backups of our new cluster.

A code integration and delivery (CI/CD) pipeline is essential to a software project.  Not only does it help us ship code faster but it also allows less room for mistakes. This tutorial will teach you how to set up a simple CI/CD pipeline for your GKE cluster with Codeship.  

WHY CODESHIP

We no longer had any time maintaining our self-hosted instance of Jenkins. Codeship works well for us because it has native Docker support. It has been extremely reliable, and their support is always quick to respond. As a bonus, adding Slack notifications is easy. It is also great for personal projects because it offers 100 free builds per month without requiring a credit card.

THE BIG PICTURE

For this tutorial, we will be setting up a pipeline for a demo application that you can find at https://gitlab.com/neso-io/cat-api. It is a simple API that returns a random image of a cat.

At Neso, the build starts when a pull/merge request is accepted. On top of Codeship, we also make use of Gitlab CI so that we only merge the PR if the automated tests pass. We will be making something similar. A git push or PR merge to the master branch will trigger the chain of events in our pipeline.

Codeship will be responsible for running the automated tests, as well as building and pushing our Docker image to Google Container Registry. It then lets Kubernetes (k8s) know that there is a new image to be rolled out. At this point, we will let Kubernetes do its magic.

At Neso, a Kubernetes namespace serves as an environment. We'll be following that convention here. The master branch goes into our staging environment/namespace; git tags matching the Semantic Versioning Specification will go into our production environment/namespace.

environment namespace

PROJECT SET UP

Similar to other managed CI/CD platforms, a bit of configuration is required.  At the time of writing, Codeship only supports Github, Bitbucket, and Gitlab. When you add a new project, you'll need to pick where you host your code between the three.  I've selected Gitlab for this tutorial since that's where the code is on.

When you're done selecting your SCM provider, it's time to connect Codeship to your repository. You'll need raised privileges in your repository before you can add it to Codeship.

select scm

After you add your project on Codeship, you'll need a few files to get started:

  1. codeship-services.yml - Contains the defined Docker images required for the pipeline

  1. codeship-steps.yml - Configuration for the commands that will be executed in the Docker images specified in the services configuration

  1. codeship.aes - Every project on Codeship Pro has an AES key found in the General Project Settings. Go ahead and download it to the project's root directory. You'll need this key to encrypt the credentials required to access resources in your project on Google Cloud Platform (GCP).

  1. secrets.env - This is a plain text file that contains GCP service account credentials, GPC project ID and etc. Everything specified here will be accessible via environment variables. More on this in next section.

  1. secrets.env.encrypted - This is the encrypted version of the secrets.env file. This is the encrypted version of the secrets.env file which I will show you how to generate later.

Only the files codeship.aes and secrets.env shouldn't be committed so go ahead and add those to your .gitignore file.

SECRETS

We'll be setting up Codeship to run the unit tests and build the Docker image. The built Docker image will need to be pushed somewhere that our Kubernetes cluster has access to. There are many registries available like Gitlab and DockerHub. Since we're already on GKE, we might as well use Google Container Registry (GCR).  

GCR isn't just going to allow anyone to push to the registry unless they're authenticated. For that, we're going to need to create a service account on GCP. The service account will need the roles:

  1. Storage Admin - required for read-write access to GRC
  2. Kubernetes Engine Admin - required for read-write access to GKE

Download the JSON and add it to the secrets.env file. For a more detailed guide for encrypting sensitive data Codeship provides excellent documentation for this at https://documentation.codeship.com/pro/builds-and-configuration/environment-variables/#encrypted-environment-variables.

Before encryption, my secrets file looked like:

encryption google

SECRETS ENCRYPTION

I mentioned earlier that every project on Codeship is provided with an AES key that will be used for encrypting anything sensitive that will be needed in our pipeline. The key can be found in the project's General Project Settings page.

aes key

If you haven't already downloaded the project's AES key go ahead and add it to the root of the project as codeship.aes.

codeship aes

Before you can encrypt the secrets file, you'll need to install a CLI tool created by Codeship called Jet.  Jet CLI is a valuable tool that can not only encrypt the secrets files, but it can help you test your Codeship configuration by being able to run the steps on your machine .

Once you've installed Jet we'll be able to encrypt the secrets file with a single command:

$ jet encrypt secrets.env secrets.env.encrypted

There should now be a file named secrets.env.encrypted containing the encrypted contents of the secrets.env file. With all of that done, we can now start designing our pipeline.

STEP 1 - Test runner

Automated tests are a requirement in Code Integration. For our test runner, we'll use Neso's image with PHP and composer built in. Before the build starts, Codeship is going to pull the code from the repo. We will need to mount this code to the image used in our test runner.

This image doesn't need too much set up on our end so we'll only need to run two commands for our test runner: installing the dependencies and running the test itself.

This should execute no matter what branch we push to.

STEP 2 - Docker Image builder

We won't have anything to run in our cluster without a Docker image. The next step we'll define is how to build the Docker image. Thankfully as I mentioned earlier, Codeship Pro has native Docker support. One important thing to take note of is the name of the image. If you're using Google Container Registry like us, then you'll need to follow the convention for the image's tag: [HOSTNAME]/[PROJECT-ID]/[IMAGE].

The step is pretty straightforward too. We only need to reference the name of the service we defined.

The Docker image should only be built when we push to the master branch as indicated by the tag field.

STEP 3 - Push the image to the registry

Before we push the Docker image, we have to decrypt the sensitive data needed to be authenticated. Fortunately, Codeship has a Docker image for that; it will only need to know the name of the encrypted file.

The image that will be pushed will be having two tags: the branch that triggered the build (in this case: master) and the first 8 characters of the git commit hash.

The image last tagged as master will serve as the release candidate in this pipeline. The tag for the commit hash will be used when we update the image to run in our k8s deployment.

STEP 4 - Deploying the image

The final step in our pipeline will be rolling out the new Docker image. We need Google Cloud SDK for this. Since it involves a couple of commands, we'll write a bash script.

The command in line 5 is where the authentication happens using the Google service account JSON we encrypted. That is the only part of the script that is Codeship specific. You probably remember running something similar to the commands in lines 8 to 11 after you installed the Gcloud SDK on your machine. The one on line 11 allows you to run kubectl commands from your terminal. Lines 13 to 17 allow us to specify which environment to roll out the image to. If we change anything in the k8s manifest, Line 21 applies that change as part of the pipeline.

The last bit of code in the script sets the image to be rolled out to the environment or k8s namespace. For the staging environment, we'll be using the Docker image tagged with the current commit hash. The only tags matching the Semantic Versioning specification will be rolled out to the production environment.

Since we'll need to authenticate Codeship to be able to execute commands on our k8s cluster, a Docker image with Gcloud SDK will be required. Codeship is there to save us once again with their own Image built for google cloud deployments. Same as the image we used for authenticating, we’ll pass the file name with the encrypted data to it.

The image will need a copy of the deploy script so let’s mount our code to it (lines 6 to 7). Now the only thing left to do is to execute the script.

This step will only execute during a push or PR merge to the master branch. The command tells it to roll out the image to the staging environment.

When we trigger the build, we should get all green.

run test suite green

STEP 5 - Promote the master image for release

At Neso, we create a Git tag following the Semantic Versioning specification from the staging branch to trigger a release. We can copy that procedure by adding a service using the Docker image we tagged as master.

The idea here is that Codeship will pull the Image we last tagged as master, tag it again with the semver tag and push it to the registry.

The last step in our pipeline is rolling out the image to production.

This looks pretty much the same as the step for deploying to staging except it will only be triggered with a semver tag. It should still run the test suite then promote and roll out the Docker image from the staging environment.

Now, if I tag the master branch as 1.0.0. It our pipeline look like:

run test suite released

CONCLUSION

Congratulations! You now have a simple CI/CD pipeline. There's still plenty of room for improving this pipeline: like having the test suite and building of the Image concurrently. But I think that will serve best as an exercise for you, dear reader.