Kubernetes running on Google Cloud is our go-to deployment architecture, also known as the Google Container Engine (or GKE for short). We have help ...
We desided for a number of business reasons to move one of our existing Kubernetes cluster new a geographic region.
The thing we were most worried about is that we had persistent volumes attached to MySQL instances for our test environments running in k8s.
There isn’t a straightforward way for this. One common way is to create a snapshot of
etcd but we’re on GKE so that’s out of the question. Luckily we found Ark.
Ark is a disaster recovery tool for Kubernetes clusters. It can take backups of the whole cluster with the ability to restore it using a single command. We can even have it run on a schedule. Persistent volumes are also taken care of. It has good documentation so setting it up was almost a breeze if not because of a bug with RBAC in GKE.
Ark works by creating custom resources in k8s for its operations conveniently defined in a single yaml file.
I had to
kubectl apply the yaml file to the American cluster and the shiny new European cluster where we’re moving into.
This is where the RBAC bug on GKE appears:
To work around this I had to have my Google account granted with the
cluster-admin role in both clusters:
kubectl create clusterrolebinding paul-cluster-admin-binding --clusterrole=cluster-admin --user=
Ironically, it spits out the same error unless you’re an account with the Owner IAM Role.
Apparently, this is a known issue on GKE:
Because of the way Container Engine checks permissions when you create a Role or ClusterRole, you must first create a RoleBinding that grants you all of the permissions included in the role you want to create. An example workaround is to create a RoleBinding that gives your Google identity a cluster-admin role before attempting to create additional Role or ClusterRole permissions. This is a known issue in the Beta release of Role-Based Access Control in Kubernetes and Container Engine version 1.6.
Apart from persistent volumes, Ark stores its backups in a cloud storage bucket. This bucket should be exclusive to Ark because each backup is stored in its own subdirectory in the bucket’s root. A service account will be needed to authorize Ark to upload files into the bucket.
I created a service account just for Ark to use. It will need read and write access to the bucket. In GKE, persistent volumes are just disks attached to the nodes so I had to give it permissions for those too. These are permissions given to the service account:
compute.disks.get compute.disks.create compute.disks.createSnapshot compute.snapshots.get compute.snapshots.create compute.snapshots.useReadOnly compute.snapshots.delete compute.projects.get
At this point the bucket has been created and Ark has been allowed upload to it. Now it will need to know which bucket to use by setting the Ark Config (a custom resource defined by Ark):
# examples/gcp/00-ark-config.yaml ... backupStorageProvider: name: gcp bucket: neso-cluster-backup ...
To hand off the service account to Ark a k8s
cloud-credentials containing the service account key will have to be created.
# download service account key gcloud iam service-accounts keys create ark-svc-account \ --iam-account $SERVICE_ACCOUNT_EMAIL # create secret kubectl create secret generic cloud-credentials \ --namespace heptio-ark \ --from-file cloud=ark-svc-account
In the Ark Deployment yaml file, there wasn’t anything that needed to be changed. All that’s left to start the server is to
kubectl apply the Config and the Deployment.
After everything’s been set up on both clusters and the Ark client install locally. It’s time to put Ark to the test. Making sure
kubectl's context was set to the US cluster, with fingers crossed we generated the backup:
$ ark backup create us-cluster --exclude-namespaces kube-system,kube-public,heptio-ark
Gave it a few minutes and then:
$ ark backup get NAME STATUS CREATED EXPIRES SELECTOR us-cluster Completed 2018-09-21 15:59:35 +0800 +08 30d <none>
The backup includes all the resources from pods to ingresses. We wanted to keep the IP addresses we used in the old cluster. To free up the IP addresses, down go the ingresses in the old cluster.
Now setting the
kubectl context to the new cluster in Europe. It took a while for the cluster to see the backup but it did appear eventually:
$ ark backup get NAME STATUS CREATED EXPIRES SELECTOR us-cluster Completed 2018-09-21 15:59:35 +0800 +08 30d <none>
$ ark restore create --from-backup us-cluster
Ark was able to restore everything except for the persistent volumes. Our applications could not connect to the databases. Taking a closer look, it appears that Ark created the disks but they were in the region where the backups were created. The maintainers are aware of this issue and added the fix for this in
We can’t wait for that release, though. We had no choice but to move the databases out of k8s. We ultimately decided to spin up a CloudSQL instance and stick our test environments’ databases there.
Ark is an awesome tool. Although the migration did not go as smooth as it should have, some good came out of it. It forced us to move our database outside of kubernetes which we shouldn’t be doing in the first place. Also, we now have regular backups of our new cluster.
A code integration and delivery (CI/CD) pipeline is essential to a software project. Not only does it help us ship code faster but it also allows less room for mistakes. This tutorial will teach you how to set up a simple CI/CD pipeline for your GKE cluster with Codeship.
We no longer had any time maintaining our self-hosted instance of Jenkins. Codeship works well for us because it has native Docker support. It has been extremely reliable, and their support is always quick to respond. As a bonus, adding Slack notifications is easy. It is also great for personal projects because it offers 100 free builds per month without requiring a credit card.
For this tutorial, we will be setting up a pipeline for a demo application that you can find at https://gitlab.com/neso-io/cat-api. It is a simple API that returns a random image of a cat.
At Neso, the build starts when a pull/merge request is accepted. On top of Codeship, we also make use of Gitlab CI so that we only merge the PR if the automated tests pass. We will be making something similar. A git push or PR merge to the master branch will trigger the chain of events in our pipeline.
Codeship will be responsible for running the automated tests, as well as building and pushing our Docker image to Google Container Registry. It then lets Kubernetes (k8s) know that there is a new image to be rolled out. At this point, we will let Kubernetes do its magic.
At Neso, a Kubernetes namespace serves as an environment. We'll be following that convention here. The master branch goes into our staging environment/namespace; git tags matching the Semantic Versioning Specification will go into our production environment/namespace.
Similar to other managed CI/CD platforms, a bit of configuration is required. At the time of writing, Codeship only supports Github, Bitbucket, and Gitlab. When you add a new project, you'll need to pick where you host your code between the three. I've selected Gitlab for this tutorial since that's where the code is on.
When you're done selecting your SCM provider, it's time to connect Codeship to your repository. You'll need raised privileges in your repository before you can add it to Codeship.
After you add your project on Codeship, you'll need a few files to get started:
Only the files codeship.aes and secrets.env shouldn't be committed so go ahead and add those to your .gitignore file.
We'll be setting up Codeship to run the unit tests and build the Docker image. The built Docker image will need to be pushed somewhere that our Kubernetes cluster has access to. There are many registries available like Gitlab and DockerHub. Since we're already on GKE, we might as well use Google Container Registry (GCR).
GCR isn't just going to allow anyone to push to the registry unless they're authenticated. For that, we're going to need to create a service account on GCP. The service account will need the roles:
Download the JSON and add it to the secrets.env file. For a more detailed guide for encrypting sensitive data Codeship provides excellent documentation for this at https://documentation.codeship.com/pro/builds-and-configuration/environment-variables/#encrypted-environment-variables.
Before encryption, my secrets file looked like:
I mentioned earlier that every project on Codeship is provided with an AES key that will be used for encrypting anything sensitive that will be needed in our pipeline. The key can be found in the project's General Project Settings page.
If you haven't already downloaded the project's AES key go ahead and add it to the root of the project as codeship.aes.
Before you can encrypt the secrets file, you'll need to install a CLI tool created by Codeship called Jet. Jet CLI is a valuable tool that can not only encrypt the secrets files, but it can help you test your Codeship configuration by being able to run the steps on your machine .
Once you've installed Jet we'll be able to encrypt the secrets file with a single command:
$ jet encrypt secrets.env secrets.env.encrypted
There should now be a file named secrets.env.encrypted containing the encrypted contents of the secrets.env file. With all of that done, we can now start designing our pipeline.
Automated tests are a requirement in Code Integration. For our test runner, we'll use Neso's image with PHP and composer built in. Before the build starts, Codeship is going to pull the code from the repo. We will need to mount this code to the image used in our test runner.
This image doesn't need too much set up on our end so we'll only need to run two commands for our test runner: installing the dependencies and running the test itself.
This should execute no matter what branch we push to.
We won't have anything to run in our cluster without a Docker image. The next step we'll define is how to build the Docker image. Thankfully as I mentioned earlier, Codeship Pro has native Docker support. One important thing to take note of is the name of the image. If you're using Google Container Registry like us, then you'll need to follow the convention for the image's tag: [HOSTNAME]/[PROJECT-ID]/[IMAGE].
The step is pretty straightforward too. We only need to reference the name of the service we defined.
The Docker image should only be built when we push to the master branch as indicated by the tag field.
Before we push the Docker image, we have to decrypt the sensitive data needed to be authenticated. Fortunately, Codeship has a Docker image for that; it will only need to know the name of the encrypted file.
The image that will be pushed will be having two tags: the branch that triggered the build (in this case: master) and the first 8 characters of the git commit hash.
The image last tagged as master will serve as the release candidate in this pipeline. The tag for the commit hash will be used when we update the image to run in our k8s deployment.
The final step in our pipeline will be rolling out the new Docker image. We need Google Cloud SDK for this. Since it involves a couple of commands, we'll write a bash script.
The command in line 5 is where the authentication happens using the Google service account JSON we encrypted. That is the only part of the script that is Codeship specific. You probably remember running something similar to the commands in lines 8 to 11 after you installed the Gcloud SDK on your machine. The one on line 11 allows you to run kubectl commands from your terminal. Lines 13 to 17 allow us to specify which environment to roll out the image to. If we change anything in the k8s manifest, Line 21 applies that change as part of the pipeline.
The last bit of code in the script sets the image to be rolled out to the environment or k8s namespace. For the staging environment, we'll be using the Docker image tagged with the current commit hash. The only tags matching the Semantic Versioning specification will be rolled out to the production environment.
Since we'll need to authenticate Codeship to be able to execute commands on our k8s cluster, a Docker image with Gcloud SDK will be required. Codeship is there to save us once again with their own Image built for google cloud deployments. Same as the image we used for authenticating, we’ll pass the file name with the encrypted data to it.
The image will need a copy of the deploy script so let’s mount our code to it (lines 6 to 7). Now the only thing left to do is to execute the script.
This step will only execute during a push or PR merge to the master branch. The command tells it to roll out the image to the staging environment.
When we trigger the build, we should get all green.
At Neso, we create a Git tag following the Semantic Versioning specification from the staging branch to trigger a release. We can copy that procedure by adding a service using the Docker image we tagged as master.
The idea here is that Codeship will pull the Image we last tagged as master, tag it again with the semver tag and push it to the registry.
The last step in our pipeline is rolling out the image to production.
This looks pretty much the same as the step for deploying to staging except it will only be triggered with a semver tag. It should still run the test suite then promote and roll out the Docker image from the staging environment.
Now, if I tag the master branch as 1.0.0. It our pipeline look like:
Congratulations! You now have a simple CI/CD pipeline. There's still plenty of room for improving this pipeline: like having the test suite and building of the Image concurrently. But I think that will serve best as an exercise for you, dear reader.