Sync Kubernetes volume with S3 bucket

In the Cloud era topics like backup became kind of less important. One of the reasons is probably possibility to take automated snapshots of your volumes, another maybe that you can use some kind of redundancy. However, there are still use cases where you need more traditional ways to backup your data. In this article we will discuss about syncing files from your Kubernetes volume to S3 compatible storage.

If you know the concept of Persistent Volumes in Kubernetes, then you also know three of the supported access modes: ReadWriteOnce, ReadOnlyMany or ReadWriteMany. The first mode allows a volume to be mounted just once, meaning it is usually already used by some pod and it is possible to access it just through this pod. The other two types can be mounted multiple times, meaning multiple pods can access the same data. In case of ReadOnlyMany all pods can just read the data, but in case of ReadWriteMany all pods can read and write data. In this article we are going to focus to these volumes that we can mount multiple times.

At the moment of writing this, the following CSI plugins support ReadWriteMany mode: AzureFile, CephFS, Glusterfs, Quobyte, NFS and PortworxVolume. As you may guess by my article written few months ago, in our example we will use NFS volume. Now let's see the manifest:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: volume-backup
spec:
  schedule: "20 3 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 2
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: aws-cli
              image: amazon/aws-cli
              env:
                - name: AWS_ACCESS_KEY_ID
                  value: {your_aws_access_key_ID}
                - name: AWS_SECRET_ACCESS_KEY
                  value: {your_aws_secret_access_key}
                - name: AWS_REGION
                  value: eu-central-1
              args:
                - --no-progress
                - --delete
                - s3
                - sync
                - /data
                - s3://your_bucket_name
              volumeMounts:
                - name: backup
                  mountPath: /data
          volumes:
            - name: backup
              persistentVolumeClaim:
                claimName: {your_pvc_name}
          restartPolicy: OnFailure
      ttlSecondsAfterFinished: 172800

As you can see, this manifest would create a Kubernetes CronJob object which is going to trigger a job every night at 3:20. No concurrency is allowed, meaning just one job of a kind at a time is allowed. We want to keep history of two last successful and two failed jobs. Job template has just one container which is using official amazon/aws-cli image from Docker Hub. As shown in volumes section, we use existing PVC and mount it under /data path in our container. The entrypoint for this image is aws-cli (aws) itself and we pass just the arguments.

Yes, it is easy like that. Just one manifest and our volume files get synced to S3 bucket. Of course, instead of sync subcommand you can use any other available aws-cli subcommand. Finally, a note for Mino users: pass `--endpoint-url=https://minio.your-domain.com:9000` as the first argument to aws-cli.