Setup TDengine Cluster on Kubernetes

Service

Service config taosd-service.yaml for each port we will use, here note that the metadata.name (setted as "taosd") will be used in next step:

---
apiVersion: v1
kind: Service
metadata:
  name: "taosd"
  labels:
    app: "tdengine"
spec:
  ports:
  - name: tcp6030
    protocol: "TCP"
    port: 6030
  - name: tcp6041
    protocol: "TCP"
    port: 6041
  selector:
    app: "tdengine"

StatefulSet

We use StatefulSet config tdengine.yaml for TDengine.

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: "tdengine"
  labels:
    app: "tdengine"
spec:
  serviceName: "taosd"
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: "tdengine"
  template:
    metadata:
      name: "tdengine"
      labels:
        app: "tdengine"
    spec:
      containers:
        - name: "tdengine"
          image: "tdengine/tdengine:3.0.7.1"
          imagePullPolicy: "IfNotPresent"
          ports:
            - name: tcp6030
              protocol: "TCP"
              containerPort: 6030
            - name: tcp6041
              protocol: "TCP"
              containerPort: 6041
          env:
            # POD_NAME for FQDN config
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            # SERVICE_NAME and NAMESPACE for fqdn resolve
            - name: SERVICE_NAME
              value: "taosd"
            - name: STS_NAME
              value: "tdengine"
            - name: STS_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            # TZ for timezone settings, we recommend to always set it.
            - name: TZ
              value: "Asia/Shanghai"
            # TAOS_ prefix will configured in taos.cfg, strip prefix and camelCase.
            - name: TAOS_SERVER_PORT
              value: "6030"
            # Must set if you want a cluster.
            - name: TAOS_FIRST_EP
              value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
            # TAOS_FQND should always be set in k8s env.
            - name: TAOS_FQDN
              value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
          volumeMounts:
            - name: taosdata
              mountPath: /var/lib/taos
          startupProbe:
            exec:
              command:
                - taos-check
            failureThreshold: 360
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - taos-check
            initialDelaySeconds: 5
            timeoutSeconds: 5000
          livenessProbe:
            exec:
              command:
                - taos-check
            initialDelaySeconds: 15
            periodSeconds: 20
  volumeClaimTemplates:
    - metadata:
        name: taosdata
      spec:
        accessModes:
          - "ReadWriteOnce"
        storageClassName: "standard"
        resources:
          requests:
            storage: "5Gi"

Start the cluster

kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml

The script will create a three node TDengine cluster on k8s.

Execute show dnodes in taos shell:

kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-2 -- taos -s "show dnodes"

Well, the current dnodes list shows:

Welcome to the TDengine shell from Linux, Client Version:3.0.0.0
Copyright (c) 2022 by TAOS Data, Inc. All rights reserved.

taos> show dnodes
   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:29:49.049 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:11.895 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:33.007 |                                |
Query OK, 3 rows affected (0.004610s)

Scale Up

TDengine on Kubernetes could automatically scale up with:

kubectl scale statefulsets tdengine --replicas=4

Check if scale-up works:

kubectl get pods -l app=tdengine 

Results:

NAME         READY   STATUS    RESTARTS   AGE
tdengine-0   1/1     Running   0          2m9s
tdengine-1   1/1     Running   0          108s
tdengine-2   1/1     Running   0          86s
tdengine-3   1/1     Running   0          22s

Check TDengine dnodes:

kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"

Results:

Welcome to the TDengine shell from Linux, Client Version:3.0.0.0
Copyright (c) 2022 by TAOS Data, Inc. All rights reserved.

taos> show dnodes
   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:29:49.049 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:11.895 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:33.007 |                                |
      4 | tdengine-3.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:31:36.204 |                                |
Query OK, 4 rows affected (0.009594s)

Scale Down

Let's try scale down from 4 to 3.

To perform a right scale-down, we should drop the last dnode in taos shell first:

kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 4"

Then scale down to 3.

kubectl scale statefulsets tdengine --replicas=3

Extra replicas pods will be terminated, and retain 3 pods.

Type kubectl get pods -l app=tdengine to check pods.

NAME         READY   STATUS    RESTARTS   AGE
tdengine-0   1/1     Running   0          4m17s
tdengine-1   1/1     Running   0          3m56s
tdengine-2   1/1     Running   0          3m34s

Also need to remove the pvc(if no, scale-up will be failed next):

kubectl delete pvc taosdata-tdengine-3

Now your TDengine cluster is safe.

Scale up again will be ok:

kubectl scale statefulsets tdengine --replicas=3

show dnodes results:

   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:29:49.049 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:11.895 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:33.007 |                                |
      5 | tdengine-3.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:34:35.520 |                                |

Let's do something BAD Case 1

Scale it up to 4 and then scale down to 2 directly. Deleted pods are offline now:

Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.

taos> show dnodes
   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:29:49.049 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:11.895 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | offline    | 2022-06-22 15:30:33.007 | status msg timeout             |
      5 | tdengine-3.taosd.default.sv... |      0 |            256 | offline    | 2022-06-22 15:34:35.520 | status msg timeout             ||
Query OK, 4 row(s) in set (0.004293s)

But we can't drop tje offline dnodes, the dnode will stuck in dropping mode (if you call drop dnode 'fqdn:6030').

Let's do something BAD Case 2

Note that if the remaining dnodes is less than the database replica, it will cause error until you scale it up again.

Create database with replica 3, and insert data to a table:

kubectl exec -i -t tdengine-0 -- \
  taos -s \
  "create database if not exists test replica 2;
   use test; 
   create table if not exists t1(ts timestamp, n int);
   insert into t1 values(now, 1)(now+1s, 2);"

Scale down to replica 1 (bad behavior):

kubectl scale statefulsets tdengine --replicas=1

Now in taos shell, all operations with database test are not valid.

So, before scale-down, please check the max value of replica among all databases, and be sure to do drop dnode step.

Clean Up TDengine StatefulSet

To complete remove tdengine statefulset, type:

kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine