一步一步创建 TDengine 集群

Service 服务

创建一个 service 配置文件:taosd-service.yaml,服务名称 metadata.name (此处为 "taosd") 将在下一步中使用到。添加 TDengine 所用到的所有端口:

---
apiVersion: v1
kind: Service
metadata:
  name: "taosd"
  labels:
    app: "tdengine"
spec:
  ports:
  - name: tcp6030
    protocol: "TCP"
    port: 6030
  - name: tcp6041
    protocol: "TCP"
    port: 6041
  selector:
    app: "tdengine"

StatefulSet 有状态服务

根据 Kubernetes 对各类部署的说明,我们将使用 StatefulSet 作为 TDengine 的服务类型,创建文件 tdengine.yaml

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: "tdengine"
  labels:
    app: "tdengine"
spec:
  serviceName: "taosd"
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: "tdengine"
  template:
    metadata:
      name: "tdengine"
      labels:
        app: "tdengine"
    spec:
      containers:
        - name: "tdengine"
          image: "tdengine/tdengine:3.0.7.1"
          imagePullPolicy: "IfNotPresent"
          ports:
            - name: tcp6030
              protocol: "TCP"
              containerPort: 6030
            - name: tcp6041
              protocol: "TCP"
              containerPort: 6041
          env:
            # POD_NAME for FQDN config
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            # SERVICE_NAME and NAMESPACE for fqdn resolve
            - name: SERVICE_NAME
              value: "taosd"
            - name: STS_NAME
              value: "tdengine"
            - name: STS_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            # TZ for timezone settings, we recommend to always set it.
            - name: TZ
              value: "Asia/Shanghai"
            # TAOS_ prefix will configured in taos.cfg, strip prefix and camelCase.
            - name: TAOS_SERVER_PORT
              value: "6030"
            # Must set if you want a cluster.
            - name: TAOS_FIRST_EP
              value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
            # TAOS_FQND should always be set in k8s env.
            - name: TAOS_FQDN
              value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
          volumeMounts:
            - name: taosdata
              mountPath: /var/lib/taos
          startupProbe:
            exec:
              command:
                - taos-check
            failureThreshold: 360
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - taos-check
            initialDelaySeconds: 5
            timeoutSeconds: 5000
          livenessProbe:
            exec:
              command:
                - taos-check
            initialDelaySeconds: 15
            periodSeconds: 20
  volumeClaimTemplates:
    - metadata:
        name: taosdata
      spec:
        accessModes:
          - "ReadWriteOnce"
        storageClassName: "standard"
        resources:
          requests:
            storage: "5Gi"

启动集群

kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml

上面的配置将生成一个三节点的 TDengine 集群,dnode 是自动配置的,可以使用 show dnodes 命令查看当前集群的节点:

kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-2 -- taos -s "show dnodes"

一个三节点集群,应输出如下:

Welcome to the TDengine shell from Linux, Client Version:3.0.0.0
Copyright (c) 2022 by TAOS Data, Inc. All rights reserved.

taos> show dnodes
   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:29:49.049 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:11.895 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:33.007 |                                |
Query OK, 3 rows affected (0.004610s)

扩容

TDengine 支持自动扩容:

kubectl scale statefulsets tdengine --replicas=4

检查一下是否生效,首先看下 POD 状态:

kubectl get pods -l app=tdengine 

Results:

NAME         READY   STATUS    RESTARTS   AGE
tdengine-0   1/1     Running   0          2m9s
tdengine-1   1/1     Running   0          108s
tdengine-2   1/1     Running   0          86s
tdengine-3   1/1     Running   0          22s

TDengine Dnode 状态需要等 POD ready 后才能看到:

kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"

扩容后的四节点 TDengine 集群的 dnode 列表:

Welcome to the TDengine shell from Linux, Client Version:3.0.0.0
Copyright (c) 2022 by TAOS Data, Inc. All rights reserved.

taos> show dnodes
   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:29:49.049 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:11.895 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:33.007 |                                |
      4 | tdengine-3.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:31:36.204 |                                |
Query OK, 4 rows affected (0.009594s)

缩容

TDengine 的缩容并没有自动化,我们尝试将一个四节点集群缩容到三节点。

想要安全的缩容,首先需要将节点从 dnode 列表中移除:

kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 4"

确认移除成功后(使用 kubectl exec -i -t tdengine-0 -- taos -s "show dnodes" 查看和确认 dnode 列表),使用 kubectl 命令移除 POD:

kubectl scale statefulsets tdengine --replicas=3

最后一个 POD 将会被删除。使用命令 kubectl get pods -l app=tdengine 查看POD状态:

NAME         READY   STATUS    RESTARTS   AGE
tdengine-0   1/1     Running   0          4m17s
tdengine-1   1/1     Running   0          3m56s
tdengine-2   1/1     Running   0          3m34s

POD删除后,需要手动删除PVC,否则下次扩容时会继续使用以前的数据导致无法正常加入集群。

kubectl delete pvc taosdata-tdengine-3

此时TDengine集群才是安全的。之后还可以正常扩容:

kubectl scale statefulsets tdengine --replicas=4

kubectl exec -i -t tdengine-0 -- taos -s "show dnodes" 结果如下:

   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:29:49.049 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:11.895 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:33.007 |                                |
      5 | tdengine-3.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:34:35.520 |                                |

错误行为 1

扩容到四节点之后缩容到两节点,删除的 POD 会进入 offline 状态:

Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.

taos> show dnodes
   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:29:49.049 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-06-22 15:30:11.895 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | offline    | 2022-06-22 15:30:33.007 | status msg timeout             |
      5 | tdengine-3.taosd.default.sv... |      0 |            256 | offline    | 2022-06-22 15:34:35.520 | status msg timeout             ||
Query OK, 4 row(s) in set (0.004293s)

drop dnode 行为将不会按照预期执行,且下次集群重启后,所有的 dnode 节点将无法启动 dropping 状态无法退出。

错误行为 2

TDengine集群会持有 replica 参数,如果缩容后的节点数小于这个值,集群将无法使用:

创建一个库使用 replica 参数为 3,插入部分数据:

kubectl exec -i -t tdengine-0 -- \
  taos -s \
  "create database if not exists test replica 3;
   use test; 
   create table if not exists t1(ts timestamp, n int);
   insert into t1 values(now, 1)(now+1s, 2);"

缩容到单节点:

kubectl scale statefulsets tdengine --replicas=1

在 taos shell 中的所有数据库操作将无法成功。

清理 TDengine 集群

完整移除 TDengine 集群,需要分别清理 statefulset、svc、pvc。

kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine

在下一节,我们将使用 Helm 来提供更灵活便捷的操作方式。