Setup TDengine Cluster on Kubernetes
Service
Service config taosd-service.yaml for each port we will use, here note that the metadata.name (setted as "taosd") will be used in next step:
---
apiVersion: v1
kind: Service
metadata:
name: "taosd"
labels:
app: "tdengine"
spec:
ports:
- name: tcp6030
protocol: "TCP"
port: 6030
- name: tcp6041
protocol: "TCP"
port: 6041
selector:
app: "tdengine"
StatefulSet
We use StatefulSet config tdengine.yaml for TDengine.
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: "tdengine"
labels:
app: "tdengine"
spec:
serviceName: "taosd"
replicas: 3
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: "tdengine"
template:
metadata:
name: "tdengine"
labels:
app: "tdengine"
spec:
containers:
- name: "tdengine"
image: "tdengine/tdengine:3.0.7.1"
imagePullPolicy: "IfNotPresent"
ports:
- name: tcp6030
protocol: "TCP"
containerPort: 6030
- name: tcp6041
protocol: "TCP"
containerPort: 6041
env:
# POD_NAME for FQDN config
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# SERVICE_NAME and NAMESPACE for fqdn resolve
- name: SERVICE_NAME
value: "taosd"
- name: STS_NAME
value: "tdengine"
- name: STS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# TZ for timezone settings, we recommend to always set it.
- name: TZ
value: "Asia/Shanghai"
# TAOS_ prefix will configured in taos.cfg, strip prefix and camelCase.
- name: TAOS_SERVER_PORT
value: "6030"
# Must set if you want a cluster.
- name: TAOS_FIRST_EP
value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
# TAOS_FQND should always be set in k8s env.
- name: TAOS_FQDN
value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
volumeMounts:
- name: taosdata
mountPath: /var/lib/taos
startupProbe:
exec:
command:
- taos-check
failureThreshold: 360
periodSeconds: 10
readinessProbe:
exec:
command:
- taos-check
initialDelaySeconds: 5
timeoutSeconds: 5000
livenessProbe:
exec:
command:
- taos-check
initialDelaySeconds: 15
periodSeconds: 20
volumeClaimTemplates:
- metadata:
name: taosdata
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "standard"
resources:
requests:
storage: "5Gi"
Start the cluster
kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml
The script will create a three node TDengine cluster on k8s.
Execute show dnodes in taos shell:
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-2 -- taos -s "show dnodes"
Well, the current dnodes list shows:
Welcome to the TDengine shell from Linux, Client Version:3.0.0.0
Copyright (c) 2022 by TAOS Data, Inc. All rights reserved.
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:29:49.049 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:30:11.895 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:30:33.007 | |
Query OK, 3 rows affected (0.004610s)
Scale Up
TDengine on Kubernetes could automatically scale up with:
kubectl scale statefulsets tdengine --replicas=4
Check if scale-up works:
kubectl get pods -l app=tdengine
Results:
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 2m9s
tdengine-1 1/1 Running 0 108s
tdengine-2 1/1 Running 0 86s
tdengine-3 1/1 Running 0 22s
Check TDengine dnodes:
kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
Results:
Welcome to the TDengine shell from Linux, Client Version:3.0.0.0
Copyright (c) 2022 by TAOS Data, Inc. All rights reserved.
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:29:49.049 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:30:11.895 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:30:33.007 | |
4 | tdengine-3.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:31:36.204 | |
Query OK, 4 rows affected (0.009594s)
Scale Down
Let's try scale down from 4 to 3.
To perform a right scale-down, we should drop the last dnode in taos shell first:
kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 4"
Then scale down to 3.
kubectl scale statefulsets tdengine --replicas=3
Extra replicas pods will be terminated, and retain 3 pods.
Type kubectl get pods -l app=tdengine to check pods.
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 4m17s
tdengine-1 1/1 Running 0 3m56s
tdengine-2 1/1 Running 0 3m34s
Also need to remove the pvc(if no, scale-up will be failed next):
kubectl delete pvc taosdata-tdengine-3
Now your TDengine cluster is safe.
Scale up again will be ok:
kubectl scale statefulsets tdengine --replicas=3
show dnodes results:
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:29:49.049 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:30:11.895 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:30:33.007 | |
5 | tdengine-3.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:34:35.520 | |
Let's do something BAD Case 1
Scale it up to 4 and then scale down to 2 directly. Deleted pods are offline now:
Welcome to the TDengine shell from Linux, Client Version:2.1.1.0
Copyright (c) 2020 by TAOS Data, Inc. All rights reserved.
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | note |
============================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:29:49.049 | |
2 | tdengine-1.taosd.default.sv... | 0 | 256 | ready | 2022-06-22 15:30:11.895 | |
3 | tdengine-2.taosd.default.sv... | 0 | 256 | offline | 2022-06-22 15:30:33.007 | status msg timeout |
5 | tdengine-3.taosd.default.sv... | 0 | 256 | offline | 2022-06-22 15:34:35.520 | status msg timeout ||
Query OK, 4 row(s) in set (0.004293s)
But we can't drop tje offline dnodes, the dnode will stuck in dropping mode (if you call drop dnode 'fqdn:6030').
Let's do something BAD Case 2
Note that if the remaining dnodes is less than the database replica, it will cause error until you scale it up again.
Create database with replica 3, and insert data to a table:
kubectl exec -i -t tdengine-0 -- \
taos -s \
"create database if not exists test replica 2;
use test;
create table if not exists t1(ts timestamp, n int);
insert into t1 values(now, 1)(now+1s, 2);"
Scale down to replica 1 (bad behavior):
kubectl scale statefulsets tdengine --replicas=1
Now in taos shell, all operations with database test are not valid.
So, before scale-down, please check the max value of replica among all databases, and be sure to do drop dnode step.
Clean Up TDengine StatefulSet
To complete remove tdengine statefulset, type:
kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine