Skip to content

02-Aggregate

Agg: all
1. Aggregate function:
- stddev_pop
- varpop
- avg
- sum
- leastsquares
- statecount
- max
- min
2. Check error
3. Check aggregate function with null

Agg function bugs
1. Verify bug TD-21561 (count fun with group by error)

Agg-basic: group_concat
Test the GROUP_CONCAT function

Agg-basic: group_concat regression
GROUP_CONCAT extended regression: separator in merge-aligned operators,
NULL handling, multi-column, nchar/varchar mix, separator variants,
partition grouping, and edge cases.

Agg-basic: last
Test the LAST function

Agg: last/last_row with tag
description: verify the behavior of selecting last/last_row with tag column outside.
For example: select last(ts), tag1, tag2 from stable group by tbname.
In this case, we should read cache data to get the tag column value.

Agg-basic: last with pk
Test the LAST function with composite key outside.
For example: select last(ts), pk from stb group by tbname.

Agg-basic: Leastsquares
Test the LeastSquares function, including time windows, filtering on ordinary data columns, filtering on tag columns.

Agg-basic: smoking cases
Smoking the aggregate functions

Fun: stddev_samp()
1. create normal table with timestamp,int columns
2. insert 5 rows with int values 1,2,3,4,5
3. query stddev_samp(int column) and check the result

Fun: variance()
1. create normal table with timestamp,int columns
2. insert 5 rows with int values 1,2,3,4,5
3. query variance(int column) and check the result

Fun: var_pop()
same with variance()

Fun: var_samp()
1. create normal table with timestamp,int columns
2. insert 5 rows with int values 1,2,3,4,5
3. query var_samp(int column) and check the result

Fun: group_concat()
1. create normal table with timestamp,int columns
2. insert 10 rows with int values 1,2,3,4,5,6,7,8,9,10
3. query group_concat(int column) and check the result

Fun: apercentile()
1. Sim case including time windows, t-digest input, null value
2. Query on super/child/normal table
3. Support types
4. Error cases
5. Query on distribute

Fun: avg()
1. Sim case including time windows, filtering on ordinary data columns, filtering on tag columns, GROUP BY, and PARTITION BY.
2. Support types
3. Error cases
4. Boundary values
5. Basic query
6. AVG with filter conditions
7. AVG with unsigned types

Fun: distribute avg()
1. prepare data for distribute aggregate test
2. check distribute data
3. check avg function work status
4. distribute aggregate query test

Fun: count()
1. Sim case including time windows, filtering on ordinary data columns, filtering on tag columns, GROUP BY, and PARTITION BY.
2. Basic query
3. Error check
4. Query on stable/normal table
5. Query with interval clause
6. Query after restart taosd
7. Query on null data
8. Query on partition by clause
9. Query on distributed

Fun: elapsed()
1. Query on super/child/normal table
2. Query with nested
3. Query with join
4. Query with union
5. Query with other function
6. Query with filter
7. Query with tags
8. Error cases

Fun: histogram()
1. Query on super/child/normal table
2. Query with bin_type parameter as user_input/linear_bin/log_bin
3. Query with bin_description parameter
4. Query with different data type
5. Query with filter
6. Error cases
7. Check again after flush database

Fun: hyperloglog()
1. Query on super/child/normal table
2. Query with group by
3. Query with having
4. Query with different data type
5. Query with filter
6. Query with join
7. Error cases
8. Check again after flush database

leastsquares on a supertable whose child tables share identical timestamps.
Background
----------
leastsquares(col, start, step) fits y = a*x + b where x is a global
sequence number (start, start+step, start+2*step, …) assigned to every
row in the result set in time order. When rows from different child
tables share the same timestamp, the relative row order within that
timestamp is NOT defined by the SQL standard and TDengine makes no
guarantee about it.
As a result, querying leastsquares over a whole supertable that has
duplicate timestamps across child tables produces an undefined (though
deterministic for a given build) numeric result. This behaviour is
neither forbidden nor promoted — it is simply undefined.
What this test guarantees
-------------------------
1. No crash / no error — the query completes and returns one row.
2. Per-child-table results (direct query or PARTITION BY tbname) are
well-defined: each child has unique timestamps, so the fit is exact.
3. The supertable (no-partition) result is logged for observability but
is NOT asserted against a fixed value, because the merge order of
duplicate-timestamp rows is implementation-defined.
Observed behaviour (v3.4.x)
---------------------------
With two child tables tb1/tb2 both having ts=(T1,T2,T3), c1=(1,2,3):
* tb1 direct / tb2 direct / PARTITION BY tbname:
{slop:1.000000, intercept:0.000000} (perfect linear fit)
* stb (no partition, 6 rows merged):
{slop:0.228571, intercept:1.200000}
Explanation: the merge scan emits tb1's three rows first, then tb2's
three rows, giving y-sequence (1,2,3,1,2,3) with x=(1..6).
Least-squares on that sequence yields slope 24/105 ≈ 0.228571.
This differs from the per-child result because the x-axis treats
the 6 rows as one continuous sequence rather than two independent
3-row sequences.

Fun: leastsquares()
1. Query on different data types
2. Query on super/child/normal table
3. Error cases
4. Query with partition by
5. Query with group by and having
6. Query with union

Fun: percentile()
1. Sim case including time windows, filtering on ordinary data columns, filtering on tag columns, GROUP BY, and PARTITION BY.
2. Query on super/child/normal table
3. Support types
4. Error cases
5. Check tags

Fun: spread()
1. Sim case including time windows, filtering on ordinary data columns, filtering on tag columns, GROUP BY, and PARTITION BY.
2. Query on super/child/normal table
3. Support types
4. Error cases
5. Query with filter conditions
6. Query with group by
7. Query with distribute aggregate
8. Check function work status

Fun: stddev()
1. Sim case including time windows, filtering on ordinary data columns, filtering on tag columns, GROUP BY, and PARTITION BY.
2. Query on super/child/normal table
3. Support types
4. Error cases
5. Query with filter conditions
6. Query with group by
7. Query with distribute aggregate

Fun: stddev_pop()
same with stddev()

Fun: sum()
1. Sim case including time windows, filtering on ordinary data columns, filtering on tag columns, GROUP BY, and PARTITION BY.
2. Query on super/child/normal table
3. Support types
4. Error cases
5. Query with filter conditions
6. Query with group by
7. Query with distribute aggregate