DorisDB、TiDB/TiFlash、ClickHouse性能对比-单表监控场景
--2021-02-03 刘春雷
随着数据库集群数的快速上涨,监控的数据分析变得越发的重要,例如:
-
快速分析指定集群、指定时间段的监控值的情况
-
快速分析所有集群、指定时间的分类监控Top10,例如最近1天写Top10、读Top10
对于以上需求,需要分析型的数据库来支持~例如目前业界最火的3款:
-
DorisDB
-
ClickHouse
-
TiDB/TiFlash
当前监控数据已经落地在TiDB数据库,条数 8亿+ ,本次只对这3种数据库进行性能测试,同时涉及:
- DorisDB集群,并行度/桶对执行SQL的影响
- ClickHouse单实例、集群对SQL执行的影响
- TiDB 版本、SSD磁盘与 闪存卡磁盘对SQL执行的影响
因接触时间不久,可能有不正确的地方,大家参考着看~
1、汇总
1.1、结论
- DorisDB总体时间最短
- TiDB/TiFlash总体时间最长
- TiDB执行计划多数走TiKV,导致执行时间长,且数据量越多,执行时间越长
- TiDB多数走TiFlash更快,但4.0.10 版本的执行计划多数不走
- TiDB闪存卡比SSD更快,但4.0.10有些执行计划不是最优,导致效率低
- DorisDB最快次数最多
- DorisDB的并行度及桶数量会较大影响SQL执行时间,越高越快
- ClickHouse集群 比 单实例 SQL执行时间快
1.2、SQL执行情况
注:绿色时间最短,红色最长
时间均为:秒
2、信息
2.1、数据库信息
2.2、测试表信息
表名:mysql_status
内容:存的为监控信息
数据量:872208435 (8.7亿)
3、DorisDB准备数据
3.1、DorisDB建表
CREATE TABLE mysql_status
(
cluster_id
varchar(50) NULL COMMENT “”,
inip
varchar(50) NULL COMMENT “”,
monitorTime
datetime NULL COMMENT “”,
id
bigint(20) NULL COMMENT “”,
port
int(11) NULL COMMENT “”,
iRole
varchar(20) NOT NULL DEFAULT “OTHER” COMMENT “”,
insert_ok
int(11) NOT NULL COMMENT “”,
alive
int(11) NOT NULL COMMENT “”,
Binlog_count
int(11) NOT NULL COMMENT “”,
Binlog_total_size
bigint(20) NOT NULL COMMENT “”,
Bytes_received
bigint(20) NOT NULL COMMENT “”,
Bytes_sent
bigint(20) NOT NULL COMMENT “”,
Com_delete
bigint(20) NOT NULL COMMENT “”,
Com_insert
bigint(20) NOT NULL COMMENT “”,
Com_replace
bigint(20) NOT NULL COMMENT “”,
Com_select
bigint(20) NOT NULL COMMENT “”,
Com_update
bigint(20) NOT NULL COMMENT “”,
Queries
bigint(20) NOT NULL COMMENT “”,
Questions
bigint(20) NOT NULL COMMENT “”,
Slow_queries
bigint(20) NOT NULL COMMENT “”,
Created_tmp_disk_tables
bigint(20) NOT NULL COMMENT “”,
Threads_cached
int(11) NOT NULL COMMENT “”,
Threads_connected
int(11) NOT NULL COMMENT “”,
Threads_created
int(11) NOT NULL COMMENT “”,
Threads_running
int(11) NOT NULL COMMENT “”,
Uptime
int(11) NOT NULL COMMENT “”,
createTime
datetime NOT NULL COMMENT “数据写入时间”,
monitorDay
date NOT NULL COMMENT “监控数据获取日期”
) ENGINE=OLAP
DUPLICATE KEY(cluster_id
, inip
, monitorTime
)
COMMENT “OLAP”
PARTITION BY RANGE(**monitorDay**
)
(PARTITION p1 VALUES [(‘0000-01-01’), (‘2020-01-31’)),
PARTITION p2 VALUES [(‘2020-01-31’), (‘2020-02-29’)),
PARTITION p3 VALUES [(‘2020-02-29’), (‘2020-03-31’)),
PARTITION p4 VALUES [(‘2020-03-31’), (‘2020-04-30’)),
PARTITION p5 VALUES [(‘2020-04-30’), (‘2020-05-31’)),
PARTITION p6 VALUES [(‘2020-05-31’), (‘2020-06-30’)),
PARTITION p7 VALUES [(‘2020-06-30’), (‘2020-07-31’)),
PARTITION p8 VALUES [(‘2020-07-31’), (‘2020-08-31’)),
PARTITION p9 VALUES [(‘2020-08-31’), (‘2020-09-30’)),
PARTITION p10 VALUES [(‘2020-09-30’), (‘2020-10-31’)),
PARTITION p11 VALUES [(‘2020-10-31’), (‘2020-11-30’)),
PARTITION p12 VALUES [(‘2020-11-30’), (‘2020-12-31’)),
PARTITION p13 VALUES [(‘2020-12-31’), (‘2021-01-31’)),
PARTITION p14 VALUES [(‘2021-01-31’), (‘2021-02-28’)),
PARTITION p15 VALUES [(‘2021-02-28’), (‘2021-03-31’)),
PARTITION p16 VALUES [(‘2021-03-31’), (‘2021-04-30’)),
PARTITION p17 VALUES [(‘2021-04-30’), (‘2021-05-31’)))
DISTRIBUTED BY HASH(id
) BUCKETS 10
PROPERTIES (
“replication_num” = “1”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”
);
3.2、DorisDB导入数据
数据从 TiDB集群SSD 导出的监控数据,灌入
【dumping导出csv】:
./dumpling -uxxx -pxxx -hxxx -Pxxx -o dumping/ --filetype csv --csv-delimiter ‘’ --no-header -F 256MiB --threads 4 --sql ‘select cluster_id,inip,monitorTime,id,port,iRole,insert_ok,alive,Binlog_count,Binlog_total_size,Bytes_received,Bytes_sent,Com_delete,Com_insert,Com_replace,Com_select,Com_update,Queries,Questions,Slow_queries,Created_tmp_disk_tables,Threads_cached,Threads_connected,Threads_created,Threads_running,Uptime,createTime,monitorDay from mysql_status’
【导入】:
for i in {000…208};do echo “-----------$i-----------”;curl --location-trusted -u root:xxx -T result.000000$i.csv -H “label:r$i” -H “column_separator:,” http://127.0.0.1:xxx/api/lcl/mysql_status/_stream_load;done
数据量: 872208435 (8.7亿)
速度举例:
{
“TxnId”: 569,
“Label”: “r166”,
“Status”: “Success”,
“Message”: “OK”,
"NumberTotalRows": 1781588 ,
“NumberLoadedRows”: 1781588,
“NumberFilteredRows”: 0,
“NumberUnselectedRows”: 0,
“LoadBytes”: 268435506,
“LoadTimeMs”: 6951,
“BeginTxnTimeMs”: 0,
“StreamLoadPutTimeMs”: 1,
“ReadDataTimeMs”: 4935,
“WriteDataTimeMs”: 6938,
“CommitAndPublishTimeMs”: 11
4、TiDB准备数据
4.1、建表
SSD 与 闪存卡 一样
CREATE TABLE mysql_status
(
id
bigint(20) NOT NULL AUTO_INCREMENT,
cluster_id
varchar(50) NOT NULL ,
inip
varchar(50) NOT NULL ,
port
int(11) NOT NULL ,
iRole
varchar(20) NOT NULL,
insert_ok
int(11) NOT NULL DEFAULT ‘0’,
alive
int(11) NOT NULL DEFAULT ‘0’,
Binlog_count
int(11) NOT NULL DEFAULT ‘0’,
Binlog_total_size
bigint(20) NOT NULL DEFAULT ‘0’,
Bytes_received
bigint(20) NOT NULL DEFAULT ‘0’,
Bytes_sent
bigint(20) NOT NULL DEFAULT ‘0’,
Com_delete
bigint(20) NOT NULL DEFAULT ‘0’,
Com_insert
bigint(20) NOT NULL DEFAULT ‘0’,
Com_replace
bigint(20) NOT NULL DEFAULT ‘0’,
Com_select
bigint(20) NOT NULL DEFAULT ‘0’,
Com_update
bigint(20) NOT NULL DEFAULT ‘0’,
Queries
bigint(20) NOT NULL DEFAULT ‘0’,
Questions
bigint(20) NOT NULL DEFAULT ‘0’,
Slow_queries
bigint(20) NOT NULL DEFAULT ‘0’,
Created_tmp_disk_tables
bigint(20) NOT NULL DEFAULT ‘0’,
Threads_cached
int(11) NOT NULL DEFAULT ‘0’,
Threads_connected
int(11) NOT NULL DEFAULT ‘0’,
Threads_created
int(11) NOT NULL DEFAULT ‘0’,
Threads_running
int(11) NOT NULL DEFAULT ‘0’,
Uptime
int(11) NOT NULL DEFAULT ‘0’,
monitorTime
datetime NOT NULL ,
monitorDay
date NOT NULL ,
createTime
datetime NOT NULL ,
PRIMARY KEY (id
),
KEY idx_inIP_port
(inip
,port
,monitorTime
),
KEY idx_clusterid
(cluster_id
,monitorTime
),
KEY idx_monitorTime
(monitorTime
),
KEY idx_timeclusterinip
(monitorTime
,cluster_id
,inip
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=872252609 COMMENT=‘xxx’
4.2、导入数据
【导出:复用上面导出csv数据】:
【导入:load 方式导入数据】:
for i in {000…114};do echo “----------$i-------------”;mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘result.000000$i.csv’ INTO TABLE mysql_status FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ (cluster_id,inip,monitorTime,id,port,iRole,insert_ok,alive,Binlog_count,Binlog_total_size,Bytes_received,Bytes_sent,Com_delete,Com_insert,Com_replace,Com_select,Com_update,Queries,Questions,Slow_queries,Created_tmp_disk_tables,Threads_cached,Threads_connected,Threads_created,Threads_running,Uptime,createTime,monitorDay)” ;done
闪存卡集群此表数据量 :872208435
SSD集群此表数据量 :884370397
【 此表添加至TiFlash 】:
alter table mysql_status SET TIFLASH REPLICA 1;
5、ClickHouse准备数据
5.1、单实例
clickhouse-client --user=default --host=127.0.0.1
CREATE TABLE mysql_status ENGINE = MergeTree order by id AS SELECT * FROM mysql(‘xxx:xx’, ‘xxx’, ‘mysql_status’, ‘xxx’, ‘mima’);
Progress: 120.00 million rows, 24.24 GB ( 261.34 thousand rows/s ., 52.80 MB/s.)
5.2、集群
按照主键id取余,分成3份数据,平均落在3个节点上
【选择其中一个分片同步所有数据】
clickhouse-client --user=default --host=127.0.0.1
CREATE TABLE mysql_status ENGINE = MergeTree order by id AS SELECT * FROM mysql(‘I0.0.0.1:xxx’, ‘db_name’, ‘mysql_status’, ‘xxx’, ‘mima’);
Progress: 120.00 million rows, 24.24 GB ( 261.34 thousand rows/s ., 52.80 MB/s.)
【3个均创建分布式表】 :
CREATE TABLE mysql_status_all AS mysql_status ENGINE = Distributed(perftest_3shards_1replicas, default, mysql_status , rand());
【3个均创建shard表】:
CREATE TABLE default. mysql_status_shard (id
Int64, cluster_id
String, inip
String, port
Int32, iRole
String, insert_ok
Int32, alive
Int32, Binlog_count
Int32, Binlog_total_size
Int64, Bytes_received
Int64, Bytes_sent
Int64, Com_delete
Int64, Com_insert
Int64, Com_replace
Int64, Com_select
Int64, Com_update
Int64, Queries
Int64, Questions
Int64, Slow_queries
Int64, Created_tmp_disk_tables
Int64, Threads_cached
Int32, Threads_connected
Int32, Threads_created
Int32, Threads_running
Int32, Uptime
Int32, monitorTime
DateTime, monitorDay
Date, createTime
DateTime) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192
【导入数据】:
第1台台执行
insert into mysql_status_shard select * from mysql_status_all where id%3=0;
第2台执行
insert into mysql_status_shard select * from mysql_status_all where id%3=1;
第3台执行
insert into mysql_status_shard select * from mysql_status_all where id%3=2;
【3个均创建最终分布式表】 :
CREATE TABLE mysql_status_shard_all AS mysql_status_shard ENGINE = Distributed(perftest_3shards_1replicas, default, mysql_status_shard , rand());
【3个分片每个本地表的条数】:
select count(*) from mysql_status_shard ;
┌───count()─┐
│ 294362289 │
└──────── ┘
┌───count() ┐
│ 294362291 │
└────────┘
┌───count() ┐
│ 294362289 │
└────────┘
6、SQL测试对比
注:绿色的为最快的