DorisDB、TiDB/TiFlash、ClickHouse性能对比-单表监控场景

DorisDB、TiDB/TiFlash、ClickHouse性能对比-单表监控场景

                                                --2021-02-03 刘春雷

随着数据库集群数的快速上涨,监控的数据分析变得越发的重要,例如:

  • 快速分析指定集群、指定时间段的监控值的情况

  • 快速分析所有集群、指定时间的分类监控Top10,例如最近1天写Top10、读Top10

对于以上需求,需要分析型的数据库来支持~例如目前业界最火的3款:

  • DorisDB

  • ClickHouse

  • TiDB/TiFlash

当前监控数据已经落地在TiDB数据库,条数 8亿+ ,本次只对这3种数据库进行性能测试,同时涉及:

  • DorisDB集群,并行度/桶对执行SQL的影响
  • ClickHouse单实例、集群对SQL执行的影响
  • TiDB 版本、SSD磁盘与 闪存卡磁盘对SQL执行的影响

因接触时间不久,可能有不正确的地方,大家参考着看~

1、汇总

1.1、结论

  • DorisDB总体时间最短
  • TiDB/TiFlash总体时间最长
  • TiDB执行计划多数走TiKV,导致执行时间长,且数据量越多,执行时间越长
  • TiDB多数走TiFlash更快,但4.0.10 版本的执行计划多数不走
  • TiDB闪存卡比SSD更快,但4.0.10有些执行计划不是最优,导致效率低
  • DorisDB最快次数最多
  • DorisDB的并行度及桶数量会较大影响SQL执行时间,越高越快
  • ClickHouse集群 比 单实例 SQL执行时间快

1.2、SQL执行情况

注:绿色时间最短,红色最长

时间均为:秒

2、信息

2.1、数据库信息

2.2、测试表信息

表名:mysql_status

内容:存的为监控信息

数据量:872208435 (8.7亿)

3、DorisDB准备数据

3.1、DorisDB建表

CREATE TABLE mysql_status (
cluster_id varchar(50) NULL COMMENT “”,
inip varchar(50) NULL COMMENT “”,
monitorTime datetime NULL COMMENT “”,
id bigint(20) NULL COMMENT “”,
port int(11) NULL COMMENT “”,
iRole varchar(20) NOT NULL DEFAULT “OTHER” COMMENT “”,
insert_ok int(11) NOT NULL COMMENT “”,
alive int(11) NOT NULL COMMENT “”,
Binlog_count int(11) NOT NULL COMMENT “”,
Binlog_total_size bigint(20) NOT NULL COMMENT “”,
Bytes_received bigint(20) NOT NULL COMMENT “”,
Bytes_sent bigint(20) NOT NULL COMMENT “”,
Com_delete bigint(20) NOT NULL COMMENT “”,
Com_insert bigint(20) NOT NULL COMMENT “”,
Com_replace bigint(20) NOT NULL COMMENT “”,
Com_select bigint(20) NOT NULL COMMENT “”,
Com_update bigint(20) NOT NULL COMMENT “”,
Queries bigint(20) NOT NULL COMMENT “”,
Questions bigint(20) NOT NULL COMMENT “”,
Slow_queries bigint(20) NOT NULL COMMENT “”,
Created_tmp_disk_tables bigint(20) NOT NULL COMMENT “”,
Threads_cached int(11) NOT NULL COMMENT “”,
Threads_connected int(11) NOT NULL COMMENT “”,
Threads_created int(11) NOT NULL COMMENT “”,
Threads_running int(11) NOT NULL COMMENT “”,
Uptime int(11) NOT NULL COMMENT “”,
createTime datetime NOT NULL COMMENT “数据写入时间”,
monitorDay date NOT NULL COMMENT “监控数据获取日期”
) ENGINE=OLAP
DUPLICATE KEY(cluster_id, inip, monitorTime)
COMMENT “OLAP”
PARTITION BY RANGE(**monitorDay**)
(PARTITION p1 VALUES [(‘0000-01-01’), (‘2020-01-31’)),
PARTITION p2 VALUES [(‘2020-01-31’), (‘2020-02-29’)),
PARTITION p3 VALUES [(‘2020-02-29’), (‘2020-03-31’)),
PARTITION p4 VALUES [(‘2020-03-31’), (‘2020-04-30’)),
PARTITION p5 VALUES [(‘2020-04-30’), (‘2020-05-31’)),
PARTITION p6 VALUES [(‘2020-05-31’), (‘2020-06-30’)),
PARTITION p7 VALUES [(‘2020-06-30’), (‘2020-07-31’)),
PARTITION p8 VALUES [(‘2020-07-31’), (‘2020-08-31’)),
PARTITION p9 VALUES [(‘2020-08-31’), (‘2020-09-30’)),
PARTITION p10 VALUES [(‘2020-09-30’), (‘2020-10-31’)),
PARTITION p11 VALUES [(‘2020-10-31’), (‘2020-11-30’)),
PARTITION p12 VALUES [(‘2020-11-30’), (‘2020-12-31’)),
PARTITION p13 VALUES [(‘2020-12-31’), (‘2021-01-31’)),
PARTITION p14 VALUES [(‘2021-01-31’), (‘2021-02-28’)),
PARTITION p15 VALUES [(‘2021-02-28’), (‘2021-03-31’)),
PARTITION p16 VALUES [(‘2021-03-31’), (‘2021-04-30’)),
PARTITION p17 VALUES [(‘2021-04-30’), (‘2021-05-31’)))
DISTRIBUTED BY HASH(id) BUCKETS 10
PROPERTIES (
“replication_num” = “1”,
“in_memory” = “false”,
“storage_format” = “DEFAULT”
);

3.2、DorisDB导入数据

数据从 TiDB集群SSD 导出的监控数据,灌入

【dumping导出csv】:

./dumpling -uxxx -pxxx -hxxx -Pxxx -o dumping/ --filetype csv --csv-delimiter ‘’ --no-header -F 256MiB --threads 4 --sql ‘select cluster_id,inip,monitorTime,id,port,iRole,insert_ok,alive,Binlog_count,Binlog_total_size,Bytes_received,Bytes_sent,Com_delete,Com_insert,Com_replace,Com_select,Com_update,Queries,Questions,Slow_queries,Created_tmp_disk_tables,Threads_cached,Threads_connected,Threads_created,Threads_running,Uptime,createTime,monitorDay from mysql_status’
【导入】:

for i in {000…208};do echo “-----------$i-----------”;curl --location-trusted -u root:xxx -T result.000000$i.csv -H “label:r$i” -H “column_separator:,” http://127.0.0.1:xxx/api/lcl/mysql_status/_stream_load;done

数据量: 872208435 (8.7亿)

速度举例:

{
“TxnId”: 569,
“Label”: “r166”,
“Status”: “Success”,
“Message”: “OK”,
"NumberTotalRows": 1781588 ,
“NumberLoadedRows”: 1781588,
“NumberFilteredRows”: 0,
“NumberUnselectedRows”: 0,
“LoadBytes”: 268435506,
“LoadTimeMs”: 6951,
“BeginTxnTimeMs”: 0,
“StreamLoadPutTimeMs”: 1,

“ReadDataTimeMs”: 4935,
“WriteDataTimeMs”: 6938,
“CommitAndPublishTimeMs”: 11

4、TiDB准备数据

4.1、建表

SSD 与 闪存卡 一样

CREATE TABLE mysql_status (
id bigint(20) NOT NULL AUTO_INCREMENT,
cluster_id varchar(50) NOT NULL ,
inip varchar(50) NOT NULL ,
port int(11) NOT NULL ,
iRole varchar(20) NOT NULL,
insert_ok int(11) NOT NULL DEFAULT ‘0’,
alive int(11) NOT NULL DEFAULT ‘0’,
Binlog_count int(11) NOT NULL DEFAULT ‘0’,
Binlog_total_size bigint(20) NOT NULL DEFAULT ‘0’,
Bytes_received bigint(20) NOT NULL DEFAULT ‘0’,
Bytes_sent bigint(20) NOT NULL DEFAULT ‘0’,
Com_delete bigint(20) NOT NULL DEFAULT ‘0’,
Com_insert bigint(20) NOT NULL DEFAULT ‘0’,
Com_replace bigint(20) NOT NULL DEFAULT ‘0’,
Com_select bigint(20) NOT NULL DEFAULT ‘0’,
Com_update bigint(20) NOT NULL DEFAULT ‘0’,
Queries bigint(20) NOT NULL DEFAULT ‘0’,
Questions bigint(20) NOT NULL DEFAULT ‘0’,
Slow_queries bigint(20) NOT NULL DEFAULT ‘0’,
Created_tmp_disk_tables bigint(20) NOT NULL DEFAULT ‘0’,
Threads_cached int(11) NOT NULL DEFAULT ‘0’,
Threads_connected int(11) NOT NULL DEFAULT ‘0’,
Threads_created int(11) NOT NULL DEFAULT ‘0’,
Threads_running int(11) NOT NULL DEFAULT ‘0’,
Uptime int(11) NOT NULL DEFAULT ‘0’,
monitorTime datetime NOT NULL ,
monitorDay date NOT NULL ,
createTime datetime NOT NULL ,
PRIMARY KEY (id),
KEY idx_inIP_port (inip,port,monitorTime),
KEY idx_clusterid (cluster_id,monitorTime),
KEY idx_monitorTime (monitorTime),
KEY idx_timeclusterinip (monitorTime,cluster_id,inip)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=872252609 COMMENT=‘xxx’

4.2、导入数据

【导出:复用上面导出csv数据】:

【导入:load 方式导入数据】:

for i in {000…114};do echo “----------$i-------------”;mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘result.000000$i.csv’ INTO TABLE mysql_status FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’ (cluster_id,inip,monitorTime,id,port,iRole,insert_ok,alive,Binlog_count,Binlog_total_size,Bytes_received,Bytes_sent,Com_delete,Com_insert,Com_replace,Com_select,Com_update,Queries,Questions,Slow_queries,Created_tmp_disk_tables,Threads_cached,Threads_connected,Threads_created,Threads_running,Uptime,createTime,monitorDay)” ;done

闪存卡集群此表数据量 :872208435
SSD集群此表数据量 :884370397

此表添加至TiFlash 】:

alter table mysql_status SET TIFLASH REPLICA 1;

5、ClickHouse准备数据

5.1、单实例

clickhouse-client --user=default --host=127.0.0.1

CREATE TABLE mysql_status ENGINE = MergeTree order by id AS SELECT * FROM mysql(‘xxx:xx’, ‘xxx’, ‘mysql_status’, ‘xxx’, ‘mima’);

Progress: 120.00 million rows, 24.24 GB ( 261.34 thousand rows/s ., 52.80 MB/s.)

5.2、集群

按照主键id取余,分成3份数据,平均落在3个节点上

【选择其中一个分片同步所有数据】

clickhouse-client --user=default --host=127.0.0.1

CREATE TABLE mysql_status ENGINE = MergeTree order by id AS SELECT * FROM mysql(‘I0.0.0.1:xxx’, ‘db_name’, ‘mysql_status’, ‘xxx’, ‘mima’);

Progress: 120.00 million rows, 24.24 GB ( 261.34 thousand rows/s ., 52.80 MB/s.)

【3个均创建分布式表】

CREATE TABLE mysql_status_all AS mysql_status ENGINE = Distributed(perftest_3shards_1replicas, default, mysql_status , rand());

【3个均创建shard表】:

CREATE TABLE default. mysql_status_shard (id Int64, cluster_id String, inip String, port Int32, iRole String, insert_ok Int32, alive Int32, Binlog_count Int32, Binlog_total_size Int64, Bytes_received Int64, Bytes_sent Int64, Com_delete Int64, Com_insert Int64, Com_replace Int64, Com_select Int64, Com_update Int64, Queries Int64, Questions Int64, Slow_queries Int64, Created_tmp_disk_tables Int64, Threads_cached Int32, Threads_connected Int32, Threads_created Int32, Threads_running Int32, Uptime Int32, monitorTime DateTime, monitorDay Date, createTime DateTime) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192

【导入数据】:

第1台台执行
insert into mysql_status_shard select * from mysql_status_all where id%3=0;
第2台执行
insert into mysql_status_shard select * from mysql_status_all where id%3=1;
第3台执行
insert into mysql_status_shard select * from mysql_status_all where id%3=2;

【3个均创建最终分布式表】

CREATE TABLE mysql_status_shard_all AS mysql_status_shard ENGINE = Distributed(perftest_3shards_1replicas, default, mysql_status_shard , rand());

【3个分片每个本地表的条数】:

select count(*) from mysql_status_shard ;

┌───count()─┐
│ 294362289 │
└──────── ┘

┌───count() ┐
│ 294362291 │
└────────┘

┌───count() ┐
│ 294362289 │
└────────┘

6、SQL测试对比

注:绿色的为最快的




1赞