DorisDB、TiDB/TiFlash、ClickHouse性能对比- SSB测试 - 单多表场景
--2021-02-05 刘春雷
因分析型业务猛烈增长,本文将对3种OLAP列存数据库进行性能对比,功能及易运维等方面此次先不讨论,
涉及:
- DorisDB集群
- ClickHouse单实例,集群
- TiDB/TiFlash集群
Star schema benchmark(以下简称SSB)是学术界和工业界广泛使用的一个星型模型测试集 , 通过这个测试集合可以方便的对比各种OLAP产品的基础性能指标
因个人刚刚接触不久,如有错误的地方,大家参考着看哈~
1、汇总
1.1、结论
- 单表/多表查询,DorisDB总体时间均最短
- 单表查询:DorisDB最快次数最多,ClickHouse次之
- 多表查询:DorisDB所有执行均最快
- TiDB/TiFlash总体时间单表/多表查询均最长
- TiDB执行计划多数走TiKV,导致执行时间长,且数据量越多,执行时间越长
- TiDB强制走TiFlash ,单表多数提速多,多表多数变慢,但4.0.10 版本的执行计划多数不走
- ClickHouse多表查询需要更改SQL,使类型一致才可以,且字段名、表名区分大小写
- ClickHouse单机性能强悍,性价比较高
- 多表join场景比单表查询时间长
- ClickHouse大单表查询方式效率好,多表关联效率降低明显
- DorisDB多表关联效率好
- ClickHouse分布式表Join 比 全本地单表Join 的查询快
- ClickHouse小表不推荐使用分布式表Join,只大表进行分布式表Join,执行时间均变短
1.2、单表查询
注:绿色最快,红色最慢
1.3、多表Join
注:绿色最快,红色最慢
2、信息
2.1、数据库信息
2.2、表信息
3、DorisDB模拟数据
3.1、构建数据
【安装测试工具】
https://www.dorisdb.com/zh-CN/blog/1.8
wget http://dorisdb-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.zip
tar xvf ssb-poc.tar.gz
cd ssb-poc
make && make install
【构建数据】
cd output
bin/gen-ssb.sh 100 data_dir
【修改配置】:
conf/doris.conf
【执行脚本建表】:
bin/create_db_table.sh ddl_100
sql: ssb_create success
sql: ssb_flat_create success
【使用Stream load导入单表数据】
bin/stream_load.sh data_dir
stream load start. table: lineorder, path: data_dir/lineorder.tbl.1
…
stream load success. table: lineorder, path: data_dir/lineorder.tbl.96
stream load start. table: customer, path: data_dir/customer.tbl
stream load success. table: customer, path: data_dir/customer.tbl
stream load start. table: dates, path: data_dir/dates.tbl
stream load success. table: dates, path: data_dir/dates.tbl
stream load start. table: part, path: data_dir/part.tbl
stream load success. table: part, path: data_dir/part.tbl
stream load start. table: supplier, path: data_dir/supplier.tbl
stream load success. table: supplier, path: data_dir/supplier.tbl
【插入数据到宽表lineorder_flat】
bin/flat_insert.sh
sql: ssb_flat_insert start
sql: ssb_flat_insert success
3.2、测试SQL
首先在客户端执行命令,修改Doris的并行度(类似clickhouse set max_threads= 8)
mysql -h127.0.0.1 -uroot -pxxx -P9030
set global parallel_fragment_exec_instance_num = 8;
【多表】:
bin/benchmark.sh -p -d ssb
------ dataset: ssb, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 326.0
q2 214.0
q3 199.0
q4 2316.0
q5 1474.0
q6 1158.0
q7 4683.0
q8 1878.0
q9 1418.0
q10 277.0
q11 4675.0
q12 1346.0
q13 736.0
【单表】:
bin/benchmark.sh -p -d ssb-flat
------ dataset: ssb-flat, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 294.0
q2 169.0
q3 172.0
q4 307.0
q5 366.0
q6 172.0
q7 584.0
q8 241.0
q9 134.0
q10 136.0
q11 391.0
q12 433.0
q13 270.0
4、ClickHouse模拟数据
4.1、单实例构建数据
【下载】:
https://clickhouse.tech/docs/en/getting-started/example-datasets/star-schema/
【参考文档执行】
https://github.com/vadimtk/ssb-dbgen
【生成数据】
cd ssb-dbgen-master
./dbgen -s 100 -T c
./dbgen -s 100 -T l
./dbgen -s 100 -T p
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000
./dbgen -s 100 -T s
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000
./dbgen -s 100 -T d
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000
【创建表】:
clickhouse-client --user=default --host=127.0.0.1
use lcl
CREATE TABLE customer( C_CUSTKEY UInt32, C_NAME String, C_ADDRESS String, C_CITY LowCardinality(String), C_NATION LowCardinality(String), C_REGION LowCardinality(String), C_PHONE String, C_MKTSEGMENT LowCardinality(String))ENGINE = MergeTree ORDER BY (C_CUSTKEY);
CREATE TABLE lineorder( LO_ORDERKEY UInt32, LO_LINENUMBER UInt8, LO_CUSTKEY UInt32, LO_PARTKEY UInt32, LO_SUPPKEY UInt32, LO_ORDERDATE Date, LO_ORDERPRIORITY LowCardinality(String), LO_SHIPPRIORITY UInt8, LO_QUANTITY UInt8, LO_EXTENDEDPRICE UInt32, LO_ORDTOTALPRICE UInt32, LO_DISCOUNT UInt8, LO_REVENUE UInt32, LO_SUPPLYCOST UInt32, LO_TAX UInt8, LO_COMMITDATE Date, LO_SHIPMODE LowCardinality(String))ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);
CREATE TABLE part( P_PARTKEY UInt32, P_NAME String, P_MFGR LowCardinality(String), P_CATEGORY LowCardinality(String), P_BRAND LowCardinality(String), P_COLOR LowCardinality(String), P_TYPE LowCardinality(String), P_SIZE UInt8, P_CONTAINER LowCardinality(String))ENGINE = MergeTree ORDER BY P_PARTKEY;
CREATE TABLE supplier( S_SUPPKEY UInt32, S_NAME String, S_ADDRESS String, S_CITY LowCardinality(String), S_NATION LowCardinality(String), S_REGION LowCardinality(String), S_PHONE String)ENGINE = MergeTree ORDER BY S_SUPPKEY;
【导入数据】:
clickhouse-client --user=default --host=127.0.0.1 --query “INSERT INTO customer FORMAT CSV” < customer.tbl
clickhouse-client --user=default --host=127.0.0.1 --query “INSERT INTO part FORMAT CSV” < part.tbl
clickhouse-client --user=default --host=127.0.0.1 --query “INSERT INTO supplier FORMAT CSV” < supplier.tbl
clickhouse-client --user=default --host=127.0.0.1 --query “INSERT INTO lineorder FORMAT CSV” < lineorder.tbl
【创建大单表及灌入数据】:
CREATE TABLE lineorder_flat ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS SELECT l.LO_ORDERKEY AS LO_ORDERKEY, l.LO_LINENUMBER AS LO_LINENUMBER, l.LO_CUSTKEY AS LO_CUSTKEY, l.LO_PARTKEY AS LO_PARTKEY, l.LO_SUPPKEY AS LO_SUPPKEY, l.LO_ORDERDATE AS LO_ORDERDATE, l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY, l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY, l.LO_QUANTITY AS LO_QUANTITY, l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE, l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE, l.LO_DISCOUNT AS LO_DISCOUNT, l.LO_REVENUE AS LO_REVENUE, l.LO_SUPPLYCOST AS LO_SUPPLYCOST, l.LO_TAX AS LO_TAX, l.LO_COMMITDATE AS LO_COMMITDATE, l.LO_SHIPMODE AS LO_SHIPMODE, c.C_NAME AS C_NAME, c.C_ADDRESS AS C_ADDRESS, c.C_CITY AS C_CITY, c.C_NATION AS C_NATION, c.C_REGION AS C_REGION, c.C_PHONE AS C_PHONE, c.C_MKTSEGMENT AS C_MKTSEGMENT, s.S_NAME AS S_NAME, s.S_ADDRESS AS S_ADDRESS, s.S_CITY AS S_CITY, s.S_NATION AS S_NATION, s.S_REGION AS S_REGION, s.S_PHONE AS S_PHONE, p.P_NAME AS P_NAME, p.P_MFGR AS P_MFGR, p.P_CATEGORY AS P_CATEGORY, p.P_BRAND AS P_BRAND, p.P_COLOR AS P_COLOR, p.P_TYPE AS P_TYPE, p.P_SIZE AS P_SIZE, p.P_CONTAINER AS P_CONTAINER FROM lineorder AS l INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;
速度:
Progress: 38.35 million rows, 1.77 GB ( 263.42 thousand rows/s., 12.14 MB/s.)
4.2、 集群构建数据
集群3个实例,每个实例占1/3的数据
CREATE TABLE default.lineorder (LO_ORDERKEY
UInt32, LO_LINENUMBER
UInt8, LO_CUSTKEY
UInt32, LO_PARTKEY
UInt32, LO_SUPPKEY
UInt32, LO_ORDERDATE
Date, LO_ORDERPRIORITY
LowCardinality(String), LO_SHIPPRIORITY
UInt8, LO_QUANTITY
UInt8, LO_EXTENDEDPRICE
UInt32, LO_ORDTOTALPRICE
UInt32, LO_DISCOUNT
UInt8, LO_REVENUE
UInt32, LO_SUPPLYCOST
UInt32, LO_TAX
UInt8, LO_COMMITDATE
Date, LO_SHIPMODE
LowCardinality(String)) ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) SETTINGS index_granularity = 8192
CREATE TABLE default.lineorder_shard (LO_ORDERKEY
UInt32, LO_LINENUMBER
UInt8, LO_CUSTKEY
UInt32, LO_PARTKEY
UInt32, LO_SUPPKEY
UInt32, LO_ORDERDATE
Date, LO_ORDERPRIORITY
LowCardinality(String), LO_SHIPPRIORITY
UInt8, LO_QUANTITY
UInt8, LO_EXTENDEDPRICE
UInt32, LO_ORDTOTALPRICE
UInt32, LO_DISCOUNT
UInt8, LO_REVENUE
UInt32, LO_SUPPLYCOST
UInt32, LO_TAX
UInt8, LO_COMMITDATE
Date, LO_SHIPMODE
LowCardinality(String)) ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) SETTINGS index_granularity = 8192;
CREATE TABLE default.lineorder_shard_all AS lineorder ENGINE = Distributed(perftest_3shards_1replicas, default, lineorder, rand())
insert into lineorder_shard select * from lineorder_shard_all where LO_ORDERKEY%3=0;
insert into lineorder_shard select * from lineorder_shard_all where LO_ORDERKEY%3=1;
insert into lineorder_shard select * from lineorder_shard_all where LO_ORDERKEY%3=2;
drop table lineorder_shard_all;
CREATE TABLE lineorder_shard_all AS lineorder_shard ENGINE = Distributed(perftest_3shards_1replicas, default, lineorder_shard, rand())
select count(*) from lineorder_shard ;
┌───count()─┐
│ 200025041 │
└──── ────┘
┌───count()─┐
│ 200018447 │
└───── ───┘
┌───count()─┐
│ 199994414 │
└────── ──┘
4.3、 ClickHouse-单表测试SQL
4.4、 ClickHouse-多表测试SQL
更改SQL:
- 字段更改成大写,表名小写
- SQL更改:LO_ORDERDATE 改成:toInt32(toYYYYMMDD(LO_ORDERDATE))
5、TiDB模拟数据
5.1、构建数据
创建相关表
【导入数据】:
for i in {1…100};do echo “----------$i-------------”;mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘lineorder.tbl.$i’ INTO TABLE lineorder FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”;done
mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘/opt/soft/data_dir/dates.tbl’ INTO TABLE dates FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”
mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘/opt/soft/data_dir/customer.tbl’ INTO TABLE customer FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”
mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘/opt/soft/data_dir/supplier.tbl’ INTO TABLE supplier FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”
mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘/opt/soft/data_dir/part.tbl’ INTO TABLE part FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”
【添加至TiFlash】:
alter table lineorder SET TIFLASH REPLICA 1;
alter table customer SET TIFLASH REPLICA 1;
alter table dates SET TIFLASH REPLICA 1;
alter table part SET TIFLASH REPLICA 1;
alter table supplier SET TIFLASH REPLICA 1;
【查看】:
SELECT * FROM information_schema.tiflash_replica ;
【汇聚数据到大单表】:
导出:
time ./dumpling -uroot -pxxx -h127.0.0.1 -Pxxx -o dumping_lineorder_flat/ --filetype sql --no-header -F 256MiB --threads 4 --sql 'SELECT LO_ORDERKEY
, LO_ORDERDATE
, LO_LINENUMBER
, LO_CUSTKEY
, LO_PARTKEY
, LO_SUPPKEY
, LO_ORDERPRIORITY
, LO_SHIPPRIORITY
, LO_QUANTITY
, LO_EXTENDEDPRICE
, LO_ORDTOTALPRICE
, LO_DISCOUNT
, LO_REVENUE
, LO_SUPPLYCOST
, LO_TAX
, LO_COMMITDATE
, LO_SHIPMODE
, C_NAME
, C_ADDRESS
, C_CITY
, C_NATION
, C_REGION
, C_PHONE
, C_MKTSEGMENT
, S_NAME
, S_ADDRESS
, S_CITY
, S_NATION
, S_REGION
, S_PHONE
, P_NAME
, P_MFGR
, P_CATEGORY
, P_BRAND
, P_COLOR
, P_TYPE
, P_SIZE
, P_CONTAINER
FROM lcl.lineorder l INNER JOIN lcl.customer c ON (c.C_CUSTKEY = l.LO_CUSTKEY) INNER JOIN lcl.supplier s ON (s.S_SUPPKEY = l.LO_SUPPKEY) INNER JOIN lcl.part p ON (p.P_PARTKEY = l.LO_PARTKEY) ’
【导入到大单表】:
【模拟loader 相关文件-创建文件】:
lcl.lineorder_flat-schema.sql
lcl-schema-create.sql
【修改文件名】:
for i in {000…885};do echo “------- $i -----”;mv result.000000$i.sql lcl.lineorder_flat.000000$i.sql;done
【替换文件内容】:
for i in {000…885};do echo “------- $i -----”;sed -i ‘s/INSERT INTO ``/INSERT INTO lineorder_flat
/g’ lcl.lineorder_flat.000000$i.sql;done
【导入至大单表】:
cd /opt/soft/tidb-enterprise-tools-nightly-linux-amd64
./bin/loader -c=loader.toml
5.2、测试SQL
参考上面结果: