DorisDB、TiDB/TiFlash、ClickHouse性能对比- SSB测试-单多表场景

DorisDB、TiDB/TiFlash、ClickHouse性能对比- SSB测试 - 单多表场景

                                            --2021-02-05 刘春雷

因分析型业务猛烈增长,本文将对3种OLAP列存数据库进行性能对比,功能及易运维等方面此次先不讨论,

涉及:

  • DorisDB集群
  • ClickHouse单实例,集群
  • TiDB/TiFlash集群

Star schema benchmark(以下简称SSB)是学术界和工业界广泛使用的一个星型模型测试集 , 通过这个测试集合可以方便的对比各种OLAP产品的基础性能指标

因个人刚刚接触不久,如有错误的地方,大家参考着看哈~

1、汇总

1.1、结论

  • 单表/多表查询,DorisDB总体时间均最短
  • 单表查询:DorisDB最快次数最多,ClickHouse次之
  • 多表查询:DorisDB所有执行均最快
  • TiDB/TiFlash总体时间单表/多表查询均最长
  • TiDB执行计划多数走TiKV,导致执行时间长,且数据量越多,执行时间越长
  • TiDB强制走TiFlash ,单表多数提速多,多表多数变慢,但4.0.10 版本的执行计划多数不走
  • ClickHouse多表查询需要更改SQL,使类型一致才可以,且字段名、表名区分大小写
  • ClickHouse单机性能强悍,性价比较高​
  • 多表join场景比单表查询时间长​
  • ClickHouse大单表查询方式效率好,多表关联效率降低明显
  • DorisDB多表关联效率好​
  • ClickHouse分布式表Join 比 全本地单表Join 的查询快
  • ClickHouse小表不推荐使用分布式表Join,只大表进行分布式表Join,执行时间均变短

1.2、单表查询

注:绿色最快,红色最慢

1.3、多表Join

注:绿色最快,红色最慢

2、信息

2.1、数据库信息


2.2、表信息

image

3、DorisDB模拟数据

3.1、构建数据

【安装测试工具】

https://www.dorisdb.com/zh-CN/blog/1.8

wget http://dorisdb-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.zip

tar xvf ssb-poc.tar.gz
cd ssb-poc
make && make install

【构建数据】

cd output
bin/gen-ssb.sh 100 data_dir

【修改配置】:

conf/doris.conf

【执行脚本建表】:

bin/create_db_table.sh ddl_100

sql: ssb_create success
sql: ssb_flat_create success

【使用Stream load导入单表数据】

bin/stream_load.sh data_dir

stream load start. table: lineorder, path: data_dir/lineorder.tbl.1

stream load success. table: lineorder, path: data_dir/lineorder.tbl.96
stream load start. table: customer, path: data_dir/customer.tbl
stream load success. table: customer, path: data_dir/customer.tbl
stream load start. table: dates, path: data_dir/dates.tbl
stream load success. table: dates, path: data_dir/dates.tbl
stream load start. table: part, path: data_dir/part.tbl
stream load success. table: part, path: data_dir/part.tbl
stream load start. table: supplier, path: data_dir/supplier.tbl
stream load success. table: supplier, path: data_dir/supplier.tbl

【插入数据到宽表lineorder_flat】

bin/flat_insert.sh

sql: ssb_flat_insert start
sql: ssb_flat_insert success

3.2、测试SQL

首先在客户端执行命令,修改Doris的并行度(类似clickhouse set max_threads= 8)

mysql -h127.0.0.1 -uroot -pxxx -P9030

set global parallel_fragment_exec_instance_num = 8;

【多表】:

bin/benchmark.sh -p -d ssb

------ dataset: ssb, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 326.0
q2 214.0
q3 199.0
q4 2316.0
q5 1474.0
q6 1158.0
q7 4683.0
q8 1878.0
q9 1418.0
q10 277.0
q11 4675.0
q12 1346.0
q13 736.0

【单表】:

bin/benchmark.sh -p -d ssb-flat

------ dataset: ssb-flat, concurrency: 1 ------
sql\time(ms)\parallel_num 1
q1 294.0
q2 169.0
q3 172.0
q4 307.0
q5 366.0
q6 172.0
q7 584.0
q8 241.0
q9 134.0
q10 136.0
q11 391.0
q12 433.0
q13 270.0

4、ClickHouse模拟数据

4.1、单实例构建数据

【下载】:

https://clickhouse.tech/docs/en/getting-started/example-datasets/star-schema/

【参考文档执行】

https://github.com/vadimtk/ssb-dbgen

【生成数据】

cd ssb-dbgen-master

./dbgen -s 100 -T c

./dbgen -s 100 -T l

./dbgen -s 100 -T p
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000

./dbgen -s 100 -T s
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000

./dbgen -s 100 -T d
SSBM (Star Schema Benchmark) Population Generator (Version 1.0.0)
Copyright Transaction Processing Performance Council 1994 - 2000

【创建表】:

clickhouse-client --user=default --host=127.0.0.1
use lcl

CREATE TABLE customer( C_CUSTKEY UInt32, C_NAME String, C_ADDRESS String, C_CITY LowCardinality(String), C_NATION LowCardinality(String), C_REGION LowCardinality(String), C_PHONE String, C_MKTSEGMENT LowCardinality(String))ENGINE = MergeTree ORDER BY (C_CUSTKEY);
CREATE TABLE lineorder( LO_ORDERKEY UInt32, LO_LINENUMBER UInt8, LO_CUSTKEY UInt32, LO_PARTKEY UInt32, LO_SUPPKEY UInt32, LO_ORDERDATE Date, LO_ORDERPRIORITY LowCardinality(String), LO_SHIPPRIORITY UInt8, LO_QUANTITY UInt8, LO_EXTENDEDPRICE UInt32, LO_ORDTOTALPRICE UInt32, LO_DISCOUNT UInt8, LO_REVENUE UInt32, LO_SUPPLYCOST UInt32, LO_TAX UInt8, LO_COMMITDATE Date, LO_SHIPMODE LowCardinality(String))ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);
CREATE TABLE part( P_PARTKEY UInt32, P_NAME String, P_MFGR LowCardinality(String), P_CATEGORY LowCardinality(String), P_BRAND LowCardinality(String), P_COLOR LowCardinality(String), P_TYPE LowCardinality(String), P_SIZE UInt8, P_CONTAINER LowCardinality(String))ENGINE = MergeTree ORDER BY P_PARTKEY;
CREATE TABLE supplier( S_SUPPKEY UInt32, S_NAME String, S_ADDRESS String, S_CITY LowCardinality(String), S_NATION LowCardinality(String), S_REGION LowCardinality(String), S_PHONE String)ENGINE = MergeTree ORDER BY S_SUPPKEY;

【导入数据】:

clickhouse-client --user=default --host=127.0.0.1 --query “INSERT INTO customer FORMAT CSV” < customer.tbl
clickhouse-client --user=default --host=127.0.0.1 --query “INSERT INTO part FORMAT CSV” < part.tbl
clickhouse-client --user=default --host=127.0.0.1 --query “INSERT INTO supplier FORMAT CSV” < supplier.tbl
clickhouse-client --user=default --host=127.0.0.1 --query “INSERT INTO lineorder FORMAT CSV” < lineorder.tbl

【创建大单表及灌入数据】:

CREATE TABLE lineorder_flat ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS SELECT l.LO_ORDERKEY AS LO_ORDERKEY, l.LO_LINENUMBER AS LO_LINENUMBER, l.LO_CUSTKEY AS LO_CUSTKEY, l.LO_PARTKEY AS LO_PARTKEY, l.LO_SUPPKEY AS LO_SUPPKEY, l.LO_ORDERDATE AS LO_ORDERDATE, l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY, l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY, l.LO_QUANTITY AS LO_QUANTITY, l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE, l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE, l.LO_DISCOUNT AS LO_DISCOUNT, l.LO_REVENUE AS LO_REVENUE, l.LO_SUPPLYCOST AS LO_SUPPLYCOST, l.LO_TAX AS LO_TAX, l.LO_COMMITDATE AS LO_COMMITDATE, l.LO_SHIPMODE AS LO_SHIPMODE, c.C_NAME AS C_NAME, c.C_ADDRESS AS C_ADDRESS, c.C_CITY AS C_CITY, c.C_NATION AS C_NATION, c.C_REGION AS C_REGION, c.C_PHONE AS C_PHONE, c.C_MKTSEGMENT AS C_MKTSEGMENT, s.S_NAME AS S_NAME, s.S_ADDRESS AS S_ADDRESS, s.S_CITY AS S_CITY, s.S_NATION AS S_NATION, s.S_REGION AS S_REGION, s.S_PHONE AS S_PHONE, p.P_NAME AS P_NAME, p.P_MFGR AS P_MFGR, p.P_CATEGORY AS P_CATEGORY, p.P_BRAND AS P_BRAND, p.P_COLOR AS P_COLOR, p.P_TYPE AS P_TYPE, p.P_SIZE AS P_SIZE, p.P_CONTAINER AS P_CONTAINER FROM lineorder AS l INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;

速度:

Progress: 38.35 million rows, 1.77 GB ( 263.42 thousand rows/s., 12.14 MB/s.)

4.2、 集群构建数据

集群3个实例,每个实例占1/3的数据

CREATE TABLE default.lineorder (LO_ORDERKEY UInt32, LO_LINENUMBER UInt8, LO_CUSTKEY UInt32, LO_PARTKEY UInt32, LO_SUPPKEY UInt32, LO_ORDERDATE Date, LO_ORDERPRIORITY LowCardinality(String), LO_SHIPPRIORITY UInt8, LO_QUANTITY UInt8, LO_EXTENDEDPRICE UInt32, LO_ORDTOTALPRICE UInt32, LO_DISCOUNT UInt8, LO_REVENUE UInt32, LO_SUPPLYCOST UInt32, LO_TAX UInt8, LO_COMMITDATE Date, LO_SHIPMODE LowCardinality(String)) ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) SETTINGS index_granularity = 8192
CREATE TABLE default.lineorder_shard (LO_ORDERKEY UInt32, LO_LINENUMBER UInt8, LO_CUSTKEY UInt32, LO_PARTKEY UInt32, LO_SUPPKEY UInt32, LO_ORDERDATE Date, LO_ORDERPRIORITY LowCardinality(String), LO_SHIPPRIORITY UInt8, LO_QUANTITY UInt8, LO_EXTENDEDPRICE UInt32, LO_ORDTOTALPRICE UInt32, LO_DISCOUNT UInt8, LO_REVENUE UInt32, LO_SUPPLYCOST UInt32, LO_TAX UInt8, LO_COMMITDATE Date, LO_SHIPMODE LowCardinality(String)) ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY) SETTINGS index_granularity = 8192;
CREATE TABLE default.lineorder_shard_all AS lineorder ENGINE = Distributed(perftest_3shards_1replicas, default, lineorder, rand())
insert into lineorder_shard select * from lineorder_shard_all where LO_ORDERKEY%3=0;
insert into lineorder_shard select * from lineorder_shard_all where LO_ORDERKEY%3=1;
insert into lineorder_shard select * from lineorder_shard_all where LO_ORDERKEY%3=2;
drop table lineorder_shard_all;
CREATE TABLE lineorder_shard_all AS lineorder_shard ENGINE = Distributed(perftest_3shards_1replicas, default, lineorder_shard, rand())

select count(*) from lineorder_shard ;

┌───count()─┐
│ 200025041 │
└──── ────┘

┌───count()─┐
│ 200018447 │
└───── ───┘

┌───count()─┐
│ 199994414 │
└────── ──┘

4.3、 ClickHouse-单表测试SQL



4.4、 ClickHouse-多表测试SQL

更改SQL:

  • 字段更改成大写,表名小写
  • SQL更改:LO_ORDERDATE 改成:toInt32(toYYYYMMDD(LO_ORDERDATE))





5、TiDB模拟数据

5.1、构建数据

创建相关表

【导入数据】:

for i in {1…100};do echo “----------$i-------------”;mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘lineorder.tbl.$i’ INTO TABLE lineorder FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”;done

mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘/opt/soft/data_dir/dates.tbl’ INTO TABLE dates FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”

mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘/opt/soft/data_dir/customer.tbl’ INTO TABLE customer FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”

mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘/opt/soft/data_dir/supplier.tbl’ INTO TABLE supplier FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”

mysql -uroot -pxxx -h127.0.0.1 -Pxxx lcl -e “LOAD DATA LOCAL INFILE ‘/opt/soft/data_dir/part.tbl’ INTO TABLE part FIELDS TERMINATED BY ‘|’ LINES TERMINATED BY ‘\n’”

【添加至TiFlash】:

alter table lineorder SET TIFLASH REPLICA 1;

alter table customer SET TIFLASH REPLICA 1;

alter table dates SET TIFLASH REPLICA 1;

alter table part SET TIFLASH REPLICA 1;
alter table supplier SET TIFLASH REPLICA 1;

【查看】:

SELECT * FROM information_schema.tiflash_replica ;

【汇聚数据到大单表】:

导出:

time ./dumpling -uroot -pxxx -h127.0.0.1 -Pxxx -o dumping_lineorder_flat/ --filetype sql --no-header -F 256MiB --threads 4 --sql 'SELECT LO_ORDERKEY , LO_ORDERDATE , LO_LINENUMBER , LO_CUSTKEY , LO_PARTKEY , LO_SUPPKEY , LO_ORDERPRIORITY , LO_SHIPPRIORITY , LO_QUANTITY , LO_EXTENDEDPRICE , LO_ORDTOTALPRICE , LO_DISCOUNT , LO_REVENUE , LO_SUPPLYCOST , LO_TAX , LO_COMMITDATE , LO_SHIPMODE , C_NAME , C_ADDRESS , C_CITY , C_NATION , C_REGION , C_PHONE , C_MKTSEGMENT , S_NAME , S_ADDRESS , S_CITY , S_NATION , S_REGION , S_PHONE , P_NAME , P_MFGR , P_CATEGORY , P_BRAND , P_COLOR , P_TYPE , P_SIZE , P_CONTAINER FROM lcl.lineorder l INNER JOIN lcl.customer c ON (c.C_CUSTKEY = l.LO_CUSTKEY) INNER JOIN lcl.supplier s ON (s.S_SUPPKEY = l.LO_SUPPKEY) INNER JOIN lcl.part p ON (p.P_PARTKEY = l.LO_PARTKEY) ’

【导入到大单表】:

【模拟loader 相关文件-创建文件】:

lcl.lineorder_flat-schema.sql

lcl-schema-create.sql

【修改文件名】:

for i in {000…885};do echo “------- $i -----”;mv result.000000$i.sql lcl.lineorder_flat.000000$i.sql;done

【替换文件内容】:

for i in {000…885};do echo “------- $i -----”;sed -i ‘s/INSERT INTO ``/INSERT INTO lineorder_flat/g’ lcl.lineorder_flat.000000$i.sql;done

【导入至大单表】:

cd /opt/soft/tidb-enterprise-tools-nightly-linux-amd64

./bin/loader -c=loader.toml

5.2、测试SQL

参考上面结果: