ClickHouse中MySQL(TiDB)的映射表的使用优化

Updated on with 0 views and 0 comments

此篇作为ClickHouse的SQL优化的衍生篇,主要对MySQL/TiDB等映射表的使用角度进行初步探索。

基于下述的整体逻辑有点绕,特别是第三种意外发现的场景,简直就是不可以常理推导,存在一个非常大的不确定性。于是可对映射表在CH的SQL使用的做以下的最简洁的规范

  • 所有映射表作为驱动表时无须用子查询封装,但作为被驱动表时一律用子查询进行封装(简化规范,保持一致性)
  • 在SELECT中只提取需要用到的字段(减少字段数来降低的网络IO,及内存使用)
  • 在WHERE语句中尽可能的提前过滤字段,但不能对字段做CH的函数处理(减少记录数来降低的网络IO,及内存使用)
SELECT A.a1, B.b1,toDate(A.a4) as dat
FROM A 
[GLOBAL] [LEFT/INNER/RIGHT] JOIN B on A.a2 = B.b2
WHERE A.a3 = 'xxxx' and B.b3 = 'yyyy' and dat = '2021-11-21'

A表为驱动表,B表为被驱动表,ClickHouse对于驱动表与非驱动表的下推逻辑不同,总结下来有以下几点

  1. SELECT字段下推逻辑
    • 自动推导驱动表A在SQL中使用过的字段,并下推到映射表的查询,即发送给MySQL/TiDB的查询中只包含这些字段
    • 被驱动表B的字段下推逻辑
      • 当使用GLOBAL ... JOIN时,不会推导B表在SQL中使用过的字段,即B表的所有字段都会发给MySQL/TiDB的查询中
      • 当使用.... JOIN时,自动推导B表在SQL中使用过的字段,并下推到映射表的查询,即发送给MySQL/TiDB的查询中只包含这些字段
  2. WHERE语句中的字段下推逻辑
    • 驱动表A在WHERE语句中的过滤字段(不包含被函数处理过的字段)会下推到MySQL/TiDB的查询中(即A.a3= 'xxxx'会下推, toDate(A.a4)='2021-11-21'不会下推)
    • 被驱动表B在WHERE语句中的任何过滤字段不会MySQL/TiDB的查询中(即B.b3 = 'yyyy'不会下推)
SELECT A.a1, toDate(A.a4) as dat
FROM A
WHERE A.a3 = 'xxxx' and A.a5 >= '2021-11-21 00:00:00' and A.a5 <= '2021-11-21 23:59:59' [and dat = '2021-11-21']

在以上的单独子查询中,ClickHouse对映射表A的下推逻辑,总结下来有以下几点

  1. SELECT字段下推逻辑
    • 在SQL中使用过的字段,都会自动下推到MySQL/TiDB的查询中,即A.a1, A.a4会下推
  2. WHERE语句中的字段下推逻辑
    • 当WHERE语句中未使用函数部分的过滤字段会成功下推到MySQL/TiDB的查询中,即A.a3 =
SELECT A.
FROM (
    SELECT a1, a2, a3, toDate(A.a4) as dat
    FROM A
    WHERE a3 = 'xxxx' and a5 >= '2021-11-21 00:00:00' and a5 <= '2021-11-21 23:59:59'
      [and dat = '2021-11-21']  -- flag1
) A 
LEFT JOIN B on A.a2 = B.b2
WHERE B.b3 = 'yyyy' 
    [and A.a3 = 'xxxx']  -- flag2

上述的代码案例中的逻辑最妖,具体表现如下

  1. 当flag1 & flag2都存在时,则A子查询中的WHERE的过滤字段不会下推到TiDB中,反而只推导子查询外的 A.a3 = 'xxxx'到TiDB中
  2. 当flag1存在,flag2不存在时,则A子查询中的WHERE的过滤字段会成功下推到TiDB中
  3. 当flag1不存在,flag2存在时,则A子查询中的WHERE的过滤字段也会成功下推到TiDB中

以下为大致的探索过程,由于使用的是CH,TiDB的慢查询记录,估为了构造慢查询,造成SQL不是很一致,以及大家在复现过程中可能需要多试几次

MySQL(TiDB)的映射表作为驱动表时的规则探索

测试语句

select 
    if(vendor_create_time is null, toDate(create_time),toDate(vendor_create_time)) as dat
  , ac.vendor_id                  as  vendor_id
  , ac.vendor_account_id          as  vendor_account_id
  , eva.vendor_account_name       as  vendor_account_name
  , eva.vendor_advertiser_id      as  vendor_advertiser_id
  , eva.subordinate_account_id    as  subordinate_account_id
  , ac.project_id                 as  project_id
  , count(distinct ac.id)                        as  1_in_creative 
  from makepolo.ad_creative ac
  left join makepolo.entity_vendor_account eva on toInt64(eva.id)= toInt64(ac.vendor_account_id) 
  where ac.create_time >= '2021-11-01 00:00:00'
  and ac.create_time <= '2021-11-22 23:59:59'
  and  dat >= '2021-11-01' and dat <= '2021-11-22'
  and eva.status=1
  and ac.company_id=20
  group by 
  dat
   ,vendor_id
   ,vendor_account_id
   ,vendor_account_name
   ,vendor_advertiser_id
   ,subordinate_account_id
   ,project_id
  1. 从CH的system.query_log中的columns字段中可以看出,此时无论驱动表还是被驱动表,都自动推导了需要使用的字段
['makepolo.ad_creative.company_id','makepolo.ad_creative.create_time','makepolo.ad_creative.id','makepolo.ad_creative.project_id','makepolo.ad_creative.vendor_account_id','makepolo.ad_creative.vendor_create_time','makepolo.ad_creative.vendor_id','makepolo.entity_vendor_account.id','makepolo.entity_vendor_account.status','makepolo.entity_vendor_account.subordinate_account_id','makepolo.entity_vendor_account.vendor_account_name','makepolo.entity_vendor_account.vendor_advertiser_id']
  1. 从TiDB的INFORMATION_SCHEMA.slow_query的Query字段中可以看出
    • 驱动表的WHERE语句中未使用函数的过滤字段都成功完成下推到TiDB
    • SELECT中的字段也如上述所示,完成了到TiDB下推。
SELECT `id`, `vendor_id`, `vendor_account_id`, `company_id`, `project_id`, `create_time`, `vendor_create_time` FROM `makepolo`.`ad_creative` WHERE (`create_time` >= '2021-11-01 00:00:00') AND (`create_time` <= '2021-11-22 23:59:59') AND (`company_id` = 20);

MySQL(TiDB)的映射表作为被驱动表时的规则探索

测试语句

select
        toDate(cr.date)                 as dat
        , vaq.company_id                as vaq_company_id
        , vaq.subordinate_user_id       as vaq_subordinate_user_id
        , qcp.marketing_goal            as qcp_marketing_goal

        , sum(cr.stat_cost)/10000                                                                                   as cost_n
        , sum(if(qctv.create_source = 1, cr.stat_cost ,0))/10000                                                    as api_cost
        , sum(if(qcp.marketing_goal = 'VIDEO_PROM_GOODS', cr.stat_cost, 0))/10000                                   as video_cost
        , sum(if(qctv.create_source = 1 and qcp.marketing_goal = 'VIDEO_PROM_GOODS', cr.stat_cost, 0))/10000        as video_api_cost
        , sum(if(qcp.marketing_goal = 'LIVE_PROM_GOODS', cr.stat_cost, 0))/10000                                    as live_cost
        , sum(if(qctv.create_source = 1 and qcp.marketing_goal = 'LIVE_PROM_GOODS', cr.stat_cost, 0))/10000         as live_api_cost
        , 0   as creative_count
        , 0   as api_creative_count
        , 0   as ad_count
        , 0   as api_ad_count

        from makebenz.vendor_creative_report cr final
        global join tidb_makebenz.vendor_account_qianchuan vaq on cr.advertiser_id = vaq.advertiser_id
        left join tidb_makebenz.qc_vendor_creative qctv on cr.creative_id = qctv.creative_id
        left join  tidb_makebenz.qc_campaign qcp on cr.campaign_id = qcp.campaign_id
        where
            cr.date >= today() - 3 and cr.date <= today()
            and cr.date >= '2021-11-12' and cr.date <=  '2021-11-22'
            and qctv.create_time >= '2021-11-01 00:00:00' and qctv.create_time <= '2021-11-22 23:59:59'
        group by dat, vaq_company_id, vaq_subordinate_user_id, qcp_marketing_goal
  1. 从CH的system.query_log中的columns字段中可以看出,
    • 未带GLOBAL关键词的tidb_makebenz.qc_vendor_creativetidb_makebenz.qc_campaign的字段都完成了下推
    • 带GLOBAL关键词的tidb_makebenz.vendor_account_qianchuan的所有字段都被提取出来,并且有个临时表,这与GLOBAL的逻辑有关,这里不细述
['_temporary_and_external_tables.`_tmp_0333ae46-cd1d-4285-8333-ae46cd1d0285`','_temporary_and_external_tables.`_tmp_70cc555b-adf2-465d-b0cc-555badf2c65d`','_temporary_and_external_tables.`_tmp_a59a3198-3857-4042-a59a-31983857c042`','_temporary_and_external_tables.`_tmp_b704469b-50d1-409d-b704-469b50d1409d`','_temporary_and_external_tables.`_tmp_bcdf62a5-84ef-41e0-bcdf-62a584efb1e0`','_temporary_and_external_tables.`_tmp_e3041940-3899-4f2c-a304-19403899ff2c`','makebenz.vendor_creative_report','makebenz.vendor_creative_report_local','tidb_makebenz.qc_campaign','tidb_makebenz.qc_vendor_creative','tidb_makebenz.vendor_account_qianchuan']	['_temporary_and_external_tables.`_tmp_0333ae46-cd1d-4285-8333-ae46cd1d0285`.advertiser_id','_temporary_and_external_tables.`_tmp_0333ae46-cd1d-4285-8333-ae46cd1d0285`.company_id','_temporary_and_external_tables.`_tmp_0333ae46-cd1d-4285-8333-ae46cd1d0285`.subordinate_user_id','_temporary_and_external_tables.`_tmp_70cc555b-adf2-465d-b0cc-555badf2c65d`.advertiser_id','_temporary_and_external_tables.`_tmp_70cc555b-adf2-465d-b0cc-555badf2c65d`.company_id','_temporary_and_external_tables.`_tmp_70cc555b-adf2-465d-b0cc-555badf2c65d`.subordinate_user_id','_temporary_and_external_tables.`_tmp_a59a3198-3857-4042-a59a-31983857c042`.advertiser_id','_temporary_and_external_tables.`_tmp_a59a3198-3857-4042-a59a-31983857c042`.company_id','_temporary_and_external_tables.`_tmp_a59a3198-3857-4042-a59a-31983857c042`.subordinate_user_id','_temporary_and_external_tables.`_tmp_b704469b-50d1-409d-b704-469b50d1409d`.advertiser_id','_temporary_and_external_tables.`_tmp_b704469b-50d1-409d-b704-469b50d1409d`.company_id','_temporary_and_external_tables.`_tmp_b704469b-50d1-409d-b704-469b50d1409d`.subordinate_user_id','_temporary_and_external_tables.`_tmp_bcdf62a5-84ef-41e0-bcdf-62a584efb1e0`.advertiser_id','_temporary_and_external_tables.`_tmp_bcdf62a5-84ef-41e0-bcdf-62a584efb1e0`.company_id','_temporary_and_external_tables.`_tmp_bcdf62a5-84ef-41e0-bcdf-62a584efb1e0`.subordinate_user_id','_temporary_and_external_tables.`_tmp_e3041940-3899-4f2c-a304-19403899ff2c`.advertiser_id','_temporary_and_external_tables.`_tmp_e3041940-3899-4f2c-a304-19403899ff2c`.company_id','_temporary_and_external_tables.`_tmp_e3041940-3899-4f2c-a304-19403899ff2c`.subordinate_user_id','makebenz.vendor_creative_report.advertiser_id','makebenz.vendor_creative_report.campaign_id','makebenz.vendor_creative_report.creative_id','makebenz.vendor_creative_report.date','makebenz.vendor_creative_report.stat_cost','makebenz.vendor_creative_report_local.advertiser_id','makebenz.vendor_creative_report_local.campaign_id','makebenz.vendor_creative_report_local.creative_id','makebenz.vendor_creative_report_local.date','makebenz.vendor_creative_report_local.stat_cost','tidb_makebenz.qc_campaign.campaign_id','tidb_makebenz.qc_campaign.marketing_goal','tidb_makebenz.qc_vendor_creative.create_source','tidb_makebenz.qc_vendor_creative.create_time','tidb_makebenz.qc_vendor_creative.creative_id','tidb_makebenz.vendor_account_qianchuan.access_token','tidb_makebenz.vendor_account_qianchuan.access_token_expire','tidb_makebenz.vendor_account_qianchuan.account_type','tidb_makebenz.vendor_account_qianchuan.admin_advertiser_id','tidb_makebenz.vendor_account_qianchuan.admin_advertiser_name','tidb_makebenz.vendor_account_qianchuan.advertiser_id','tidb_makebenz.vendor_account_qianchuan.app_id','tidb_makebenz.vendor_account_qianchuan.authorizer_data','tidb_makebenz.vendor_account_qianchuan.company_id','tidb_makebenz.vendor_account_qianchuan.create_time','tidb_makebenz.vendor_account_qianchuan.id','tidb_makebenz.vendor_account_qianchuan.name','tidb_makebenz.vendor_account_qianchuan.name_as','tidb_makebenz.vendor_account_qianchuan.refresh_token','tidb_makebenz.vendor_account_qianchuan.refresh_token_expire','tidb_makebenz.vendor_account_qianchuan.remark','tidb_makebenz.vendor_account_qianchuan.status','tidb_makebenz.vendor_account_qianchuan.subordinate_user_id','tidb_makebenz.vendor_account_qianchuan.update_time','tidb_makebenz.vendor_account_qianchuan.user_id']
  1. 从TiDB的INFORMATION_SCHEMA.slow_query的Query字段中可以看出
    • 被驱动表的WHERE语句中的过滤字段并未完成到TiDB的下推
    • 被驱动表的SELECT字段如上所示,确实完成了到TiDB的下推
SELECT `creative_id`, `create_source`, `create_time` FROM `makebenz`.`qc_vendor_creative`

MySQL(TiDB)的独立子查询的规则探索

select id, company_id, vendor_id, vendor_account_id, project_id
    , if(vendor_create_time is null, toDate(create_time), toDate(vendor_create_time)) as dat
from makepolo.ad_creative a 
where a.create_time >= '2021-11-01 00:00:00'
  and a.create_time <= '2021-11-01 23:59:59' 
  and dat >= '2021-11-01' and dat <= '2021-11-01' 
  and company_id=20 
  1. 从TiDB的INFORMATION_SCHEMA.slow_query的Query字段中可以看出
    • SELECT与WHERE的过滤字段都成功下推到TiDB中
SELECT `id`, `vendor_id`, `vendor_account_id`, `company_id`, `project_id`, `create_time`, `vendor_create_time` FROM `makepolo`.`ad_creative` WHERE (`create_time` >= '2021-11-01 00:00:00') AND (`create_time` <= '2021-11-01 23:59:59') AND (`company_id` = 20);

最骚SQL的意外发现

-- 运行40S
select dat
      , ac.vendor_id                  as  vendor_id
      , ac.vendor_account_id          as  vendor_account_id
      , eva.vendor_account_name       as  vendor_account_name
      , eva.vendor_advertiser_id      as  vendor_advertiser_id
      , eva.subordinate_account_id    as  subordinate_account_id
      , ac.project_id                 as  project_id
      , count(distinct ac.id)                        as  1_in_creative 
from(
    select id, company_id, vendor_id, vendor_account_id, project_id
        , if(vendor_create_time is null, toDate(create_time), toDate(vendor_create_time)) as dat
    from makepolo.ad_creative a 
    where a.create_time >= '2021-11-01 00:00:00'
      and a.create_time <= '2021-11-01 23:59:59' 
      and dat >= '2021-11-01' and dat <= '2021-11-01' 
      and company_id=20 
)ac
left join makepolo.entity_vendor_account eva on toInt64(eva.id)= toInt64(ac.vendor_account_id)
where  eva.status = 1  and ac.company_id = 20
group by dat
    , vendor_id
    , vendor_account_id
    , vendor_account_name
    , vendor_advertiser_id
    , subordinate_account_id
    , project_id;
  1. 从TiDB的INFORMATION_SCHEMA.slow_query的Query字段中可以看出
    • makepolo.ad_creative的子查询中WHERE的过滤字段未成功下推到TiDB中
    • 反而是子查询外面的ac.company_id = 20完成了下推
SELECT `id`, `vendor_id`, `vendor_account_id`, `company_id`, `project_id`, `create_time`, `vendor_create_time` FROM `makepolo`.`ad_creative` WHERE `company_id` = 20;
-- 运行3S
select dat
      , ac.vendor_id                  as  vendor_id
      , ac.vendor_account_id          as  vendor_account_id
      , eva.vendor_account_name       as  vendor_account_name
      , eva.vendor_advertiser_id      as  vendor_advertiser_id
      , eva.subordinate_account_id    as  subordinate_account_id
      , ac.project_id                 as  project_id
      , count(distinct ac.id)                        as  1_in_creative 
from(
    select id, company_id, vendor_id, vendor_account_id, project_id
        , if(vendor_create_time is null, toDate(create_time), toDate(vendor_create_time)) as dat
    from makepolo.ad_creative a 
    where a.create_time >= '2021-11-01 00:00:00'
      and a.create_time <= '2021-11-01 23:59:59' 
      and dat >= '2021-11-01' and dat <= '2021-11-01' 
      and company_id=20 
)ac
left join makepolo.entity_vendor_account eva on toInt64(eva.id)= toInt64(ac.vendor_account_id)
where  eva.status = 1  -- and ac.company_id = 20
group by dat
    , vendor_id
    , vendor_account_id
    , vendor_account_name
    , vendor_advertiser_id
    , subordinate_account_id
    , project_id;
  1. 相较于第一条SQL就只是把外面的and ac.company_id = 20给注释掉,但从TiDB的INFORMATION_SCHEMA.slow_query的Query字段中可以看出
    • makepolo.ad_creative的子查询中WHERE的过滤字段都成功下推到TiDB中
SELECT `id`, `vendor_id`, `vendor_account_id`, `company_id`, `project_id`, `create_time`, `vendor_create_time` FROM `makepolo`.`ad_creative` WHERE (`create_time` >= '2021-11-01 00:00:00') AND (`create_time` <= '2021-11-01 23:59:59') AND (`company_id` = 20);
-- 运行3S
select dat
      , ac.vendor_id                  as  vendor_id
      , ac.vendor_account_id          as  vendor_account_id
      , eva.vendor_account_name       as  vendor_account_name
      , eva.vendor_advertiser_id      as  vendor_advertiser_id
      , eva.subordinate_account_id    as  subordinate_account_id
      , ac.project_id                 as  project_id
      , count(distinct ac.id)                        as  1_in_creative 
from(
    select id, company_id, vendor_id, vendor_account_id, project_id
        , if(vendor_create_time is null, toDate(create_time), toDate(vendor_create_time)) as dat
    from makepolo.ad_creative a 
    where a.create_time >= '2021-11-01 00:00:00'
      and a.create_time <= '2021-11-01 23:59:59' 
      and company_id=20 
)ac
left join makepolo.entity_vendor_account eva on toInt64(eva.id)= toInt64(ac.vendor_account_id)
where  eva.status = 1   and ac.company_id = 20 and  dat >= '2021-11-01' and dat <= '2021-11-01' 
group by dat
    , vendor_id
    , vendor_account_id
    , vendor_account_name
    , vendor_advertiser_id
    , subordinate_account_id
    , project_id;
  1. 相较于第一条SQL就只是把子查询中被函数处理过的dat的过滤移到了外面,但从TiDB的INFORMATION_SCHEMA.slow_query的Query字段中可以看出
    • makepolo.ad_creative的子查询中WHERE的过滤字段都成功下推到TiDB中
SELECT `id`, `vendor_id`, `vendor_account_id`, `company_id`, `project_id`, `create_time`, `vendor_create_time` FROM `makepolo`.`ad_creative` WHERE (`create_time` >= '2021-11-01 00:00:00') AND (`create_time` <= '2021-11-01 23:59:59') AND (`company_id` = 20);

标题:ClickHouse中MySQL(TiDB)的映射表的使用优化
作者:wuqingbo
地址:http://solo.fancydigital.com.cn/articles/2021/11/22/1637584514025.html