ClickHouse的JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN都是不支持在ON中都只支持等式判定,无法支持不等式判定。因此提供了一个叫ASOF JOIN的功能,其支持在ON写不等式,但依旧有以下限定
table1 | event | ev_time | user_id | table2 | event | ev_time | user_id |
---|---|---|---|---|---|---|---|
| | event_1_1 | 2021-09-09 12:00:00 | 42 | | | event_2_1 | 2021-09-09 11:59:00 | 42 |
| | event_1_2 | 2021-09-09 13:00:00 | 42 | | | event_2_2 | 2021-09-09 12:30:00 | 42 |
| | | | event_2_3 | 2021-09-09 13:00:00 | 42 |
结果
user_id | event | ev_time | event2 | ev_time2 |
---|---|---|---|---|
42 | event_1_2 | 2021-09-09 05:00:00 | event_2_3 | 2021-09-09 05:00:00 |
42 | event_1_1 | 2021-09-09 04:00:00 | event_2_1 | 2021-09-09 03:59:00 |
select table1.user_id, table1.event, table1.ev_time, table2.event as event2, table2.ev_time as ev_time2
from
(
select 'event_1_1' as event, toDateTime('2021-09-09 12:00:00') as ev_time, 42 as user_id
union all
select 'event_1_2' as event, toDateTime('2021-09-09 13:00:00') as ev_time, 42 as user_id
) table1
asof left join
(
select 'event_2_1' as event, toDateTime('2021-09-09 11:59:00') as ev_time, 42 as user_id
union all
select 'event_2_2' as event, toDateTime('2021-09-09 12:30:00') as ev_time, 42 as user_id
union all
select 'event_2_3' as event, toDateTime('2021-09-09 13:00:00') as ev_time, 42 as user_id
) table2
on table1.user_id = table2.user_id and table1.ev_time >= table2.ev_time
FancyDSP允许一个投放计划同时映射多个AMO订单,同时有个前提,即这些AMO订单的执行周期是无交集的。
举个栗子
基于以上的场景,汇总底表数据中只有分天,分投放计划等执行系统信息,外加一个投放计划于AMO订单的映射表,此时如何准确用汇总数据及映射表信息关联出AMO订单信息呢?
分析可得,此问题最大的难点
举个栗子来说明就是
在上述栗子中,最终实现的数据需要满足以下几点
我们先从几个错误的尝试开始
select dat, data.c_id
, caom.order_id as caom_order_id
, ao.order_id as ao_order_id
from
(
select toDate('2021-09-03') as dat, 1 as c_id
union all
select toDate('2021-09-04') as dat, 1 as c_id
union all
select toDate('2021-09-05') as dat, 1 as c_id
union all
select toDate('2021-09-06') as dat, 1 as c_id
) data
left join (
select 1 as c_id, 1 as order_id
union all
select 1 as c_id, 2 as order_id
) caom on data.c_id = caom.c_id
left join (
select 1 as order_id, toDate('2021-09-03') as start_date, toDate('2021-09-04') as end_date
union all
select 2 as order_id, toDate('2021-09-05') as start_date, toDate('2021-09-05') as end_date
) ao on caom.order_id = ao.order_id
where data.dat >= ao.start_date and data.dat <= ao.end_date
order by dat, ao_order_id
dat | c_id | caom_order_id | ao_order_id |
---|---|---|---|
2021-09-03 | 1 | 1 | 1 |
2021-09-04 | 1 | 1 | 1 |
2021-09-05 | 1 | 2 | 2 |
dat | c_id | caom_order_id | ao_order_id |
---|---|---|---|
2021-09-03 | 1 | 1 | 1 |
2021-09-03 | 1 | 2 | 2 |
2021-09-04 | 1 | 1 | 1 |
2021-09-04 | 1 | 2 | 2 |
2021-09-05 | 1 | 1 | 1 |
2021-09-05 | 1 | 2 | 2 |
2021-09-06 | 1 | 1 | 1 |
2021-09-06 | 1 | 2 | 2 |
在第一个left join时,data数据就已经被放大,然后后面两个asof left join只会在此基础上确保数据不放大,已经无力回天。
select dat, data.c_id
, caom.order_id as caom_order_id
, ao1.order_id as ao1_order_id
, ao.order_id as ao_order_id
from
(
select toDate('2021-09-03') as dat, 1 as c_id
union all
select toDate('2021-09-04') as dat, 1 as c_id
union all
select toDate('2021-09-05') as dat, 1 as c_id
union all
select toDate('2021-09-06') as dat, 1 as c_id
) data
left join (
select 1 as c_id, 1 as order_id
union all
select 1 as c_id, 2 as order_id
) caom on data.c_id = caom.c_id
asof left join (
select 1 as order_id, toDate('2021-09-03') as start_date
union all
select 2 as order_id, toDate('2021-09-05') as start_date
) ao1 on caom.order_id = ao1.order_id and data.dat >= ao1.start_date
asof left join (
select 1 as order_id, toDate('2021-09-04') as end_date
union all
select 2 as order_id, toDate('2021-09-05') as end_date
) ao on ao1.order_id = ao.order_id and data.dat <= ao.end_date
order by dat, caom_order_id
dat | c_id | caom_order_id | ao1_order_id | ao_order_id |
---|---|---|---|---|
2021-09-03 | 1 | 1 | 1 | 1 |
2021-09-03 | 1 | 2 | 0 | 0 |
2021-09-04 | 1 | 1 | 1 | 1 |
2021-09-04 | 1 | 2 | 0 | 0 |
2021-09-05 | 1 | 1 | 1 | 0 |
2021-09-05 | 1 | 2 | 2 | 2 |
2021-09-06 | 1 | 1 | 1 | 0 |
2021-09-06 | 1 | 2 | 2 | 0 |
select dat, data.c_id
, ao1.order_id as ao1_order_id
, ao1.start_date as start_date
, ao.order_id as ao_order_id
, ao.end_date as end_date
from
(
select toDate('2021-09-03') as dat, 1 as c_id
union all
select toDate('2021-09-04') as dat, 1 as c_id
union all
select toDate('2021-09-05') as dat, 1 as c_id
union all
select toDate('2021-09-06') as dat, 1 as c_id
) data
asof left join (
select c_id, caom.order_id as order_id, ao.start_date as start_date
from (
select 1 as c_id, 1 as order_id
union all
select 1 as c_id, 2 as order_id
) caom
join (
select 1 as order_id, toDate('2021-09-03') as start_date
union all
select 2 as order_id, toDate('2021-09-05') as start_date
) ao on caom.order_id = ao.order_id
) ao1 on data.c_id = ao1.c_id and data.dat >= ao1.start_date
asof left join (
select ao.order_id as order_id, ao.end_date as end_date
from (
select 1 as order_id, toDate('2021-09-04') as end_date
union all
select 2 as order_id, toDate('2021-09-05') as end_date
) ao
) ao on ao1.order_id = ao.order_id and data.dat <= ao.end_date
order by dat, ao_order_id
dat | c_id | ao1_order_id | start_date | ao_order_id | end_date |
---|---|---|---|---|---|
2021-09-03 | 1 | 1 | 2021-09-03 | 1 | 2021-09-04 |
2021-09-04 | 1 | 1 | 2021-09-03 | 1 | 2021-09-04 |
2021-09-05 | 1 | 2 | 2021-09-05 | 2 | 2021-09-05 |
2021-09-06 | 1 | 2 | 2021-09-05 | 0 | 1970-01-01 |