背景
有一张表:
date | store_id | sku | sales |
---|---|---|---|
2023-01-01 | CK005 | 03045 | 50 |
date 代表交易日期,store_id代表门店编号,sku代表商品,sales代表销量。
问题
求出2022年每家店销量前3的sku。
生成的表为4列,第一列为store_id,第二至第四列依次为销量排名前三的sku,如果某家门店只卖两个sku,即没有销量排名第三的sku,那么那个单元格用null值代替
sql解法
select `store_id`,
max(`SKU_1`) as `TOP_SKU_1`,
max(`SKU_2`) AS `TOP_SKU_2`,
max(`SKU_3`) AS `TOP_SKU_3` from (
select `store_id`,
IF(`rank_num` = 1, `sku`, NULL) AS `SKU_1`,
IF(`rank_num` = 2, `sku`, NULL) AS `SKU_2`,
IF(`rank_num` = 3, `sku`, NULL) AS `SKU_3`
from
(select *, row_number() over (partition by `store_id` order by `sales` desc) as `rank_num`
from (
SELECT `store_id`, `sku`, sum(`qty`) as `sales` FROM input
where year(`date`) = 2022
group by `store_id`, `sku`))
where `rank_num` <= 3 )
group by `store_id`
结果校验:
其中:
排序然后case或者if之后是这样的:
select *
from
(select *, row_number() over (partition by `store_id` order by `sales` desc) as `rank_num`
from (
SELECT `store_id`, `sku`, sum(`qty`) as `sales` FROM input
where year(`date`) = 2022
group by `store_id`, `sku`))
where `rank_num` <= 3
然后再加一个group by的操作,用聚合函数max取出每一组的值。
Python解法
- 求出销量前三的sku
df = hist_month.query('year == 2022').groupby(['year','stock_id','sku'],as_index = False).agg({'y':sum})
df['y_rank'] = df.groupby("stock_id")["y"].rank( ascending=False)
top3_df = df.query('y_rank <= 3')
top3_df
temp = top3_df.set_index(['year','stock_id','y_rank'])['sku'].unstack()
temp = temp.rename_axis(columns=None).reset_index()