典型题目:APP 某页面推出了数据分析系列直播课程介绍,用户可以选择报名任意一场或多场直播课。请统计每个科目最大同时在线人数(按 course_id 排序)
数据源表
-- 课程表
drop table if exists course_tb;
CREATE TABLE course_tb(
course_id int(10) NOT NULL,
course_name char(10) NOT NULL,
course_datetime char(30) NOT NULL);
INSERT INTO course_tb VALUES(1, 'Python', '2021-12-1 19:00-
21:00');
INSERT INTO course_tb VALUES(2, 'SQL', '2021-12-2 19:00-21:00');
INSERT INTO course_tb VALUES(3, 'R', '2021-12-3 19:00-21:00');
-- 上课情况表
drop table if exists attend_tb;
CREATE TABLE attend_tb(
user_id int(10) NOT NULL,
course_id int(10) NOT NULL,
in_datetime datetime NOT NULL,
out_datetime datetime NOT NULL
);
INSERT INTO attend_tb VALUES(100, 1, '2021-12-1 19:00:00', '2021-
12-1 19:28:00');
INSERT INTO attend_tb VALUES(100, 1, '2021-12-1 19:30:00', '2021-
12-1 19:53:00');
INSERT INTO attend_tb VALUES(101, 1, '2021-12-1 19:00:00', '2021-
12-1 20:55:00');
INSERT INTO attend_tb VALUES(102, 1, '2021-12-1 19:00:00', '2021-
12-1 19:05:00');
INSERT INTO attend_tb VALUES(104, 1, '2021-12-1 19:00:00', '2021-
12-1 20:59:00');
INSERT INTO attend_tb VALUES(101, 2, '2021-12-2 19:05:00', '2021-
12-2 20:58:00');
INSERT INTO attend_tb VALUES(102, 2, '2021-12-2 18:55:00', '2021-
12-2 21:00:00');
INSERT INTO attend_tb VALUES(104, 2, '2021-12-2 18:57:00', '2021-
12-2 20:56:00');
INSERT INTO attend_tb VALUES(107, 2, '2021-12-2 19:10:00', '2021-
12-2 19:18:00');
INSERT INTO attend_tb VALUES(100, 3, '2021-12-3 19:01:00', '2021-
12-3 21:00:00');
INSERT INTO attend_tb VALUES(102, 3, '2021-12-3 18:58:00', '2021-
12-3 19:05:00');
INSERT INTO attend_tb VALUES(108, 3, '2021-12-3 19:01:00', '2021-
12-3 19:56:00');
- 上课情况表(attend_tb)

- 课程表(course_tb)

解题思路
- 我们需要统计最大在线,涉及到进入和离开直播间,因此是一个累加的过程,如果要用到累加,我们就需要使用到窗口函数sum()
- 我们规定,进入时间代表该时刻直播间人数加1,因此增加字段uv,此时为1;推出时间代表该时刻直播间人数减1,因此增加字段uv,此时为-1
-- 合并后实现累加的前提条件
WITH t1 AS (
SELECT user_id
,course_id
,in_datetime AS dt
,1 AS uv
FROM attend_tb
UNION ALL
SELECT user_id
,course_id
,out_datetime AS dt
,-1 AS uv
FROM attend_tb
)
- 利用窗口函数sum后取max提取出最大在线人数
完整代码为:
-- SELECT * FROM sqlpartice.attend_tb;
WITH t1 AS (
SELECT user_id
,course_id
,in_datetime AS dt
,1 AS uv
FROM attend_tb
UNION ALL
SELECT user_id
,course_id
,out_datetime AS dt
,-1 AS uv
FROM attend_tb
)
SELECT
a.course_id
,course_name
,MAX(cnt)
FROM(
SELECT
user_id
,course_id
,dt
-- 这里对uv取降序处理,是因为如果同一时间有进有出,应该要先增加再减少的逻辑
,SUM(uv) OVER(PARTITION BY course_id ORDER BY dt ,uv DESC) AS cnt
FROM t1
)a
JOIN course_tb ON a.course_id = course_tb.course_id
GROUP BY course_id,course_name
ORDER BY course_id
额外变式:统计直播中前15分钟和最后五分钟直播间的同时在线人数,并且找出峰值点
- 变式的难点主要是要对时间字段进行处理,可以用最简单的截取字符串并转换为datetime,还有对时间字段进行筛选
-- 解析课程时间段为开始和结束时间
WITH course_time_parsed AS (
SELECT
course_id,
course_name,
STR_TO_DATE(CONCAT('2021-12-',
SUBSTRING_INDEX(course_datetime, ' ', 1),
' ',
SUBSTRING_INDEX(SUBSTRING_INDEX(course_datetime, '-', 1), ' ', -1)),
'%Y-%m-%d %H:%i') AS start_time,
STR_TO_DATE(CONCAT('2021-12-',
SUBSTRING_INDEX(course_datetime, ' ', 1),
' ',
SUBSTRING_INDEX(course_datetime, '-', -1)),
'%Y-%m-%d %H:%i') AS end_time
FROM course_tb
),
-- 计算核心时段(课前15分钟至课后5分钟)
course_window AS (
SELECT
course_id,
course_name,
DATE_SUB(start_time, INTERVAL 15 MINUTE) AS core_start,
DATE_ADD(end_time, INTERVAL 5 MINUTE) AS core_end
FROM course_time_parsed
),
-- 合并进出事件并过滤核心时段
attendance_events AS (
SELECT
user_id,
course_id,
in_datetime AS event_time,
1 AS delta
FROM attend_tb
JOIN course_window USING(course_id)
WHERE in_datetime BETWEEN core_start AND core_end
UNION ALL
SELECT
user_id,
course_id,
STR_TO_DATE(out_datetime, '%Y-%m-%d %H:%i:%s') AS event_time,
-1 AS delta
FROM attend_tb
JOIN course_window USING(course_id)
WHERE STR_TO_DATE(out_datetime, '%Y-%m-%d %H:%i:%s') BETWEEN core_start AND core_end
),
-- 计算实时在线人数
realtime_online AS (
SELECT
course_id,
event_time,
SUM(delta) OVER (
PARTITION BY course_id
ORDER BY event_time, delta DESC
) AS concurrent_users
FROM attendance_events
),
-- 标记峰值时间点
peak_markers AS (
SELECT
course_id,
event_time,
concurrent_users,
ROW_NUMBER() OVER (
PARTITION BY course_id
ORDER BY concurrent_users DESC, event_time
) AS peak_rank
FROM realtime_online
)
-- 最终结果
SELECT
c.course_id,
c.course_name,
p.concurrent_users AS max_concurrent_users,
p.event_time AS peak_time
FROM peak_markers p
JOIN course_window c USING(course_id)
WHERE p.peak_rank = 1
ORDER BY c.course_id;

1813

被折叠的 条评论
为什么被折叠?



