SQL练习-用户留存问题

典型题目:APP 某页面推出了数据分析系列直播课程介绍,用户可以选择报名任意一场或多场直播课。请统计每个科目最大同时在线人数(按 course_id 排序)

数据源表

-- 课程表
drop table if exists course_tb;
CREATE TABLE course_tb(
	course_id int(10) NOT NULL,
	course_name char(10) NOT NULL,
	course_datetime char(30) NOT NULL);
INSERT INTO course_tb VALUES(1, 'Python', '2021-12-1 19:00-
21:00');
INSERT INTO course_tb VALUES(2, 'SQL', '2021-12-2 19:00-21:00');
INSERT INTO course_tb VALUES(3, 'R', '2021-12-3 19:00-21:00');

-- 上课情况表
drop table if exists attend_tb;
CREATE TABLE attend_tb(
	user_id int(10) NOT NULL,
	course_id int(10) NOT NULL,
	in_datetime datetime NOT NULL,
	out_datetime datetime NOT NULL
);
INSERT INTO attend_tb VALUES(100, 1, '2021-12-1 19:00:00', '2021-
12-1 19:28:00');
INSERT INTO attend_tb VALUES(100, 1, '2021-12-1 19:30:00', '2021-
12-1 19:53:00');
INSERT INTO attend_tb VALUES(101, 1, '2021-12-1 19:00:00', '2021-
12-1 20:55:00');
INSERT INTO attend_tb VALUES(102, 1, '2021-12-1 19:00:00', '2021-
12-1 19:05:00');
INSERT INTO attend_tb VALUES(104, 1, '2021-12-1 19:00:00', '2021-
12-1 20:59:00');
INSERT INTO attend_tb VALUES(101, 2, '2021-12-2 19:05:00', '2021-
12-2 20:58:00');
INSERT INTO attend_tb VALUES(102, 2, '2021-12-2 18:55:00', '2021-
12-2 21:00:00');
INSERT INTO attend_tb VALUES(104, 2, '2021-12-2 18:57:00', '2021-
12-2 20:56:00');
INSERT INTO attend_tb VALUES(107, 2, '2021-12-2 19:10:00', '2021-
12-2 19:18:00');
INSERT INTO attend_tb VALUES(100, 3, '2021-12-3 19:01:00', '2021-
12-3 21:00:00');
INSERT INTO attend_tb VALUES(102, 3, '2021-12-3 18:58:00', '2021-
12-3 19:05:00');
INSERT INTO attend_tb VALUES(108, 3, '2021-12-3 19:01:00', '2021-
12-3 19:56:00');
  • 上课情况表(attend_tb)
    在这里插入图片描述
  • 课程表(course_tb)
    在这里插入图片描述

解题思路

  • 我们需要统计最大在线,涉及到进入和离开直播间,因此是一个累加的过程,如果要用到累加,我们就需要使用到窗口函数sum()
  1. 我们规定,进入时间代表该时刻直播间人数加1,因此增加字段uv,此时为1;推出时间代表该时刻直播间人数减1,因此增加字段uv,此时为-1
-- 合并后实现累加的前提条件
WITH t1 AS (
SELECT user_id
	,course_id
    ,in_datetime AS dt
    ,1 AS uv
FROM attend_tb
UNION ALL
SELECT user_id
	,course_id
    ,out_datetime AS dt
    ,-1 AS uv
FROM attend_tb
)
  1. 利用窗口函数sum后取max提取出最大在线人数

完整代码为:

-- SELECT * FROM sqlpartice.attend_tb;

WITH t1 AS (
SELECT user_id
	,course_id
    ,in_datetime AS dt
    ,1 AS uv
FROM attend_tb
UNION ALL
SELECT user_id
	,course_id
    ,out_datetime AS dt
    ,-1 AS uv
FROM attend_tb
)

SELECT
	a.course_id
    ,course_name
    ,MAX(cnt)
FROM(
	SELECT
		user_id
		,course_id
		,dt
		-- 这里对uv取降序处理,是因为如果同一时间有进有出,应该要先增加再减少的逻辑
		,SUM(uv) OVER(PARTITION BY course_id ORDER BY dt ,uv DESC) AS cnt
	FROM t1
)a
JOIN course_tb ON a.course_id = course_tb.course_id
GROUP BY course_id,course_name
ORDER BY course_id

额外变式:统计直播中前15分钟和最后五分钟直播间的同时在线人数,并且找出峰值点

  • 变式的难点主要是要对时间字段进行处理,可以用最简单的截取字符串并转换为datetime,还有对时间字段进行筛选
-- 解析课程时间段为开始和结束时间
WITH course_time_parsed AS (
    SELECT 
        course_id,
        course_name,
        STR_TO_DATE(CONCAT('2021-12-', 
                  SUBSTRING_INDEX(course_datetime, ' ', 1), 
                  ' ', 
                  SUBSTRING_INDEX(SUBSTRING_INDEX(course_datetime, '-', 1), ' ', -1)), 
                  '%Y-%m-%d %H:%i') AS start_time,
        STR_TO_DATE(CONCAT('2021-12-', 
                  SUBSTRING_INDEX(course_datetime, ' ', 1), 
                  ' ', 
                  SUBSTRING_INDEX(course_datetime, '-', -1)), 
                  '%Y-%m-%d %H:%i') AS end_time
    FROM course_tb
),

-- 计算核心时段(课前15分钟至课后5分钟)
course_window AS (
    SELECT 
        course_id,
        course_name,
        DATE_SUB(start_time, INTERVAL 15 MINUTE) AS core_start,
        DATE_ADD(end_time, INTERVAL 5 MINUTE) AS core_end
    FROM course_time_parsed
),

-- 合并进出事件并过滤核心时段
attendance_events AS (
    SELECT 
        user_id,
        course_id,
        in_datetime AS event_time,
        1 AS delta
    FROM attend_tb
    JOIN course_window USING(course_id)
    WHERE in_datetime BETWEEN core_start AND core_end
    
    UNION ALL
    
    SELECT 
        user_id,
        course_id,
        STR_TO_DATE(out_datetime, '%Y-%m-%d %H:%i:%s') AS event_time,
        -1 AS delta
    FROM attend_tb
    JOIN course_window USING(course_id)
    WHERE STR_TO_DATE(out_datetime, '%Y-%m-%d %H:%i:%s') BETWEEN core_start AND core_end
),

-- 计算实时在线人数
realtime_online AS (
    SELECT
        course_id,
        event_time,
        SUM(delta) OVER (
            PARTITION BY course_id 
            ORDER BY event_time, delta DESC
        ) AS concurrent_users
    FROM attendance_events
),

-- 标记峰值时间点
peak_markers AS (
    SELECT
        course_id,
        event_time,
        concurrent_users,
        ROW_NUMBER() OVER (
            PARTITION BY course_id 
            ORDER BY concurrent_users DESC, event_time
        ) AS peak_rank
    FROM realtime_online
)

-- 最终结果
SELECT
    c.course_id,
    c.course_name,
    p.concurrent_users AS max_concurrent_users,
    p.event_time AS peak_time
FROM peak_markers p
JOIN course_window c USING(course_id)
WHERE p.peak_rank = 1
ORDER BY c.course_id;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值