ADS(Application Data Store)层是数据仓库的最顶层,直接面向业务应用,为数据分析、报表展示和业务决策提供数据支持。它是数据仓库与业务系统之间的桥梁,将经过清洗、转换和汇总的数据以业务友好的方式呈现给最终用户。
一、ADS层建设思路
1、设计原则
业务导向:ADS层设计应以业务需求为核心,确保数据能够直接支持业务决策和运营分析。
性能优化:针对高频查询场景进行优化,包括合理的分区策略、索引设计和物化视图。
数据一致性:确保数据口径统一,避免不同报表之间的数据不一致。
可扩展性:设计时考虑未来业务扩展需求,预留足够的扩展空间。
易用性:提供简单直观的数据结构,降低业务人员使用门槛。
2、数据模型设计
ADS层通常采用星型模型或雪花模型,主要包含以下几类表:
汇总事实表:按不同维度(时间、地区、产品等)汇总的事实数据。
维度表:包含业务实体的属性信息,如客户、产品、地区等。
指标表:存储预计算的业务指标,如转化率、留存率等。
报表表:直接面向报表展示的宽表,包含多个维度的指标。
3、数据更新策略
增量更新:对于大表,采用增量更新策略,只处理新增或变更的数据。
全量刷新:对于小表或需要保证数据一致性的场景,采用全量刷新策略。
定时调度:根据业务需求设置合理的调度周期,如每日、每周或每月。
二、ADS层应用场景
业务报表:为管理层提供决策支持的各类报表。
运营分析:支持运营人员进行用户行为分析和营销效果评估。
风险控制:提供风险监控和预警数据。
客户服务:支持客服人员进行客户画像分析和精准服务。
产品优化:为产品团队提供用户反馈和使用数据,指导产品迭代。
三、实战案例:用户价值分析报表
1、业务背景
电商平台需要对用户进行价值分层,以便进行精准营销和个性化服务。基于RFM模型(Recency、Frequency、Monetary)对用户进行价值评估,并计算用户的生命周期价值,为运营决策提供数据支持。
2、数据来源
DWS层:用户交易宽表(dws_trade_user_order_td)和用户登录宽表(dws_user_user_login_td)
ADS层:历史用户价值分析数据(用于计算价值发展趋势)
3、实现方案
1)建表
-- 用户价值分析表 CREATE TABLE IF NOT EXISTS ads.ads_user_value_analysis ( dt DATE COMMENT '统计日期', user_id BIGINT COMMENT '用户ID', order_count_td BIGINT COMMENT '累计下单次数', order_amount_td DECIMAL(20,2) COMMENT '累计下单金额', order_last_date DATE COMMENT '最近下单日期', order_first_date DATE COMMENT '首次下单日期', login_count_td BIGINT COMMENT '累计登录次数', login_last_date DATE COMMENT '最近登录日期', average_order_amount DECIMAL(20,2) COMMENT '平均客单价', purchase_cycle_days INT COMMENT '平均购买周期(天)', account_days INT COMMENT '账号存续天数', life_time_value DECIMAL(20,2) COMMENT '生命周期价值(LTV)', recency_score INT COMMENT '最近活跃度评分(R)', frequency_score INT COMMENT '活动频次评分(F)', monetary_score INT COMMENT '消费金额评分(M)', rfm_score INT COMMENT 'RFM总分', user_value_level STRING COMMENT '用户价值分层', active_status STRING COMMENT '活跃状态', life_cycle_status STRING COMMENT '生命周期状态', shopping_preference STRING COMMENT '购物偏好', growth_trend STRING COMMENT '价值发展趋势' ) COMMENT '用户价值分层分析报表,基于RFM模型计算用户价值' PARTITIONED BY (dt STRING) STORED AS ORC TBLPROPERTIES ('orc.compress'='SNAPPY');
2)核心指标计算
RFM模型评分
Recency(最近购买时间):根据最近一次购买距今的天数评分(1-5分) Frequency(购买频率):根据累计购买次数评分(1-5分) Monetary(消费金额):根据累计消费金额评分(1-5分)
用户价值分层
高价值:RFM总分≥13分 中高价值:RFM总分10-12分 中价值:RFM总分7-9分 低价值:RFM总分4-6分 流失风险:RFM总分≤3分
生命周期价值(LTV)
计算公式:平均客单价 × 年购买频率 × 预期客户生命周期(年) 年购买频率:总购买次数×365/账号存续天数
用户状态分类
活跃状态:基于最近交易和登录时间 生命周期状态:基于交易行为和订单历史 购物偏好:基于购买频率和金额 价值发展趋势:比较当前RFM评分与30天前的评分
3)具体逻辑实现
INSERT INTO ads.ads_user_value_analysis (dt, user_id, order_count_td, order_amount_td, order_last_date, order_first_date, login_count_td, login_last_date, average_order_amount, purchase_cycle_days, account_days, life_time_value, recency_score, frequency_score, monetary_score, rfm_score, user_value_level, active_status, life_cycle_status, shopping_preference, growth_trend) SELECT -- 基础日期统计 date('${pdate}') AS dt, -- 统计日期,使用调度日期参数 t1.user_id AS user_id, -- 用户ID t1.order_count_td, -- 累计下单次数,来自交易宽表 t1.total_amount_td, -- 累计下单金额,来自交易宽表 -- 格式化日期为yyyy-MM-dd格式 date_format(t1.order_last_date, '%Y-%m-%d') AS order_last_date, -- 最近下单日期 date_format(t1.order_first_date, '%Y-%m-%d') AS order_first_date, -- 首次下单日期 t2.login_count_td, -- 累计登录次数,来自用户登录宽表 date_format(t2.login_last_date, '%Y-%m-%d') AS login_last_date, -- 最近登录日期 -- 计算衍生指标 -- 计算平均客单价 = 总金额/订单数 CASE WHEN t1.order_count_td > 0 THEN t1.total_amount_td/t1.order_count_td ELSE 0 END AS average_order_amount, -- 计算平均购买周期(天) = (最后订单日期-首次订单日期)/(订单数-1) CASE WHEN t1.order_count_td > 1 THEN datediff(t1.order_last_date, t1.order_first_date)/(t1.order_count_td-1) ELSE NULL END AS purchase_cycle_days, -- 计算账号存续天数 = 当前日期-注册日期 datediff(current_date(), t2.register_date) AS account_days, -- 计算生命周期价值(LTV) = 平均客单价 * 年购买频率 * 预期客户生命周期(年) -- 年购买频率计算方式: 总购买次数*365/账号存续天数,即年化购买频率 -- 预期客户生命周期取3年作为默认预估 CASE WHEN t1.order_count_td > 0 AND datediff(current_date(), t2.register_date) > 0 THEN (t1.total_amount_td/t1.order_count_td) * (t1.order_count_td*365/datediff(current_date(), t2.register_date)) * 3 ELSE 0 END AS life_time_value, -- RFM模型计算 - 为每个维度打分(1-5分) -- Recency(最近购买时间)评分: 越近分数越高 CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 -- 30天内 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 -- 31-60天 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 -- 61-90天 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 -- 91-180天 ELSE 1 -- 180天以上 END AS recency_score, -- Frequency(购买频率)评分: 购买次数越多分数越高 CASE WHEN t1.order_count_td >= 20 THEN 5 -- 20次及以上 WHEN t1.order_count_td >= 10 THEN 4 -- 10-19次 WHEN t1.order_count_td >= 5 THEN 3 -- 5-9次 WHEN t1.order_count_td >= 2 THEN 2 -- 2-4次 ELSE 1 -- 1次 END AS frequency_score, -- Monetary(消费金额)评分: 总消费金额越高分数越高 CASE WHEN t1.total_amount_td >= 10000 THEN 5 -- 1万元及以上 WHEN t1.total_amount_td >= 5000 THEN 4 -- 5千-1万元 WHEN t1.total_amount_td >= 2000 THEN 3 -- 2千-5千元 WHEN t1.total_amount_td >= 500 THEN 2 -- 500-2千元 ELSE 1 -- 500元以下 END AS monetary_score, -- 计算RFM总分 = R分 + F分 + M分 (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END) + (CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END) + (CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) AS rfm_score, -- 用户价值分层: 根据RFM总分(3-15分)进行分层 CASE WHEN (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END + CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END + CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) >= 13 THEN '高价值' -- 13-15分 WHEN (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END + CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END + CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) >= 10 THEN '中高价值' -- 10-12分 WHEN (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END + CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END + CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) >= 7 THEN '中价值' -- 7-9分 WHEN (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END + CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END + CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) >= 4 THEN '低价值' -- 4-6分 ELSE '流失风险' -- 3分 END AS user_value_level, -- 活跃状态: 基于最近交易和登录时间 CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 OR datediff(current_date(), t2.login_last_date) <= 7 THEN '活跃' -- 30天内有交易或7天内有登录 WHEN datediff(current_date(), t1.order_last_date) <= 90 OR datediff(current_date(), t2.login_last_date) <= 30 THEN '沉默' -- 90天内有交易或30天内有登录 ELSE '流失' -- 超过90天未交易且超过30天未登录 END AS active_status, -- 生命周期状态: 基于交易行为和订单历史 CASE WHEN datediff(current_date(), t1.order_first_date) <= 30 AND t1.order_count_td <= 2 THEN '新用户' -- 30天内首次购买且购买次数<=2次 WHEN t1.order_count_td >= 3 AND (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END) >= 4 THEN '成长期' -- 购买>=3次且近期活跃(60天内) WHEN t1.order_count_td >= 5 AND (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END) >= 3 THEN '成熟期' -- 购买>=5次且90天内有购买 WHEN (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END) <= 2 AND (CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END) >= 3 THEN '衰退期' -- 购买次数>=5但超过90天未购买 WHEN (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END) >= 3 AND datediff(t1.order_last_date, t0.lag_order_date) > 90 THEN '回流' -- 最近90天内有购买但之前超过90天未购买 ELSE '新用户' -- 默认为新用户 END AS life_cycle_status, -- 购物偏好: 基于购买频率和金额 CASE WHEN (CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END) >= 4 AND (CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) <= 3 THEN '高频低额' -- 高频率低金额: 购买频繁但单价较低 WHEN (CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END) <= 3 AND (CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) >= 4 THEN '低频高额' -- 低频率高金额: 购买较少但大额消费 WHEN (CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END) >= 4 AND (CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) >= 4 THEN '高频高额' -- 高频率高金额: 高价值客户,频繁且大额 ELSE '低频低额' -- 低频率低金额: 低价值客户 END AS shopping_preference, -- 价值发展趋势: 比较当前RFM评分与30天前的评分 CASE WHEN (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END + CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END + CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) > COALESCE(t3.previous_rfm_score, 0) THEN '上升' -- 当前分数高于30天前,趋势上升 WHEN (CASE WHEN datediff(current_date(), t1.order_last_date) <= 30 THEN 5 WHEN datediff(current_date(), t1.order_last_date) <= 60 THEN 4 WHEN datediff(current_date(), t1.order_last_date) <= 90 THEN 3 WHEN datediff(current_date(), t1.order_last_date) <= 180 THEN 2 ELSE 1 END + CASE WHEN t1.order_count_td >= 20 THEN 5 WHEN t1.order_count_td >= 10 THEN 4 WHEN t1.order_count_td >= 5 THEN 3 WHEN t1.order_count_td >= 2 THEN 2 ELSE 1 END + CASE WHEN t1.total_amount_td >= 10000 THEN 5 WHEN t1.total_amount_td >= 5000 THEN 4 WHEN t1.total_amount_td >= 2000 THEN 3 WHEN t1.total_amount_td >= 500 THEN 2 ELSE 1 END) < COALESCE(t3.previous_rfm_score, 0) THEN '下降' -- 当前分数低于30天前,趋势下降 ELSE '稳定' -- 分数相等,趋势稳定 END AS growth_trend FROM ( -- 订单数据: 获取用户交易相关信息 SELECT user_id, k1, order_date_last, LAG(order_date_last, 1, NULL) OVER(PARTITION BY user_id ORDER BY k1) AS lag_order_date -- 获取上一次最近下单日期,用于计算回流状态 FROM dws.dws_trade_user_order_td WHERE k1 = date('${pdate}') -- 取当天分区数据 ) t0 JOIN ( -- 订单数据: 获取用户交易相关信息 SELECT user_id, SUM(order_count_td) AS order_count_td, -- 累计下单次数 SUM(total_amount_td) AS total_amount_td, -- 累计下单金额 MAX(order_date_last) AS order_last_date, -- 最近下单日期 MIN(order_date_first) AS order_first_date -- 首次下单日期 FROM dws.dws_trade_user_order_td WHERE k1 = date('${pdate}') -- 取当天分区数据 GROUP BY user_id ) t1 ON t0.user_id = t1.user_id JOIN ( -- 登录数据: 获取用户登录相关信息 SELECT user_id, SUM(login_count_td) AS login_count_td, -- 累计登录次数 MAX(login_date_last) AS login_last_date, -- 最近登录日期 date('2020-01-01') AS register_date -- 注册日期,使用默认值 FROM dws.dws_user_user_login_td WHERE k1 = date('${pdate}') -- 取当天分区数据 GROUP BY user_id ) t2 ON t1.user_id = t2.user_id LEFT JOIN ( -- 上月RFM评分数据: 用于计算价值发展趋势 SELECT user_id, recency_score + frequency_score + monetary_score AS previous_rfm_score -- 30天前的RFM总分 FROM ads.ads_user_value_analysis WHERE dt = date_sub(date('${pdate}'), 30) -- 取30天前的数据 ) t3 ON t1.user_id = t3.user_id;
ADS(Application Data Store)层是数据仓库的最顶层,直接面向业务应用。其建设核心是以业务需求为导向,将经过清洗、转换和汇总的数据以业务友好的方式呈现。ADS层设计应遵循业务导向、性能优化、数据一致性、可扩展性和易用性五大原则。主要包含汇总事实表、维度表、指标表和报表表等数据模型。数据更新策略包括增量更新和全量刷新,需根据业务场景选择。ADS层实现需要高性能存储引擎(如Doris)和计算框架,并建立完善的数据质量控制机制。通过ADS层,企业可以实现精准营销、会员运营、风险控制、客户服务和产品优化等业务目标,为决策提供数据支持。
还没有评论,来说两句吧...