时间:2023-03-15 23:00:01 | 来源:电子商务
时间:2023-03-15 23:00:01 来源:电子商务
目录推荐系统中用户对物品的反馈分为显式和隐式反馈,显式反馈 (如评分、评级) 或单一的隐式反馈 (如浏览、点击、加入购物车)。 隐式反馈推荐是推荐系统通过对内容和用户行为的分析,建立适当的模型,帮助用户从海量的数据中找到自己感兴趣的内容。推荐系统中用户的行为反馈包括显式反馈和隐式反馈,隐式反馈信息在推荐系统算法中被广泛应用。隐式反馈体现着用户的兴趣爱好,对隐式反馈信息的挖掘有助于提高推荐系统的效果,以更好地设计推荐系统。2. 项目分析目的
select count(*), count(UserID), count(Product_ID), count(Product_Category_ID), count(Behavior),count(Timestamp)from Userbehavior;
我们在当初定义表的时候,选择了not null选项,现在在用count计算时,会发现,我们时没有缺失值的。delete from 表名 where 列名 is null;
第二种,是填充缺失值select coalesce(列名, 要填充的值) from 表名;
3. 将时间戳转换和拆分为日期和时间alter table Userbehavior add 日期 varchar(255);update Userbehaviorset 日期=from_unixtime(时间戳,'%Y-%m-%d');
修改后结果如图alter table Userbehavior add Time varchar(255);update Userbehaviorset Time =from_unixtime(时间戳,'%H:%i:%s')
修改结果如图DELETE FROM userbehavior WHERE 日期 not BETWEEN '2017-11-25' AND '2017-12-03';
这样就删除了,不在我们需要的日期范围内的数据啦SELECT product_category_ID,COUNT(product_category_ID)AS Num FROM userbehaviorGROUP BY product_category_IDORDER BY Num DESCLIMIT 10;
我们选取了热销产品的TOP10来看,但是这里有一个问题,这个榜是包含了用户所有行为的表格,包括加购,收藏,点击和购买,我们拆开来看看,每个环节的情况。select Product_Category_ID,COUNT(product_category_id)AS Num,(Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'pv') as pv,(Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'cart') as cart,(Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'fav') as fav,(Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'buy') as buyFrom userbehavior u1Group by u1.product_category_idOrder By Num DescLIMIT 10;
我们可以看到,毫不意外,用户主要的行为还是在点击量上。select Product_Category_ID,COUNT(product_category_id)AS Sum,(Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'pv') as pv,(Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'cart') as cart,(Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'fav') as fav,(Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'buy') as buy,CONCAT(ROUND(((Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'buy')/(Select COUNT(u2.product_category_id) from userbehavior u2 WHERE u1.product_category_id = u2.Product_Category_ID AND Behavior = 'pv')*100),2),'%') AS changesFrom userbehavior u1Group by u1.product_category_idOrder By sum DescLIMIT 10;
我们发现,行为总数最多的商品种类3607361,成交率是0.那么很显然,产品的高点击量本质上毫无意义,因为不能给店铺带来收益。在高点击量的情况下,我们能够看出,该产品的获客能力是很强的,证明公司的推广策略本身很有效,但是用户点击进来后的,加购,收藏,和购买行为都在我们上面展示的10条数据里面,并不算是偏高的。由此说明,用户点击进来后,没有找到自己想要的,或者说,用户点击来发现产品不是自己期望的。该产品就需要在这种方面进行改善。SELECT CONCAT(ROUND(((SELECT COUNT(UserID) FROM userbehavior WHERE Behavior = 'pv')-(SELECT COUNT(UserID)FROM userbehavior WHERE Behavior = 'buy'))/COUNT(*)*100,2),'%') AS 加购流失率FROM userbehavior;
我们看到这个用户流失率高达88.07%,我们先来看看剩下几个路径,再来一起对比。SELECT CONCAT(ROUND(((SELECT COUNT(UserID) FROM userbehavior WHERE Behavior = 'pv')-(SELECT COUNT(UserID)FROM userbehavior WHERE Behavior = 'cart'))/COUNT(*)*100,2),'%') AS 加购流失率,CONCAT(ROUND(((SELECT COUNT(UserID) FROM userbehavior WHERE Behavior = 'cart')-(SELECT COUNT(UserID)FROM userbehavior WHERE Behavior = 'buy'))/COUNT(*)*100,2),'%') AS 购买流失率FROM userbehavior;
加购流失率:84.48% 加购后购买流失率 - 3.58%SELECT CONCAT(ROUND(((SELECT COUNT(UserID) FROM userbehavior WHERE Behavior = 'pv')-(SELECT COUNT(UserID)FROM userbehavior WHERE Behavior = 'fav'))/COUNT(*)*100,2),'%') AS 收藏流失率,CONCAT(ROUND(((SELECT COUNT(UserID) FROM userbehavior WHERE Behavior = 'fav')-(SELECT COUNT(UserID)FROM userbehavior WHERE Behavior = 'buy'))/COUNT(*)*100,2),'%') AS 购买流失率FROM userbehavior;
收藏流失率: 86.87%, 购买流失率: 1.19%SELECT CONCAT(ROUND(((SELECT COUNT(UserID) FROM userbehavior WHERE Behavior = 'pv')-(SELECT COUNT(UserID)FROM userbehavior WHERE Behavior = 'cart'))/COUNT(*)*100,2),'%') AS 加购流失率,CONCAT(ROUND(((SELECT COUNT(UserID) FROM userbehavior WHERE Behavior = 'cart')-(SELECT COUNT(UserID)FROM userbehavior WHERE Behavior = 'fav'))/COUNT(*)*100,2),'%') AS 收藏流失率,CONCAT(ROUND(((SELECT COUNT(UserID) FROM userbehavior WHERE Behavior = 'fav')-(SELECT COUNT(UserID)FROM userbehavior WHERE Behavior = 'buy'))/COUNT(*)*100,2),'%') AS 购买流失率FROM userbehavior;
加购流失率: 84.48%, 收藏流失率: 2.39%, 购买流失率 : 1.19%SELECT date, COUNT(behavior) as pv, COUNT(DISTINCT UserID) AS uvFROM userbehaviorWHERE Behavior = 'pv'GROUP BY DateORDER BY Date;
我们从上述数据可以看出,在1202和1203这两天,用户无论是点击量还是独立用户访问数,都略高于平时。我们对这两天访问量高做假设检验的分析SELECT UserID,DATEDIFF('2017-12-3',max(date)) as R ,count(Behavior) as Ffrom userbehaviorwhere Behavior ='buy'group by UserIDORDER BY R DESC,F DESC;
随后,我们指定自己的打分规则SELECT a.*,(case when R<=2 then 4when R between 3 and 4 then 3when R between 5 and 6 then 2when R between 7 and 8 then 1 end) as Rscore,(case when F between 1 and 3 then 1when F between 4 and 6 then 2when F between 7 and 9 then 3when F>=10 then 4 end) as Fscorefrom (SELECT UserID,DATEDIFF('2017-12-3',max(date)) as R ,count(Behavior) as Ffrom userbehaviorwhere Behavior ='buy'group by UserID)a;
关键词:数据,分析,用户,平台