时间:2023-06-18 13:45:02 | 来源:网站运营
时间:2023-06-18 13:45:02 来源:网站运营
“水泊梁山“互联网有限公司一百单八将内部社交网络:作者介绍:徐麟,目前就职于杭州唯品会产品技术中心,哥大统计数据狗,从事数据挖掘&分析工作,喜欢用R&Python玩一些不一样的数据
个人公众号:数据森麟(ID:shujusenlin),本站同名专栏作者:数据森麟 。
with open("水浒传全文.txt", encoding='gb18030') as file: shuihu = file.read()shuihu = shuihu.replace('/n','')shuihu_set = shuihu.split(' ')shuihu_set=[k for k in shuihu_set if k!='']songjiang_set=[k for k in shuihu_set if '宋江' in k]haohan = pd.read_excel('水浒人物.xlsx')haohan['出场段落']=0
haohan.sort_values('出场段落',ascending=False,inplace=True)attr = haohan['姓名'][0:10] v1 = haohan['出场段落'][0:10]bar = Bar("水泊梁山年收入TOP10")bar.add("年收入(万)", attr, v1, is_stack=True,is_label_show=True)bar.render('水泊梁山年收入TOP10.html')haohan.sort_values('出场段落',ascending=True,inplace=True)attr = haohan['姓名'][0:10]v1 = haohan['出场段落'][0:10]bar = Bar("水泊梁山年收入BOTTOM10")bar.add("年收入(万)", attr, v1, is_stack=True,is_label_show=True)bar.render('水泊梁山年收入BOTTOM10.html')
net_df = pd.DataFrame(columns=['Source','Target','Weight','Source_Ratio','Target_Ratio'])for i in range(0,107): for j in range(i+1,108): this_weight = len([k for k in shuihu_set if haohan['使用名'][i] in k and haohan['使用名'][j] in k]) net_df=net_df.append({'Source':haohan['姓名'][i],'Target':haohan['姓名'][j], 'Weight':this_weight, 'Source_Ratio':this_weight/haohan['出场段落'][i], 'Target_Ratio':this_weight/haohan['出场段落'][j]}, ignore_index=True) print(str(i)+':'+str(j))
关键词:网络,社交,内部,梁山