Python读取微信聊天记录生成词云

偶然的一个想法，于是便去试了一试python的使用过程

获取数据

获取数据参考https://blog.csdn.net/Kevinxgl/article/details/109992360

电脑安装安卓模拟器(蓝叠)
用电脑微信备份需要的聊天记录至模拟器
模拟器获取root权限找到数据库存放位置/data/data/com/tencent.mm/MicroMsg在这个路径下有两个以数字和字母命名的文件夹，其中有一个存放这数据库EnMicroMsg.db(通常文件大的)复制到pc上备用
获取数据库密码：

机器的IMEI：直接在模拟器上下载IMEI工具
auth_uin :/data/data/com.tencent.mm/shared_prefs路径下的auth_info_key_prefs.xml中value值

计算密码：MD5在线加密工具，IMEI和auth_id,加密方式选择32位小。数据库密码就是前7位

其中的重点就是获取数据库，数据库软甲SqlLite Browser可以百度获取。导出位message时需要仔细寻找

Python对数据进行操作

ide选择PyCharm 库主要用到 jieba pandas WordCloud 等
WordCloud需要自己安装，PyCharm安装错误。

安装WordCloud可以参考->https://blog.csdn.net/qq_31673689/article/details/78745155
停用词表

代码分析见下图

import csv
import re

from matplotlib import pyplot as plt
from wordcloud import WordCloud
from PIL import Image
import numpy as np

import jieba
import pandas as pd
jieba.load_userdict("dict.txt") #载入自定义词典，可以更好的分词

#读取csv文件数据 并且存储为dataframe格式
#datas=pd.read_csv('wx3.csv',usecols=['content'],encoding='utf-8',engine='python')
#=pd.DataFrame(datas)
#print(pf)
#pf.to_csv("wx.txt",encoding='utf_8_sig') 读取的数据写入到txt文件中

#读取txt文件
with open("wx.txt",encoding="utf-8") as f:
   text=f.read()
#正则表达式，去除字母数字等
str = re.sub('[a-zA-Z0-9’!"#$%&\'()*+,-./:;<=>?@，。?★、…【】《》？“”‘’！[\\]^_`{|}~\s]+', "", text)
#进行分词 icut()精确分词
wordlist = jieba.lcut(str)
#读取停词表内容，过滤不必要的数据
stopwords = [line.strip() for line in open('stop.txt', 'r', encoding='utf-8').readlines()]

#循环过滤
outstr = '' # 待返回字符
for word in wordlist:
    if word not in stopwords:
        outstr += word + " "

outstr2=outstr.split()

counts = {}
for word in outstr2:
    if len(word) == 1:    # 单个词语不计算在内
        continue
    else:
        counts[word] = counts.get(word, 0) + 1    # 遍历所有词语，每出现一次其对应的值加 1

print("长度", len(counts.items()))
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)    # 根据词语出现的次数进行从大到小排序

for i in range(10):
    word, count = items[i]
    print("{0:<5}{1:>5}".format(word, count))

#将列表元素以空格拼接起来

jieba_txt = " ".join(outstr2)




#生成词云
background_image = np.array(Image.open('wx.png'))
wordcloud = WordCloud(font_path=r'C:\Windows\Fonts\msyh.ttc',  # 调用系统自带字体(微软雅黑)
                         background_color='white',  # 背景色
                         max_words=400,  # 最大显示单词数
                         max_font_size=60,  # 频率最大单词字体大小
                         mask=background_image  # 自定义显示的效果图
                         ).generate(jieba_txt)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
wordcloud.to_file("词云图片.jpg")