好qq改成想要的文字。寻GG文字

我从2014年就开始做微信公众号内容的批量采集,最开始的目的是为了做一个html5的垃圾内容网站。当时垃圾站采集到的微信公众号的内容很容易在公众号里面传播。当时批量采集特别好做,采集入口是公众号的历史消息页。这个入口到现在也是一样,只不过越来越难采集了。采集的方式也更新换代了好多个版本。后来在2015年html5垃圾站不做了,转向将采集目标定位在本地新闻资讯类公众号,前端显示做成了app。所以就形成了一个可以自动采集公众号内容的新闻app。曾经我一直担心有一天微信技术升级之后无法采集内容了,我的新闻app就失效了。但随着微信不断的技术升级,采集方法也随之升级,反而使我越来越有信心。只要公众号历史消息页存在,就能批量采集到内容。所以今天决定将采集方法整理之后写下来。我的方法来源于许多同行的分享精神,所以我也会延续这个精神,将我的成果分享出来。&p&&b&本篇文章将持续更新,你所看到的内容将保证在看到的时间是可用的。&/b&&/p&&p&首先我们来看一个微信公众号历史消息页面的链接地址:&/p&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&http://mp.weixin.qq.com/mp/getmasssendmsg?__biz=MjM5MzczNjY2NA==#wechat_webview_type=1&wechat_redirect
&/code&&/pre&&/div&&p&=========日更新=========&/p&&br&&p&现在根据不同的微信个人号,会出现两种不同的历史消息页面地址,下面是另一种历史消息页的地址,第一种地址的链接会在anyproxy中显示302跳转:&/p&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=MzA3NDk5MjYzNg==&scene=124#wechat_redirect
&/code&&/pre&&/div&&br&&p&第一种链接地址的页面样式:&/p&&p&&figure&&img src=&https://pic4.zhimg.com/v2-f63c8dcbab824_b.jpg& data-rawwidth=&750& data-rawheight=&1334& class=&origin_image zh-lightbox-thumb& width=&750& data-original=&https://pic4.zhimg.com/v2-f63c8dcbab824_r.jpg&&&/figure&第二种链接地址的页面样式:&/p&&p&&figure&&img src=&https://pic2.zhimg.com/v2-a14983b45aad17a068dc636b3556f99f_b.jpg& data-rawwidth=&640& data-rawheight=&1136& class=&origin_image zh-lightbox-thumb& width=&640& data-original=&https://pic2.zhimg.com/v2-a14983b45aad17a068dc636b3556f99f_r.jpg&&&/figure&根据目前掌握的信息,两种页面形式无规律的出现在不同的微信号中,有的微信号始终是第一种页面形式,有的就始终是第二种页面形式。&/p&&p&上面这个链接是一个微信公众号历史消息页面的真实链接,但是我们把这个链接输入到浏览器中会显示:请从微信客户端访问。这是因为实际上这个链接地址还需要几个参数才能正常显示内容。下面我们就来看看可以正常显示内容的完整链接是什么样的:&br&&/p&&div class=&highlight&&&pre&&code class=&language-js&&&span&&/span&&span class=&c1&&//第一种链接&/span&
&span class=&nx&&http&/span&&span class=&o&&:&/span&&span class=&c1&&//mp.weixin.qq.com/mp/getmasssendmsg?__biz=MjM5NTM1NjczMw==&uin=NzM4MTk1ODgx&key=a226a081696afed0d9dfa6e5c78ad4e9a2b94aeaad6ac4dd87de3e56fe9cc2052f68aca6e99fd8e4c29abe4a049d1a71eeb2be5&devicetype=android-17&version=2605033c&lang=zh_CN&nettype=WIFI&ascene=1&pass_ticket=zbA7PswOPKySRpyEYI5kDCjRiljxcpzdbTuVMauFGemgdp8R1DY1uQY49srehWab&wx_header=1&/span&
&span class=&c1&&//第二种&/span&
&span class=&nx&&http&/span&&span class=&o&&:&/span&&span class=&c1&&//mp.weixin.qq.com/mp/profile_ext?action=home&__biz=MzA3NDk5MjYzNg==&scene=124&uin=NzM4MTk1ODgx&key=2a0324183dbd55a2680d11ccbaa34cdb349ee9be58f5b666092ddb17adf8a88dccfd511c9e118aa324a38903f79cff940cf749ecd5a&devicetype=android-17&version=2605033c&lang=zh_CN&nettype=WIFI&a8scene=3&pass_ticket=Fo3zjtJcbPfijNHKUIQbV%2BeHsAqhbjJCwzTfV48u%2FCZRRGTmI8oqmHDxxfEL8ke%2B&wx_header=1&/span&
&/code&&/pre&&/div&&p&&b&这个地址是通过微信客户端打开历史消息页面之后,再使用后面介绍的代理服务器软件获取到的。&/b&这里面有几个参数:&/p&&p&action=;__biz=;uin=;key=;devicetype=;version=;lang=;nettype=;scene=;pass_ticket=;wx_header=;&/p&&p&其中重要的参数是:__uin=;key=;pass_ticket=;这4个参数。&/p&&p&__biz是公众号的一个类似id的参数,每个公众号拥有一个微信的biz,目前极小概率会发生公众号的biz会变化的事件;&/p&&p&剩下的3个参数是有关用户的id和令牌票据之类的意思,&b&这3个参数的值是通过微信的客户端生成后自动补充到地址栏中的。&/b&所以我们想采集公众号就必须通过一个微信客户端app。在以前的微信版本中这3个参数还可以获取一次之后在有效期之内多个公众号通用。现在的版本已经是每次访问一个公众号都会更换参数值。&/p&&p&我现在所使用的方法只需要关注__biz这个参数就可以了。&/p&&br&我的采集系统由以下几部分组成:&p&1、一个微信客户端:可以是一台手机安装了微信的app,或者是用电脑中的安卓模拟器。经过实测ios的微信客户端在批量采集过程中崩溃率高于安卓系统。为了降低成本,我使用的是安卓模拟器。&/p&&figure&&img src=&https://pic4.zhimg.com/v2-414d43c63db7b1d52c74c64b31282fe7_b.jpg& data-rawwidth=&431& data-rawheight=&653& class=&origin_image zh-lightbox-thumb& width=&431& data-original=&https://pic4.zhimg.com/v2-414d43c63db7b1d52c74c64b31282fe7_r.jpg&&&/figure&&br&&p&2、一个微信个人号:为了采集内容不仅需要微信客户端,还要有一个微信个人号专门用于采集,因为这个微信号就干不了其它事情了。&/p&&p&3、本地代理服务器系统:目前使用的方法是通过Anyproxy代理服务器将公众号历史消息页面中的文章列表发送到自己的服务器上。具体安装设置方法在后面详细介绍。&/p&&p&4、文章列表分析与入库系统:我用的是php语言编写的,后文将详细介绍如何分析文章列表和建立采集队列实现批量采集内容。&/p&&p&步骤&/p&&p&一、安装模拟器或使用手机安装微信客户端app,申请微信个人号并登录到app上面。这一点就不过多介绍了,大家都会。&/p&&p&二、代理服务器系统安装&/p&&p&目前我使用的是Anyproxy,&a href=&https://link.zhihu.com/?target=http%3A//anyproxy.io& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&AnyProxy&/a& 。这个软件的特点是可以获取到https链接的内容。在2016年年初的时候微信公众号和微信文章开始使用https链接。并且Anyproxy可以通过修改rule配置实现向公众号的页面中插入脚本代码。下面开始介绍安装与配置过程。&/p&&p&1、安装 &a href=&https://link.zhihu.com/?target=http%3A//nodejs.org/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&NodeJS&/a&&/p&&p&2、在命令行或者终端运行 npm install -g anyproxy,mac系统需要加上sudo;&/p&&p&3、生成RootCA,https需要这个证书:运行命令sudo anyproxy --root(windows可能不需要sudo);&/p&&p&4、启动anyproxy运行命令:sudo anyproxy -i;参数-i是解析HTTPS的意思;&/p&&br&&p&5、安装证书,在手机或安卓模拟器中安装证书:&/p&&ul&&li&方法一: 启动anyproxy,浏览器打开 &a href=&https://link.zhihu.com/?target=http%3A//localhost%3A8002/fetchCrtFile& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&localhost:8002/fetchCrt&/span&&span class=&invisible&&File&/span&&span class=&ellipsis&&&/span&&/a& ,能获取rootCA.crt文件&br&&/li&&li&方法二:启动anyproxy,&a href=&https://link.zhihu.com/?target=http%3A//localhost%3A8002/qr_root& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&localhost:8002/qr_root&/span&&span class=&invisible&&&/span&&/a& 可以获取证书路径的二维码,移动端安装时会比较便捷&br&&/li&&li&建议通过二维码将证书安装到手机中。&/li&&/ul&&br&&p&6、设置代理:安卓模拟器的代理服务器地址是wifi链接的网关,可以通过吧dhcp设置为静态后看到网关地址,看完后别忘了再设置为自动。手机中的代理服务器地址就是运行anyproxy的电脑的ip地址。代理服务器默认端口是8001;&/p&&figure&&img src=&https://pic2.zhimg.com/v2-f8d269f0567efeee6b1f83_b.jpg& data-rawwidth=&431& data-rawheight=&653& class=&origin_image zh-lightbox-thumb& width=&431& data-original=&https://pic2.zhimg.com/v2-f8d269f0567efeee6b1f83_r.jpg&&&/figure&&p&现在打开微信,点击到任意一个公众号历史消息或文章中,在终端都可以看到响应的代码滚动。如果没有出现,请检查手机的代理设置是否正确。&figure&&img src=&https://pic4.zhimg.com/v2-ddc05be32f5dc14dbc827_b.jpg& data-rawwidth=&1193& data-rawheight=&604& class=&origin_image zh-lightbox-thumb& width=&1193& data-original=&https://pic4.zhimg.com/v2-ddc05be32f5dc14dbc827_r.jpg&&&/figure&&/p&&p&现在打开浏览器地址&a href=&https://link.zhihu.com/?target=http%3A//localhost%3A8002& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&localhost:8002&/span&&span class=&invisible&&&/span&&/a& 可以看到anyproxy的web界面。从微信中点开一个历史消息页面,然后再看浏览器的web界面,会滚动出现历史消息页面的地址。&/p&&p&&figure&&img src=&https://pic2.zhimg.com/v2-45bcbcfad12d8_b.jpg& data-rawwidth=&1165& data-rawheight=&686& class=&origin_image zh-lightbox-thumb& width=&1165& data-original=&https://pic2.zhimg.com/v2-45bcbcfad12d8_r.jpg&&&/figure&以/mp/getmasssendmsg开头的网址就是微信历史消息页面。左边一个小锁头表示这个页面是https加密的。现在我们点击一下这一行;&br&&/p&&p&=========日更新=========&br&&/p&&p&部分微信号以/mp/getmasssendmsg开头的网址会出现302跳转,跳转到了/mp/profile_ext?action=home开头的地址。所以点开这个地址才可以看到内容。&/p&&br&&p&&figure&&img src=&https://pic2.zhimg.com/v2-290cede650af43ba98f6f2f5ae81d06b_b.jpg& data-rawwidth=&1165& data-rawheight=&686& class=&origin_image zh-lightbox-thumb& width=&1165& data-original=&https://pic2.zhimg.com/v2-290cede650af43ba98f6f2f5ae81d06b_r.jpg&&&/figure&右边如果出现了html的文件内容则表示解密成功。如果没有内容,请检查anyproxy的运行模式是否有参数i,是否生成了ca证书,手机是否正确安装证书。&/p&&p&现在我们的手机中的所有内容都已经可以明文通过代理服务器了。下面我们要修改配置代理服务器,使公众号内容被获取到。&/p&&p&一、找到配置文件:&/p&&p&mac系统中配置文件的位置在/usr/local/lib/node_modules/anyproxy/lib/;windows系统请原谅我暂时不知道。应该可以根据类似mac的文件夹地址找到这个目录。&/p&&p&二、修改文件rule_default.js&/p&&p&找到replaceServerResDataAsync: function(req,res,serverResData,callback) 函数&/p&&p&修改函数内容(请注意详细阅读注释,这里只是介绍原理,理解后根据自己的条件修改内容):&/p&&p&=========日更新=========&br&&/p&&br&&p&因为出现了两种页面形式,且在不同的微信号中始终显示同一种页面形式,但为了能兼容两种页面形式,以下的代码会保留两种页面形式的判断,你也可以根据自己的页面形式去掉li&/p&&div class=&highlight&&&pre&&code class=&language-js&&&span&&/span&&span class=&nx&&replaceServerResDataAsync&/span&&span class=&o&&:&/span& &span class=&kd&&function&/span&&span class=&p&&(&/span&&span class=&nx&&req&/span&&span class=&p&&,&/span&&span class=&nx&&res&/span&&span class=&p&&,&/span&&span class=&nx&&serverResData&/span&&span class=&p&&,&/span&&span class=&nx&&callback&/span&&span class=&p&&){&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&sr&&/mp\/getmasssendmsg/i&/span&&span class=&p&&.&/span&&span class=&nx&&test&/span&&span class=&p&&(&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&)){&/span&&span class=&c1&&//当链接地址为公众号历史消息页面时(第一种页面形式)&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&.&/span&&span class=&nx&&toString&/span&&span class=&p&&()&/span& &span class=&o&&!==&/span& &span class=&s2&&&&&/span&&span class=&p&&){&/span&
&span class=&k&&try&/span& &span class=&p&&{&/span&&span class=&c1&&//防止报错退出程序&/span&
&span class=&kd&&var&/span& &span class=&nx&&reg&/span& &span class=&o&&=&/span& &span class=&sr&&/msgList = (.*?);/&/span&&span class=&p&&;&/span&&span class=&c1&&//定义历史消息正则匹配规则&/span&
&span class=&kd&&var&/span& &span class=&nx&&ret&/span& &span class=&o&&=&/span& &span class=&nx&&reg&/span&&span class=&p&&.&/span&&span class=&nx&&exec&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&.&/span&&span class=&nx&&toString&/span&&span class=&p&&());&/span&&span class=&c1&&//转换变量为string&/span&
&span class=&nx&&HttpPost&/span&&span class=&p&&(&/span&&span class=&nx&&ret&/span&&span class=&p&&[&/span&&span class=&mi&&1&/span&&span class=&p&&],&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&,&/span&&span class=&s2&&&getMsgJson.php&&/span&&span class=&p&&);&/span&&span class=&c1&&//这个函数是后文定义的,将匹配到的历史消息json发送到自己的服务器&/span&
&span class=&kd&&var&/span& &span class=&nx&&http&/span& &span class=&o&&=&/span& &span class=&nx&&require&/span&&span class=&p&&(&/span&&span class=&s1&&'http'&/span&&span class=&p&&);&/span&
&span class=&nx&&http&/span&&span class=&p&&.&/span&&span class=&nx&&get&/span&&span class=&p&&(&/span&&span class=&s1&&'http://xxx.com/getWxHis.php'&/span&&span class=&p&&,&/span& &span class=&kd&&function&/span&&span class=&p&&(&/span&&span class=&nx&&res&/span&&span class=&p&&)&/span& &span class=&p&&{&/span&&span class=&c1&&//这个地址是自己服务器上的一个程序,目的是为了获取到下一个链接地址,将地址放在一个js脚本中,将页面自动跳转到下一页。后文将介绍getWxHis.php的原理。&/span&
&span class=&nx&&res&/span&&span class=&p&&.&/span&&span class=&nx&&on&/span&&span class=&p&&(&/span&&span class=&s1&&'data'&/span&&span class=&p&&,&/span& &span class=&kd&&function&/span&&span class=&p&&(&/span&&span class=&nx&&chunk&/span&&span class=&p&&){&/span&
&span class=&nx&&callback&/span&&span class=&p&&(&/span&&span class=&nx&&chunk&/span&&span class=&o&&+&/span&&span class=&nx&&serverResData&/span&&span class=&p&&);&/span&&span class=&c1&&//将返回的代码插入到历史消息页面中,并返回显示出来&/span&
&span class=&p&&})&/span&
&span class=&p&&});&/span&
&span class=&p&&}&/span&&span class=&k&&catch&/span&&span class=&p&&(&/span&&span class=&nx&&e&/span&&span class=&p&&){&/span&&span class=&c1&&//如果上面的正则没有匹配到,那么这个页面内容可能是公众号历史消息页面向下翻动的第二页,因为历史消息第一页是html格式的,第二页就是json格式的。&/span&
&span class=&k&&try&/span& &span class=&p&&{&/span&
&span class=&kd&&var&/span& &span class=&nx&&json&/span& &span class=&o&&=&/span& &span class=&nx&&JSON&/span&&span class=&p&&.&/span&&span class=&nx&&parse&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&.&/span&&span class=&nx&&toString&/span&&span class=&p&&());&/span&
&span class=&k&&if&/span& &span class=&p&&(&/span&&span class=&nx&&json&/span&&span class=&p&&.&/span&&span class=&nx&&general_msg_list&/span& &span class=&o&&!=&/span& &span class=&p&&[])&/span& &span class=&p&&{&/span&
&span class=&nx&&HttpPost&/span&&span class=&p&&(&/span&&span class=&nx&&json&/span&&span class=&p&&.&/span&&span class=&nx&&general_msg_list&/span&&span class=&p&&,&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&,&/span&&span class=&s2&&&getMsgJson.php&&/span&&span class=&p&&);&/span&&span class=&c1&&//这个函数和上面的一样是后文定义的,将第二页历史消息的json发送到自己的服务器&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&&span class=&k&&catch&/span&&span class=&p&&(&/span&&span class=&nx&&e&/span&&span class=&p&&){&/span&
&span class=&nx&&console&/span&&span class=&p&&.&/span&&span class=&nx&&log&/span&&span class=&p&&(&/span&&span class=&nx&&e&/span&&span class=&p&&);&/span&&span class=&c1&&//错误捕捉&/span&
&span class=&p&&}&/span&
&span class=&nx&&callback&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&);&/span&&span class=&c1&&//直接返回第二页json内容&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&&span class=&k&&else&/span& &span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&sr&&/mp\/profile_ext\?action=home/i&/span&&span class=&p&&.&/span&&span class=&nx&&test&/span&&span class=&p&&(&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&)){&/span&&span class=&c1&&//当链接地址为公众号历史消息页面时(第二种页面形式)&/span&
&span class=&k&&try&/span& &span class=&p&&{&/span&
&span class=&kd&&var&/span& &span class=&nx&&reg&/span& &span class=&o&&=&/span& &span class=&sr&&/var msgList = \'(.*?)\';/&/span&&span class=&p&&;&/span&&span class=&c1&&//定义历史消息正则匹配规则(和第一种页面形式的正则不同)&/span&
&span class=&kd&&var&/span& &span class=&nx&&ret&/span& &span class=&o&&=&/span& &span class=&nx&&reg&/span&&span class=&p&&.&/span&&span class=&nx&&exec&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&.&/span&&span class=&nx&&toString&/span&&span class=&p&&());&/span&&span class=&c1&&//转换变量为string&/span&
&span class=&nx&&HttpPost&/span&&span class=&p&&(&/span&&span class=&nx&&ret&/span&&span class=&p&&[&/span&&span class=&mi&&1&/span&&span class=&p&&],&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&,&/span&&span class=&s2&&&getMsgJson.php&&/span&&span class=&p&&);&/span&&span class=&c1&&//这个函数是后文定义的,将匹配到的历史消息json发送到自己的服务器&/span&
&span class=&kd&&var&/span& &span class=&nx&&http&/span& &span class=&o&&=&/span& &span class=&nx&&require&/span&&span class=&p&&(&/span&&span class=&s1&&'http'&/span&&span class=&p&&);&/span&
&span class=&nx&&http&/span&&span class=&p&&.&/span&&span class=&nx&&get&/span&&span class=&p&&(&/span&&span class=&s1&&'http://xxx.com/getWxHis'&/span&&span class=&p&&,&/span& &span class=&kd&&function&/span&&span class=&p&&(&/span&&span class=&nx&&res&/span&&span class=&p&&)&/span& &span class=&p&&{&/span&&span class=&c1&&//这个地址是自己服务器上的一个程序,目的是为了获取到下一个链接地址,将地址放在一个js脚本中,将页面自动跳转到下一页。后文将介绍getWxHis.php的原理。&/span&
&span class=&nx&&res&/span&&span class=&p&&.&/span&&span class=&nx&&on&/span&&span class=&p&&(&/span&&span class=&s1&&'data'&/span&&span class=&p&&,&/span& &span class=&kd&&function&/span&&span class=&p&&(&/span&&span class=&nx&&chunk&/span&&span class=&p&&){&/span&
&span class=&nx&&callback&/span&&span class=&p&&(&/span&&span class=&nx&&chunk&/span&&span class=&o&&+&/span&&span class=&nx&&serverResData&/span&&span class=&p&&);&/span&&span class=&c1&&//将返回的代码插入到历史消息页面中,并返回显示出来&/span&
&span class=&p&&})&/span&
&span class=&p&&});&/span&
&span class=&p&&}&/span&&span class=&k&&catch&/span&&span class=&p&&(&/span&&span class=&nx&&e&/span&&span class=&p&&){&/span&
&span class=&nx&&callback&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&);&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&&span class=&k&&else&/span& &span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&sr&&/mp\/profile_ext\?action=getmsg/i&/span&&span class=&p&&.&/span&&span class=&nx&&test&/span&&span class=&p&&(&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&)){&/span&&span class=&c1&&//第二种页面表现形式的向下翻页后的json&/span&
&span class=&k&&try&/span& &span class=&p&&{&/span&
&span class=&kd&&var&/span& &span class=&nx&&json&/span& &span class=&o&&=&/span& &span class=&nx&&JSON&/span&&span class=&p&&.&/span&&span class=&nx&&parse&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&.&/span&&span class=&nx&&toString&/span&&span class=&p&&());&/span&
&span class=&k&&if&/span& &span class=&p&&(&/span&&span class=&nx&&json&/span&&span class=&p&&.&/span&&span class=&nx&&general_msg_list&/span& &span class=&o&&!=&/span& &span class=&p&&[])&/span& &span class=&p&&{&/span&
&span class=&nx&&HttpPost&/span&&span class=&p&&(&/span&&span class=&nx&&json&/span&&span class=&p&&.&/span&&span class=&nx&&general_msg_list&/span&&span class=&p&&,&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&,&/span&&span class=&s2&&&getMsgJson.php&&/span&&span class=&p&&);&/span&&span class=&c1&&//这个函数和上面的一样是后文定义的,将第二页历史消息的json发送到自己的服务器&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&&span class=&k&&catch&/span&&span class=&p&&(&/span&&span class=&nx&&e&/span&&span class=&p&&){&/span&
&span class=&nx&&console&/span&&span class=&p&&.&/span&&span class=&nx&&log&/span&&span class=&p&&(&/span&&span class=&nx&&e&/span&&span class=&p&&);&/span&
&span class=&p&&}&/span&
&span class=&nx&&callback&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&);&/span&
&span class=&p&&}&/span&&span class=&k&&else&/span& &span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&sr&&/mp\/getappmsgext/i&/span&&span class=&p&&.&/span&&span class=&nx&&test&/span&&span class=&p&&(&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&)){&/span&&span class=&c1&&//当链接地址为公众号文章阅读量和点赞量时&/span&
&span class=&k&&try&/span& &span class=&p&&{&/span&
&span class=&nx&&HttpPost&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&,&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&,&/span&&span class=&s2&&&getMsgExt.php&&/span&&span class=&p&&);&/span&&span class=&c1&&//函数是后文定义的,功能是将文章阅读量点赞量的json发送到服务器&/span&
&span class=&p&&}&/span&&span class=&k&&catch&/span&&span class=&p&&(&/span&&span class=&nx&&e&/span&&span class=&p&&){&/span&
&span class=&p&&}&/span&
&span class=&nx&&callback&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&);&/span&
&span class=&p&&}&/span&&span class=&k&&else&/span& &span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&sr&&/s\?__biz/i&/span&&span class=&p&&.&/span&&span class=&nx&&test&/span&&span class=&p&&(&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&)&/span& &span class=&o&&||&/span& &span class=&sr&&/mp\/rumor/i&/span&&span class=&p&&.&/span&&span class=&nx&&test&/span&&span class=&p&&(&/span&&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&url&/span&&span class=&p&&)){&/span&&span class=&c1&&//当链接地址为公众号文章时(rumor这个地址是公众号文章被辟谣了)&/span&
&span class=&k&&try&/span& &span class=&p&&{&/span&
&span class=&kd&&var&/span& &span class=&nx&&http&/span& &span class=&o&&=&/span& &span class=&nx&&require&/span&&span class=&p&&(&/span&&span class=&s1&&'http'&/span&&span class=&p&&);&/span&
&span class=&nx&&http&/span&&span class=&p&&.&/span&&span class=&nx&&get&/span&&span class=&p&&(&/span&&span class=&s1&&'http://xxx.com/getWxPost.php'&/span&&span class=&p&&,&/span& &span class=&kd&&function&/span&&span class=&p&&(&/span&&span class=&nx&&res&/span&&span class=&p&&)&/span& &span class=&p&&{&/span&&span class=&c1&&//这个地址是自己服务器上的另一个程序,目的是为了获取到下一个链接地址,将地址放在一个js脚本中,将页面自动跳转到下一页。后文将介绍getWxPost.php的原理。&/span&
&span class=&nx&&res&/span&&span class=&p&&.&/span&&span class=&nx&&on&/span&&span class=&p&&(&/span&&span class=&s1&&'data'&/span&&span class=&p&&,&/span& &span class=&kd&&function&/span&&span class=&p&&(&/span&&span class=&nx&&chunk&/span&&span class=&p&&){&/span&
&span class=&nx&&callback&/span&&span class=&p&&(&/span&&span class=&nx&&chunk&/span&&span class=&o&&+&/span&&span class=&nx&&serverResData&/span&&span class=&p&&);&/span&
&span class=&p&&})&/span&
&span class=&p&&});&/span&
&span class=&p&&}&/span&&span class=&k&&catch&/span&&span class=&p&&(&/span&&span class=&nx&&e&/span&&span class=&p&&){&/span&
&span class=&nx&&callback&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&);&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&&span class=&k&&else&/span&&span class=&p&&{&/span&
&span class=&nx&&callback&/span&&span class=&p&&(&/span&&span class=&nx&&serverResData&/span&&span class=&p&&);&/span&
&span class=&p&&}&/span&
&span class=&p&&},&/span&
&/code&&/pre&&/div&&br&&p&上面这段代码是利用anyproxy可以修改返回页面内容的功能,向页面注入脚本,和将页面内容发送到服务器上。使用这个原理来批量采集公众号内容和阅读量。这段脚本中自定义了一个函数,下面详细介绍:&/p&&p&在rule_default.js文件末尾添加以下代码:&/p&&div class=&highlight&&&pre&&code class=&language-js&&&span&&/span&&span class=&kd&&function&/span& &span class=&nx&&HttpPost&/span&&span class=&p&&(&/span&&span class=&nx&&str&/span&&span class=&p&&,&/span&&span class=&nx&&url&/span&&span class=&p&&,&/span&&span class=&nx&&path&/span&&span class=&p&&)&/span& &span class=&p&&{&/span&&span class=&c1&&//将json发送到服务器,str为json内容,url为历史消息页面地址,path是接收程序的路径和文件名&/span&
&span class=&kd&&var&/span& &span class=&nx&&http&/span& &span class=&o&&=&/span& &span class=&nx&&require&/span&&span class=&p&&(&/span&&span class=&s1&&'http'&/span&&span class=&p&&);&/span&
&span class=&kd&&var&/span& &span class=&nx&&data&/span& &span class=&o&&=&/span& &span class=&p&&{&/span&
&span class=&nx&&str&/span&&span class=&o&&:&/span& &span class=&nb&&encodeURIComponent&/span&&span class=&p&&(&/span&&span class=&nx&&str&/span&&span class=&p&&),&/span&
&span class=&nx&&url&/span&&span class=&o&&:&/span& &span class=&nb&&encodeURIComponent&/span&&span class=&p&&(&/span&&span class=&nx&&url&/span&&span class=&p&&)&/span&
&span class=&p&&};&/span&
&span class=&nx&&content&/span& &span class=&o&&=&/span& &span class=&nx&&require&/span&&span class=&p&&(&/span&&span class=&s1&&'querystring'&/span&&span class=&p&&).&/span&&span class=&nx&&stringify&/span&&span class=&p&&(&/span&&span class=&nx&&data&/span&&span class=&p&&);&/span&
&span class=&kd&&var&/span& &span class=&nx&&options&/span& &span class=&o&&=&/span& &span class=&p&&{&/span&
&span class=&nx&&method&/span&&span class=&o&&:&/span& &span class=&s2&&&POST&&/span&&span class=&p&&,&/span&
&span class=&nx&&host&/span&&span class=&o&&:&/span& &span class=&s2&&&www.xxx.com&&/span&&span class=&p&&,&/span&&span class=&c1&&//注意没有http://,这是服务器的域名。&/span&
&span class=&nx&&port&/span&&span class=&o&&:&/span& &span class=&mi&&80&/span&&span class=&p&&,&/span&
&span class=&nx&&path&/span&&span class=&o&&:&/span& &span class=&nx&&path&/span&&span class=&p&&,&/span&&span class=&c1&&//接收程序的路径和文件名&/span&
&span class=&nx&&headers&/span&&span class=&o&&:&/span& &span class=&p&&{&/span&
&span class=&s1&&'Content-Type'&/span&&span class=&o&&:&/span& &span class=&s1&&'application/x-www-form- charset=UTF-8'&/span&&span class=&p&&,&/span&
&span class=&s2&&&Content-Length&&/span&&span class=&o&&:&/span& &span class=&nx&&content&/span&&span class=&p&&.&/span&&span class=&nx&&length&/span&
&span class=&p&&}&/span&
&span class=&p&&};&/span&
&span class=&kd&&var&/span& &span class=&nx&&req&/span& &span class=&o&&=&/span& &span class=&nx&&http&/span&&span class=&p&&.&/span&&span class=&nx&&request&/span&&span class=&p&&(&/span&&span class=&nx&&options&/span&&span class=&p&&,&/span& &span class=&kd&&function&/span& &span class=&p&&(&/span&&span class=&nx&&res&/span&&span class=&p&&)&/span& &span class=&p&&{&/span&
&span class=&nx&&res&/span&&span class=&p&&.&/span&&span class=&nx&&setEncoding&/span&&span class=&p&&(&/span&&span class=&s1&&'utf8'&/span&&span class=&p&&);&/span&
&span class=&nx&&res&/span&&span class=&p&&.&/span&&span class=&nx&&on&/span&&span class=&p&&(&/span&&span class=&s1&&'data'&/span&&span class=&p&&,&/span& &span class=&kd&&function&/span& &span class=&p&&(&/span&&span class=&nx&&chunk&/span&&span class=&p&&)&/span& &span class=&p&&{&/span&
&span class=&nx&&console&/span&&span class=&p&&.&/span&&span class=&nx&&log&/span&&span class=&p&&(&/span&&span class=&s1&&'BODY: '&/span& &span class=&o&&+&/span& &span class=&nx&&chunk&/span&&span class=&p&&);&/span&
&span class=&p&&});&/span&
&span class=&p&&});&/span&
&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&on&/span&&span class=&p&&(&/span&&span class=&s1&&'error'&/span&&span class=&p&&,&/span& &span class=&kd&&function&/span& &span class=&p&&(&/span&&span class=&nx&&e&/span&&span class=&p&&)&/span& &span class=&p&&{&/span&
&span class=&nx&&console&/span&&span class=&p&&.&/span&&span class=&nx&&log&/span&&span class=&p&&(&/span&&span class=&s1&&'problem with request: '&/span& &span class=&o&&+&/span& &span class=&nx&&e&/span&&span class=&p&&.&/span&&span class=&nx&&message&/span&&span class=&p&&);&/span&
&span class=&p&&});&/span&
&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&write&/span&&span class=&p&&(&/span&&span class=&nx&&content&/span&&span class=&p&&);&/span&
&span class=&nx&&req&/span&&span class=&p&&.&/span&&span class=&nx&&end&/span&&span class=&p&&();&/span&
&span class=&p&&}&/span&
&/code&&/pre&&/div&&p&上面就是rule规则修改的主要部分,需要将json内容发送到自己的服务器,还要从服务器获取到下一页的跳转地址。这就涉及到了四个php文件:getMsgJson.php、getMsgExt.php、getWxHis.php、getWxPost.php&/p&&br&&p&在详细介绍这4个php文件之前,为了提高采集系统性能和降低崩溃率,我们还可以进行一些修改:&/p&&p&安卓模拟器经常会访问一些&a href=&https://link.zhihu.com/?target=http%3A//google.com& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&google.com&/span&&span class=&invisible&&&/span&&/a&的地址,这样会导致anyproxy死机,找到函数replaceRequestOption : function(req,option),修改函数内容:&/p&&div class=&highlight&&&pre&&code class=&language-js&&&span&&/span&&span class=&nx&&replaceRequestOption&/span& &span class=&o&&:&/span& &span class=&kd&&function&/span&&span class=&p&&(&/span&&span class=&nx&&req&/span&&span class=&p&&,&/span&&span class=&nx&&option&/span&&span class=&p&&){&/span&
&span class=&kd&&var&/span& &span class=&nx&&newOption&/span& &span class=&o&&=&/span& &span class=&nx&&option&/span&&span class=&p&&;&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&sr&&/google/i&/span&&span class=&p&&.&/span&&span class=&nx&&test&/span&&span class=&p&&(&/span&&span class=&nx&&newOption&/span&&span class=&p&&.&/span&&span class=&nx&&headers&/span&&span class=&p&&.&/span&&span class=&nx&&host&/span&&span class=&p&&)){&/span&
&span class=&nx&&newOption&/span&&span class=&p&&.&/span&&span class=&nx&&hostname&/span& &span class=&o&&=&/span& &span class=&s2&&&www.baidu.com&&/span&&span class=&p&&;&/span&
&span class=&nx&&newOption&/span&&span class=&p&&.&/span&&span class=&nx&&port&/span&
&span class=&o&&=&/span& &span class=&s2&&&80&&/span&&span class=&p&&;&/span&
&span class=&p&&}&/span&
&span class=&k&&return&/span& &span class=&nx&&newOption&/span&&span class=&p&&;&/span&
&span class=&p&&},&/span&
&/code&&/pre&&/div&&p&以上就是针对anyproxy的rule文件的修改配置,配置修改完成之后,重新启动anyproxy。mac系统里按control+c中断程序,再输入命令sudo anyproxy -i启动;如果启动报错,可能是程序没有退出干净,端口被占用。这时输入命令ps -a查看占用的pid,再输入命令“kill -9 pid”这里将pid替换成查询到的pid号码。杀死进程之后就可以启动anyproxy了。还是那句话windows的命令请原谅我不太熟悉。&/p&&br&&p&接下来详细介绍服务器上接收程序的设计原理:&/p&&p&(以下代码并不是直接可以用的,只是介绍原理,其中一部分需要根据自己的服务器数据库框架进行编写)&/p&&p&1、getMsgJson.php:这个程序负责接收历史消息的json并解析后存入数据库&/p&&div class=&highlight&&&pre&&code class=&language-php&&&span&&/span&&span class=&cp&&&?&/span&
&span class=&nv&&$str&/span& &span class=&o&&=&/span& &span class=&nv&&$_POST&/span&&span class=&p&&[&/span&&span class=&s1&&'str'&/span&&span class=&p&&];&/span&
&span class=&nv&&$url&/span& &span class=&o&&=&/span& &span class=&nv&&$_POST&/span&&span class=&p&&[&/span&&span class=&s1&&'url'&/span&&span class=&p&&];&/span&&span class=&c1&&//先获取到两个POST变量&/span&
&span class=&c1&&//先针对url参数进行操作&/span&
&span class=&nb&&parse_str&/span&&span class=&p&&(&/span&&span class=&nb&&parse_url&/span&&span class=&p&&(&/span&&span class=&nb&&htmlspecialchars_decode&/span&&span class=&p&&(&/span&&span class=&nb&&urldecode&/span&&span class=&p&&(&/span&&span class=&nv&&$url&/span&&span class=&p&&)),&/span&&span class=&nx&&PHP_URL_QUERY&/span& &span class=&p&&),&/span&&span class=&nv&&$query&/span&&span class=&p&&);&/span&&span class=&c1&&//解析url地址&/span&
&span class=&nv&&$biz&/span& &span class=&o&&=&/span& &span class=&nv&&$query&/span&&span class=&p&&[&/span&&span class=&s1&&'__biz'&/span&&span class=&p&&];&/span&&span class=&c1&&//得到公众号的biz&/span&
&span class=&c1&&//接下来进行以下操作&/span&
&span class=&c1&&//从数据库中查询biz是否已经存在,如果不存在则插入,这代表着我们新添加了一个采集目标公众号。&/span&
&span class=&c1&&//再解析str变量&/span&
&span class=&nv&&$json&/span& &span class=&o&&=&/span& &span class=&nb&&json_decode&/span&&span class=&p&&(&/span&&span class=&nv&&$str&/span&&span class=&p&&,&/span&&span class=&k&&true&/span&&span class=&p&&);&/span&&span class=&c1&&//首先进行json_decode&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&o&&!&/span&&span class=&nv&&$json&/span&&span class=&p&&){&/span&
&span class=&nv&&$json&/span& &span class=&o&&=&/span& &span class=&nb&&json_decode&/span&&span class=&p&&(&/span&&span class=&nb&&htmlspecialchars_decode&/span&&span class=&p&&(&/span&&span class=&nv&&$str&/span&&span class=&p&&),&/span&&span class=&k&&true&/span&&span class=&p&&);&/span&&span class=&c1&&//如果不成功,就增加一步htmlspecialchars_decode&/span&
&span class=&p&&}&/span&
&span class=&k&&foreach&/span&&span class=&p&&(&/span&&span class=&nv&&$json&/span&&span class=&p&&[&/span&&span class=&s1&&'list'&/span&&span class=&p&&]&/span& &span class=&k&&as&/span& &span class=&nv&&$k&/span&&span class=&o&&=&&/span&&span class=&nv&&$v&/span&&span class=&p&&){&/span&
&span class=&nv&&$type&/span& &span class=&o&&=&/span& &span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'comm_msg_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'type'&/span&&span class=&p&&];&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&nv&&$type&/span&&span class=&o&&==&/span&&span class=&mi&&49&/span&&span class=&p&&){&/span&&span class=&c1&&//type=49代表是图文消息&/span&
&span class=&nv&&$content_url&/span& &span class=&o&&=&/span& &span class=&nb&&str_replace&/span&&span class=&p&&(&/span&&span class=&s2&&&&/span&&span class=&se&&\\&/span&&span class=&s2&&&&/span&&span class=&p&&,&/span& &span class=&s2&&&&&/span&&span class=&p&&,&/span& &span class=&nb&&htmlspecialchars_decode&/span&&span class=&p&&(&/span&&span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'app_msg_ext_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'content_url'&/span&&span class=&p&&]));&/span&&span class=&c1&&//获得图文消息的链接地址&/span&
&span class=&nv&&$is_multi&/span& &span class=&o&&=&/span& &span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'app_msg_ext_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'is_multi'&/span&&span class=&p&&];&/span&&span class=&c1&&//是否是多图文消息&/span&
&span class=&nv&&$datetime&/span& &span class=&o&&=&/span& &span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'comm_msg_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'datetime'&/span&&span class=&p&&];&/span&&span class=&c1&&//图文消息发送时间&/span&
&span class=&c1&&//在这里将图文消息链接地址插入到采集队列库中(队列库将在后文介绍,主要目的是建立一个批量采集队列,另一个程序将根据队列安排下一个采集的公众号或者文章内容)&/span&
&span class=&c1&&//在这里根据$content_url从数据库中判断一下是否重复&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&s1&&'数据库中不存在相同的$content_url'&/span&&span class=&p&&)&/span& &span class=&p&&{&/span&
&span class=&nv&&$fileid&/span& &span class=&o&&=&/span& &span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'app_msg_ext_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'fileid'&/span&&span class=&p&&];&/span&&span class=&c1&&//一个微信给的id&/span&
&span class=&nv&&$title&/span& &span class=&o&&=&/span& &span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'app_msg_ext_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'title'&/span&&span class=&p&&];&/span&&span class=&c1&&//文章标题&/span&
&span class=&nv&&$title_encode&/span& &span class=&o&&=&/span& &span class=&nb&&urlencode&/span&&span class=&p&&(&/span&&span class=&nb&&str_replace&/span&&span class=&p&&(&/span&&span class=&s2&&&&&&/span&&span class=&p&&,&/span& &span class=&s2&&&&&/span&&span class=&p&&,&/span& &span class=&nv&&$title&/span&&span class=&p&&));&/span&&span class=&c1&&//建议将标题进行编码,这样就可以存储emoji特殊符号了&/span&
&span class=&nv&&$digest&/span& &span class=&o&&=&/span& &span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'app_msg_ext_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'digest'&/span&&span class=&p&&];&/span&&span class=&c1&&//文章摘要&/span&
&span class=&nv&&$source_url&/span& &span class=&o&&=&/span& &span class=&nb&&str_replace&/span&&span class=&p&&(&/span&&span class=&s2&&&&/span&&span class=&se&&\\&/span&&span class=&s2&&&&/span&&span class=&p&&,&/span& &span class=&s2&&&&&/span&&span class=&p&&,&/span& &span class=&nb&&htmlspecialchars_decode&/span&&span class=&p&&(&/span&&span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'app_msg_ext_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'source_url'&/span&&span class=&p&&]));&/span&&span class=&c1&&//阅读原文的链接&/span&
&span class=&nv&&$cover&/span& &span class=&o&&=&/span& &span class=&nb&&str_replace&/span&&span class=&p&&(&/span&&span class=&s2&&&&/span&&span class=&se&&\\&/span&&span class=&s2&&&&/span&&span class=&p&&,&/span& &span class=&s2&&&&&/span&&span class=&p&&,&/span& &span class=&nb&&htmlspecialchars_decode&/span&&span class=&p&&(&/span&&span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'app_msg_ext_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'cover'&/span&&span class=&p&&]));&/span&&span class=&c1&&//封面图片&/span&
&span class=&nv&&$is_top&/span& &span class=&o&&=&/span& &span class=&mi&&1&/span&&span class=&p&&;&/span&&span class=&c1&&//标记一下是头条内容&/span&
&span class=&c1&&//现在存入数据库&/span&
&span class=&k&&echo&/span& &span class=&s2&&&头条标题:&&/span&&span class=&o&&.&/span&&span class=&nv&&$title&/span&&span class=&o&&.&/span&&span class=&nv&&$lastId&/span&&span class=&o&&.&/span&&span class=&s2&&&&/span&&span class=&se&&\n&/span&&span class=&s2&&&&/span&&span class=&p&&;&/span&&span class=&c1&&//这个echo可以显示在anyproxy的终端里&/span&
&span class=&p&&}&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&nv&&$is_multi&/span&&span class=&o&&==&/span&&span class=&mi&&1&/span&&span class=&p&&){&/span&&span class=&c1&&//如果是多图文消息&/span&
&span class=&k&&foreach&/span&&span class=&p&&(&/span&&span class=&nv&&$v&/span&&span class=&p&&[&/span&&span class=&s1&&'app_msg_ext_info'&/span&&span class=&p&&][&/span&&span class=&s1&&'multi_app_msg_item_list'&/span&&span class=&p&&]&/span& &span class=&k&&as&/span& &span class=&nv&&$kk&/span&&span class=&o&&=&&/span&&span class=&nv&&$vv&/span&&span class=&p&&){&/span&&span class=&c1&&//循环后面的图文消息&/span&
&span class=&nv&&$content_url&/span& &span class=&o&&=&/span& &span class=&nb&&str_replace&/span&&span class=&p&&(&/span&&span class=&s2&&&&/span&&span class=&se&&\\&/span&&span class=&s2&&&&/span&&span class=&p&&,&/span&&span class=&s2&&&&&/span&&span class=&p&&,&/span&&span class=&nb&&htmlspecialchars_decode&/span&&span class=&p&&(&/span&&span class=&nv&&$vv&/span&&span class=&p&&[&/span&&span class=&s1&&'content_url'&/span&&span class=&p&&]));&/span&&span class=&c1&&//图文消息链接地址&/span&
&span class=&c1&&//这里再次根据$content_url判断一下数据库中是否重复以免出错&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&s1&&'数据库中不存在相同的$content_url'&/span&&span class=&p&&){&/span&
&span class=&c1&&//在这里将图文消息链接地址插入到采集队列库中(队列库将在后文介绍,主要目的是建立一个批量采集队列,另一个程序将根据队列安排下一个采集的公众号或者文章内容)&/span&
&span class=&nv&&$title&/span& &span class=&o&&=&/span& &span class=&nv&&$vv&/span&&span class=&p&&[&/span&&span class=&s1&&'title'&/span&&span class=&p&&];&/span&&span class=&c1&&//文章标题&/span&
&span class=&nv&&$fileid&/span& &span class=&o&&=&/span& &span class=&nv&&$vv&/span&&span class=&p&&[&/span&&span class=&s1&&'fileid'&/span&&span class=&p&&];&/span&&span class=&c1&&//一个微信给的id&/span&
&span class=&nv&&$title_encode&/span& &span class=&o&&=&/span& &span class=&nb&&urlencode&/span&&span class=&p&&(&/span&&span class=&nb&&str_replace&/span&&span class=&p&&(&/span&&span class=&s2&&&&&&/span&&span class=&p&&,&/span&&span class=&s2&&&&&/span&&span class=&p&&,&/span&&span class=&nv&&$title&/span&&span class=&p&&));&/span&&span class=&c1&&//建议将标题进行编码,这样就可以存储emoji特殊符号了&/span&
&span class=&nv&&$digest&/span& &span class=&o&&=&/span& &span class=&nb&&htmlspecialchars&/span&&span class=&p&&(&/span&&span class=&nv&&$vv&/span&&span class=&p&&[&/span&&span class=&s1&&'digest'&/span&&span class=&p&&]);&/span&&span class=&c1&&//文章摘要&/span&
&span class=&nv&&$source_url&/span& &span class=&o&&=&/span& &span class=&nb&&str_replace&/span&&span class=&p&&(&/span&&span class=&s2&&&&/span&&span class=&se&&\\&/span&&span class=&s2&&&&/span&&span class=&p&&,&/span&&span class=&s2&&&&&/span&&span class=&p&&,&/span&&span class=&nb&&htmlspecialchars_decode&/span&&span class=&p&&(&/span&&span class=&nv&&$vv&/span&&span class=&p&&[&/span&&span class=&s1&&'source_url'&/span&&span class=&p&&]));&/span&&span class=&c1&&//阅读原文的链接&/span&
&span class=&c1&&//$cover = getCover(str_replace(&\\&,&&,htmlspecialchars_decode($vv['cover'])));&/span&
&span class=&nv&&$cover&/span& &span class=&o&&=&/span& &span class=&nb&&str_replace&/span&&span class=&p&&(&/span&&span class=&s2&&&&/span&&span class=&se&&\\&/span&&span class=&s2&&&&/span&&span class=&p&&,&/span&&span class=&s2&&&&&/span&&span class=&p&&,&/span&&span class=&nb&&htmlspecialchars_decode&/span&&span class=&p&&(&/span&&span class=&nv&&$vv&/span&&span class=&p&&[&/span&&span class=&s1&&'cover'&/span&&span class=&p&&]));&/span&&span class=&c1&&//封面图片&/span&
&span class=&c1&&//现在存入数据库&/span&
&span class=&k&&echo&/span& &span class=&s2&&&标题:&&/span&&span class=&o&&.&/span&&span class=&nv&&$title&/span&&span class=&o&&.&/span&&span class=&nv&&$lastId&/span&&span class=&o&&.&/span&&span class=&s2&&&&/span&&span class=&se&&\n&/span&&span class=&s2&&&&/span&&span class=&p&&;&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&
&span class=&p&&}&/span&
&span class=&cp&&?&&/span&&span class=&x&&&/span&
&/code&&/pre&&/div&&p&再次强调代码只是原理,其中一部分注视的代码要自己编写。&/p&&p&2、getMsgExt.php获取文章阅读量和点赞量的程序&/p&&div class=&highlight&&&pre&&code class=&language-php&&&span&&/span&&span class=&cp&&&?&/span&
&span class=&nv&&$str&/span& &span class=&o&&=&/span& &span class=&nv&&$_POST&/span&&span class=&p&&[&/span&&span class=&s1&&'str'&/span&&span class=&p&&];&/span&
&span class=&nv&&$url&/span& &span class=&o&&=&/span& &span class=&nv&&$_POST&/span&&span class=&p&&[&/span&&span class=&s1&&'url'&/span&&span class=&p&&];&/span&&span class=&c1&&//先获取到两个POST变量&/span&
&span class=&c1&&//先针对url参数进行操作&/span&
&span class=&nb&&parse_str&/span&&span class=&p&&(&/span&&span class=&nb&&parse_url&/span&&span class=&p&&(&/span&&span class=&nb&&htmlspecialchars_decode&/span&&span class=&p&&(&/span&&span class=&nb&&urldecode&/span&&span class=&p&&(&/span&&span class=&nv&&$url&/span&&span class=&p&&)),&/span&&span class=&nx&&PHP_URL_QUERY&/span& &span class=&p&&),&/span&&span class=&nv&&$query&/span&&span class=&p&&);&/span&&span class=&c1&&//解析url地址&/span&
&span class=&nv&&$biz&/span& &span class=&o&&=&/span& &span class=&nv&&$query&/span&&span class=&p&&[&/span&&span class=&s1&&'__biz'&/span&&span class=&p&&];&/span&&span class=&c1&&//得到公众号的biz&/span&
&span class=&nv&&$sn&/span& &span class=&o&&=&/span& &span class=&nv&&$query&/span&&span class=&p&&[&/span&&span class=&s1&&'sn'&/span&&span class=&p&&];&/span&
&span class=&c1&&//再解析str变量&/span&
&span class=&nv&&$json&/span& &span class=&o&&=&/span& &span class=&nb&&json_decode&/span&&span class=&p&&(&/span&&span class=&nv&&$str&/span&&span class=&p&&,&/span&&span class=&k&&true&/span&&span class=&p&&);&/span&&span class=&c1&&//进行json_decode&/span&
&span class=&c1&&//$sql = &select * from `文章表` where `biz`='&.$biz.&' and `content_url` like '%&.$sn.&%'& limit 0,1;&/span&
&span class=&c1&&//根据biz和sn找到对应的文章&/span&
&span class=&nv&&$read_num&/span& &span class=&o&&=&/span& &span class=&nv&&$json&/span&&span class=&p&&[&/span&&span class=&s1&&'appmsgstat'&/span&&span class=&p&&][&/span&&span class=&s1&&'read_num'&/span&&span class=&p&&];&/span&&span class=&c1&&//阅读量&/span&
&span class=&nv&&$like_num&/span& &span class=&o&&=&/span& &span class=&nv&&$json&/span&&span class=&p&&[&/span&&span class=&s1&&'appmsgstat'&/span&&span class=&p&&][&/span&&span class=&s1&&'like_num'&/span&&span class=&p&&];&/span&&span class=&c1&&//点赞量&/span&
&span class=&c1&&//在这里同样根据sn在采集队列表中删除对应的文章,代表这篇文章可以移出采集队列了&/span&
&span class=&c1&&//$sql = &delete from `队列表` where `content_url` like '%&.$sn.&%'& &/span&
&span class=&c1&&//然后将阅读量和点赞量更新到文章表中。&/span&
&span class=&k&&exit&/span&&span class=&p&&(&/span&&span class=&nb&&json_encode&/span&&span class=&p&&(&/span&&span class=&nv&&$msg&/span&&span class=&p&&));&/span&&span class=&c1&&//可以显示在anyproxy的终端里&/span&
&span class=&cp&&?&&/span&&span class=&x&&&/span&
&/code&&/pre&&/div&&p&3、getWxHis.php、getWxPost.php两个程序比较类似,一起介绍&/p&&p&==========日更新==========&/p&&p&因为出现了两种页面表现形式,拼接历史消息页面的地址也应该发生改变,但是目前实测,即使微信客户端出现的是第二种页面表现形式,也可以将第一种页面的链接地址发送给微信,同样有效。&/p&&div class=&highlight&&&pre&&code class=&language-php&&&span&&/span&&span class=&cp&&&?&/span&
&span class=&c1&&//getWxHis.php 当前页面为公众号历史消息时,读取这个程序&/span&
&span class=&c1&&//在采集队列表中有一个load字段,当值等于1时代表正在被读取&/span&
&span class=&c1&&//首先删除采集队列表中load=1的行&/span&
&span class=&c1&&//然后从队列表中任意select一行&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&s1&&'队列表为空'&/span&&span class=&p&&){&/span&
&span class=&c1&&//队列表如果空了,就从存储公众号biz的表中取得一个biz,这里我在公众号表中设置了一个采集时间的time字段,按照正序排列之后,就得到时间戳最小的一个公众号记录,并取得它的biz&/span&
&span class=&nv&&$url&/span& &span class=&o&&=&/span& &span class=&s2&&&http://mp.weixin.qq.com/mp/getmasssendmsg?__biz=&&/span&&span class=&o&&.&/span&&span class=&nv&&$biz&/span&&span class=&o&&.&/span&&span class=&s2&&&#wechat_webview_type=1&wechat_redirect&&/span&&span class=&p&&;&/span&&span class=&c1&&//拼接公众号历史消息url地址(第一种页面形式)&/span&
&span class=&nv&&$url&/span& &span class=&o&&=&/span& &span class=&s2&&&https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=&&/span&&span class=&o&&.&/span&&span class=&nv&&$biz&/span&&span class=&o&&.&/span&&span class=&s2&&&&scene=124#wechat_redirect&&/span&&span class=&p&&;&/span&&span class=&c1&&//拼接公众号历史消息url地址(第二种页面形式)&/span&
&span class=&c1&&//更新刚才提到的公众号表中的采集时间time字段为当前时间戳。&/span&
&span class=&p&&}&/span&&span class=&k&&else&/span&&span class=&p&&{&/span&
&span class=&c1&&//取得当前这一行的content_url字段&/span&
&span class=&nv&&$url&/span& &span class=&o&&=&/span& &span class=&nv&&$content_url&/span&&span class=&p&&;&/span&
&span class=&c1&&//将load字段update为1&/span&
&span class=&p&&}&/span&
&span class=&k&&echo&/span& &span class=&s2&&&&script&setTimeout(function(){window.location.href='&&/span&&span class=&o&&.&/span&&span class=&nv&&$url&/span&&span class=&o&&.&/span&&span class=&s2&&&';},2000);&/script&&&/span&&span class=&p&&;&/span&&span class=&c1&&//将下一个将要跳转的$url变成js脚本,由anyproxy注入到微信页面中。&/span&
&span class=&cp&&?&&/span&&span class=&x&&&/span&
&/code&&/pre&&/div&&div class=&highlight&&&pre&&code class=&language-php&&&span&&/span&&span class=&cp&&&?&/span&
&span class=&c1&&//getWxPost.php 当前页面为公众号文章页面时,读取这个程序&/span&
&span class=&c1&&//首先删除采集队列表中load=1的行&/span&
&span class=&c1&&//然后从队列表中按照“order by id asc”选择多行(注意这一行和上面的程序不一样)&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&o&&!&/span&&span class=&k&&empty&/span&&span class=&p&&(&/span&&span class=&s1&&'队列表'&/span&&span class=&p&&)&/span& &span class=&o&&&&&/span& &span class=&nb&&count&/span&&span class=&p&&(&/span&&span class=&s1&&'队列表中的行数'&/span&&span class=&p&&)&/span&&span class=&o&&&&/span&&span class=&mi&&1&/span&&span class=&p&&){&/span&&span class=&c1&&//(注意这一行和上面的程序不一样)&/span&
&span class=&c1&&//取得第0行的content_url字段&/span&
&span class=&nv&&$url&/span& &span class=&o&&=&/span& &span class=&nv&&$content_url&/span&&span class=&p&&;&/span&
&span class=&c1&&//将第0行的load字段update为1&/span&
&span class=&p&&}&/span&&span class=&k&&else&/span&&span class=&p&&{&/span&
&span class=&c1&&//队列表还剩下最后一条时,就从存储公众号biz的表中取得一个biz,这里我在公众号表中设置了一个采集时间的time字段,按照正序排列之后,就得到时间戳最小的一个公众号记录,并取得它的biz&/span&
&span class=&nv&&$url&/span& &span class=&o&&=&/span& &span class=&s2&&&http://mp.weixin.qq.com/mp/getmasssendmsg?__biz=&&/span&&span class=&o&&.&/span&&span class=&nv&&$biz&/span&&span class=&o&&.&/span&&span class=&s2&&&#wechat_webview_type=1&wechat_redirect&&/span&&span class=&p&&;&/span&&span class=&c1&&//拼接公众号历史消息url地址(第一种页面形式)&/span&
&span class=&nv&&$url&/span& &span class=&o&&=&/span& &span class=&s2&&&https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=&&/span&&span class=&o&&.&/span&&span class=&nv&&$biz&/span&&span class=&o&&.&/span&&span class=&s2&&&&scene=124#wechat_redirect&&/span&&span class=&p&&;&/span&&span class=&c1&&//拼接公众号历史消息url地址(第二种页面形式)&/span&
&span class=&c1&&//更新刚才提到的公众号表中的采集时间time字段为当前时间戳。&/span&
&span class=&p&&}&/span&
&span class=&k&&echo&/span& &span class=&s2&&&&script&setTimeout(function(){window.location.href='&&/span&&span class=&o&&.&/span&&span class=&nv&&$url&/span&&span class=&o&&.&/span&&span class=&s2&&&';},2000);&/script&&&/span&&span class=&p&&;&/span&&span class=&c1&&//将下一个将要跳转的$url变成js脚本,由anyproxy注入到微信页面中。&/span&
&span class=&cp&&?&&/span&&span class=&x&&&/span&
&/code&&/pre&&/div&&p&这两段程序的意义是:从队列表中读取出下一个采集内容的信息,如果是历史消息页,则将biz拼接到地址中(注意:评论区有朋友以为key和pass_ticket也要拼接,实则不需要),通过js的方式输出到页面,如果下一条是文章,则将历史消息列表json中的文章地址直接输出为js。同样文章内容的地址中不包含uin和key这样的参数,这些参数都是由客户端自动补充的。&/p&&p&这两个程序的微小差别是因为当读取公众号历史消息页面时,anyproxy会同时做两件事,第一是&b&将历史消息的json发送到服务器&/b&,第二是&b&获取到下一页的链接地址&/b&。但是这两个操作是存在&b&时间差&/b&的,第一次读取下一页地址时候本来应该是得到&b&当前这个公众号&/b&文章的第一条链接地址,但是这时候历史消息的json还没有发送到服务器,所以只能得到&b&第二个公众号&/b&的历史消息页面。在读取&b&第二个公众号&/b&历史消息页面之后得到的下一页地址则是&b&第一个公众号&/b&的第一篇文章的地址。当队列还剩下一条记录时,就需要再去取得&b&下一个公众号&/b&的链接地址,否则如果当队列空了再去取得&b&下一个公众号&/b&的链接地址,就会循环到上面提到的第一次读取时的情况,这样就会出现两个公众号历史消息列表和文章采集穿插进行的情况。&/p&&p&刚才这4个PHP程序提到了几个数据表,下面再讲一下数据表如何设计。这里只介绍一些主要字段,现实应用中还会根据自己程序的不同添加上其它有必要的字段。&/p&&p&1、微信公众号表&/p&&div class=&highlight&&&pre&&code class=&language-sql&&&span&&/span&&span class=&k&&CREATE&/span& &span class=&k&&TABLE&/span& &span class=&o&&`&/span&&span class=&n&&weixin&/span&&span class=&o&&`&/span& &span class=&p&&(&/span&
&span class=&o&&`&/span&&span class=&n&&id&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&n&&AUTO_INCREMENT&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&biz&/span&&span class=&o&&`&/span& &span class=&nb&&varchar&/span&&span class=&p&&(&/span&&span class=&mi&&255&/span&&span class=&p&&)&/span& &span class=&k&&DEFAULT&/span& &span class=&s1&&''&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'公众号唯一标识biz'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&collect&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&DEFAULT&/span& &span class=&s1&&'1'&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'记录采集时间的时间戳'&/span&&span class=&p&&,&/span&
&span class=&k&&PRIMARY&/span& &span class=&k&&KEY&/span& &span class=&p&&(&/span&&span class=&o&&`&/span&&span class=&n&&id&/span&&span class=&o&&`&/span&&span class=&p&&)&/span&
&span class=&p&&)&/span& &span class=&p&&;&/span&
&/code&&/pre&&/div&&p&2、微信文章表&/p&&div class=&highlight&&&pre&&code class=&language-sql&&&span&&/span&&span class=&k&&CREATE&/span& &span class=&k&&TABLE&/span& &span class=&o&&`&/span&&span class=&n&&post&/span&&span class=&o&&`&/span& &span class=&p&&(&/span&
&span class=&o&&`&/span&&span class=&n&&id&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&n&&AUTO_INCREMENT&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&biz&/span&&span class=&o&&`&/span& &span class=&nb&&varchar&/span&&span class=&p&&(&/span&&span class=&mi&&255&/span&&span class=&p&&)&/span& &span class=&nb&&CHARACTER&/span& &span class=&k&&SET&/span& &span class=&n&&utf8&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'文章对应的公众号biz'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&field_id&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'微信定义的一个id,每条文章唯一'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&title&/span&&span class=&o&&`&/span& &span class=&nb&&varchar&/span&&span class=&p&&(&/span&&span class=&mi&&255&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&DEFAULT&/span& &span class=&s1&&''&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'文章标题'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&title_encode&/span&&span class=&o&&`&/span& &span class=&nb&&text&/span& &span class=&nb&&CHARACTER&/span& &span class=&k&&SET&/span& &span class=&n&&utf8&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'文章编码,防止文章出现emoji'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&digest&/span&&span class=&o&&`&/span& &span class=&nb&&varchar&/span&&span class=&p&&(&/span&&span class=&mi&&500&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&DEFAULT&/span& &span class=&s1&&''&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'文章摘要'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&content_url&/span&&span class=&o&&`&/span& &span class=&nb&&varchar&/span&&span class=&p&&(&/span&&span class=&mi&&500&/span&&span class=&p&&)&/span& &span class=&nb&&CHARACTER&/span& &span class=&k&&SET&/span& &span class=&n&&utf8&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'文章地址'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&source_url&/span&&span class=&o&&`&/span& &span class=&nb&&varchar&/span&&span class=&p&&(&/span&&span class=&mi&&500&/span&&span class=&p&&)&/span& &span class=&nb&&CHARACTER&/span& &span class=&k&&SET&/span& &span class=&n&&utf8&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'阅读原文地址'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&cover&/span&&span class=&o&&`&/span& &span class=&nb&&varchar&/span&&span class=&p&&(&/span&&span class=&mi&&500&/span&&span class=&p&&)&/span& &span class=&nb&&CHARACTER&/span& &span class=&k&&SET&/span& &span class=&n&&utf8&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'封面图片'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&is_multi&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'是否多图文'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&is_top&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'是否头条'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&datetime&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'文章时间戳'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&readNum&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&DEFAULT&/span& &span class=&s1&&'1'&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'文章阅读量'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&likeNum&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&k&&DEFAULT&/span& &span class=&s1&&'0'&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'文章点赞量'&/span&&span class=&p&&,&/span&
&span class=&k&&PRIMARY&/span& &span class=&k&&KEY&/span& &span class=&p&&(&/span&&span class=&o&&`&/span&&span class=&n&&id&/span&&span class=&o&&`&/span&&span class=&p&&)&/span&
&span class=&p&&)&/span& &span class=&p&&;&/span&
&/code&&/pre&&/div&&p&3、采集队列表&/p&&div class=&highlight&&&pre&&code class=&language-sql&&&span&&/span&&span class=&k&&CREATE&/span& &span class=&k&&TABLE&/span& &span class=&o&&`&/span&&span class=&n&&tmplist&/span&&span class=&o&&`&/span& &span class=&p&&(&/span&
&span class=&o&&`&/span&&span class=&n&&id&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&n&&unsigned&/span& &span class=&k&&NOT&/span& &span class=&k&&NULL&/span& &span class=&n&&AUTO_INCREMENT&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&n&&content_url&/span&&span class=&o&&`&/span& &span class=&nb&&varchar&/span&&span class=&p&&(&/span&&span class=&mi&&255&/span&&span class=&p&&)&/span& &span class=&k&&DEFAULT&/span& &span class=&k&&NULL&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'文章地址'&/span&&span class=&p&&,&/span&
&span class=&o&&`&/span&&span class=&k&&load&/span&&span class=&o&&`&/span& &span class=&nb&&int&/span&&span class=&p&&(&/span&&span class=&mi&&11&/span&&span class=&p&&)&/span& &span class=&k&&DEFAULT&/span& &span class=&s1&&'0'&/span& &span class=&k&&COMMENT&/span& &span class=&s1&&'读取中标记'&/span&&span class=&p&&,&/span&
&span class=&k&&PRIMARY&/span& &span class=&k&&KEY&/span& &span class=&p&&(&/span&&span class=&o&&`&/span&&span class=&n&&id&/span&&span class=&o&&`&/span&&span class=&p&&),&/span&
&span class=&k&&UNIQUE&/span& &span class=&k&&KEY&/span& &span class=&o&&`&/span&&span class=&n&&content_url&/span&&span class=&o&&`&/span& &span class=&p&&(&/span&&span class=&o&&`&/span&&span class=&n&&content_url&/span&&span class=&o&&`&/span&&span class=&p&&)&/span&
&span class=&p&&)&/span& &span class=&p&&;&/span&
&/code&&/pre&&/div&&p&以上就是由微信客户端、微信号、anyproxy代理服务器、PHP程序、mysql数据库共同组成的微信公众号文章批量自动采集系统。&/p&&p&在接下来的文章中,还会再进一步详细介绍如何保存文章内容,如何提高采集系统的稳定性,以及其它我的系统运行过程中得到的经验。&/p&&p&非常希望大家能给予意见和交流,欢迎骚扰微信号cuijin。&/p&&p&&a href=&https://zhuanlan.zhihu.com/p/& class=&internal&&持续更新,微信公众号文章批量采集系统的构建&/a&&br&&/p&&p&&a href=&https://zhuanlan.zhihu.com/p/& class=&internal&&微信公众号文章采集的入口--历史消息页详解&/a&&br&&/p&&p&&a href=&https://zhuanlan.zhihu.com/p/& class=&internal&&微信公众号文章页的分析与采集&/a&&/p&&p&&a href=&https://zhuanlan.zhihu.com/p//edit& class=&internal&&提高微信公众号文章采集效率,anyproxy进阶使用方法&/a&&/p&
我从2014年就开始做微信公众号内容的批量采集,最开始的目的是为了做一个html5的垃圾内容网站。当时垃圾站采集到的微信公众号的内容很容易在公众号里面传播。当时批量采集特别好做,采集入口是公众号的历史消息页。这个入口到现在也是一样,只不过越来越难…
&p& 自媒体是大势所趋,对应的自媒体变现也将会出现大势所趋。题主提到自媒体赚钱,我就在下面分享15种自媒体的变现模式,可以给大家以参考!!! &/p&&br&&figure&&img src=&https://pic3.zhimg.com/v2-a4eb067767daf3a2a51dbfa6_b.jpg& data-rawwidth=&524& data-rawheight=&291& class=&origin_image zh-lightbox-thumb& width=&524& data-original=&https://pic3.zhimg.com/v2-a4eb067767daf3a2a51dbfa6_r.jpg&&&/figure&&p&&b&1、自媒体平台&/b&&/p&&p&现在很多大型网络公司都建立了自媒体平台,筑巢引凤,吸引自媒体入驻,给自媒体人广告分成,像著名的&a href=&//link.zhihu.com/?target=http%3A//lusongsong.com/tags/baidu.html& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&百度&i class=&icon-external&&&/i&&/a&百家,上线才半个月,就有一部分自媒体人收入过万了,当然,这种方式适合于比较能写的自媒体人,需要一定的文字功底。&/p&&p&&b&2、广告收入&/b&&/p&&p&既然自媒体当中有媒体二字,自然就具有媒体属性,之前在传统媒体当中,电视台、报纸、杂志靠什么赚钱呢?我相信,最大的一块肯定是广告费,自媒体也不例外,很多自媒体人的第一笔收入可能就是广告费。&/p&&p&广告有两种:硬广和软文,硬广是赤裸裸广告性质,容易引起关注用户的反感;软文就好很多了,写得好的软文,甚至你都不觉得这是广告,一般自媒体收费以活跃度及关注量等因素收费。自媒体在云堆新媒上接的广告能赚到几百上千的广告费。&/p&&p&&b&3、公关宣传&/b&&/p&&p&公关宣传主要就是软文了,目前有不少自媒体以写专访的形式区别于软文,于是专访比软文高端不少,对于找不到媒体能专访的小公司,这种自媒体无疑起到重要性作用。在传播的时候,自媒体除了文笔费可能还有些推广费用。&/p&&p&&b&4、品牌植入&/b&&/p&&p&严格意义上来讲,品牌植入也是广告的一种,但是它更加润物细无声,我举个最简单的例子来说明硬广、&a href=&//link.zhihu.com/?target=http%3A//lusongsong.com/info/post/239.html& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&软文和品牌产品&i class=&icon-external&&&/i&&/a&/服务植入的区别。&/p&&p&例如:变形金刚3里边不少都是软性的广告,让用户产生不少的直观映像。如舒化奶就是一个软性的例子,而直接显示地方的时候,直接点明位置,也是一个硬性的广告,其中的车子就是相当于品牌的植入。&/p&&p&注意:知名品牌的品牌植入一般只有自媒体的大咖们才有资格参与,大多数自媒体人只能望而兴叹。&/p&&p&&b&5、电商&/b&&/p&&p&自媒体不一定就是专门写文章的一群人,也可能是某一领域的专家,比如买各种东西的体验,因为他们是有独特的见识,对这个感兴趣的人就会关注他们,后序推荐点产品什么的,很多人都会买单。&/p&&p&自媒体赚钱越多,一般都是意见领袖,能影响很多人,最重要的是,这些人会直接购买东西,也就是他们的自媒体价值现。&/p&&p&&b&6、销售产品&/b&&/p&&p&这里所说的卖自己的品牌产品,不是发广告帮别人卖产品。&/p&&p&目前,也有不少出售服务的自媒体人,比如:微信公众号:和谐电商 是在网上开展网络代运营业务,前期已经有和谐设计的先有品牌,再有其它相关服务,我们发现一点,在&a href=&//link.zhihu.com/?target=http%3A//lusongsong.com/tags/hulianwang.html& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&互联网&i class=&icon-external&&&/i&&/a&时代,只要有用户,卖什么都成!&/p&&p&&b&7、咨询服务&/b&&/p&&p&这个服务的钱一般来说非专业人士不好做,因为你在某个领域做得好,同行也认可,然后找你咨询一些项目问题,完全是可以收费,自媒体人相当比较难,也许专访比较可靠点!&/p&&p&&b&8、演讲,培训&/b&&/p&&p&媒体人多方面发展,也可以从写的好,再慢慢开讲,再慢慢的去做演讲和培训,一般参加一场商业演讲,看行业影响程序,可以少至几千块多至几万不等的辛苦费,&/p&&p&自媒体做得好,一般就会有学习者,比如用户粉丝或者企业需要一些内训(效果一般是要好于社会上的那些讲师),价格也不便宜,一次5000到几万,一个月能有几次,收入也不错哦!&/p&&p&&b&9、会员制&/b&&/p&&p&其实做付费会员制是一件吃力不讨好的事,付费的人数多还好,可以多请人维护,会员少的情况下,请不起人,全凭自己服务会员一至两年,难度是不可想象的。&/p&&p&比如:罗辑思维就是会员制,两万会员一共收了960万会费,还有一些是采用会员制,会员可以享受到免费参加一些版块或者赠送热门书籍及咨询服务等等。&/p&&p&&b&10、新闻客户端&/b&&/p&&p&新浪/搜狐/网易等新闻客户端的费用,其中搜狐可以达到500-1000,如果自媒体的产量一个月20篇,也有二万的收入,有很多时候稿子也不能只发独家,当然如果你专心在一个平台上写个十篇八篇,也是大几千一万的收入了,加上客户端有新的广告及分成计划的,也许可以做一些格外的收入。&/p&&p&&b&11、出书&/b&&/p&&p&有些写作能力比较强的,感觉只有自媒体平台的广告分成太少,也可以自己出版书籍,但这要求写作能力相当更高。目前也有用低成本的电子书模式,很多微博及微信公众号都有销售电子书籍售卖。&/p&&p&&b&12、被&包养&或跳槽&/b&&/p&&p&这是目前不少企业与自媒体的关系,自媒体形成一定的品牌价值之后,会被一些大的机构注资或收购,自媒体做得好,然后被挖走跳槽的很常见。&/p&&p&&b&13、直接卖号&/b&&/p&&p&现在也有将微信公众号或者是微博账号直接转卖掉的,具

我要回帖

更多关于 qq改成想要的文字 的文章

 

随机推荐