wwWww.86e6显示“页面e6浏览器下载不到里面ww.86e6com的内容”怎么解决?

百万站提示: 您只需要输入您的域名如 baidu.com 即可.
特别提示:您好,本程序正在改版中,目前暂停使用,敬请关注百万站网站收录热门活动。
相约谷歌、百度、Sogou、Soso、Yahoo、Bing、Youdao - 百万站全新打造史上最全最快搜索引擎收录查询!
Google PR:
Sogou Rank:
China Rank:
快照日期: --
百度近一天收录:--
近一周:-- 近一月:--
Chinarank排名图形数据[Chinarank官方数据]
Alexa排名图形数据[Alexa官方数据]
关于百万站站长工具
1. 百万站站长工具(tool.baiwanzhan.com),致力于为站长提供各类网站应用相关查询服务。
2. 如果您需要其他查询功能,也可以直接与我们编辑联系,来信告诉我们。
3. 百万站-百万优秀网站的大本营!百万站官方网站汇聚百万精品网站,与您分享百万精彩网站知识。在 SegmentFault,学习技能、解决问题
每个月,我们帮助 1000 万的开发者解决各种各样的技术问题。并助力他们在技术能力、职业生涯、影响力上获得提升。
问题对人有帮助,内容完整,我也想知道答案
问题没有实际价值,缺少关键内容,没有改进余地
.ball_1里的rotateX(-90)有IE就不显示,删掉图片就出来,但是rotate不是兼容IE10+吗?大神帮我看看,是不是我写的有问题
&div class="ui_base u_p3d"&
&div class="ball_c"&下载&/div&
&div class="base u_p3d"&
&div class="pan"&&/div&
&div class="ball_base u_p3d ball_1"&
&!--&img src="images/html.png" style="width:90 height:90 background:"/&
&img src="../a.jpg" style="width:90 height:90 background:"/&
&img src="../images/084bddfdca086e6abvuoE_fw658.jpg"
style="width:90 height:90"/&--&
&a href="http://php.net/" class="ball"&&/a&
&div class="ball_base u_p3d ball_2"&
&a href="https://www.java.com/zh_CN/" class="ball"&&/a&
&div class="ball_base u_p3d ball_3"&
&a href="http://www.w3school.com.cn/index.html" class="ball"&&/a&
&div class="ball_base u_p3d ball_4"&
&a href="http://www.netocr.com/register.do" class="ball"&&/a&
&div class="ball_base u_p3d ball_5"&
&a href="http://php.net/" class="ball"&&/a&
&div class="ball_base u_p3d ball_6"&
&a href="http://www.runoob.com/" class="ball"&&/a&
&div class="ball_base u_p3d ball_7"&
&a href="http://www.baidu.com" class="ball"&&/a&
@keyframes cir1 {
transform: rotateY(0deg) rotateZ(10deg);
transform: rotateY(-350deg) rotateZ(10deg);
@keyframes cir2 {
transform: rotateY(-50deg) rotateZ(10deg);
transform: rotateY(-400deg) rotateZ(10deg);
@keyframes cir3 {
transform: rotateY(-100deg) rotateZ(10deg);
transform: rotateY(-450deg) rotateZ(10deg);
@keyframes cir4 {
transform: rotateY(-150deg) rotateZ(10deg);
transform: rotateY(-500deg) rotateZ(10deg);
@keyframes cir5 {
transform: rotateY(-200deg) rotateZ(10deg);
transform: rotateY(-550deg) rotateZ(10deg);
@keyframes cir6 {
transform: rotateY(-250deg) rotateZ(10deg);
transform: rotateY(-600deg) rotateZ(10deg);
@keyframes cir7 {
transform: rotateY(-300deg) rotateZ(10deg);
transform: rotateY(-650deg) rotateZ(10deg);
@keyframes cir {
transform: rotateX(80deg) rotateY(-10deg) rotateZ(0deg);
transform: rotateX(80deg) rotateY(-10deg) rotateZ(-360deg);
@keyframes cir_p {
transform: rotateZ(0deg);
transform: rotateZ(-360deg);
-webkit-transform-style: preserve-3d !
-ms-transform-style: preserve-3d !
transform-style: preserve-3d !
.ui_base {
width: 400
height:300
margin:100
-ms-perspective: 1000
-ms-perspective-origin: 50% 0%;
-webkit-perspective: 1000
-webkit-perspective-origin: 50% 0%;
perspective: 1000
perspective-origin: 50% 0%;
-webkit-transform: rotateX(80deg) rotateY(-10deg);
-ms-transform: rotateX(80deg) rotateY(-10deg);
transform: rotateX(80deg) rotateY(-10deg);
width: 350
height: 350
-webkit-backface-visibility:
-ms-backface-visibility:
backface-visibility:
animation: cir 15s linear 0
.ball_base {
-webkit-transform-origin: 225px 0
-ms-transform-origin: 225px 0
transform-origin: 225px 0
width: 225
height: 127
/*transition:all 2s ease-out 0*/
transition:all 2s 0
transform-origin: 50% 50%;
height: 90
line-height: 90
text-align:
/*background-image:url(../images/round.png);*/
background-size: 100% 100%;
font-size: 12
opacity: 1;
.ball_1 a{background-image:url(../images/PHP1.png);}
.ball_2 a{background-image:url(../images/java.png);}
.ball_3 a{background-image:url(../images/Android.png);}
.ball_4 a{background-image:url(../images/ios.png);}
.ball_5 a{background-image:url(../images/c.png);}
.ball_6 a{background-image:url(../images/html.png);}
.ball_7 a{background-image:url(../images/python1.png);}
transform-origin: 50% 50%;
width: 157
height: 157
line-height: 157
text-align:
background-image:url(../images/round.png);
background-size:100% 100%;
font-size: 24
opacity: 0.9;
width: 100%;
height: 100%;
background-image: url("../images/c5.png");
background-size: 100% 100%;
-webkit-animation: cir_p 5s linear 0
-ms-animation: cir_p 5s linear 0
animation: cir_p 5s linear 0
.ball_1 .ball {
animation: cir1 15s linear 0
-ms-animation: cir1 15s linear 0
transition-delay: 1300ms !
.ball_2 .ball {
-ms-animation: cir2 15s linear 0
animation: cir2 15s linear 0
transition-delay: 1100ms !
.ball_3 .ball {
-ms-animation: cir3 15s linear 0
animation: cir3 15s linear 0
transition-delay: 900ms !
.ball_4 .ball {
-ms-animation: cir4 15s linear 0
animation: cir4 15s linear 0
transition-delay: 700ms !
.ball_5 .ball {
-ms-animation: cir5 15s linear 0
animation: cir5 15s linear 0
transition-delay: 500ms !
.ball_6 .ball {
-ms-animation: cir6 15s linear 0
animation: cir6 15s linear 0
transition-delay: 300ms !
.ball_7 .ball {
-ms-animation: cir7 15s linear 0
animation: cir7 15s linear 0
transition-delay: 100ms !
transform: rotateX(-90deg) rotateY(0deg) translateY(-70px);
transform: rotateX(-90deg) rotateY(50deg) translateY(-70px);
transform: rotateX(-90deg) rotateY(100deg) translateY(-70px);
transform: rotateX(-90deg) rotateY(150deg) translateY(-70px);
transform: rotateX(-90deg) rotateY(200deg) translateY(-70px);
transform: rotateX(-90deg) rotateY(250deg) translateY(-70px);
transform: rotateX(-90deg) rotateY(300deg) translateY(-70px);
答案对人有帮助,有参考价值
答案没帮助,是错误的答案,答非所问
因为CSS3很多新增属性支持性能很差,transform支持IE9+,不过你可以加前缀-ms-
答案对人有帮助,有参考价值
答案没帮助,是错误的答案,答非所问
rotateX(-90deg) 的含义是绕 X 轴顺时针旋转 90 度。
设想这个元素是三维空间的一个对象,而我们在屏幕上看到的只是这个想象之中的三维对象的投影。旋转 90 度之后,它就在三维空间垂直于屏幕了。所以在没有 perspective 的情况下,它在屏幕的投影是不是什么也没有? 只有被透视的情况下,它才可能在屏幕上留下投影。然而 IE 10 和 11 并不支持采用 CSS perspective 属性的方式添加透视效果,因此什么也看不到。
这里真正的问题是 IE 10/11 不支持 CSS perspective 属性,只支持把 perspective 写在 transform 里面。如下所示。
.element {
transform: perspective(233px) translateY(50px) rotateX(90deg)
分享到微博?
关闭理由:
删除理由:
忽略理由:
推广(招聘、广告、SEO 等)方面的内容
与已有问题重复(请编辑该提问指向已有相同问题)
答非所问,不符合答题要求
宜作评论而非答案
带有人身攻击、辱骂、仇恨等违反条款的内容
无法获得确切结果的问题
非开发直接相关的问题
非技术提问的讨论型问题
其他原因(请补充说明)
我要该,理由是:
在 SegmentFault,学习技能、解决问题
每个月,我们帮助 1000 万的开发者解决各种各样的技术问题。并助力他们在技术能力、职业生涯、影响力上获得提升。未找到信息!
未找到信息!
收录 / 索引信息
页面升级访问
一般不超过80字符
网站关键词
页面升级访问
一般不超过100字符
页面升级访问、每天正常更新、欢迎广大朋友尽情观赏!!! 备注:未成年自觉离开!!!
一般不超过200字符
同主机安全
虚假或欺诈网站
挂马或恶意网站
未存在违规内容
alexa排名趋势
服务器信息
服务器类型
源文件大小
压缩后大小
最后修改时间
关闭关闭关闭404 Not Found
404 Not Found
The requested URL was not found on this server.
您要找的内容已被删除&figure&&img src=&https://pic1.zhimg.com/v2-17f912fb0a4da658d68c8b_b.jpg& data-rawwidth=&576& data-rawheight=&576& class=&origin_image zh-lightbox-thumb& width=&576& data-original=&https://pic1.zhimg.com/v2-17f912fb0a4da658d68c8b_r.jpg&&&/figure&&blockquote&&a href=&http://link.zhihu.com/?target=https%3A//zhangslob.github.io//Hi%25EF%25BC%258C%25E8%25BF%%E6%2598%25AF%25E6%E7%259A%%2588%25AC%25E8%2599%25AB%25E7%25AC%%25AE%25B0/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Hi,这里是我的爬虫笔记&/a&&br&长期更新
(?o . o?)&/blockquote&&p&&br&&/p&&figure&&img src=&https://pic1.zhimg.com/v2-51f0f4d5e4f2b1f84bbc_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&580& data-rawheight=&580& class=&origin_image zh-lightbox-thumb& width=&580& data-original=&https://pic1.zhimg.com/v2-51f0f4d5e4f2b1f84bbc_r.jpg&&&/figure&&p&&br&&/p&&p&平时有个习惯,会把自己的笔记写在有道云里面,现在做个整理。会长期更新,因为我是BUG制造机。&/p&&p&&br&&/p&&h2&解析&/h2&&p&&br&&/p&&p&&b&xpath提取所有节点文本&/b&&/p&&div class=&highlight&&&pre&&code class=&language-python3&&&span&&/span&&span class=&o&&&&/span&&span class=&n&&div&/span& &span class=&nb&&id&/span&&span class=&o&&=&/span&&span class=&s2&&&test3&&/span&&span class=&o&&&&/span&&span class=&n&&我左青龙&/span&&span class=&err&&,&/span&&span class=&o&&&&/span&&span class=&n&&span&/span& &span class=&nb&&id&/span&&span class=&o&&=&/span&&span class=&s2&&&tiger&&/span&&span class=&o&&&&/span&&span class=&n&&右白虎&/span&&span class=&err&&,&/span&&span class=&o&&&&/span&&span class=&n&&ul&/span&&span class=&o&&&&/span&&span class=&n&&上朱雀&/span&&span class=&err&&,&/span&&span class=&o&&&&/span&&span class=&n&&li&/span&&span class=&o&&&&/span&&span class=&n&&下玄武&/span&&span class=&err&&。&/span&&span class=&o&&&/&/span&&span class=&n&&li&/span&&span class=&o&&&&/&/span&&span class=&n&&ul&/span&&span class=&o&&&&/span&&span class=&n&&老牛在当中&/span&&span class=&err&&,&/span&&span class=&o&&&/&/span&&span class=&n&&span&/span&&span class=&o&&&&/span&&span class=&n&&龙头在胸口&/span&&span class=&err&&。&/span&&span class=&o&&&&/span&&span class=&n&&div&/span&&span class=&o&&&&/span&
&/code&&/pre&&/div&&p&使用xpath的string(.)&/p&&div class=&highlight&&&pre&&code class=&language-python3&&&span&&/span&&span class=&ch&&#!/usr/bin/env python&/span&
&span class=&c1&&# -*- coding: utf-8 -*-&/span&
&span class=&kn&&from&/span& &span class=&nn&&scrapy.selector&/span& &span class=&k&&import&/span& &span class=&n&&Selector&/span&
&span class=&n&&text&/span& &span class=&o&&=&/span& &span class=&s1&&'&div id=&test3&&我左青龙,&span id=&tiger&&右白虎,&ul&上朱雀,&li&下玄武。&/li&&/ul&老牛在当中,&/span&龙头在胸口。&div&'&/span&
&span class=&n&&s&/span& &span class=&o&&=&/span& &span class=&n&&Selector&/span&&span class=&p&&(&/span&&span class=&n&&text&/span&&span class=&o&&=&/span&&span class=&n&&text&/span&&span class=&p&&)&/span&
&span class=&n&&data&/span& &span class=&o&&=&/span& &span class=&n&&s&/span&&span class=&o&&.&/span&&span class=&n&&xpath&/span&&span class=&p&&(&/span&&span class=&s1&&'//div[@id=&test3&]'&/span&&span class=&p&&)&/span&
&span class=&n&&info&/span& &span class=&o&&=&/span& &span class=&n&&data&/span&&span class=&o&&.&/span&&span class=&n&&xpath&/span&&span class=&p&&(&/span&&span class=&s1&&'string(.)'&/span&&span class=&p&&)&/span&&span class=&o&&.&/span&&span class=&n&&extract&/span&&span class=&p&&()[&/span&&span class=&mi&&0&/span&&span class=&p&&]&/span&
&span class=&nb&&print&/span&&span class=&p&&(&/span&&span class=&n&&info&/span&&span class=&p&&)&/span&
&span class=&c1&&# output: 我左青龙,右白虎,上朱雀,下玄武。老牛在当中,龙头在胸口。&/span&
&/code&&/pre&&/div&&p&&br&&/p&&p&&b&如何解决详情页面元素改变&/b&&/p&&p&这个问题是这样产生的,在很多PC站,比如链家,这个页面有这些字段A,但是下个页面这个字段A没了,取而代之的是字段B,在xpath定位时就失效了。这个问题很常见,大体思路是这样的。&/p&&ol&&li&创建一个包含所有字段的dict: &code&data = {}.fromkeys(('url', 'price', 'address'))&/code&&/li&&li&然后根据网页中是否有字段来取值,例如,有'url'就取对应的value,没有则为空&/li&&li&这样就可以完美解决匹配不全问题&/li&&/ol&&h2&Scrapy 相关&/h2&&p&&br&&/p&&p&&b&文件编写&/b&&/p&&p&逻辑文件和解析部分分开写,匹配文件目录是&code&utils/parse/&/code&,爬虫文件目录是&code&spiders/&/code&&/p&&p&便于后期更改与处理 &/p&&p&&br&&/p&&p&&b&Scrapy 中文乱码&/b&&/p&&p&在 &code&setting&/code& 文件中设置:&code&FEED_EXPORT_ENCODING = 'utf-8'&/code& &/p&&p&&br&&/p&&p&&b&Scrapy 使用Mongo&/b&&/p&&p&&code&pipelines.py&/code& &/p&&ol&&li&首先我们要从settings文件中读取数据的地址、端口、数据库名称。&/li&&li&拿到数据库的基本信息后进行连接。&/li&&li&将数据写入数据库(update制定唯一键)&/li&&li&关闭数据库&/li&&/ol&&p&&br&&/p&&blockquote&注意:只有打开和关闭是只执行一次,而写入操作会根据具体的写入次数而定。&br&
Redis 无需关闭&/blockquote&&div class=&highlight&&&pre&&code class=&language-python3&&&span&&/span&&span class=&kn&&import&/span& &span class=&nn&&pymongo&/span&
&span class=&k&&class&/span& &span class=&nc&&MongoDBPipeline&/span&&span class=&p&&(&/span&&span class=&nb&&object&/span&&span class=&p&&):&/span&
&span class=&sd&&&&&&/span&
&span class=&sd&&
1、连接数据库操作&/span&
&span class=&sd&&
&&&&/span&
&span class=&k&&def&/span& &span class=&nf&&__init__&/span&&span class=&p&&(&/span&&span class=&bp&&self&/span&&span class=&p&&,&/span&&span class=&n&&mongourl&/span&&span class=&p&&,&/span&&span class=&n&&mongoport&/span&&span class=&p&&,&/span&&span class=&n&&mongodb&/span&&span class=&p&&):&/span&
&span class=&sd&&'''&/span&
&span class=&sd&&
初始化mongodb数据的url、端口号、数据库名称&/span&
&span class=&sd&&
:param mongourl:&/span&
&span class=&sd&&
:param mongoport:&/span&
&span class=&sd&&
:param mongodb:&/span&
&span class=&sd&&
'''&/span&
&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&mongourl&/span& &span class=&o&&=&/span& &span class=&n&&mongourl&/span&
&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&mongoport&/span& &span class=&o&&=&/span& &span class=&n&&mongoport&/span&
&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&mongodb&/span& &span class=&o&&=&/span& &span class=&n&&mongodb&/span&
&span class=&nd&&@classmethod&/span&
&span class=&k&&def&/span& &span class=&nf&&from_crawler&/span&&span class=&p&&(&/span&&span class=&n&&cls&/span&&span class=&p&&,&/span&&span class=&n&&crawler&/span&&span class=&p&&):&/span&
&span class=&sd&&&&&&/span&
&span class=&sd&&
1、读取settings里面的mongodb数据的url、port、DB。&/span&
&span class=&sd&&
:param crawler:&/span&
&span class=&sd&&
:return:&/span&
&span class=&sd&&
&&&&/span&
&span class=&k&&return&/span& &span class=&n&&cls&/span&&span class=&p&&(&/span&
&span class=&n&&mongourl&/span& &span class=&o&&=&/span& &span class=&n&&crawler&/span&&span class=&o&&.&/span&&span class=&n&&settings&/span&&span class=&o&&.&/span&&span class=&n&&get&/span&&span class=&p&&(&/span&&span class=&s2&&&MONGO_URL&&/span&&span class=&p&&),&/span&
&span class=&n&&mongoport&/span& &span class=&o&&=&/span& &span class=&n&&crawler&/span&&span class=&o&&.&/span&&span class=&n&&settings&/span&&span class=&o&&.&/span&&span class=&n&&get&/span&&span class=&p&&(&/span&&span class=&s2&&&MONGO_PORT&&/span&&span class=&p&&),&/span&
&span class=&n&&mongodb&/span& &span class=&o&&=&/span& &span class=&n&&crawler&/span&&span class=&o&&.&/span&&span class=&n&&settings&/span&&span class=&o&&.&/span&&span class=&n&&get&/span&&span class=&p&&(&/span&&span class=&s2&&&MONGO_DB&&/span&&span class=&p&&)&/span&
&span class=&p&&)&/span&
&span class=&k&&def&/span& &span class=&nf&&open_spider&/span&&span class=&p&&(&/span&&span class=&bp&&self&/span&&span class=&p&&,&/span&&span class=&n&&spider&/span&&span class=&p&&):&/span&
&span class=&sd&&'''&/span&
&span class=&sd&&
1、连接mongodb数据&/span&
&span class=&sd&&
:param spider:&/span&
&span class=&sd&&
:return:&/span&
&span class=&sd&&
'''&/span&
&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&client&/span& &span class=&o&&=&/span& &span class=&n&&pymongo&/span&&span class=&o&&.&/span&&span class=&n&&MongoClient&/span&&span class=&p&&(&/span&&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&mongourl&/span&&span class=&p&&,&/span&&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&mongoport&/span&&span class=&p&&)&/span&
&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&db&/span& &span class=&o&&=&/span& &span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&client&/span&&span class=&p&&[&/span&&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&mongodb&/span&&span class=&p&&]&/span&
&span class=&k&&def&/span& &span class=&nf&&process_item&/span&&span class=&p&&(&/span&&span class=&bp&&self&/span&&span class=&p&&,&/span&&span class=&n&&item&/span&&span class=&p&&,&/span&&span class=&n&&spider&/span&&span class=&p&&):&/span&
&span class=&sd&&'''&/span&
&span class=&sd&&
1、将数据写入数据库&/span&
&span class=&sd&&
:param item:&/span&
&span class=&sd&&
:param spider:&/span&
&span class=&sd&&
:return:&/span&
&span class=&sd&&
'''&/span&
&span class=&n&&name&/span& &span class=&o&&=&/span& &span class=&n&&item&/span&&span class=&o&&.&/span&&span class=&n&&__class__&/span&&span class=&o&&.&/span&&span class=&n&&__name__&/span&
&span class=&c1&&# self.db[name].insert(dict(item))&/span&
&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&db&/span&&span class=&p&&[&/span&&span class=&s1&&'user'&/span&&span class=&p&&]&/span&&span class=&o&&.&/span&&span class=&n&&update&/span&&span class=&p&&({&/span&&span class=&s1&&'url_token'&/span&&span class=&p&&:&/span&&span class=&n&&item&/span&&span class=&p&&[&/span&&span class=&s1&&'url_token'&/span&&span class=&p&&]},{&/span&&span class=&s1&&'$set'&/span&&span class=&p&&:&/span&&span class=&n&&item&/span&&span class=&p&&},&/span&&span class=&kc&&True&/span&&span class=&p&&)&/span&
&span class=&k&&return&/span& &span class=&n&&item&/span&
&span class=&k&&def&/span& &span class=&nf&&close_spider&/span&&span class=&p&&(&/span&&span class=&bp&&self&/span&&span class=&p&&,&/span&&span class=&n&&spider&/span&&span class=&p&&):&/span&
&span class=&sd&&'''&/span&
&span class=&sd&&
1、关闭数据库连接&/span&
&span class=&sd&&
:param spider:&/span&
&span class=&sd&&
:return:&/span&
&span class=&sd&&
'''&/span&
&span class=&bp&&self&/span&&span class=&o&&.&/span&&span class=&n&&client&/span&&span class=&o&&.&/span&&span class=&n&&close&/span&&span class=&p&&()&/span&
&/code&&/pre&&/div&&p&&br&&/p&&p&&b&scrapy图片下载&/b&&/p&&div class=&highlight&&&pre&&code class=&language-python3&&&span&&/span&&span class=&kn&&import&/span& &span class=&nn&&scrapy&/span&
&span class=&kn&&from&/span& &span class=&nn&&scrapy.pipelines.images&/span& &span class=&k&&import&/span& &span class=&n&&ImagesPipeline&/span&
&span class=&kn&&from&/span& &span class=&nn&&scrapy.exceptions&/span& &span class=&k&&import&/span& &span class=&n&&DropItem&/span&
&span class=&k&&class&/span& &span class=&nc&&MyImagesPipeline&/span&&span class=&p&&(&/span&&span class=&n&&ImagesPipeline&/span&&span class=&p&&):&/span&
&span class=&k&&def&/span& &span class=&nf&&get_media_requests&/span&&span class=&p&&(&/span&&span class=&bp&&self&/span&&span class=&p&&,&/span& &span class=&n&&item&/span&&span class=&p&&,&/span& &span class=&n&&info&/span&&span class=&p&&):&/span&
&span class=&k&&for&/span& &span class=&n&&image_url&/span& &span class=&ow&&in&/span& &span class=&n&&item&/span&&span class=&p&&[&/span&&span class=&s1&&'image_urls'&/span&&span class=&p&&]:&/span&
&span class=&k&&yield&/span& &span class=&n&&scrapy&/span&&span class=&o&&.&/span&&span class=&n&&Request&/span&&span class=&p&&(&/span&&span class=&n&&image_url&/span&&span class=&p&&)&/span&
&span class=&k&&def&/span& &span class=&nf&&item_completed&/span&&span class=&p&&(&/span&&span class=&bp&&self&/span&&span class=&p&&,&/span& &span class=&n&&results&/span&&span class=&p&&,&/span& &span class=&n&&item&/span&&span class=&p&&,&/span& &span class=&n&&info&/span&&span class=&p&&):&/span&
&span class=&n&&image_paths&/span& &span class=&o&&=&/span& &span class=&p&&[&/span&&span class=&n&&x&/span&&span class=&p&&[&/span&&span class=&s1&&'path'&/span&&span class=&p&&]&/span& &span class=&k&&for&/span& &span class=&n&&ok&/span&&span class=&p&&,&/span& &span class=&n&&x&/span& &span class=&ow&&in&/span& &span class=&n&&results&/span& &span class=&k&&if&/span& &span class=&n&&ok&/span&&span class=&p&&]&/span&
&span class=&k&&if&/span& &span class=&ow&&not&/span& &span class=&n&&image_paths&/span&&span class=&p&&:&/span&
&span class=&k&&raise&/span& &span class=&n&&DropItem&/span&&span class=&p&&(&/span&&span class=&s2&&&Item contains no images&&/span&&span class=&p&&)&/span&
&span class=&n&&item&/span&&span class=&p&&[&/span&&span class=&s1&&'image_paths'&/span&&span class=&p&&]&/span& &span class=&o&&=&/span& &span class=&n&&image_paths&/span&
&span class=&k&&return&/span& &span class=&n&&item&/span&
&/code&&/pre&&/div&&p&&br&&/p&&p&&b&scrapy 暂停爬虫&/b&&/p&&p&&code&scrapy crawl somespider -s JOBDIR=crawls/somespider-1&/code&&/p&&p&&br&&/p&&p&&b&scrapy_redis 分布式&/b&&/p&&p&使用队列与去重即可完成分布式需求,需要注意的是 Redis 格式,默认采用的是 &code&list&/code&, 可以在 &code&settings.py&/code& 文件中设置 &code&REDIS_START_URLS_AS_SET = True&/code&,使用 &code&Redis&/code&的 &code&set&/code&类型(去重种子链接)&/p&&h2&安装&/h2&&p&&br&&/p&&p&&b&超时问题&/b&&/p&&p&自定义超时时间&/p&&p&&code&sudo pip3 --default-timeout=100 install -U scrapy&/code&&/p&&p&或者 使用其他源&/p&&p&&code&sudo pip3 install scrapy -i https://pypi.tuna.tsinghua.edu.cn/simple&/code&&/p&&p&&br&&/p&&p&&b&权限问题&/b&&/p&&p&安装某模块时,报错:&code&PermissionError: [WinError 5] 拒绝访问。: 'c:\\program files\\python35\\Lib\\sit&br&e-packages\\lxml'&/code&&/p&&p&最简单方法:&code&pip install --user lxml&/code&&/p&&h2&Pycharm 相关&/h2&&p&&br&&/p&&p&&b&.gitignore 文件&/b&&/p&&p&安装插件: &code&Preferences & Plugins & Browse repositories... & Search for &.ignore& & Install Plugin&/code&&/p&&p&然后就可以很方便的添加到 .gitignore &/p&&p&&br&&/p&&figure&&img src=&https://pic2.zhimg.com/v2-a8f05bb9e61fe_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&488& data-rawheight=&781& class=&origin_image zh-lightbox-thumb& width=&488& data-original=&https://pic2.zhimg.com/v2-a8f05bb9e61fe_r.jpg&&&/figure&&p&&br&&/p&&p&&b&显示函数&/b&&/p&&p&&br&&/p&&figure&&img src=&https://pic3.zhimg.com/v2-32c48cc06a32bd06fbc3a_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&276& data-rawheight=&539& class=&content_image& width=&276&&&/figure&&p&&br&&/p&&p&点击 &code&Show Members&/code&,查看目录,会显示相应的类和函数&/p&&p&&br&&/p&&p&&b&激活码&/b&&/p&&ol&&li&&a href=&http://link.zhihu.com/?target=http%3A//idea.liyang.io& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&idea.liyang.io&/span&&span class=&invisible&&&/span&&/a&&/li&&li&&a href=&http://link.zhihu.com/?target=http%3A//xidea.online& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&xidea.online&/span&&span class=&invisible&&&/span&&/a&&/li&&/ol&&blockquote&不要更到最新版本&/blockquote&&p&&br&&/p&&h2&数据&/h2&&p&&br&&/p&&p&&b&Mongo导出命令&/b&&/p&&p&&code&λ mongoexport -d test -c set --type=csv -f name,age -o set.csv&/code&&/p&&p&&code&λ mongoexport -h 10.10.10.11 -d test -c test --type=csv -f url,id,title -o data.csv&/code&&/p&&h2&其他&/h2&&p&&br&&/p&&p&&b&requirements.txt 文件&/b&&/p&&p&小提示:使用 &a href=&http://link.zhihu.com/?target=https%3A//github.com/damnever/pigar& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&pigar&/a& 可以一键生成 requirements.txt 文件&/p&&p&Installation:&code&pip install pigar&/code&&/p&&p&Usage:&code&pigar&/code& &/p&&figure&&img src=&https://pic1.zhimg.com/v2-5e9b15ec32cde566d55e2d0abab4d5ae_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&1070& data-rawheight=&634& data-thumbnail=&https://pic3.zhimg.com/v2-5e9b15ec32cde566d55e2d0abab4d5ae_b.jpg& class=&origin_image zh-lightbox-thumb& width=&1070& data-original=&https://pic3.zhimg.com/v2-5e9b15ec32cde566d55e2d0abab4d5ae_r.jpg&&&/figure&&p&&br&&/p&&p&好了,今天先写这点,以后再补上。&/p&&p&欢迎来公众号(zhangslob)与我交流&/p&&p&&/p&&p&&/p&
长期更新 (?o . o?) 平时有个习惯,会把自己的笔记写在有道云里面,现在做个整理。会长期更新,因为我是BUG制造机。 解析 xpath提取所有节点文本&div id="test3"&我左青龙,&span id="tiger"&右白虎,&ul&上朱雀,&li&…
&figure&&img src=&https://pic3.zhimg.com/v2-b8fd8a32f0dfab58b37e386_b.jpg& data-rawwidth=&872& data-rawheight=&604& class=&origin_image zh-lightbox-thumb& width=&872& data-original=&https://pic3.zhimg.com/v2-b8fd8a32f0dfab58b37e386_r.jpg&&&/figure&&p&以前参考别人的代码,用Python做了一个12306命令行式的火车票查询工具,感觉还挺有意思的!于是自己又做了一个类似的——携程机票查询器。&/p&&p&携程官网查询的效果是这样的:&/p&&figure&&img src=&https://pic1.zhimg.com/v2-89f157c0a8bd7d891fa44_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&1190& data-rawheight=&874& class=&origin_image zh-lightbox-thumb& width=&1190& data-original=&https://pic1.zhimg.com/v2-89f157c0a8bd7d891fa44_r.jpg&&&/figure&&p&Python命令行界面查询的效果是这样的:&/p&&figure&&img src=&https://pic3.zhimg.com/v2-b8fd8a32f0dfab58b37e386_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&872& data-rawheight=&604& class=&origin_image zh-lightbox-thumb& width=&872& data-original=&https://pic3.zhimg.com/v2-b8fd8a32f0dfab58b37e386_r.jpg&&&/figure&&p&输入出发地、目的地、乘机日期,即可看到可选的航班、机场、出发到达时间、票价等信息。&/p&&p&视频演示效果如下:&/p&&a class=&video-box& href=&http://link.zhihu.com/?target=https%3A//www.zhihu.com/video/767936& target=&_blank& data-video-id=&& data-video-playable=&true& data-name=&& data-poster=&https://pic2.zhimg.com/80/v2-029a73d6a55f5a1c9ed8d56c1c84f71d_b.jpg& data-lens-id=&767936&&
&img class=&thumbnail& src=&https://pic2.zhimg.com/80/v2-029a73d6a55f5a1c9ed8d56c1c84f71d_b.jpg&&&span class=&content&&
&span class=&title&&&span class=&z-ico-extern-gray&&&/span&&span class=&z-ico-extern-blue&&&/span&&/span&
&span class=&url&&&span class=&z-ico-video&&&/span&https://www.zhihu.com/video/767936&/span&
&p&程序的源码如下:&/p&&p&1.air_stations.py&/p&&p&2.airline_ticket.py&/p&&div class=&highlight&&&pre&&code class=&language-java&&&span&&/span&&span class=&err&&#&/span&&span class=&mf&&1.&/span&&span class=&n&&air_stations&/span&&span class=&o&&.&/span&&span class=&na&&py&/span&
&span class=&kn&&import&/span& &span class=&nn&&re&/span&
&span class=&kn&&import&/span& &span class=&nn&&os&/span&
&span class=&kn&&import&/span& &span class=&nn&&json&/span&
&span class=&kn&&import&/span& &span class=&nn&&requests&/span&
&span class=&n&&from&/span& &span class=&n&&pprint&/span& &span class=&kn&&import&/span& &span class=&nn&&pprint&/span&
&span class=&n&&url&/span& &span class=&o&&=&/span& &span class=&err&&'&/span&&span class=&n&&http&/span&&span class=&o&&:&/span&&span class=&c1&&//webresource.c-ctrip.com/code/cquery/resource/address/flight/flight_new_poi_gb2312.js?CR__00_00_00'&/span&
&span class=&n&&response&/span& &span class=&o&&=&/span& &span class=&n&&requests&/span&&span class=&o&&.&/span&&span class=&na&&get&/span&&span class=&o&&(&/span&&span class=&n&&url&/span&&span class=&o&&,&/span&&span class=&n&&verify&/span&&span class=&o&&=&/span&&span class=&n&&False&/span&&span class=&o&&)&/span&
&span class=&n&&station&/span& &span class=&o&&=&/span& &span class=&n&&re&/span&&span class=&o&&.&/span&&span class=&na&&findall&/span&&span class=&o&&(&/span&&span class=&n&&u&/span&&span class=&err&&'&/span&&span class=&o&&([&/span&&span class=&err&&\&/span&&span class=&n&&u4e00&/span&&span class=&o&&-&/span&&span class=&err&&\&/span&&span class=&n&&u9fa5&/span&&span class=&o&&]+)&/span&&span class=&err&&\&/span&&span class=&o&&(([&/span&&span class=&n&&A&/span&&span class=&o&&-&/span&&span class=&n&&Z&/span&&span class=&o&&]+)&/span&&span class=&err&&\&/span&&span class=&o&&)&/span&&span class=&err&&'&/span&&span class=&o&&,&/span& &span class=&n&&response&/span&&span class=&o&&.&/span&&span class=&na&&text&/span&&span class=&o&&)&/span&
&span class=&n&&stations&/span& &span class=&o&&=&/span& &span class=&n&&dict&/span&&span class=&o&&(&/span&&span class=&n&&station&/span&&span class=&o&&)&/span&
&span class=&n&&pprint&/span&&span class=&o&&(&/span&&span class=&n&&stations&/span&&span class=&o&&,&/span&&span class=&n&&indent&/span& &span class=&o&&=&/span& &span class=&mi&&4&/span&&span class=&o&&)&/span&
&/code&&/pre&&/div&&p&&br&&/p&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&2.airline_ticket.py
#此程序可用于查询携程机票,查询需要指定出发日期、出发城市、目的城市!(模仿了12306火车订票查询程序)
import requests,json,os
from docopt import docopt
from prettytable import PrettyTable
from colorama import init,Fore
from air_stations import stations
fromCity = input('Please input the city you want leave :')
toCity = input('Please input the city you will arrive :')
tripDate = input('Please input the date(Example:) :')
class TrainsCollection:
header = '航空公司 航班 机场 时间 机票价格 机场建设费'.split()
def __init__(self,airline_tickets):
self.airline_tickets = airline_tickets
def plains(self):
#航空公司的总表没有找到,但是常见航空公司也不是很多就暂时用这个dict{air_company}来收集!
#如果strs没有查询成功,则会返回一个KeyError,表示此dict中未找到目标航空公司,则会用其英文代码显示!
air_company = {&G5&:&华夏航空&,&9C&:&春秋航空&,&MU&:&东方航空&,&NS&:&河北航空&,&HU&:&海南航空&,&HO&:&吉祥航空&,&CZ&:&南方航空&,&FM&:&上海航空&,&ZH&:&深圳航空&,&MF&:&厦门航空&,&CA&:&中国国航&,&KN&:&中国联航&}
for item in self.airline_tickets:
strs = air_company[item['alc']]
except KeyError:
strs = item['alc']
airline_data = [
Fore.BLUE + strs + Fore.RESET,
Fore.BLUE + item['fn'] + Fore.RESET,
'\n'.join([Fore.YELLOW + item['dpbn'] + Fore.RESET,
Fore.CYAN + item['apbn'] + Fore.RESET]),
'\n'.join([Fore.YELLOW + item['dt'] + Fore.RESET,
Fore.CYAN + item['at'] + Fore.RESET]),
item['lp'],
item['tax'],
yield airline_data
def pretty_print(self):
#PrettyTable()用于在屏幕上将查询到的航班信息表逐行打印到终端
pt = PrettyTable()
pt._set_field_names(self.header)
for airline_data in self.plains:
pt.add_row(airline_data)
def doit():
headers = {
&Cookie&:&自定义&,
&User-Agent&: &自定义&,
arguments = {
'from':fromCity,
'to':toCity,
'date':tripDate
DCity1 = stations[arguments['from']]
ACity1 = stations[arguments['to']]
DDate1 = arguments['date']
url = (&http://flights.ctrip.com/domesticsearch/search/SearchFirstRouteFlights?DCity1={}&ACity1={}&SearchType=S&DDate1={}&).format(DCity1,ACity1,DDate1)
r = requests.get(url,headers = headers,verify=False)
except Exception as e:
print(repr(e))
print(url)
airline_tickets = r.json()['fis']
TrainsCollection(airline_tickets).pretty_print()
if __name__ == '__main__':
&/code&&/pre&&/div&&p&其实,此小程序还可以拓展,譬如将查询记录存到本地电脑(txt格式、或者存到数据库里)或者更厉害的还可以设置定时自动查询;还可以设置查询到自动发邮箱提醒;还可以用Python的GUI库将此程序做成桌面软件的形式。。。。&/p&&p&&br&&/p&&p&学点编程,好处多多 &/p&
以前参考别人的代码,用Python做了一个12306命令行式的火车票查询工具,感觉还挺有意思的!于是自己又做了一个类似的——携程机票查询器。携程官网查询的效果是这样的:Python命令行界面查询的效果是这样的:输入出发地、目的地、乘机日期,即可看到可选的…
&p&作为一个学编程的,找资源这种事,肯定不能像普通老百姓一样打开百度盲目查找啦。此时你就需要大喊一声 Python 大法好。 近日无意中看到了一个不错的网站,心想着就把它利用起来吧,就写了一个磁力资源获取器命令行工具。&/p&&h2&开发环境&/h2&&p&Windows10 + Python3&/p&&h2&安装&/h2&&h2&pip 安装&/h2&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&$ pip install torrent-cli
&/code&&/pre&&/div&&h2&源码安装&/h2&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&$ git clone https://github.com/chenjiandongx/torrent-cli.git
$ cd torrent-cli
$ pip install -r requirements.txt
$ python setup.py install
&/code&&/pre&&/div&&h2&用法&/h2&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&C:\Users\chenjiandongx&torrent-cli
usage: torrent-cli [-h] [-k KEYWORD] [-n NUM] [-s SORT_BY] [-o OUTPUT] [-p]
Magnets-Getter CLI Tools.
optional arguments:
-h, --help
show this help message and exit
-k KEYWORD, --keyword KEYWORD
magnet keyword.
-n NUM, --num NUM
magnet number.(default 20)
-s SORT_BY, --sort-by SORT_BY
0: Sort by date,1: Sort by size.(default 0)
-o OUTPUT, --output OUTPUT
output file path, supports csv and json format.
-p, --pretty-oneline
show magnets info with one line.
-v, --version
version information.
&/code&&/pre&&/div&&h2&简单示范&/h2&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&C:\Users\chenjiandongx&torrent-cli -k 战狼2
Crawling data for you.....
磁链: magnet:?xt=urn:btih:7ccaddac5d1cceee047
名称: 战狼2.Wolf.Warriors.2.p.WEB-DL.X264.AAC-国语中字-RARBT
大小: 2.85 GB
磁链: magnet:?xt=urn:btih:b441bedeb5d64b5843
名称: 【百度搜:爱诺影视】战狼2.2017.HD1080P.国语中字
大小: 2.1 GB
磁链: magnet:?xt=urn:btih:621cbac6d305a625ecbc81
名称: 战狼2.新战狼.2017.HD2160P.X264.AAC.国语中字
大小: 5.43 GB
磁链: magnet:?xt=urn:btih:a8eccdababa8616c21
名称: 战狼2.Wolf.Warriors.2.p.WEB-DL.X264.AAC-bbs.homefei.me
大小: 3.5 GB
磁链: magnet:?xt=urn:btih:bf62ed12
名称: 战狼2.新战狼.2017.HD1080P.X264.AAC.国语中字
大小: 2.61 GB
&/code&&/pre&&/div&&p&单行显示并按大小排序&/p&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&C:\Users\chenjiandongx&torrent-cli -k 战狼2 -p -s 1
Crawling data for you.....
magnet:?xt=urn:btih:bd6fef7ca28 7.75 GB
magnet:?xt=urn:btih:8cdba300f4cdd.49 GB
magnet:?xt=urn:btih:7e364fe1efc2cdc34df90fe 5.44 GB
magnet:?xt=urn:btih:fad291b24cbef6fa6ce127ae GB
magnet:?xt=urn:btih:de42bc281cf39f0f489b64f06cc83 5.44 GB
magnet:?xt=urn:btih:621cbac6d305a625ecbc81 5.43 GB
magnet:?xt=urn:btih:bf2f9dd23e3fbfca 3.56 GB
magnet:?xt=urn:btih:a8eccdababa.5 GB
magnet:?xt=urn:btih:f26ceda5aa89d6db067d751c0d3f 2.76 GB
magnet:?xt=urn:btih:244d7f5b281b0ccbddf4 2.72 GB
magnet:?xt=urn:btih:6bb32d1fd3ff23f163c8af21c27d52 2.69 GB
magnet:?xt=urn:btih:c60e1fce8d3bfdfd4f9 2.69 GB
magnet:?xt=urn:btih:bf62ed12 2.61 GB
magnet:?xt=urn:btih:770ad4fbdf7d7aa97d2dd8afa691b5 2.55 GB
magnet:?xt=urn:btih:5e6bce50844bfb85 2.32 GB
magnet:?xt=urn:btih:34f0c102d57a9ae77a93a22de7aaa 2.32 GB
magnet:?xt=urn:btih:2fb0b1aa149c0a2f7d 2.32 GB
magnet:?xt=urn:btih:9aabcd27ca36d8c15ae56ccf5b0b 2.32 GB
magnet:?xt=urn:btih:afbda9a99b38f4fb974931fac6bf0 2.32 GB
magnet:?xt=urn:btih:b2489aed91b9a154bcb3ca1707bd 720.82 MB
&/code&&/pre&&/div&&p&或者可以保存为 csv 或者 json 文件&/p&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&C:\Users\chenjiandongx&torrent-cli -k 战狼2 -o movie.csv
&/code&&/pre&&/div&&p&&br&&/p&&p&项目地址:&a href=&https://link.zhihu.com/?target=https%3A//github.com/chenjiandongx/torrent-cli& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&chenjiandongx/torrent-cli&/a&&/p&&p&先 star 后上车!&/p&&p&&br&&/p&&p& 更新&/p&&p&——————————————————————————————————————&/p&&p&项目好像上了 Python Trending ???&/p&&figure&&img src=&https://pic2.zhimg.com/v2-4e26d7fc29f5a3deb2ebea432c3949c8_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&951& data-rawheight=&191& class=&origin_image zh-lightbox-thumb& width=&951& data-original=&https://pic2.zhimg.com/v2-4e26d7fc29f5a3deb2ebea432c3949c8_r.jpg&&&/figure&&p&&/p&
作为一个学编程的,找资源这种事,肯定不能像普通老百姓一样打开百度盲目查找啦。此时你就需要大喊一声 Python 大法好。 近日无意中看到了一个不错的网站,心想着就把它利用起来吧,就写了一个磁力资源获取器命令行工具。开发环境Windows10 + Python3安装pi…
&figure&&img src=&https://pic1.zhimg.com/v2-d3158bac6f_b.jpg& data-rawwidth=&2396& data-rawheight=&1404& class=&origin_image zh-lightbox-thumb& width=&2396& data-original=&https://pic1.zhimg.com/v2-d3158bac6f_r.jpg&&&/figure&&p&&b&前言: 本文涉及知识点有数据库的读写,python基础,浏览器开发者工具的使用,适用于有编程基础,了解过python的朋友阅读。&/b&&/p&&p&&b&环境:PyCharm+Chrome+MongoDB
Window10&/b&&/p&&p&爬虫爬取数据的过程,也类似于普通用户打开网页的过程。所以当我们想要打开浏览器去获取好友空间的时候必定会要求进行登录,接着再是查看说说。那么我们先把登录步骤给解决了。&/p&&p&&b&1.模拟登录QQ空间&/b&&/p&&p&
因为想更直观的看到整个登录过程所以就没有用selenium+phantomjs,而是结合Chrome使用。除了slenium和Chrome之外还需要下载ChromeDriver进行使用,官网不提供win64版本的但是win32版本的也能正常在64位系统使用 我使用的是 2.30版本的ChromeDriver和61的Chrome
&/p&&p&ChromeDriver的下载地址:&a href=&https://link.zhihu.com/?target=https%3A//chromedriver.storage.googleapis.com/index.html& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&https://&/span&&span class=&visible&&chromedriver.storage.googleapis.com&/span&&span class=&invisible&&/index.html&/span&&span class=&ellipsis&&&/span&&/a&4&/p&&div class=&highlight&&&pre&&code class=&language-python&&&span&&/span&&span class=&kn&&from&/span& &span class=&nn&&selenium&/span& &span class=&kn&&import&/span& &span class=&n&&webdriver&/span&
&span class=&kn&&from&/span& &span class=&nn&&selenium.webdriver.common.by&/span& &span class=&kn&&import&/span& &span class=&n&&By&/span&
&span class=&k&&def&/span& &span class=&nf&&Start_Login&/span&&span class=&p&&():&/span&
&span class=&n&&driver&/span& &span class=&o&&=&/span& &span class=&n&&webdriver&/span&&span class=&o&&.&/span&&span class=&n&&Chrome&/span&&span class=&p&&(&/span&&span class=&n&&executable_path&/span&&span class=&o&&=&/span&&span class=&s1&&'D:&/span&&span class=&se&&\\&/span&&span class=&s1&&phantomjs-2.1.1&/span&&span class=&se&&\\&/span&&span class=&s1&&bin&/span&&span class=&se&&\\&/span&&span class=&s1&&chromedriver.exe'&/span&&span class=&p&&)&/span& &span class=&c1&&#这个是chormedriver的地址&/span&
&span class=&n&&driver&/span&&span class=&o&&.&/span&&span class=&n&&get&/span&&span class=&p&&(&/span&&span class=&s1&&'https://qzone.qq.com/'&/span&&span class=&p&&)&/span&
&span class=&n&&driver&/span&&span class=&o&&.&/span&&span class=&n&&switch_to&/span&&span class=&o&&.&/span&&span class=&n&&frame&/span&&span class=&p&&(&/span&&span class=&s1&&'login_frame'&/span&&span class=&p&&)&/span&
&span class=&n&&driver&/span&&span class=&o&&.&/span&&span class=&n&&find_element_by_id&/span&&span class=&p&&(&/span&&span class=&s1&&'switcher_plogin'&/span&&span class=&p&&)&/span&&span class=&o&&.&/span&&span class=&n&&click&/span&&span class=&p&&()&/span&
&span class=&n&&driver&/span&&span class=&o&&.&/span&&span class=&n&&find_element_by_id&/span&&span class=&p&&(&/span&&span class=&s1&&'u'&/span&&span class=&p&&)&/span&&span class=&o&&.&/span&&span class=&n&&clear&/span&&span class=&p&&()&/span&
&span class=&n&&driver&/span&&span class=&o&&.&/span&&span class=&n&&find_element_by_id&/span&&span class=&p&&(&/span&&span class=&s1&&'u'&/span&&span class=&p&&)&/span&&span class=&o&&.&/span&&span class=&n&&send_keys&/span&&span class=&p&&(&/span&&span class=&s1&&'yourQQCode'&/span&&span class=&p&&)&/span&
&span class=&c1&&#这里填写你的QQ号&/span&
&span class=&n&&driver&/span&&span class=&o&&.&/span&&span class=&n&&find_element_by_id&/span&&span class=&p&&(&/span&&span class=&s1&&'p'&/span&&span class=&p&&)&/span&&span class=&o&&.&/span&&span class=&n&&clear&/span&&span class=&p&&()&/span&
&span class=&n&&driver&/span&&span class=&o&&.&/span&&span class=&n&&find_element_by_id&/span&&span class=&p&&(&/span&&span class=&s1&&'p'&/span&&span class=&p&&)&/span&&span class=&o&&.&/span&&span class=&n&&send_keys&/span&&span class=&p&&(&/span&&span class=&s1&&'yourPasswords'&/span&&span class=&p&&)&/span&
&span class=&c1&&#这里填写你的QQ密码&/span&
&span class=&n&&driver&/span&&span class=&o&&.&/span&&span class=&n&&find_element_by_id&/span&&span class=&p&&(&/span&&span class=&s1&&'login_button'&/span&&span class=&p&&)&/span&&span class=&o&&.&/span&&span class=&n&&click&/span&&span class=&p&&()&/span&
&span class=&c1&&# 这里需要注意 因为我过去开过黄钻现在过期了每次登录会弹窗所以需要&/span&
&span class=&n&&WebDriverWait&/span&&span class=&p&&(&/span&&span class=&n&&driver&/span&&span class=&p&&,&/span& &span class=&mi&&20&/span&&span class=&p&&)&/span&&span class=&o&&.&/span&&span class=&n&&until&/span&&span class=&p&&(&/span&&span class=&n&&EC&/span&&span class=&o&&.&/span&&span class=&n&&presence_of_element_located&/span&&span class=&p&&((&/span&&span class=&n&&By&/span&&span class=&o&&.&/span&&span class=&n&&ID&/span&&span class=&p&&,&/span& &span class=&s1&&'qz_dialog_instance_qzDialog1'&/span&&span class=&p&&)))&/span&
&span class=&n&&driver&/span&&span class=&o&&.&/span&&span class=&n&&find_element_by_id&/span&&span class=&p&&(&/span&&span class=&s1&&'dialog_button_1'&/span&&span class=&p&&)&/span&&span class=&o&&.&/span&&span class=&n&&click&/span&&span class=&p&&()&/span&
&/code&&/pre&&/div&&p&&br&&/p&&p&&b&2.通过浏览器的开发者工具查看数据来源&/b&&/p&&p&在打开说说页面之前打开开发者工具点击NetWork选择XHR你会看到如下的几个网址通过查看Response&/p&&figure&&img src=&https://pic3.zhimg.com/v2-32fdb28c027a925e08bba_b.jpg& data-size=&normal& data-rawwidth=&465& data-rawheight=&375& class=&origin_image zh-lightbox-thumb& width=&465& data-original=&https://pic3.zhimg.com/v2-32fdb28c027a925e08bba_r.jpg&&&figcaption&筛选XHR后显示的地址&/figcaption&&/figure&&figure&&img src=&https://pic3.zhimg.com/v2-dbdceacb167c6_b.jpg& data-size=&normal& data-rawwidth=&1164& data-rawheight=&654& class=&origin_image zh-lightbox-thumb& width=&1164& data-original=&https://pic3.zhimg.com/v2-dbdceacb167c6_r.jpg&&&figcaption&获取数据所需的参数&/figcaption&&/figure&&p&通过多次请求发现不断改变g_tk值,但是这个值是通过加密算法得到的在网上查了下发现了这个加密算法&/p&&div class=&highlight&&&pre&&code class=&language-python&&&span&&/span&&span class=&c1&&# 这个函数用来解决腾讯g_tk加密算法的函数&/span&
&span class=&k&&def&/span& &span class=&nf&&get_g_tk&/span&&span class=&p&&(&/span&&span class=&n&&cookie&/span&&span class=&p&&):&/span&
&span class=&n&&hashes&/span& &span class=&o&&=&/span& &span class=&mi&&5381&/span&
&span class=&k&&for&/span& &span class=&n&&letter&/span& &span class=&ow&&in&/span& &span class=&n&&cookie&/span&&span class=&p&&[&/span&&span class=&s1&&'p_skey'&/span&&span class=&p&&]:&/span&
&span class=&n&&hashes&/span& &span class=&o&&+=&/span& &span class=&p&&(&/span&&span class=&n&&hashes&/span& &span class=&o&&&&&/span& &span class=&mi&&5&/span&&span class=&p&&)&/span& &span class=&o&&+&/span& &span class=&nb&&ord&/span&&span class=&p&&(&/span&&span class=&n&&letter&/span&&span class=&p&&)&/span&
&span class=&c1&&# ord()是用来返回字符的ascii码&/span&
&span class=&k&&return&/span& &span class=&n&&hashes&/span& &span class=&o&&&&/span& &span class=&mh&&0x7fffffff&/span&
&/code&&/pre&&/div&&p&&b&3.将数据获取并储存到数据库中&/b&&/p&&p&
既然已经搞清楚数据是怎么来的那么就可以开始考虑将获取的数据保存到数据库里了,这次我们选择的是MongoDB,MongoDB的数据储存格式为BSON类似于JSON。在获取过程需要考虑两个问题,一是你是否有权限访问该空间,二是在能访问的情况下不能无止境的爬下去需要判断该空间说说是否爬取完毕。在爬取过程中将不能访问的QQCode存入list在最后跑完的时候输出。事已至此经过漫长的等待以及和服务器不断的交互所有的数据都存入了数据库中接下来就该对数据进行处理了!!(是不是很激动)‘&/p&&p&’
&b& 注:好友的QQ号码是通过QQ邮箱里的通讯录导下来的csv文件,如果你只是想爬取自己的QQ需param里的‘QQCode’固定为自己的就行了,也就不需要外层遍历list里的QQ号了&/b&&/p&&div class=&highlight&&&pre&&code class=&language-python&&&span&&/span&&span class=&k&&def&/span& &span class=&nf&&Start_Spider&/span&&span class=&p&&():&/span&
&span class=&n&&sessions&/span& &span class=&o&&=&/span& &span class=&n&&requests&/span&&span class=&o&&.&/span&&span class=&n&&session&/span&&span class=&p&&()&/span&
&span class=&n&&namelist&/span& &span class=&o&&=&/span& &span class=&p&&[]&/span& &span class=&c1&&# 用户名&/span&
&span class=&n&&codelist&/span& &span class=&o&&=&/span& &span class=&p&&[]&/span& &span class=&c1&&# QQ号&/span&
&span class=&n&&black_list&/span& &span class=&o&&=&/span& &span class=&p&&[]&/span& &span class=&c1&&# 被拉黑的QQ号&/span&
&span class=&n&&header_list&/span& &span class=&o&&=&/span& &span class=&p&&[&/span&&span class=&n&&headers1&/span&&span class=&p&&,&/span& &span class=&n&&headers2&/span&&span class=&p&&,&/span& &span class=&n&&headers&/span&&span class=&p&&]&/span&
&span class=&c1&&# headersList&/span&
&span class=&n&&info&/span& &span class=&o&&=&/span& &span class=&nb&&dict&/span&&span class=&p&&()&/span&
&span class=&n&&i&/span& &span class=&o&&=&/span& &span class=&mi&&0&/span&
&span class=&n&&client&/span& &span class=&o&&=&/span& &span class=&n&&pymongo&/span&&span class=&o&&.&/span&&span class=&n&&MongoClient&/span&&span class=&p&&(&/span&&span class=&s1&&'localhost'&/span&&span class=&p&&,&/span& &span class=&mi&&27017&/span&&span class=&p&&)&/span&
&span class=&n&&db_name&/span& &span class=&o&&=&/span& &span class=&s1&&'QQZone'&/span&
&span class=&n&&db&/span& &span class=&o&&=&/span& &span class=&n&&client&/span&&span class=&p&&[&/span&&span class=&n&&db_name&/span&&span class=&p&&]&/span&
&span class=&n&&collection&/span& &span class=&o&&=&/span& &span class=&n&&db&/span&&span class=&p&&[&/span&&span class=&s1&&'QQ_moodfirst'&/span&&span class=&p&&]&/span&
&span class=&n&&namelist&/span&&span class=&p&&,&/span& &span class=&n&&codelist&/span&&span class=&p&&,&/span& &span class=&n&&info&/span& &span class=&o&&=&/span& &span class=&n&&read_csv&/span&&span class=&p&&()&/span&
&span class=&n&&cookies&/span&&span class=&p&&,&/span& &span class=&n&&g_tk&/span&&span class=&p&&,&/span& &span class=&n&&g_qzonetoken&/span& &span class=&o&&=&/span& &span class=&n&&Start_Login&/span&&span class=&p&&()&/span&
&span class=&k&&for&/span& &span class=&n&&QQCode&/span& &span class=&ow&&in&/span& &span class=&n&&codelist&/span&&span class=&p&&:&/span&
&span class=&k&&for&/span& &span class=&n&&i&/span& &span class=&ow&&in&/span& &span class=&nb&&range&/span&&span class=&p&&(&/span&&span class=&mi&&4000&/span&&span class=&p&&):&/span&
&span class=&n&&pos&/span& &span class=&o&&=&/span& &span class=&n&&i&/span&&span class=&o&&*&/span&&span class=&mi&&20&/span&
&span class=&n&&param&/span& &span class=&o&&=&/span& &span class=&p&&{&/span&
&span class=&s1&&'uin'&/span&&span class=&p&&:&/span& &span class=&n&&QQCode&/span&&span class=&p&&,&/span&
&span class=&s1&&'ftype'&/span&&span class=&p&&:&/span& &span class=&s1&&'0'&/span&&span class=&p&&,&/span&
&span class=&s1&&'sort'&/span&&span class=&p&&:&/span& &span class=&s1&&'0'&/span&&span class=&p&&,&/span&
&span class=&s1&&'pos'&/span&&span class=&p&&:&/span& &span class=&n&&pos&/span&&span class=&p&&,&/span&
&span class=&s1&&'num'&/span&&span class=&p&&:&/span& &span class=&s1&&'20'&/span&&span class=&p&&,&/span&
&span class=&s1&&'replynum'&/span&&span class=&p&&:&/span& &span class=&s1&&'100'&/span&&span class=&p&&,&/span&
&span class=&s1&&'g_tk'&/span&&span class=&p&&:&/span& &span class=&p&&[&/span&&span class=&n&&g_tk&/span&&span class=&p&&,&/span& &span class=&n&&g_tk&/span&&span class=&p&&],&/span&
&span class=&s1&&'callback'&/span&&span class=&p&&:&/span& &span class=&s1&&'_preloadCallback'&/span&&span class=&p&&,&/span&
&span class=&s1&&'code_version'&/span&&span class=&p&&:&/span& &span class=&s1&&'1'&/span&&span class=&p&&,&/span&
&span class=&s1&&'format'&/span&&span class=&p&&:&/span& &span class=&s1&&'jsonp'&/span&&span class=&p&&,&/span&
&span class=&s1&&'need_private_comment'&/span&&span class=&p&&:&/span& &span class=&s1&&'1'&/span&&span class=&p&&,&/span&
&span class=&s1&&'qzonetoken'&/span&&span class=&p&&:&/span& &span class=&n&&g_qzonetoken&/span&
&span class=&p&&}&/span&
&span class=&n&&respond&/span& &span class=&o&&=&/span& &span class=&n&&sessions&/span&&span class=&o&&.&/span&&span class=&n&&get&/span&&span class=&p&&(&/span&&span class=&s1&&'https://h5.qzone.qq.com/proxy/domain/taotao.qq.com/cgi-bin/emotion_cgi_msglist_v6'&/span&
&span class=&p&&,&/span& &span class=&n&&params&/span&&span class=&o&&=&/span&&span class=&n&&param&/span&&span class=&p&&,&/span& &span class=&n&&headers&/span&&span class=&o&&=&/span&&span class=&n&&headers&/span&&span class=&p&&,&/span& &span class=&n&&cookies&/span&&span class=&o&&=&/span&&span class=&n&&cookies&/span&&span class=&p&&)&/span&
&span class=&n&&r&/span& &span class=&o&&=&/span& &span class=&n&&re&/span&&span class=&o&&.&/span&&span class=&n&&sub&/span&&span class=&p&&(&/span&&span class=&s2&&&_preloadCallback&&/span&&span class=&p&&,&/span& &span class=&s2&&&&&/span&&span class=&p&&,&/span& &span class=&n&&respond&/span&&span class=&o&&.&/span&&span class=&n&&text&/span&&span class=&p&&)&/span&
&span class=&n&&test&/span& &span class=&o&&=&/span& &span class=&n&&r&/span&&span class=&p&&[&/span&&span class=&mi&&1&/span&&span class=&p&&:&/span&&span class=&o&&-&/span&&span class=&mi&&2&/span&&span class=&p&&]&/span&
&span class=&n&&Data&/span& &span class=&o&&=&/span& &span class=&n&&json&/span&&span class=&o&&.&/span&&span class=&n&&loads&/span&&span class=&p&&(&/span&&span class=&n&&test&/span&&span class=&p&&)&/span&
&span class=&k&&if&/span&&span class=&p&&(&/span&&span class=&n&&Data&/span&&span class=&p&&[&/span&&span class=&s1&&'message'&/span&&span class=&p&&]&/span& &span class=&o&&==&/span&&span class=&s1&&'对不起,主人设置了保密,您没有权限查看'&/span&&span class=&p&&):&/span&
&span class=&k&&print&/span&&span class=&p&&(&/span&&span class=&s1&&'不好意思你被&/span&&span class=&si&&%s&/span&&span class=&s1&&拉黑了'&/span& &span class=&o&&%&/span& &span class=&n&&QQCode&/span&&span class=&p&&)&/span&
&span class=&n&&black_list&/span&&span class=&o&&.&/span&&span class=&n&&append&/span&&span class=&p&&(&/span&&span class=&n&&QQCode&/span&&span class=&p&&)&/span&
&span class=&k&&break&/span&
&span class=&k&&else&/span&&span class=&p&&:&/span&
&span class=&k&&if&/span& &span class=&ow&&not&/span& &span class=&n&&re&/span&&span class=&o&&.&/span&&span class=&n&&search&/span&&span class=&p&&(&/span&&span class=&s1&&'lbs'&/span&&span class=&p&&,&/span& &span class=&n&&test&/span&&span class=&p&&):&/span&
&span class=&k&&print&/span&&span class=&p&&(&/span&&span class=&s1&&'&/span&&span class=&si&&%s&/span&&span class=&s1&&说说下载完成'&/span& &span class=&o&&%&/span& &span class=&n&&QQCode&/span&&span class=&p&&)&/span&
&span class=&k&&break&/span&
&span class=&k&&else&/span&&span class=&p&&:&/span&
&span class=&c1&&# print(Data['msglist'][0]['content'])&/span&
&span class=&n&&dictlist&/span& &span class=&o&&=&/span& &span class=&n&&handle_list&/span&&span class=&p&&(&/span&&span class=&n&&Data&/span&&span class=&p&&[&/span&&span class=&s1&&'msglist'&/span&&span class=&p&&])&/span&
&span class=&n&&time&/span&&span class=&o&&.&/span&&span class=&n&&sleep&/span&&span class=&p&&(&/span&&span class=&mi&&1&/span&&span class=&p&&)&/span&
&span class=&k&&try&/span&&span class=&p&&:&/span&
&span class=&n&&collection&/span&&span class=&o&&.&/span&&span class=&n&&insert&/span&&span class=&p&&(&/span&&span class=&n&&dictlist&/span&&span class=&p&&)&/span&
&span class=&k&&print&/span&&span class=&p&&(&/span&&span class=&s1&&'插入成功!'&/span&&span class=&p&&)&/span&
&span class=&k&&except&/span& &span class=&n&&pymongo&/span&&span class=&o&&.&/span&&span class=&n&&errors&/span&&span class=&o&&.&/span&&span class=&n&&DuplicateKeyError&/span&&span class=&p&&:&/span&
&span class=&k&&print&/span&&span class=&p&&(&/span&&span class=&s1&&'DuplicateKey'&/span&&span class=&p&&)&/span&
&span class=&k&&except&/span& &span class=&ne&&Exception&/span& &span class=&k&&as&/span& &span class=&n&&e&/span&&span class=&p&&:&/span&
&span class=&k&&print&/span&&span class=&p&&(&/span&&span class=&s1&&'e'&/span&&span class=&p&&)&/span&
&span class=&n&&i&/span&&span class=&o&&=&/span&&span class=&n&&i&/span&&span class=&o&&+&/span&&span class=&mi&&1&/span&
&span class=&n&&time&/span&&span class=&o&&.&/span&&span class=&n&&sleep&/span&&span class=&p&&(&/span&&span class=&mi&&3&/span&&span class=&p&&)&/span&
&span class=&k&&print&/span&&span class=&p&&(&/span&&span class=&n&&black_list&/span&&span class=&p&&)&/span&
&/code&&/pre&&/div&&p&&b&4.处理数据&/b&&/p&&p&&b&
大概爬下来11万条左右的数据&/b&&/p&&p&&br&&/p&&figure&&img src=&https://pic1.zhimg.com/v2-29b2a4dbe1aff_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&816& data-rawheight=&241& class=&origin_image zh-lightbox-thumb& width=&816& data-original=&https://pic1.zhimg.com/v2-29b2a4dbe1aff_r.jpg&&&/figure&&p&&b&
将获取的位置信息标记在地图上可以看到红点密集的地方大概也是旅游时大家比较想去的地方&/b&&/p&&figure&&img src=&https://pic3.zhimg.com/v2-4cc65c42a84c4fcadde9083_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&2396& data-rawheight=&1404& class=&origin_image zh-lightbox-thumb& width=&2396& data-original=&https://pic3.zhimg.com/v2-4cc65c42a84c4fcadde9083_r.jpg&&&/figure&&figure&&img src=&https://pic2.zhimg.com/v2-be0acca0c21311_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&3170& data-rawheight=&2007& class=&origin_image zh-lightbox-thumb& width=&3170& data-original=&https://pic2.zhimg.com/v2-be0acca0c21311_r.jpg&&&/figure&&p&&b&通过部分说说得到的发送设备信息&/b&&/p&&figure&&img src=&https://pic3.zhimg.com/v2-4c11de38a7e2b3c41d81e9_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&1934& data-rawheight=&1460& class=&origin_image zh-lightbox-thumb& width=&1934& data-original=&https://pic3.zhimg.com/v2-4c11de38a7e2b3c41d81e9_r.jpg&&&/figure&&figure&&img src=&https://pic1.zhimg.com/v2-1ad34d610d209b0bbb38_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&1085& data-rawheight=&637& class=&origin_image zh-lightbox-thumb& width=&1085& data-original=&https://pic1.zhimg.com/v2-1ad34d610d209b0bbb38_r.jpg&&&/figure&&figure&&img src=&https://pic1.zhimg.com/v2-65f90e0af24db74b9dc06_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&1082& data-rawheight=&635& class=&origin_image zh-lightbox-thumb& width=&1082& data-original=&https://pic1.zhimg.com/v2-65f90e0af24db74b9dc06_r.jpg&&&/figure&&figure&&img src=&https://pic4.zhimg.com/v2-d42eec791d36d35_b.jpg& data-caption=&& data-size=&normal& data-rawwidth=&3262& data-rawheight=&1724& class=&origin_image zh-lightbox-thumb& width=&3262& data-original=&https://pic4.zhimg.com/v2-d42eec791d36d35_r.jpg&&&/figure&&p&&br&&/p&&p&&b&作者:Tecmry&/b& &/p&
前言: 本文涉及知识点有数据库的读写,python基础,浏览器开发者工具的使用,适用于有编程基础,了解过python的朋友阅读。环境:PyCharm+Chrome+MongoDB Window10爬虫爬取数据的过程,也类似于普通用户打开网页的过程。所以当我们想要打开浏览器去获取好友…
&figure&&img src=&https://pic4.zhimg.com/v2-c909a7afa5b988e1b95b1e_b.jpg& data-rawwidth=&890& data-rawheight=&496& class=&origin_image zh-lightbox-thumb& width=&890& data-original=&https://pic4.zhimg.com/v2-c909a7afa5b988e1b95b1e_r.jpg&&&/figure&最近看到有一个百度图片下载的Python爬取,很感兴趣,并在原来的基础上将py文件转换为exe文件。具体操作如下:&p&环境:Python3.5&br&&/p&&p&1、写爬虫。&/p&&p&首先源码,源码及详细教程&a href=&https://link.zhihu.com/?target=http%3A//lovenight.github.io//Python-3-%25E5%25A4%259A%25E7%25BA%25BF%25E7%25A8%258B%25E4%25B8%258B%25E8%25BD%25BD%25E7%2599%25BE%25E5%25BA%25A6%25E5%259B%25BE%25E7%E6%E7%25B4%25A2%25E7%25BB%%259E%259C/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Python 3 多线程下载百度图片搜索结果&/a&&/p&&br&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&#coding:utf-8
import requests
import json
import itertools
import urllib
import sys
str_table = {
'_z2C$q': ':',
'_z&e3B': '.',
'AzdH3F': '/'
char_table = {
char_table = {ord(key): ord(value) for key, value in char_table.items()}
def decode(url):
for key,value in str_table.items():
url = url.replace(key,value)
return url.translate(char_table)
def buildUrls(word):
word = urllib.parse.quote(word)
url = r&http://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=&fp=result&queryWord={word}&cl=2&lm=-1&ie=utf-8&oe=utf-8&st=-1&ic=0&word={word}&face=0&istype=2nc=1&pn={pn}&rn=60&
urls = (url.format(word=word, pn=x) for x in itertools.count(start=0, step=60))
return urls
re_url = re.compile(r'&objURL&:&(.*?)&')
def resolveImgUrl(html):
imgUrls = [decode(x) for x in re_url.findall(html)]
return imgUrls
def downImg(imgUrl,dirpath,imgName):
filename = os.path.join(dirpath,imgName)
res = requests.get(imgUrl,timeout=15)
if str(res.status_code)[0] == '4':
print(str(res.status_code),&:&,imgUrl)
return False
except Exception as e:
print('抛出异常:',imgUrl)
return False
with open(filename+'.jpg','wb') as f:
f.write(res.content)
return True
def mkDir(dirName):
dirpath = os.path.join(sys.path[0], dirName)
if not os.path.exists(dirpath):
os.mkdir(dirpath)
return dirpath
if __name__ == '__main__':
print(&欢迎使用百度图片下载脚本!\n目前仅支持单个关键词。&)
print(&下载结果保存在脚本目录下的img文件夹中。&)
print(&=& * 50)
word = input(&请输入你要下载的图片关键词:\n&)
dirpath = mkDir(&img&)
urls = buildUrls(word)
for url in urls:
print(&正在请求:&, url)
html = requests.get(url, timeout=10).content.decode('utf-8')
imgUrls = resolveImgUrl(html)
if len(imgUrls) == 0:
# 没有图片则结束
for url in imgUrls:
if downImg(url, dirpath, str(index) + &.jpg&):
index += 1
print(&已下载 %s 张& % index)
&/code&&/pre&&/div&&p&让我们来测试下,例如我想下载本兮的图片,只需要输入“本兮”,回车就可以。&/p&&p&Ctrl+C停止程序。&/p&&p&&figure&&img src=&https://pic4.zhimg.com/v2-b69fd07b5e8ba89b051b3a0cbd4c97e4_b.jpg& data-rawwidth=&675& data-rawheight=&437& class=&origin_image zh-lightbox-thumb& width=&675& data-original=&https://pic4.zhimg.com/v2-b69fd07b5e8ba89b051b3a0cbd4c97e4_r.jpg&&&/figure&&figure&&img src=&https://pic4.zhimg.com/v2-97a8a3aede7ccb0a17f3b67d4404da34_b.jpg& data-rawwidth=&832& data-rawheight=&613& class=&origin_image zh-lightbox-thumb& width=&832& data-original=&https://pic4.zhimg.com/v2-97a8a3aede7ccb0a17f3b67d4404da34_r.jpg&&&/figure&&br&2、将 .py文件转换成.exe文件。&/p&&p&有多种方法,最简单的还是pyinstaller &a href=&https://link.zhihu.com/?target=http%3A//blog.csdn.net/wws563/article/details/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&python3.5把py文件转换为exe文件(by PyInstaller)&/a&&/p&&p&pip安装&/p&&br&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&pip install pyinstaller
&/code&&/pre&&/div&&p&使用。在py文件所在位置,按住Shift,单机鼠标右键“在此处打开命令窗口”,输入&/p&&br&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&pyinstaller -F baiduimg.py
&/code&&/pre&&/div&&p&然后再目录下,dist文件下就有baiduimg.exe文件了,双击即可。&/p&&p&到这里,整个过程就完成了。&/p&&br&&p&-------------------------------------&/p&&br&&p&作者:张世润&br&&/p&&p&博客专栏:&a href=&https://link.zhihu.com/?target=https%3A//ask.hellobi.com/blog/linjichu& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&崔斯特的博客专栏&/a&&/p&&p&知乎专栏:&a href=&https://zhuanlan.zhihu.com/linjichu& class=&internal&&Python练习题介绍&/a&&/p&&br&&p&&b&最近很多人私信问我问题,平常知乎评论看到不多,如果没有及时回复,大家也可以加小编微信:tszhihu,进知乎大数据分析挖掘交流群,可以跟各位老师互相交流。谢谢&/b&。&/p&
最近看到有一个百度图片下载的Python爬取,很感兴趣,并在原来的基础上将py文件转换为exe文件。具体操作如下:环境:Python3.5 1、写爬虫。首先源码,源码及详细教程 #coding:utf-8
import requests
import r…
&figure&&img src=&https://pic1.zhimg.com/v2-9ba6a68c324d7078f3acf21b588f142b_b.jpg& data-rawwidth=&600& data-rawheight=&300& class=&origin_image zh-lightbox-thumb& width=&600& data-original=&https://pic1.zhimg.com/v2-9ba6a68c324d7078f3acf21b588f142b_r.jpg&&&/figure&&p&前面讲了Python的urllib库的使用和方法,Python网络数据采集Urllib库的基本使用 ,Python的urllib高级用法 。&/p&&p&今天我们来学习下Python中Requests库的用法。&/p&&h3&&strong&Requests库的安装&/strong&&/h3&&p&利用 pip 安装,如果你安装了pip包(一款Python包管理工具,不知道可以百度哟),或者集成环境,比如Python(x,y)或者anaconda的话,就可以直接使用pip安装Python的库。&/p&&p&$ pip install requests&/p&&p&安装完成之后,下面来看一下基本的方法:&/p&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&
#get请求方法
&&& r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
#打印get请求的状态码
&&& r.status_code
#查看请求的数据类型,可以看到是json格式,utf-8编码
&&& r.headers['content-type']
'application/ charset=utf8'
&&& r.encoding
#打印请求到的内容
&&& r.text
u'{&type&:&User&...'
#输出json格式数据
&&& r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}
&/code&&/pre&&/div&&p&下面看一个小栗子:&/p&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&#小例子
import requests
r = requests.get('http://www.baidu.com')
print type(r)
print r.status_code
print r.encoding
print r.text
print r.cookies
'''请求了百度的网址,然后打印出了返回结果的类型,状态码,编码方式,Cookies等内容
&class 'requests.models.Response'&
&RequestsCookieJar[]&
&/code&&/pre&&/div&&h3&&strong&http基本请求&/strong&&/h3&&p&requests库提供了http所有的基本请求方式。例如:&/p&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&r = requests.post(&http://httpbin.org/post&)
r = requests.put(&http://httpbin.org/put&)
r = requests.delete(&http://httpbin.org/delete&)
r = requests.head(&http://httpbin.org/get&)
r = requests.options(&http://httpbin.org/get&)
&/code&&/pre&&/div&&h3&&strong&基本GET请求&/strong&&/h3&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&r = requests.get(&http://httpbin.org/get&)
#如果想要加参数,可以利用 params 参数:
import requests
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get(&http://httpbin.org/get&, params=payload)
print r.url
#输出:http://httpbin.org/get?key2=value2&key1=value1
&/code&&/pre&&/div&&p&如果想请求JSON文件,可以利用 json() 方法解析,例如自己写一个JSON文件命名为a.json,内容如下:&/p&&blockquote&&p&[&foo&, &bar&, {&br&
&foo&: &bar&&br&}]&br&#利用如下程序请求并解析:&br&import requests&br&r = requests.get(&a.json&)&br&print r.text&br&print r.json()&br&'''运行结果如下,其中一个是直接输出内容,另外一个方法是利用 json() 方法&br&解析,感受下它们的不同:'''&br&[&foo&, &bar&, {&br& &foo&: &bar&&br& }]&br& [u'foo', u'bar', {u'foo': u'bar'}]&/p&&/blockquote&&p&如果想获取来自服务器的原始套接字响应,可以取得 r.raw 。 不过需要在初始请求中设置 stream=True 。&/p&&blockquote&&p&r = requests.get('&a href=&https://link.zhihu.com/?target=https%3A//github.com/timeline.json& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&https://&/span&&span class=&visible&&github.com/timeline.jso&/span&&span class=&invisible&&n&/span&&span class=&ellipsis&&&/span&&/a&', stream=True)&br&r.raw&br&#输出&br&&requests.packages.urllib3.response.HTTPResponse object at 0x&&br&r.raw.read(10)&br&'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'&/p&&/blockquote&&p&这样就获取了网页原始套接字内容。&/p&&p&如果想添加 headers,可以传 headers 参数:&/p&&blockquote&&p&import requests&br&&br&payload = {'key1': 'value1', 'key2': 'value2'}&br&headers = {'content-type': 'application/json'}&br&r = requests.get(&&a href=&https://link.zhihu.com/?target=http%3A//httpbin.org/get& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&httpbin.org/get&/span&&span class=&invisible&&&/span&&/a&&, params=payload, headers=headers)&br&print r.url&br&#通过headers参数可以增加请求头中的headers信息&/p&&/blockquote&&h3&&strong&基本POST请求&/strong&&/h3&&p&对于 POST 请求来说,我们一般需要为它增加一些参数。那么最基本的传参方法可以利用 data 这个参数。&/p&&blockquote&&p&import requests&br&&br&payload = {'key1': 'value1', 'key2': 'value2'}&br&r = requests.post(&&a href=&https://link.zhihu.com/?target=http%3A//httpbin.org/post& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&httpbin.org/post&/span&&span class=&invisible&&&/span&&/a&&, data=payload)&br&print r.text&br&#运行结果如下:&br&{&br&
&args&: {}, &br&
&data&: &&, &br&
&files&: {}, &br&
&form&: {&br&
&key1&: &value1&, &br&
&key2&: &value2&&br&
&headers&: {&br&
&Accept&: &*/*&, &br&
&Accept-Encoding&: &gzip, deflate&, &br&
&Content-Length&: &23&, &br&
&Content-Type&: &application/x-www-form-urlencoded&, &br&
&Host&: &&a href=&https://link.zhihu.com/?target=http%3A//httpbin.org& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&httpbin.org&/span&&span class=&invisible&&&/span&&/a&&, &br&
&User-Agent&: &python-requests/2.9.1&&br&
&json&: null, &br&
&url&: &&a href=&https://link.zhihu.com/?target=http%3A//httpbin.org/post& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&httpbin.org/post&/span&&span class=&invisible&&&/span&&/a&&&br&}&/p&&/blockquote&&p&可以看到参数传成功了,然后服务器返回了我们传的数据。&/p&&p&有时候我们需要传送的信息不是表单形式的,需要我们传JSON格式的数据过去,所以我们可以用 json.dumps() 方法把表单数据序列化。&/p&&blockquote&&p&import json&br&import requests&br&&br&url = '&a href=&https://link.zhihu.com/?target=http%3A//httpbin.org/post& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&httpbin.org/post&/span&&span class=&invisible&&&/span&&/a&'&br&payload = {'some': 'data'}&br&r = requests.post(url, data=json.dumps(payload))&br&print r.text&br&&br&#运行结果:&br&{&br&
&args&: {}, &br&
&data&: &{\&some\&: \&data\&}&, &br&
&files&: {}, &br&
&form&: {}, &br&
&headers&: {&br&
&Accept&: &*/*&, &br&
&Accept-Encoding&: &gzip, deflate&, &br&
&Content-Length&: &16&, &br&
&Host&: &&a href=&https://link.zhihu.com/?target=http%3A//httpbin.org& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&httpbin.org&/span&&span class=&invisible&&&/span&&/a&&, &br&
&User-Agent&: &python-requests/2.9.1&&b

我要回帖

更多关于 2013找不到proplus.ww 的文章

 

随机推荐