关于chardetjava中的返回值是什么None怎么解决

风水堪舆学 | 网络营销 | 住宅风水 | 英文歌曲 | Adobe After Effects | 电脑配置 | 书籍改编电影 | 下载 | Legion | 网络推广 | 动画制作 | 赛事 | PLC | 小说创作 | 虚拟专用服务器 | 成语 | 家庭 | 单反相机 | 电视节目 | 投影机 | 面相 | 香港购物 | 配音 | 文具 | 二次元 | 影视 | 固态硬盘ssd | 虚拟机 | 跆拳道 | r（编程语言） | 秦时明月之天行九歌 | 使命召唤 | 网盘 | 地图 | 琅琊榜（电视剧） | 手机内存 | 角色扮演 | 华硕 | 百度输入法 | 盗墓笔记（小说） | 营销策划 | 化妆品 | Windows | ip地址 | 装修设计 | 齐内丁·齐达内 | 动画电影 | 中国中央电视台 | 罗兰 | 网站优化 | 斗鱼直播 | 冷知识 | 张帅 | 任天堂 | 摄影师 | 三菱商事 | 迅雷（软件） | 计算机病毒 | amd | 屏幕 | 微单相机 | 电学 | qq浏览器 | MacOS | 联赛 | snh48 | 芯片（集成电路） | 后宫·甄嬛传（书籍） | 植物辨识 | 运动 | 大一 | 美容 | 双色球 | 蓝牙音箱 | 楼盘 | 电脑电源 | 采暖 | 显卡驱动 | 体育赛事 | thinkpad | 离婚 | 武侠小说 | 索尼笔记本 | 中国足球协会超级联赛（csl） | youtube | 王力宏（人物） | 外星人 | 努比亚（手机品牌） | 海贼王 | 移动电源 | 完美世界（游戏） | 摩托车 | 编辑器 | 低音炮 | 收益 | 海关 | 徐波 | akb48 | 互联网创业 | 张璐 | 男性 | 性价比 | MacBook Air | 新疆维吾尔自治区 | 插座 | 外汇平台 | 华为Mate30 | 羽毛球技术 | 腾讯 QQ | 蓝屏 | 字幕 | 免费软件 | 电脑故障 | 女生 | 周星驰（人物） | 足球欧洲杯 | pdf | macbook | 直播 | 生活经历 | 骁龙处理器 | 主题曲 | 户外运动 | CPU | 娱乐圈 | 初恋 | 家居 | 流氓软件 | 名言 | 中国足球 | 近视眼 | acg | 一级方程式赛车（f1） | 小品 | 网站运营 | 英格兰足球超级联赛 | 一体机 | 人肉搜索 | 日本电影 | 系统软件 | 人生 | 流星花园 | 电钢琴 | 分辨率 | 迅雷 | 机械设计 | 古典音乐 | 液晶电视 | 睡眠 | 大片 | 资产 | Html/Css | ansys | 天蝎座 | 对联 | 大二 | 吉他学习 | 实习 | uc浏览器 | 计算机科学 | 新华社 | 脱毛 | 视力 | 乐视超级电视 | 大学生活 | 开关电源 | 平面设计 | 音乐版权 | iPhone 11 Pro | 面膜 | 鞠婧祎 | 胡歌（演员） | 郭富城 | 语言 | 赵丽颖（演员） | 意大利 | 电路设计 | 情侣 | NBA篮球 | 蔡徐坤 | 豆瓣电影 | 社交软件 | 微信开发 | 足球彩票 | 电工 | 手机摄像头 | 用户界面设计师 | 华语流行音乐 | 网卡 | 易烊千玺 | 笛子 | 日语学习 | 日语歌曲 | 歌手 | 张子枫 | 搏击项目 | 谭松韵 | 快捷键 | O2O | 移民 |

你的位置：网站首页 >> 频道首页 >>编程语言 >>关于chardetjava中的返回值是什么None怎么解决

关于chardetjava中的返回值是什么None怎么解决

来源：蜘蛛抓取(WebSpider) 时间：2017-10-23 02:12 标签：返回值是什么意思

Port of python's chardet (/chardet/chardet).
How To Use It
npm install jschardet
var jschardet = require(&jschardet&)
// &àíà??& in UTF-8
jschardet.detect(&\xc3\xa0\xc3\xad\xc3\xa0\xc3\xa7\xc3\xa3&)
// { encoding: &UTF-8&, confidence: 0.9690625 }
// &次常用國字標準字體表& in Big5
jschardet.detect(&\xa6\xb8\xb1\x60\xa5\xce\xb0\xea\xa6\x72\xbc\xd0\xb7\xc7\xa6\x72\xc5\xe9\xaa\xed&)
// { encoding: &Big5&, confidence: 0.99 }
Copy and include
in your web page.
This library is also available in
See all information related to the confidence levels of each encoding.
This is useful to see why you're not getting the expected encoding. jschardet.Constants._debug = true;
Default minimum accepted confidence level is 0.20 but sometimes this is not
enough, specially when dealing with files mostly with numbers.
To change this to 0 to always get something or any other value that can
work for you. jschardet.Constants.MINIMUM_THRESHOLD = 0;
Supported Charsets
Big5, GB2312/GB18030, EUC-TW, HZ-GB-2312, and ISO-2022-CN (Traditional and Simplified Chinese)
EUC-JP, SHIFT_JIS, and ISO-2022-JP (Japanese)
EUC-KR and ISO-2022-KR (Korean)
KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, and windows-1251 (Russian)
ISO-8859-2 and windows-1250 (Hungarian)
ISO-8859-5 and windows-1251 (Bulgarian)
windows-1252
ISO-8859-7 and windows-1253 (Greek)
ISO-8859-8 and windows-1255 (Visual and Logical Hebrew)
TIS-620 (Thai)
UTF-32 BE, LE, 3412-ordered, or 2143-ordered (with a BOM)
UTF-16 BE or LE (with a BOM)
UTF-8 (with or without a BOM)
Technical Information
I haven't been able to create tests to correctly detect:
ISO-2022-CN
windows-1250 in Hungarian
windows-1251 in Bulgarian
windows-1253 in Greek
Development
Use npm run dist to update the distribution files. They're available at .
Ported from python to JavaScript by António Afonso (/aadsm/jschardet)
Transformed into an npm package by Markus Ast (/brainafk)
T00:13:30.131Z
is the latest
of 14 releases
Collaborators
downloads in the last day
downloads in the last week
downloads in the last month
Have an issue?
Try it out
Dependencies代码分析Python requests库中文编码问题 | 峰云就她了
7,080 views
Python reqeusts在作为代理爬虫节点抓取不同字符集网站时遇到的一些问题总结. 简单说就是中文乱码的问题. & 如果单纯的抓取微博，微信，电商，那么字符集charset很容易就确认，你甚至可以单方面把encoding给固定住。但作为舆情数据来说，他每天要抓取几十万个不同网站的敏感数据，所以这就需要我们更好确认字符集编码,避免中文的乱码情况.&
该文章写的有些乱，欢迎来喷 ! 另外文章后续不断更新中，请到原文地址查看更新。
我们首先看这个例子. 你会发现一些有意思的事情.&
#blog: xiaorui.cc
In [9]: r = requests.get('http://cn.python-requests.org/en/latest/')
In [10]: r.encoding
Out[10]: 'ISO-8859-1'
In [11]: type(r.text)
Out[11]: unicode
In [12]: type(r.content)
Out[12]: str
In [13]: r.apparent_encoding
Out[13]: 'utf-8'
In [14]: chardet.detect(r.content)
Out[14]: {'confidence': 0.99, 'encoding': 'utf-8'}
123456789101112131415161718
#blog: xiaorui.cc&In [9]: r = requests.get('http://cn.python-requests.org/en/latest/')&In [10]: r.encodingOut[10]: 'ISO-8859-1'&In [11]: type(r.text)Out[11]: unicode&In [12]: type(r.content)Out[12]: str&In [13]: r.apparent_encodingOut[13]: 'utf-8'&In [14]: chardet.detect(r.content)Out[14]: {'confidence': 0.99, 'encoding': 'utf-8'}
第一个问题是，为什么会有ISO-8859-1这样的字符集编码？
iso-8859是什么？ &他又被叫做Latin-1或“西欧语言”&. &对于我来说，这属于requests的一个bug，在requests库的github里可以看到不只是中国人提交了这个issue. &但官方的回复说是按照http rfc设计的。
下面通过查看requests源代码，看这问题是如何造成的 !
requests会从服务器返回的响应头的 Content-Type 去获取字符集编码，如果content-type有charset字段那么requests才能正确识别编码，否则就使用默认的 ISO-8859-1. 一般那些不规范的页面往往有这样的问题.&
In [52]: r.headers
Out[52]: {'content-length': '16907', 'via': 'BJ-H-NX-116(EXPIRED), http/1.1 BJ-UNI-1-JCS-116 ( [cHs f ])', 'ser': '3.81', 'content-encoding': 'gzip', 'age': '23', 'expires': 'Fri, 19 Feb :25 GMT', 'vary': 'Accept-Encoding', 'server': 'JDWS', 'last-modified': 'Fri, 19 Feb :25 GMT', 'connection': 'keep-alive', 'cache-control': 'max-age=60', 'date': 'Fri, 19 Feb :31 GMT', 'content-type': 'text/'}
In [52]: r.headersOut[52]: {'content-length': '16907', 'via': 'BJ-H-NX-116(EXPIRED), http/1.1 BJ-UNI-1-JCS-116 ( [cHs f ])', 'ser': '3.81', 'content-encoding': 'gzip', 'age': '23', 'expires': 'Fri, 19 Feb :25 GMT', 'vary': 'Accept-Encoding', 'server': 'JDWS', 'last-modified': 'Fri, 19 Feb :25 GMT', 'connection': 'keep-alive', 'cache-control': 'max-age=60', 'date': 'Fri, 19 Feb :31 GMT', 'content-type': 'text/'}
文件: requests.utils.py
#blog: xiaorui.cc
def get_encoding_from_headers(headers):
"""通过headers头部的dict中获取编码格式"""
content_type = headers.get('content-type')
if not content_type:
return None
content_type, params = cgi.parse_header(content_type)
if 'charset' in params:
return params['charset'].strip("'\"")
if 'text' in content_type:
return 'ISO-8859-1'
12345678910111213141516
#blog: xiaorui.ccdef get_encoding_from_headers(headers):&&&&"""通过headers头部的dict中获取编码格式"""&&&&&content_type = headers.get('content-type')&&&&&if not content_type:&&&&&&&&return None&&&&&content_type, params = cgi.parse_header(content_type)&&&&&if 'charset' in params:&&&&&&&&return params['charset'].strip("'\"")&&&&&if 'text' in content_type:&&&&&&&&return 'ISO-8859-1'
第二个问题，那么如何获取正确的编码？&
requests的返回结果对象里有个apparent_encoding函数, apparent_encoding通过调用chardet.detect()来识别文本编码. 但是需要注意的是，这有些消耗计算资源.
至于为毛，可以看看chardet的源码实现.&
#blog: xiaorui.cc
def apparent_encoding(self):
"""使用chardet来计算编码"""
return chardet.detect(self.content)['encoding']
#blog: xiaorui.cc@propertydef apparent_encoding(self):&&&&"""使用chardet来计算编码"""&&&&return chardet.detect(self.content)['encoding']
第三个问题，requests的text() 跟 content() 有什么区别？&
requests在获取网络资源后，我们可以通过两种模式查看内容。一个是r.text，另一个是r.content，那他们之间有什么区别呢？
分析requests的源代码发现，r.text返回的是处理过的Unicode型的数据，而使用r.content返回的是bytes型的原始数据。也就是说，r.content相对于r.text来说节省了计算资源，r.content是把内容bytes返回. 而r.text是decode成Unicode. 如果headers没有charset字符集的化,text()会调用chardet来计算字符集，这又是消耗cpu的事情.
通过看requests代码来分析text() content()的区别.
文件: requests.models.py
def apparent_encoding(self):
"""The apparent encoding, provided by the chardet library"""
return chardet.detect(self.content)['encoding']
def content(self):
"""Content of the response, in bytes."""
if self._content is False:
# Read the contents.
if self._content_consumed:
raise RuntimeError(
'The content for this response was already consumed')
if self.status_code == 0:
self._content = None
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
except AttributeError:
self._content = None
self._content_consumed = True
# don't need to re that's been handled by urllib3
# since we exhausted the data.
return self._content
def text(self):
"""Content of the response, in unicode.
If Response.encoding is None, encoding will be guessed using
``chardet``.
The encoding of the response content is determined based solely on HTTP
headers, following RFC 2616 to the letter. If you can take advantage of
non-HTTP knowledge to make a better guess at the encoding, you should
set ``r.encoding`` appropriately before accessing this property.
# Try charset from content-type
content = None
encoding = self.encoding
if not self.content:
return str('')
# 当为空的时候会使用chardet来猜测编码.
if self.encoding is None:
encoding = self.apparent_encoding
# Decode unicode from given encoding.
content = str(self.content, encoding, errors='replace')
except (LookupError, TypeError):
# A LookupError is raised if the encoding was not found which could
# indicate a misspelling or similar mistake.
# A TypeError can be raised if encoding is None
# So we try blindly encoding.
content = str(self.content, errors='replace')
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
文件: requests.models.py@propertydef apparent_encoding(self):&&&&"""The apparent encoding, provided by the chardet library"""&&&&return chardet.detect(self.content)['encoding']&@propertydef content(self):&&&&"""Content of the response, in bytes."""&&&&&if self._content is False:&&&&&&&&# Read the contents.&&&&&&&&try:&&&&&&&&&&&&if self._content_consumed:&&&&&&&&&&&&&&&&raise RuntimeError(&&&&&&&&&&&&&&&&&&&&'The content for this response was already consumed')&&&&&&&&&&&&&if self.status_code == 0:&&&&&&&&&&&&&&&&self._content = None&&&&&&&&&&&&else:&&&&&&&&&&&&&&&&self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()&&&&&&&&&except AttributeError:&&&&&&&&&&&&self._content = None&&&&&self._content_consumed = True&&&&# don't need to re that's been handled by urllib3&&&&# since we exhausted the data.&&&&return self._content&@propertydef text(self):&&&&"""Content of the response, in unicode.&&&&If Response.encoding is None, encoding will be guessed using&&&&``chardet``.&&&&The encoding of the response content is determined based solely on HTTP&&&&headers, following RFC 2616 to the letter. If you can take advantage of&&&&non-HTTP knowledge to make a better guess at the encoding, you should&&&&set ``r.encoding`` appropriately before accessing this property.&&&&"""&&&&&# Try charset from content-type&&&&content = None&&&&encoding = self.encoding&&&&&if not self.content:&&&&&&&&return str('')&&&&&# 当为空的时候会使用chardet来猜测编码.&&&&if self.encoding is None:&&&&&&&&encoding = self.apparent_encoding&&&&&# Decode unicode from given encoding.&&&&try:&&&&&&&&content = str(self.content, encoding, errors='replace')&&&&except (LookupError, TypeError):&&&&&&&&# A LookupError is raised if the encoding was not found which could&&&&&&&&# indicate a misspelling or similar mistake.&&&&&&&&#&&&&&&&&# A TypeError can be raised if encoding is None&&&&&&&&#&&&&&&&&# So we try blindly encoding.&&&&&&&&content = str(self.content, errors='replace')
对于requests中文乱码解决方法有这么几种.&
由于content是HTTP相应的原始字节串，可以根据headers头部的charset把content decode为unicode，前提别是ISO-8859-1编码.
In [96]: r.encoding
Out[96]: 'gbk'
In [98]: print r.content.decode(r.encoding)[200:300]
="keywords" content="Python数据分析与挖掘实战,,机械工业出版社,5,,在线购买,折扣,打折"/&
In [96]: r.encodingOut[96]: 'gbk'&In [98]: print r.content.decode(r.encoding)[200:300]="keywords" content="Python数据分析与挖掘实战,,机械工业出版社,5,,在线购买,折扣,打折"/&
另外有一种特别粗暴方式，就是直接根据chardet的结果来encode成utf-8格式.&
#http://xiaorui.cc
In [22]: r
= requests.get('/.html')
In [23]: print r.content
KeyboardInterrupt
In [23]: r.apparent_encoding
Out[23]: 'GB2312'
In [24]: r.encoding
Out[24]: 'gbk'
In [25]: r.content.decode(r.encoding).encode('utf-8')
---------------------------------------------------------------------------
UnicodeDecodeError
Traceback (most recent call last)
&ipython-input-25-918324cdc053& in &module&()
----& 1 r.content.decode(r.apparent_encoding).encode('utf-8')
UnicodeDecodeError: 'gb2312' codec can't decode bytes in position : illegal multibyte sequence
In [27]: r.content.decode(r.apparent_encoding,'replace').encode('utf-8')
12345678910111213141516171819202122
#http://xiaorui.cc&In [22]: r&&= requests.get('/.html')&In [23]: print r.contentKeyboardInterrupt&In [23]: r.apparent_encodingOut[23]: 'GB2312'&In [24]: r.encodingOut[24]: 'gbk'&In [25]: r.content.decode(r.encoding).encode('utf-8')---------------------------------------------------------------------------UnicodeDecodeError&&&&&&&&&&&&&&&&&&&&&&&&Traceback (most recent call last)&ipython-input-25-918324cdc053& in &module&()----& 1 r.content.decode(r.apparent_encoding).encode('utf-8')&UnicodeDecodeError: 'gb2312' codec can't decode bytes in position : illegal multibyte sequence&In [27]: r.content.decode(r.apparent_encoding,'replace').encode('utf-8')
如果在确定使用text，并已经得知该站的字符集编码时，可以使用 r.encoding = ‘xxx’ 模式，当你指定编码后，requests在text时会根据你设定的字符集编码进行转换.&
&&& import requests
&&& r = requests.get('https://up.xiaorui.cc')
&&& r.text
&&& r.encoding
&&& r.encoding = 'utf-8'
&&& import requests&&& r = requests.get('https://up.xiaorui.cc')&&& r.text&&& r.encoding'gbk'&&& r.encoding = 'utf-8'
根据我抓几十万的网站的经验，大多数网站还是很规范的，如果headers头部没有charset，那么就从html的meta中抽取.
In [78]: s
Out[78]: '
&meta http-equiv="Content-Type" content="text/ charset=gbk"'
In [79]: b = re.compile("&meta.*content=.*charset=(?P&charset&[^;\s]+)", flags=re.I)
In [80]: b.search(s).group(1)
Out[80]: 'gbk"'
In [78]: sOut[78]: '&&&&&meta http-equiv="Content-Type" content="text/ charset=gbk"'&In [79]: b = re.compile("&meta.*content=.*charset=(?P&charset&[^;\s]+)", flags=re.I)&In [80]: b.search(s).group(1)Out[80]: 'gbk"'
python requests的utils.py里已经有个完善的从html中获取meta charset的函数. 说白了还是一对的正则表达式.
In [32]: requests.utils.get_encodings_from_content(r.content)
Out[32]: ['gbk']
In [32]: requests.utils.get_encodings_from_content(r.content)Out[32]: ['gbk']
文件: utils.py
def get_encodings_from_content(content):
charset_re = re.compile(r'&meta.*?charset=["\']*(.+?)["\'&]', flags=re.I)
pragma_re = re.compile(r'&meta.*?content=["\']*;?charset=(.+?)["\'&]', flags=re.I)
xml_re = re.compile(r'^&\?xml.*?encoding=["\']*(.+?)["\'&]')
return (charset_re.findall(content) +
pragma_re.findall(content) +
xml_re.findall(content))
def get_encodings_from_content(content):&&&&charset_re = re.compile(r'&meta.*?charset=["\']*(.+?)["\'&]', flags=re.I)&&&&pragma_re = re.compile(r'&meta.*?content=["\']*;?charset=(.+?)["\'&]', flags=re.I)&&&&xml_re = re.compile(r'^&\?xml.*?encoding=["\']*(.+?)["\'&]')&&&&&return (charset_re.findall(content) +&&&&&&&&&&&&pragma_re.findall(content) +&&&&&&&&&&&&xml_re.findall(content))
最后，针对requests中文乱码的问题总结:
统一编码，要不都成utf-8, 要不就用unicode做中间码 !&
国内的站点一般是utf-8、gbk、gb2312 &, 当requests的encoding是这些字符集编码后，是可以直接decode成unicode.&
但当你判断出encoding是 ISO-8859-1 时，可以结合re正则和chardet判断出他的真实编码. 可以把这逻辑封装补丁引入进来.
import requests
def monkey_patch():
prop = requests.models.Response.content
def content(self):
_content = prop.fget(self)
if self.encoding == 'ISO-8859-1':
encodings = requests.utils.get_encodings_from_content(_content)
if encodings:
self.encoding = encodings[0]
self.encoding = self.apparent_encoding
_content = _content.decode(self.encoding, 'replace').encode('utf8', 'replace')
self._content = _content
return _content
requests.models.Response.content = property(content)
monkey_patch()
12345678910111213141516
import requestsdef monkey_patch():&&&&prop = requests.models.Response.content&&&&def content(self):&&&&&&&&_content = prop.fget(self)&&&&&&&&if self.encoding == 'ISO-8859-1':&&&&&&&&&&&&encodings = requests.utils.get_encodings_from_content(_content)&&&&&&&&&&&&if encodings:&&&&&&&&&&&&&&&&self.encoding = encodings[0]&&&&&&&&&&&&else:&&&&&&&&&&&&&&&&self.encoding = self.apparent_encoding&&&&&&&&&&&&_content = _content.decode(self.encoding, 'replace').encode('utf8', 'replace')&&&&&&&&&&&&self._content = _content&&&&&&&&return _content&&&&requests.models.Response.content = property(content)monkey_patch()
Python3.x解决了这编码问题，如果你还是python2.6 2.7，那么还需要用上面的方法解决中文乱码的问题.&
对Python及运维开发感兴趣的朋友可以加QQ群：
!!! { 2000人qq大群内有各厂大牛，常组织线上分享及沙龙，对高性能及分布式场景感兴趣同学欢迎加入该QQ群
另外如果大家觉得文章对你有些作用！ &
帮忙点击广告. 一来能刺激我写博客的欲望，二来好维护云主机的费用.
如果想赏钱，可以用微信扫描下面的二维码. 另外再次标注博客原地址 && …… &&感谢！
您可能也喜欢:
暂无相关产品chardet.detect()不能获取字符串编码类型【python吧】_百度贴吧
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&签到排名：今日本吧第个签到，本吧因你更精彩，明天继续来努力！
本吧签到人数：0成为超级会员，使用一键签到本月漏签0次！成为超级会员，赠送8张补签卡连续签到：天&&累计签到：天超级会员单次开通12个月以上，赠送连续签到卡3张
关注：154,070贴子：
chardet.detect()不能获取字符串编码类型收藏
有一串字符串，通过控制台输出，全是乱码，很乱很乱那种。然后我想通过这个函数获取他的编码，最终转换为utf-8输出。但是，我发现使用这个函数的时候，得到的结果是空的。提示的时候，提到了ascii码，我猜测原字符串可能是ascii的什么鬼吧。请问一下，遇到这种情况，我该怎样做才能输出中文，或者说不乱码。谢谢大家！
python_总监级名师全程面授,项目实战案例式教学,企业需求无缝对接,助你无忧就业!python,0基础23周快速实现高薪就业,0元试听两周.名额有限,欲报从速.点击抢座
我自己顶一下，希望大家能看到帮帮忙
只有再顶一次了！
把输出结果发上来看看，
第一讲：编码...
原来的字符串是哪里来的？？
登录百度帐号推荐应用Python使用chardet判断字符编码
作者：小五义
字体：[ ] 类型：转载时间：
这篇文章主要介绍了Python使用chardet判断字符编码的方法,较为详细的分析了Python中chardet的功能、安装及使用技巧,需要的朋友可以参考下
本文实例讲述了Python使用chardet判断字符编码的方法。分享给大家供大家参考。具体分析如下：
Python中chardet 用来实现字符串/文件编码检测模板
1、chardet下载与安装
下载地址：http://pypi.python.org/pypi/chardet
下载chardet后，解压chardet压缩包，直接将chardet文件夹放在应用程序目录下，就可以使用import chardet开始使用chardet了，也可以将chardet拷贝到Python系统目录下，这样你所有的python程序只要用import chardet就可以了。
python setup.py install
使用中，chardet.detect()返回字典，其中confidence是检测精确度，encoding是编码形式
（1）网页编码判断：
&&& import urllib
&&& rawdata = urllib.urlopen('/').read()
&&& import chardet
&&& chardet.detect(rawdata)
{'confidence': 0.99999, 'encoding': 'GB2312'}
（2）文件编码判断
import chardet
tt=open('c:\\111.txt','rb')
ff=tt.readline()
#这里试着换成read(5)也可以，但是换成readlines()后报错
enc=chardet.detect(ff)
print enc['encoding']
tt.close()
希望本文所述对大家的Python程序设计有所帮助。
您可能感兴趣的文章:
大家感兴趣的内容
12345678910
最近更新的内容
常用在线小工具

关于chardetjava中的返回值是什么None怎么解决

我要回帖

更多关于返回值是什么意思的文章

随机推荐

关于chardetjava中的返回值是什么None怎么解决

我要回帖

更多关于 返回值是什么意思 的文章

随机推荐

更多关于返回值是什么意思的文章