python爬虫urllib中的异常模块处理

urllib中的异常处理

在我们写爬虫程序时，若出现url中的错误，那么我们就无法爬取我们想要的内容，对此，我们引入了urllib中的异常处理。

url的组成部分

url由6个部分组成：eg：

https://www.baidu.com/s?wd=易烊千玺

协议（http/https）
主机（www.baidu.com）
端口号（80/443）
路径（s）
参数（wd=易烊千玺）
锚点

常见的端口号：

http（80） https（443） mysql(3306) oracle(1521) redis(6379) mongodb(27017)

urlerror

通常来说，urlerror报错通常为url地址中主机部分的错误：

实例：

url = 'https://www.baidu.com1/'

运行结果：

urllib.error.urlerror: <urlopen error [errno 11001] getaddrinfo failed

httperror

这个异常的通常是url地址中参数或是路径的错误。

实例：

url = 'https://www.jianshu.com/p/3388cf148dba1'

运行结果：

urllib.error.httperror: http error 404: not found

简介

httperror类是urlerror类的子类
导入的包urllib.error.httperror/urllib.error.urlerror
http错误：http错误是针对浏览器无法连接到服务器而增加的出来的错误提示，引导并告诉浏览者该页是出了什么问题。
通过urllib发送请求的时候，有可能会发送失败，这个时候如果想让你的代码更健壮，可以通过try -except进行捕获异常。

urllib.error 模块

urllib.error 模块为 urllib.request 所引发的异常定义了异常类，基础异常类是 urlerror。

urllib.error 包含了两个方法，urlerror 和 httperror。

urlerror 是 oserror 的一个子类，用于处理程序在遇到问题时会引发此异常（或其派生的异常）。

httperror 是 urlerror 的一个子类，用于处理特殊 http 错误例如作为认证请求的时候，包含的属性 code 为 http 的状态码， reason 为引发异常的原因，headers 为导致 httperror 的特定 http 请求的 http 响应头。

异常处理

用try except语句块捕获并处理异常，其基本语法结构如下所示：

try:可能产生异常的代码块

except [ (error1, error2, … ) [as e] ]:处理异常的代码块1

except [ (error3, error4, … ) [as e] ]:处理异常的代码块2

except [exception]:处理其它异常

实例：

原url= ‘https://www.jianshu.com/p/3388cf148dba’

源码：

import urllib.request
import urllib.error
url = 'https://www.jianshu.com1/p/3388cf148dba'
# url的组成 eg：https://www.baidu.com/s?wd=易烊千玺
# 1.协议（http/https） 2.主机（www.baidu.com） 3.端口号（80/443） 4.路径（s） 5.参数（wd=易烊千玺） 6.锚点
#  常见的端口号
# http（80） https（443） mysql(3306) oracle(1521) redis(6379) mongodb(27017)
headers = {
'user-agent': 'mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/98.0.4758.102 safari/537.36'
}
try:
    request = urllib.request.request(url = url,headers = headers)
    response = urllib.request.urlopen(request)
    content = response.read().decode('utf8')
    print(content)
except urllib.error.httperror:
    print('http异常，请稍后！')
except urllib.error.urlerror:
    print('url异常，请稍后！')

1.urlerror

url = ‘https://www.jianshu.com1/p/3388cf148dba’

运行结果：

2.httperror

url = ‘https://www.jianshu.com/p/3388cf148dba111’

运行结果：

由于httprerror是urlerror的子类，所以需先写httperror的异常处理,否则一律认为是urlerror的异常。urllib的异常处理就写到这啦，希望大家都不会出现异常，更多关于python爬虫urllib异常处理的资料请关注其它相关文章！

python爬虫urllib中的异常模块处理

urllib中的异常处理

url的组成部分

urlerror

httperror

简介

urllib.error 模块

异常处理

相关推荐

php调用python脚本失败怎么解决

怎么将R语言与Python集成

Ruby与Python相比有哪些优势

Fortran如何与Python交互

python读取数据怎么去掉逗号

python怎么读取列表数据

python通配符查找方法怎么用

怎么用python通配符查找字符串