Python 车主之家全系车型（包含历史停售车型）配置参数爬虫

本文仅供学习交流使用，如侵立删！demo下载见文末

车主之家 全系 车型（包含历史停售车型）配置参数爬虫

先上效果图

环境：

win10 ，Contos7.4

python3.9.4

pycharm2021

retrying=1.3.3

requests=2.22.0

fake_useragent

抓包分析

车主之家安卓APP选择车型后打开配置页面闪退，放弃APP抓包：

踏个坑，车主之家APP车型参数配置页面打开就闪退，刚开始还以为是机型不适配的问题，后来陆续的换了好几台手机都是闪退，那应该就是一个bug。这儿浪费了很长时间！！！

web页面抓包：

web页面也没有明显的数据接口，初步分析数据应该是通过js动态加载（同汽车之家详细可参考：汽车之家车型参数爬虫）

果然和汽车之家是一个套路，而且还没有字体加密~~~哈哈哈，那就简单多了

获取所有品牌数据

接口地址：

# 全系品牌信息

https://****.****.com/?&extra=getBrandStyle

# 根据品牌ID 获取所有车型信息

model_url = f'http://****.com/app.php?&type=allStyle&brandId'

    def get_brand(self, brand_url, model_url):

        """

        第一步 获取所有的车型id

        """

        # 全系品牌信息

        brand_res = self._parse_url(url=brand_url)

        # 提取所有品牌数据

        brandIds = jsonpath(brand_res.json(), '$..list') if jsonpath(brand_res.json(), '$..list') else []

        for brandId in brandIds:

            for brand in brandId:

                print(f'品牌：{brand["title"]} 数据获取中')

                alpha = brand['alpha']  # 首字母

                title = brand['title']  # 品牌

                brand_id = brand['brandId']  # 品牌id

                origin = brand['origin']  # 产地

                # 根据品牌ID 获取所有车型信息

                model_res = self._parse_url(url=model_url)

                # 提取所有车型信息

                styles = jsonpath(model_res.json(), '$..style')[0] if jsonpath(model_res.json(), '$..style') else []

                for style in styles:

                    model_id = style.get('id')  # 车型id

                    model_name = style.get('name')  # 车型名称

                    img = style.get('img')  # 车型图片

                    yield alpha, title, brand_id, origin, model_name, model_id, img

获取车型参数配置json

接口地址：https://www.****.com/{model_id}/options/

    def parameter_configuration_html(self, model_id, file_name):

        """

        第二步：获取车型参数配置网页源码

        """

        # 请求车型参数页面

        response = self._parse_url(url)

        text = str(response.content, encoding="utf-8")

        configuration = '车型参数json'

        if not os.path.exists(configuration):

            os.makedirs(configuration)

        # 提取出车型的参数数据json保存到文件

        json_data = ""

        json_config = re.search('var json_config = (.*?)};', text)

        if json_config:

            # print(config.group(0))

            json_data = json_data + json_config.group(0)

        json_car = re.search('var json_car = (.*?)}];', text)

        if json_car:

            # print(option.group(0))

            json_data = json_data + json_car.group(0)

        with open(f'{configuration}/{file_name}', 'w', encoding='utf-8') as f:

            f.write(json_data)

数据存储

    def save_xls(self):

        """

        第四步 保存数据

        """

       # 写入表头 startRow行数 cols列数 co标题

       # 计算起止行号

       endRowNum = startRow + len(carItem['车型ID'])  # 车辆款式记录数

       for row in range(startRow, endRowNum):

           for col in carItem:

               try:

                   context = str(carItem[col][row - startRow])

                   colNum = Header[col]  # 根据项目名称查询列数

               except:

                   continue

               if not context:

                   context = '-'

               # 写入数据 row行 colNum列 context内容

               worksheet.write_string(row, colNum, context)

           print(f'第:{count}条数据插入成功')

           count += 1

       else:

           startRow = endRowNum

  	 workbook.close()

入口

    @run_time

    def run(self):

        # 第一步 获取所有的车型id

        for alpha, title, brand_id, origin, model_name, model_id, img in self.get_brand():

            # 首字母、品牌、品牌id、产地、车型id、车型名称、车型图片

            print(alpha, title, brand_id, origin, model_name, model_id, img)

            exit()

            # 判断是否获取过

            if self.keep_records(model_id=model_id, vali=True):

                print(f'数据获取过，跳过。')

                continue

            file_name = f'{alpha}-{title}-{brand_id}-{model_name}-{model_id}'

            file_name = file_name.replace('/', ' ')

            # 第二步 获取车型参数配置网页源码

            self.parameter_configuration_html(model_id=model_id, file_name=file_name)

            # 第三步 保存获取记录

            self.keep_records(model_id=model_id)

            # time.sleep(random.randint(1, 3))

效果

DEMO下载

https://download.csdn.net/download/qq_38154948/85001346

本文仅供学习交流使用，如侵立删！

Python 车主之家全系车型（包含历史停售车型）配置参数爬虫

车主之家 全系 车型（包含历史停售车型）配置参数爬虫

先上效果图

环境：

抓包分析

车主之家安卓APP选择车型后打开配置页面闪退，放弃APP抓包：

web页面抓包：

获取所有品牌数据

获取车型参数配置json

数据存储

入口

效果

DEMO下载

Python 车主之家全系车型（包含历史停售车型）配置参数爬虫的相关教程结束。

相关推荐

Python网络爬虫实战案例之：7000本电子书下载（2）

python爬虫爬取笔趣网小说网站过程图解

scrapy爬虫如何爬取javascript内容

python使用selenium实现爬虫知乎

爬虫之header

【爬虫+数据清洗+可视化】用Python分析“淄博烧烤“的评论数据

python爬虫防止IP被封的一些措施(转)

python利用urllib实现的爬取京东网站商品图片的爬虫