Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
K
kb
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2
    • Issues 2
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • kb
  • Wiki
    • Data_stream
    • Ec
  • taobao_shop_info_amp

Last edited by 章一锋 Jun 04, 2021
Page history

taobao_shop_info_amp

基本信息

数据名称(中文)

店铺详情

数据英文名称

taobao_shop_info_amp

采集网站(采集入口)

https://world.taobao.com/dianpu-amp/{shop_id}.htm

采集频率及采集策略

存量更新策略

每月更新一轮
一天30个进程 100个并发 1天可更新完

增量采集策略

暂无

爬虫

责任人

章一锋

爬虫名称

淘宝amp端店铺详情爬虫

代码地址

http://tech.pingansec.com/granite/project-ec/-/blob/develop_udms_20210113/scrapy/ec/ec/spiders/taobao/TaobaoShopInfoAmp.py

队列名称及队列地址

* redis host: redis://:utn@0818@bdp-mq-001.redis.rds.aliyuncs.com:6379/7
* redis port: 6379
* redis db:   7
* redis key:  taobao_shop_info_amp:10

优先级队列说明

任务来源

user: shuidi
password: utn@0818
host: bdp-rds-009.mysql.rds.aliyuncs.com
port: 3306
database: utn_ec
table: tb_ebusiness_shop

导入任务配置文件路径:http://tech.pingansec.com/granite/project-ec/-/blob/develop_udms_20210113/app_taobao/data_pump/spider_job.yml
方案名:taobao_shop_info_amp_to_redis

任务输入参数(样例)

任务样例

{
  "shop_name": "拔萃视觉服务旗舰店",      #店铺名称
  "platform_shop_id": "151630279",     #店铺id
  "contactor_id": "2821181738",        
  "platform_name": "天猫"               #平台名称
}

任务参数说明

data_type说明

detail: 详情

爬虫结果的超级数据

{'data': {'goods_list': [{'goods_id': '553018618334',
                          'goods_price': '¥100',
                          'goods_sales': '0人付款',
                          'goods_title': '婚紗攝影旅拍三亞廣西南寧麗江大理北海潿洲島海景拍婚紗照團購',
                          'goods_url': 'https://world.taobao.com/item-amp/553018618334.htm?spm=a21wu.11787984-tw.3.1'},
                         {'goods_id': '544302853351',
                          'goods_price': '¥5599',
                          'goods_sales': '0人付款',
                          'goods_title': '廣西北海南寧潿洲島旅拍婚紗攝影婚紗照團購定製套餐高端影樓',
                          'goods_url': 'https://world.taobao.com/item-amp/544302853351.htm?spm=a21wu.11787984-tw.3.2'},
                         {'goods_id': '530591828031',
                          'goods_price': '¥10299',
                          'goods_sales': '0人付款',
                          'goods_title': '南寧古攝影北海潿洲島三亞麗江大理九寨溝旅拍婚紗攝影婚紗照團購',
                          'goods_url': 'https://world.taobao.com/item-amp/530591828031.htm?spm=a21wu.11787984-tw.3.3'},
                         {'goods_id': '531040658410',
                          'goods_price': '¥4999',
                          'goods_sales': '0人付款',
                          'goods_title': '古攝影北海南寧潿洲島旅拍婚紗攝影婚紗照團購定製套餐高端影樓',
                          'goods_url': 'https://world.taobao.com/item-amp/531040658410.htm?spm=a21wu.11787984-tw.3.4'},
                         {'goods_id': '544843139056',
                          'goods_price': '¥1299',
                          'goods_sales': '1人付款',
                          'goods_title': '個人寫真藝術照 閨蜜照情侶照私房照時尚古裝藝術攝影',
                          'goods_url': 'https://world.taobao.com/item-amp/544843139056.htm?spm=a21wu.11787984-tw.3.5'},
                         {'goods_id': '553024914066',
                          'goods_price': '¥6599',
                          'goods_sales': '0人付款',
                          'goods_title': '古攝影旅拍婚紗攝影三亞南寧貴陽北海潿洲島海景遊艇夜景拍團購',
                          'goods_url': 'https://world.taobao.com/item-amp/553024914066.htm?spm=a21wu.11787984-tw.3.6'},
                         {'goods_id': '553118903391',
                          'goods_price': '¥11299',
                          'goods_sales': '0人付款',
                          'goods_title': '古攝影旅拍婚紗攝影三亞南寧柳州北海潿洲島海景遊艇夜景拍婚紗照',
                          'goods_url': 'https://world.taobao.com/item-amp/553118903391.htm?spm=a21wu.11787984-tw.3.7'},
                         {'goods_id': '556208388436',
                          'goods_price': '¥9299',
                          'goods_sales': '0人付款',
                          'goods_title': '南寧古攝影旅拍婚紗攝影雲南大理蒼山遠景洱海風光外景婚紗照',
                          'goods_url': 'https://world.taobao.com/item-amp/556208388436.htm?spm=a21wu.11787984-tw.3.8'},
                         {'goods_id': '561469582439',
                          'goods_price': '¥4599',
                          'goods_sales': '0人付款',
                          'goods_title': '南寧古攝影旅拍婚紗攝影廣西南寧柳州三亞北海潿洲島套餐定金尾款',
                          'goods_url': 'https://world.taobao.com/item-amp/561469582439.htm?spm=a21wu.11787984-tw.3.9'},
                         {'goods_id': '562173466308',
                          'goods_price': '¥3299',
                          'goods_sales': '0人付款',
                          'goods_title': '北海視覺海岸婚紗攝影旅拍婚紗照海景婚紗攝影照北海潿洲島',
                          'goods_url': 'https://world.taobao.com/item-amp/562173466308.htm?spm=a21wu.11787984-tw.3.10'},
                         {'goods_id': '556208744605',
                          'goods_price': '¥4599',
                          'goods_sales': '3人付款',
                          'goods_title': '南寧古攝影旅拍婚紗攝影海南三亞外景婚紗照 贈五星級酒店住宿',
                          'goods_url': 'https://world.taobao.com/item-amp/556208744605.htm?spm=a21wu.11787984-tw.3.11'}],
          'title': '拔萃視覺服務旗艦店'},
 'data_type': 'detail',
 'error_msg': '',
 'http_code': 200,
 'platform_name': '淘宝',
 'spider_end_time': '2021-06-04 10:16:47.689',
 'spider_ip': '192.168.38.1',
 'spider_name': 'taobao_shop_info_amp',
 'spider_start_time': '2021-06-04 10:16:46.482',
 'spider_used_time_ms': 1207,
 'task_params': {
     'platform_name': '淘宝',
     'shop_name': '拔萃视觉服务旗舰店',      
     'platform_shop_id': '151630279',
     'contactor_id': '2821181738',    
    },
 'task_result': 1000}

实际爬虫结果的数据结构

爬虫运行环境

scrapy

爬虫部署信息

爬虫运行机器:10.8.6.69
进程数:30
项目名称:ec
任务提交机器:10.8.6.64
任务提交方式:crontab

Taskhub地址

http://tech.pingansec.com/granite/project-taskhub/-/blob/master/taskhub/config/ec/config.d/taobao.yaml

Taskhub调度规则说明

task_result为以下值时被过滤:
    - 1000
    - 1101
    - 1102
    - 2001
    - 7000
    - 9300
其他值的任务都会被放入队列

爬虫监控指标设计

爬虫待采集结果目录


/data/ec_spider_data/taobao_shop_info_amp

数据归集

责任人

数据归集方式

  • 爬虫直接写kafka

  • 爬虫写文件logstash采集

归集后存放目录

logstash配置文件名称

logstash文件采集type

数据归集的topic

ec-spider-taobao-data

ES日志索引及筛选条件

ec-spider-data-*

监控指标看板

数据保留策略


数据清洗

责任人

代码地址

部署地址

部署方法及说明

  • crontab + data_pump
  • supervisor + data_pump
  • supervisor + consumer

数据接收来源

数据存储表地址

  • 数据库地址:
  • 表名:
Clone repository
  • README
  • basic_guidelines
  • basic_guidelines
    • basic_guidelines
    • dev_guide
    • project_build
    • 开发流程
  • best_practice
  • best_practice
    • AlterTable
    • RDS
    • azkaban
    • create_table
    • design
    • elasticsearch
    • elasticsearch
      • ES运维
    • logstash
View All Pages