Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
K
kb
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2
    • Issues 2
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • kb
  • Wiki
    • Data_stream
    • Equity_penetration
  • qcc

Last edited by 蒋家升 Jun 14, 2023
Page history
This is an old version of this page. You can view the most recent version or browse the history.

qcc

基本信息

股权穿透QCC爬虫
equity_penetration_qcc,通过scrapy部署
项目名称:project-gravel
分支:develop_equity_penetration

数据名称(中文)

股权穿透QCC爬虫

数据英文名称

equity_penetration_qcc

采集网站(采集入口)

官网PC端入口:
https://www.qcc.com

采集文件存放路径:
/data/gravel_spiders/equity_penetration_qcc

采集频率及采集策略

存量更新策略

目前全量更新一轮地域与公司遍历

增量采集策略


爬虫

股权穿透QCC爬虫 equity_penetration_qcc

责任人

蒋家升

爬虫名称

equity_penetration_qcc

代码地址

项目地址: http://tech.pingansec.com/granite/project-gravel/-/tree/develop_equity_penetration

队列名称及队列地址

  • redis host: redis://:utn@0818@bdp-mq-001.redis.rds.aliyuncs.com:6379/7
  • redis port: 6379
  • redis db: 7
  • redis key:
    • qcc

优先级队列说明

  • equity_penetration 支持队列优先级

任务来源

任务输入参数(样例)

# 地域列表任务
{"area_code": "AH_340100", "page": "1"}

# 搜索列表任务
{"search_key": "北京出国邦出入境服务有限公司"}

# 详情页信息
{"fid": "0727d5d1a4f95d791ff4b7ce5d6e975a"}

任务样例

任务参数说明

  • area_code: 省份/市区编码,例如:安徽(AH); 合肥(AH_340100)
  • page: 页码
  • search_key: 搜索框输入内容
  • fid: QCC企业id

data_type说明

  • list_region: 地域列表
  • list_search: 搜索列表
  • detail_company: 公司详情页信息
  • detail_person: 个人详情页信息

爬虫结果的超级数据

同以下实际爬虫结果

实际爬虫结果的数据结构

  • 地域列表任务结果
{
  "data":
  [
    {
      "fid": "13df1591b2302573e518c410acd7b2b4",
      "qcc_url": "https://www.qcc.com/firm/13df1591b2302573e518c410acd7b2b4.html",
      "company_name": "大渡口区玖贰辉荟服装经营部"
    },
    {
      "fid": "b028024bb8010add7d668bed6e8b0079",
      "qcc_url": "https://www.qcc.com/firm/b028024bb8010add7d668bed6e8b0079.html",
      "company_name": "重庆心揽科技发展有限公司"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list_region",
  "spider_start_time": "2021-11-24 22:41:29.584",
  "spider_end_time": "2021-11-24 22:41:29",
  "task_params": {"area_code": "CQ_500104","page": "5"},
  "metadata": {"area_code": "CQ_500104","page": "5"},
  "spider_name": "equity_penetration_qcc",
  "spider_ip": "10.8.6.51"
}
  • 公司页详情结果
{
  "data":
  {
    "business_license":
    {
      "登记状态": "存续(在营、开业、在册)",
      "成立日期": "2005-08-12",
      "人员规模": "300-399人",
      "曾用名": "-",
      "进出口企业代码": "-",
      "统一社会信用代码": "91310110779301025N",
      "企业名称": "上海宽娱数码科技有限公司",
      "注册资本": "50000万元人民币",
      "实缴资本": "1680万元人民币",
      "核准日期": "2021-11-16",
      "组织机构代码": "77930102-5",
      "工商注册号": "310110000371080",
      "纳税人识别号": "91310110779301025N",
      "企业类型": "有限责任公司(自然人独资)",
      "营业期限": "2005-08-12至无固定期限",
      "纳税人资质": "-",
      "所属行业": "科技推广和应用服务业",
      "所属地区": "上海市",
      "登记机关": "杨浦区市场监督管理局",
      "最新年报地址": "上海市杨浦区国定路335号2号楼1905室(2020年报)",
      "经营范围": "许可项目:第一类增值电信业务;第二类增值电信业务;基础电信业务;出版物批发;出版物零售;餐饮服务;信息网络传播视听节目;网络文化经营;广播电视节目作经营;营业性演出。(依法须经批准的项目,经相关部门批准后方可开展经营活动,具体经营项目以相关部门批准文件或许可证件为准)一般项目:数码科技、计算机软硬件科技领域内的技术咨询、技术转让、技术开发、技术服务,广告发布(非广播电台、电视台、报刊出版单位),票务代理,计算机软硬件、日用百货、办公用品、工艺美术品(象牙及其品除外)、服装服饰、鞋帽、针纺织品、玩具、文化体育用品、家居用品、电子产品、通讯设备、宠物用品、化妆品、卫生洁具、家用电器、文化用品、皮革品、包装材料、珠宝首饰的销售。(除依法须经批准的项目外,凭营业执照依法自主开展经营活动)",
      "法定代表人":
      {
        "legal_person": "陈睿",
        "pid": "pdc2d22e33cabf11add23ddbc90fd62f"
      },
      "参保人数": "380",
      "英文名": "ShanghaiKuanyuDigitalTechnologyCo.,Ltd.",
      "注册地址": "上海市杨浦区政立路489号801室"
    },
    "main_members":
    [
      {
        "职务": "执行董事,法定代表人",
        "持股比例": "100%持股详情>",
        "最终受益股份": "100%股权链>",
        "姓名":
        {
          "member": "陈睿",
          "pid": "pdc2d22e33cabf11add23ddbc90fd62f",
          "tags": ["实际控制人","最终受益人","有股权出质","大股东"]
        }
      },
      {
        "职务": "监事",
        "持股比例": "-",
        "最终受益股份": "-",
        "姓名":
        {
          "member": "李旎",
          "pid": "p18ae8dbf5cfd395eb02eb536dd1e58a"
        }
      }
    ],
    "shareholders":
    [
      {
        "持股比例": "100%持股详情>",
        "认缴出资额(万元)": "50000",
        "认缴出资日期": "2041-08-20",
        "参股日期": "2014-08-06",
        "实缴出资额(万元)": "1680",
        "实缴出资日期": "2009-10-19",
        "股东及出资信息":
        {
          "shareholder": "陈睿",
          "pid": "pdc2d22e33cabf11add23ddbc90fd62f",
          "tags": ["大股东","实际控制人","最终受益人","有股权出质"]
        }
      }
    ],
    "touzilist":
    [
      {
        "注册资本": "1000万元人民币",
        "成立日期": "2019-03-01",
        "状态": "存续",
        "持股比例": "100%",
        "认缴出资额": "1000万元人民币",
        "融资轮次": "-",
        "融资日期": "-",
        "关联产品/机构": "哔哩哔哩bilibili",
        "被投资企业名称":
        {
          "invested_company": "海南红红火火信息科技有限公司",
          "fid": "25ebe2f0466fffd9ce82df1705986658"
        },
        "法定代表人":
        {
          "legal_person": "郑彬炜",
          "pid": "p71e12c94c13c44b208971209d3da792"
        }
      }
    ],
    "company_pv": "18万+"
  },
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "detail_company",
  "spider_start_time": "2021-12-03 19:49:16.811",
  "spider_end_time": "2021-12-03 19:49:26",
  "task_params": {"fid": "78045ae17d1d9487163b97233b7477d2"},
  "metadata": {"fid": "78045ae17d1d9487163b97233b7477d2"},
  "spider_name": "equity_penetration_qcc",
  "spider_ip": "10.8.1.30"
}
  • 个人页详情结果
{
  "data":
  {
    "legallist":
    [
      {
        "KeyNo": "bf7a36cf53f8208141a5d9a2c68c3488",
        "Name": "北京新东方大愚文化传播有限公司",
        "OperName": "俞敏洪",
        "OperPersonId": "p1b99d0e8a749a1a32c1e17c2d41d686",
        "OperType": 1,
        "RelatedCount": 83,
        "RegCap": "2000万元人民币",
        "ImageUrl": "https://image.qcc.com/logo/bf7a36cf53f8208141a5d9a2c68c3488.jpg?x-oss-process=style/logo_200",
        "Date": 1053014400,
        "Status": "存续",
        "CoyCode": "",
        "Relation":
        [
          {
            "Type": "0",
            "TypeDesc": "法定代表人",
            "Value": "俞敏洪",
            "StartDate": -1,
            "EndDate": 0
          },
          {
            "Type": "2",
            "TypeDesc": "任职",
            "Value": "总经理,执行董事",
            "StartDate": -1,
            "EndDate": 0
          }
        ],
        "Area":
        {
          "Province": "北京市",
          "City": "北京市",
          "County": "海淀区"
        },
        "Industry":
        {
          "IndustryCode": "R",
          "Industry": "文化、体育和娱乐业",
          "SubIndustryCode": "87",
          "SubIndustry": "广播、电视、电影和录音制作业",
          "MiddleCategoryCode": null,
          "MiddleCategory": null,
          "SmallCategoryCode": null,
          "SmallCategory": null
        },
        "RegistCapiAmt": 2000,
        "SXCount": 0,
        "ZXCount": 0
      }
    ],
    "allcompanylist":
    [
      {
        "KeyNo": "effab7edf99cd329486b6237266dd5cd",
        "Name": "北京汇智博纳教育科技有限公司",
        "OperName": "金利",
        "OperPersonId": "p128d7ba6adfe5015ecfdabca188b802",
        "OperType": 1,
        "RelatedCount": 3,
        "RegCap": "1000万元人民币",
        "ImageUrl": "https://image.qcc.com/logo/effab7edf99cd329486b6237266dd5cd.jpg?x-oss-process=style/logo_200",
        "Date": 1304611200,
        "Status": "存续",
        "CoyCode": "",
        "Relation":
        [
          {
            "Type": "1",
            "TypeDesc": "股东",
            "Value": "70.00%",
            "StartDate": -1,
            "EndDate": 0
          },
          {
            "Type": "2",
            "TypeDesc": "任职",
            "Value": "监事",
            "StartDate": -1,
            "EndDate": 0
          }
        ],
        "Area":
        {
          "Province": "北京市",
          "City": "北京市",
          "County": "海淀区"
        },
        "Industry":
        {
          "IndustryCode": "M",
          "Industry": "科学研究和技术服务业",
          "SubIndustryCode": "75",
          "SubIndustry": "科技推广和应用服务业",
          "MiddleCategoryCode": "759",
          "MiddleCategory": "其他科技推广服务业",
          "SmallCategoryCode": "7590",
          "SmallCategory": "其他科技推广服务业"
        },
        "RegistCapiAmt": 1000,
        "SXCount": 0,
        "ZXCount": 0
      }
    ],
    "investlist":
    [
      {
        "KeyNo": "7a71aee12bf18701d3b1da8fa1a4bf5f",
        "Name": "北京合力惠东投资中心(有限合伙)",
        "OperName": "湖州恒益股权投资管理有限公司",
        "OperPersonId": "39c826a638deececf9ac5f9097a1410c",
        "OperType": 2,
        "RelatedCount": 3,
        "RegCap": "3236.319954万元人民币",
        "ImageUrl": "https://image.qcc.com/auto/7a71aee12bf18701d3b1da8fa1a4bf5f.jpg?x-oss-process=style/logo_200",
        "Date": 1328803200,
        "Status": "存续",
        "CoyCode": "",
        "Relation":
        [
          {
            "Type": "1",
            "TypeDesc": "股东",
            "Value": "15.45%",
            "StartDate": 1355414400,
            "EndDate": 0
          }
        ],
        "Area":
        {
          "Province": "北京市",
          "City": "北京市",
          "County": "海淀区"
        },
        "Industry":
        {
          "IndustryCode": "L",
          "Industry": "租赁和商务服务业",
          "SubIndustryCode": "72",
          "SubIndustry": "商务服务业",
          "MiddleCategoryCode": "721",
          "MiddleCategory": "组织管理服务",
          "SmallCategoryCode": "7212",
          "SmallCategory": "投资与资产管理"
        },
        "RegistCapiAmt": 3236,
        "SXCount": 0,
        "ZXCount": 0
      }
    ],
    "postofficelist":
    [
      {
        "KeyNo": "4bf81171baf9db38f0768c7e36cbe683",
        "Name": "北京洪泰企业管理集团有限公司",
        "OperName": "盛希泰",
        "OperPersonId": "p84ef64a59e0ed69364c0e8732ea9c2d",
        "OperType": 1,
        "RelatedCount": 91,
        "RegCap": "30000万元人民币",
        "ImageUrl": "https://image.qcc.com/auto/4bf81171baf9db38f0768c7e36cbe683.jpg?x-oss-process=style/logo_200",
        "Date": 1583769600,
        "Status": "存续",
        "CoyCode": "",
        "Relation":
        [
          {
            "Type": "1",
            "TypeDesc": "股东",
            "Value": "11.11%",
            "StartDate": 1583769600,
            "EndDate": 0
          },
          {
            "Type": "2",
            "TypeDesc": "任职",
            "Value": "监事",
            "StartDate": 1583769600,
            "EndDate": 0
          }
        ],
        "Area":
        {
          "Province": "北京市",
          "City": "北京市",
          "County": "通州区"
        },
        "Industry":
        {
          "IndustryCode": "L",
          "Industry": "租赁和商务服务业",
          "SubIndustryCode": "72",
          "SubIndustry": "商务服务业",
          "MiddleCategoryCode": null,
          "MiddleCategory": null,
          "SmallCategoryCode": null,
          "SmallCategory": null
        },
        "RegistCapiAmt": 30000,
        "SXCount": 0,
        "ZXCount": 0
      }
    ]
  },
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "detail_person",
  "spider_start_time": "2021-12-03 19:10:56.001",
  "spider_end_time": "2021-12-03 19:11:30",
  "task_params": {"pid": "p1b99d0e8a749a1a32c1e17c2d41d686"},
  "metadata": {"pid": "p1b99d0e8a749a1a32c1e17c2d41d686"},
  "spider_name": "equity_penetration_qcc",
  "spider_ip": "10.8.1.30"
}

爬虫运行环境

scrapy

爬虫部署信息

target: node_51
project: equity_penetration
spider_name: equity_penetration_qcc

Taskhub地址

提交任务地址: 
代码编写地址: 

Taskhub调度规则说明

task_result=1000    # 正常获取到详情任务
task_result=1101    # 无结果信息
task_result=9101    # 超时错误,需要进行重试,目前重试5次
task_result=8000    # 参数错误

爬虫监控指标设计

(先观察,待补充)
索引: 
监控频率: 
监控起止时间: 
报警条件: 
报警群:  
报警内容: 

数据归集

责任人

数据归集方式

  • 爬虫直接写kafka

  • 爬虫写文件logstash采集

爬虫结果目录

采集文件存放路径:
/data/gravel_spiders/equity_penetration_qcc

归集后存放目录

/data2_227/grvael_spider_result/equity_penetration_qcc

logstash配置文件名称

logstash文件采集type

equity_penetration_qcc

数据归集的topic

general-taxpayer

ES日志索引及筛选条件

gravel-spider-data-*

监控指标看板

数据保留策略


数据清洗

责任人

代码地址

部署地址

部署方法及说明

  • crontab + data_pump
  • supervisor + data_pump
  • supervisor + consumer

数据接收来源

数据存储表地址

  • 数据库地址:
  • 表名:
Clone repository
  • README
  • basic_guidelines
  • basic_guidelines
    • basic_guidelines
    • dev_guide
    • project_build
    • 开发流程
  • best_practice
  • best_practice
    • AlterTable
    • RDS
    • azkaban
    • create_table
    • design
    • elasticsearch
    • elasticsearch
      • ES运维
    • logstash
View All Pages