Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
K
kb
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2
    • Issues 2
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • kb
  • Wiki
    • Data_stream
  • environmental_protection_grade

Last edited by 蒋家升 Nov 09, 2021
Page history

environmental_protection_grade

基本信息

环保等级

数据名称(中文)

环保等级

数据英文名称

environmental_protection_grade

采集网站(采集入口)

官网PC端入口:
江苏:https://hblp.jsshbt.cn/shencai-envfacial-web/service/envFacial/hblp/tEnvBasKeylistModel/queryKeyListBusinessNumber
浙江:http://223.4.71.96/portal/data/api/auto
福建:http://220.160.52.213:20071/api/template/page/p_list_eval_credit
四川:http://103.203.219.138:18081/data/w/evaluateResults/list4Public
湖南:http://218.76.24.162:5014/hnxypjqyd/xxgk/queryXxgsQy
河南:http://222.143.24.250:8127/credit_publicService/system/company/systemcompanyinfo/getPublicResultListPage.do
湖北:http://113.57.151.5:8030/HBHB/companyInfo.action
广东:https://www-app.gdeei.cn/gdeepub/data/industry
贵州:http://202.98.194.198:6661/wwgs/xypj/public/credit/ratingpublicity/primaryPublicity.jsp
广西:http://202.103.233.156:9081/xypjgx/pages/xypj/wzgs/qypjjgList.jsp
河北:http://110.249.223.66:8099/xypjww/xypj/listEntEvaluate
辽宁:http://221.180.204.224:8080/LiaoNingQiYeXinYongPingJia/display/getEnterpriseInfo.do
山东:http://103.239.155.242:7002/xypjgzd/business/xypj/xypjcontroller/
吉林:http://125.32.96.149:8081/was5/web/search
安徽:http://112.27.211.29:8082/wznrfb/getQueryList

采集文件存放路径:
/data/gravel_spiders/environmental_protection

采集频率及采集策略

存量更新策略

目前全量更新一轮

增量采集策略


爬虫

环保等级爬虫 environmental_protection

责任人

蒋家升

爬虫名称

environmental_protection

代码地址

项目地址: http://tech.pingansec.com/granite/project-gravel/-/tree/develop_environmental_protection_grade/scrapy_spiders

队列名称及队列地址

  • redis host: redis://:utn@0818@bdp-mq-001.redis.rds.aliyuncs.com:6379/7
  • redis port: 6379
  • redis db: 7
  • redis key:
    • environmental_protection

优先级队列说明

  • environmental_protection 支持队列优先级

任务来源

导入任务配置文件路径:http://tech.pingansec.com/granite/project-gravel/-/blob/develop_environmental_protection_grade/app_environmental_protection_grade/data_pump/normal_list_task.yml

任务输入参数(样例)

任务样例

{"province": "jiangsu", "step": "start"},
{"province": "zhejiang", "step": "start"},
{"province": "fujian", "step": "start"},
{"province": "sichuan", "step": "start"},
{"province": "hunan", "step": "start"},
{"province": "henan", "step": "start", "index": 0, "city": "郑州市"},
{"province": "hubei", "step": "start"},
{"province": "guangdong", "step": "start"},
{"province": "guizhou", "step": "start", "index": 0, "year": 2019},
{"province": "guangxi", "step": "start"},
{"province": "hebei"},
{"province": "liaoning"}
{"province": "shandong", "company": "山东金穗木业股份有限公司"}
{"province": "jilin", "step": "start"}
{"province": "anhui", "step": "start", "index": 0, "year": 2019}

任务参数说明

{"province": "henan", "step": "start", "index": 0, "city": "郑州市"}
{"province": "guizhou", "step": "start", "index": 0, "year": 2019}
{"province": "shandong", "company": "山东金穗木业股份有限公司"}
{"province": "jilin", "step": "start", "index": 0, "level": "blue"}
  • 主要参数
    • province: 省份拼音
    • index: 翻页的页数(部分不需要)
  • 非必要参数
    • step: 步骤
  • 特殊参数
    • city: 地市,仅河南省有该字段
    • year: 年份,仅贵州省与安徽省有该字段
    • company: 公司名或统一信用代码,仅山东省有该字段
    • level: 等级,仅吉林省有该字段

data_type说明

list: 列表页数据
detail:详情页数据,当前仅山东省属于detail

爬虫结果的超级数据

实际爬虫结果的数据结构

江苏:

{
  "data":
  [
    {
      "creditDataID": "202110242303108479a164c13b45caa51598e28aac5b69",
      "creditLevel": 4,
      "creditName": "一般守信",
      "efResultID": "13574254755210144971",
      "fullName": "宿迁市-宿豫区-宿豫经济开发区",
      "manageLevel": 2,
      "manageName": "二星",
      "spCode": "142332828000",
      "spName": "宿迁市罐头食品有限责任公司",
      "time": "2021-10-25"
    },
    {
      "creditDataID": "202110242305284f9a3b85fc214f13b2cb5944fbec7e36",
      "creditLevel": 4,
      "creditName": "一般守信",
      "efResultID": "13574254755210175105",
      "fullName": "常州市-武进区",
      "manageLevel": 5,
      "manageName": "五星",
      "spCode": "3101120200000099",
      "spName": "江苏绿浥农业科技股份有限公司",
      "time": "2021-10-25"
    },
    {
      "creditDataID": "202110242308191eaf0c8cb8174469bc192039d249540c",
      "creditLevel": 4,
      "creditName": "一般守信",
      "efResultID": "13574254755210175285",
      "fullName": "苏州市-姑苏区",
      "manageLevel": 5,
      "manageName": "五星",
      "spCode": "3200000200000537",
      "spName": "中建三局第一建设工程有限责任公司",
      "time": "2021-10-25"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-25 17:17:52.098",
  "spider_end_time": "2021-10-25 17:17:54",
  "task_params":{
    "province": "jiangsu",
    "step": "start",
    "index": 1
  },
  "metadata":{
    "province": "jiangsu",
    "index": 1
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.18"
}

浙江:

{
  "data":
  [
    {
      "city": "舟山市",      # 市
      "level_title": "优秀",
      "district": "普陀区",  # 县
      "level_code": "A",    # 信用等级
      "social_credit_code": "91330903336897819J", # 统一社会信用代码
      "score_code": "8ca03c8e-634c-49cb-b35c-bac95ce46910",
      "ent_name": "舟山丰瑞海洋生物制品有限公司",
      "region_code": "330903",
      "release_time": 1635264000000  # 更新时间
    },
    {
      "city": "舟山市",
      "level_title": "优秀",
      "district": "普陀区",
      "level_code": "A",
      "social_credit_code": "913309031487170831",
      "score_code": "8ca03c8e-634c-49cb-b35c-bac95ce46910",
      "ent_name": "中石化浙江舟山石油有限公司",
      "region_code": "330903",
      "release_time": 1635264000000
    },
    {
      "city": "舟山市",
      "level_title": "优秀",
      "district": "普陀区",
      "level_code": "A",
      "social_credit_code": "9133090307868393X1",
      "score_code": "8ca03c8e-634c-49cb-b35c-bac95ce46910",
      "ent_name": "浙江荣生海洋生物制品有限公司",
      "region_code": "330903",
      "release_time": 1635264000000
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-27 11:28:51.445",
  "spider_end_time": "2021-10-27 11:28:58",
  "task_params":
  {
    "province": "zhejiang",
    "step": "start",
    "index": 2
  },
  "metadata":
  {
    "province": "zhejiang",
    "index": 2
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.18"
}

福建:

{
  "data":
  [
    {
      "id": 11795,
      "social_credit_code": "91350504MA2Y13F352",
      "credit_year_batch": "2021年第二批",
      "ent_name": "泉州佰份佰卫生用品有限公司",  # 企业名称
      "county": "洛江区",  # 区县
      "city": "泉州市",  # 地市 	
      "deptName": "泉州市洛江生态环境局",  # 评价单位
      "createTime": "2021-06-03",  # 评价时间
      "credit": "79",
      "credit_type": "环保良好企业"  # 信用等级
    },
    {
      "id": 11794,
      "social_credit_code": "91350504MA346T4N1T",
      "credit_year_batch": "2021年第二批",
      "ent_name": "泉州洛江凤栖石材厂",
      "county": "洛江区",
      "city": "泉州市",
      "deptName": "泉州市洛江生态环境局",
      "createTime": "2021-06-03",
      "credit": "79",
      "credit_type": "环保良好企业"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list_of_normal",
  "spider_start_time": "2021-10-27 14:50:13.824",
  "spider_end_time": "2021-10-27 14:50:14",
  "task_params":
  {
    "province": "fujian",
    "step": "start",
    "index": 2020
  },
  "metadata":
  {
    "province": "fujian",
    "index": 2020
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.6.51"
}

四川: # 参照字段解析

{
  "data":
  [
    {
      "id": 994222,
      "enterprise":
      {
        "id": 354497052,
        "name": "阆中市枣碧大梁山页岩机砖厂",  # 参照字段解析
        "creditCode": "92511381MA695TYH83",
        "orgCode": "MA695TYH8",
        "enterpriseType": "PRIVATE_OWNED",
        "industry": "OTHER",
        "polluteType": "EXHAUST_GAS",
        "controlType": "CITY_PREFECTURE",
        "enterpriseAttr": "PRODUCTION_ENTERPRISE",
        "legalName": "庄兆雄",
        "legalTel": "15881493231",
        "productScale": "7.5万吨/年",
        "businessScope": "页岩砖生产、销售",
        "headOfEPA": "庄剑平",
        "regTime": "2008-05-10T00:00:00+08:00",
        "enterpriseState": "PRODUCTING",
        "regionCity":
        {
          "code": "511300000",
          "name": "南充市",
          "parent":
          {
            "code": "510000000",
            "name": "四川省"
          }
        },
        "regionDistrict":
        {
          "code": "511381000",
          "name": "阆中市",
          "parent":
          {
            "code": "511300000",
            "name": "南充市"
          }
        },
        "longitude": 105.2579,
        "latitude": 31.0116,
        "isSSGS": false,
        "enable": true,
        "lastModifiedDate": "2021-03-31T09:34:40.045+08:00",
        "address": "阆中市枣碧乡大梁山村"
      },
      "evaluateScore": 84,
      "selfScore": 98,
      "countyScore": 88,
      "cityScore": 84,
      "evaluateResult": "HBLHQY",
      "evaluateYear": 2020,
      "last": true,
      "publishOrg": "南充市生态环境局",
      "evaluateState": "GSZ",
      "veto": false,
      "dataFrom": "NormalCreditEvaluation"
    },
    {
      "id": 994221,
      "enterprise":
      {
        "id": 354496929,
        "name": "阆中市金福旺页岩机砖厂",
        "creditCode": "92511381MA62HXHX0D",
        "orgCode": "MA62HXHX-0",
        "industry": "OTHER",
        "controlType": "OTHER",
        "enterpriseAttr": "PRODUCTION_ENTERPRISE",
        "legalName": "金跃伟",
        "legalTel": "",
        "headOfEPA": "金光禄",
        "regionCity":
        {
          "code": "511300000",
          "name": "南充市",
          "parent":
          {
            "code": "510000000",
            "name": "四川省"
          }
        },
        "regionDistrict":
        {
          "code": "511381000",
          "name": "阆中市",
          "parent":
          {
            "code": "511300000",
            "name": "南充市"
          }
        },
        "isSSGS": false,
        "enable": true,
        "lastModifiedDate": "2020-07-10T12:29:10.841+08:00",
        "address": "阆中市江南镇瓦房沟村十四社"
      },
      "evaluateScore": 95,
      "selfScore": 104,
      "countyScore": 97,
      "cityScore": 95,
      "evaluateResult": "HBLHQY",
      "evaluateYear": 2020,
      "last": true,
      "publishOrg": "南充市生态环境局",
      "evaluateState": "GSZ",
      "veto": false,
      "dataFrom": "NormalCreditEvaluation"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list_of_normal",
  "spider_start_time": "2021-10-27 14:54:17.216",
  "spider_end_time": "2021-10-27 14:54:17",
  "task_params":
  {
    "province": "sichuan",
    "step": "start",
    "index": 1
  },
  "metadata":
  {
    "province": "sichuan",
    "index": 1
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.18"
}

字段解析

湖南:

{
  "data":
  [
    {
      "SSQX": "雨花区",  # 区县
      "GSDJ": "环保合格单位", # 信用等级(与年度字段的年份相关)
      "ND": "2020", # 年度
      "QYMC": "长沙博大环保科技有限公司",  # 企事业单位名称
      "SSDS": "长沙市",  # 市州
      "GXSJ": "2021年09月27日",  # 更新时间
      "TYSHXYDM": "91430111344823182Y",  # 统一社会信用代码
      "CPDJ": 2,  # 参评等级
      "ZXDJ": "环保合格单位"   # 当前信用等级
    },
    {
      "SSQX": "保靖县",
      "GSDJ": "环保合格单位",
      "ND": "2020",
      "QYMC": "保靖县人民医院",
      "SSDS": "湘西土家族苗族自治州",
      "GXSJ": "2021年09月27日",
      "TYSHXYDM": "12433125448636058Q",
      "CPDJ": 2,
      "ZXDJ": "环保合格单位"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-29 16:03:08.162",
  "spider_end_time": "2021-10-29 16:03:10",
  "task_params": {"province": "hunan","step": "start","index": 1},
  "metadata": {"province": "hunan","index": 1},
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.10"
}

河南:

{
  "data":
  [
    {
      "belongsBasin": "1",
      "companyAddress": "郑州市中原区桐柏南路158号",
      "companyLevel": "2",
      "companyName": "河南(郑州)中汇心血管病医院",      # 企业事业单位名称
      "contactAddress": "郑州市中原区桐柏南路158号",
      "contactNumber": "",
      "contactPerson": "",
      "contactTelphone": "15290405687",
      "contactUser": "周毅鹏",
      "contactWechat": "zyp15290405687",
      "controlLimit": "",
      "createTime": "2021-06-09 14:16:43",
      "createUser": "d857c4afdcc047f1a1fa70417df5f5f7",
      "emissionLimits": "",
      "emissionsTo": "",
      "enabledStatus": "1",
      "evaluateDate": "2021-09-24",                   # 评级时间
      "evaluation": "1",
      "exhaustType": "01,02,04",
      "finalResult": "警示",                          # 等级
      "hasInit": false,
      "id": "0351cdb38249462f8589d065f8a207af",
      "industry": "",
      "industryInvolved": "Q8415",
      "isUsed": "1",
      "legalRepresentative": "毛慧娟",                # 法人 
      "officePhone": "",
      "orgCode": "ceaa1f4652ae4d73a2bf82d36eb2e325",
      "orgName": "中原区生态环境局",                    # 评级单位
      "organizationCode": "52410100MJF72040XB",
      "outletName": "",
      "pKName": "id",
      "parentCode": "410100",
      "pollutantName": "",
      "postcode": "450000",
      "pregion": "郑州市",                            # 城市
      "productionDate": "2009-11-05",
      "region": "中原区",                             # 区县
      "regionCode": "410102",
      "regionName": "",
      "registeredAddress": "",
      "remarks": "",                                 # 备注
      "scores": "75.0",
      "unicode": "52410100MJF72040XB",               # 统一社会信用代码
      "updateTime": "2021-07-08 15:57:38",
      "updateUser": "cd83c18f52a847bdb3466dacebc80515"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-28 15:31:48.890",
  "spider_end_time": "2021-10-28 15:33:17",
  "task_params":
  {
    "province": "henan",
    "step": "start",
    "city": "郑州市",
    "index": 1
  },
  "metadata":
  {
    "province": "henan",
    "city": "郑州市",
    "index": 1
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.18"
}

湖北:

{
  "data":
  [
    {
      "序号": "1",
      "行政区域": "天门市",
      "单位名称": "天门市岳口新兴猪场",
      "单位地址": "天门市岳口镇新兴垸村三组",
      "法定代表人": "罗远松",
      "当前得牌": "蓝标",
      "时间": "2020-01-19 13:35:33.0"
    },
    {
      "序号": "2",
      "行政区域": "天门市",
      "单位名称": "天门市新跃养殖场",
      "单位地址": "天门市岳口镇新兴垸村",
      "法定代表人": "罗良洲",
      "当前得牌": "蓝标",
      "时间": "2020-01-19 13:35:33.0"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-29 10:45:05.281",
  "spider_end_time": "2021-10-29 10:45:09",
  "task_params":
  {
    "province": "hubei",
    "step": "start",
    "index": 1
  },
  "metadata":
  {
    "province": "hubei",
    "index": 1
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.10"
}

广东:

{
  "data":
  [
    {
      "序号": "1",
      "地区": "广州市",
      "组织机构代码/社会统一信用代码": "",
      "企业名称": "广东南方碱业股份有限公司",
      "年度评定结果":
      {
        "2019": "蓝牌",
        "2018": "蓝牌",
        "2017": "红牌"
      }
    },
    {
      "序号": "2",
      "地区": "清远市",
      "组织机构代码/社会统一信用代码": "",
      "企业名称": "清远市广业环保有限公司(源潭污水处理厂)",
      "年度评定结果":
      {
        "2019": "绿牌",
        "2018": "绿牌",
        "2017": "绿牌"
      }
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-11-08 14:35:56.383",
  "spider_end_time": "2021-11-08 14:36:01",
  "task_params":
  {
    "province": "guangdong",
    "step": "start",
    "year": "2007",
    "index": 6
  },
  "metadata":
  {
    "province": "guangdong",
    "year": "2007",
    "index": 6
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.6"
}

贵州:

{
  "data":
  [
    {
      "企业名称": "遵义市红花岗区口腔医院",
      "污染源地址": "遵义市红花岗区子尹路157号",
      "参评年度": "2015",
      "系统评分": "87",
      "系统评级结果": "B++"
    },
    {
      "企业名称": "贵州昊龙胜境建材有限责任公司",
      "污染源地址": "",
      "参评年度": "2015",
      "系统评分": "87",
      "系统评级结果": "B++"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-29 14:08:28.227",
  "spider_end_time": "2021-10-29 14:08:29",
  "task_params":
  {
    "province": "guizhou",
    "step": "start",
    "year": 2015,
    "index": 3
  },
  "metadata":
  {
    "province": "guizhou",
    "year": 2015,
    "index": 3
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.10"
}

广西:

{
  "data":
  [
    {
      "统一社会信用代码": "91451281340369170J",
      "企业名称": "广西万屹工贸有限公司",
      "行业分类": "化工",
      "行政区划": "河池市 宜州市",
      "法人代表": "韦泽达",
      "是否参评": "已参评",
      "最近评价时间": "2021-07-12",
      "信用等级": "普通",
      "是否有效": "有效"
    },
    {
      "统一社会信用代码": "9145128168 51842278",
      "企业名称": "广西鑫华源水务有限公司",
      "行业分类": "污水处理",
      "行政区划": "河池市 宜州市",
      "法人代表": "霍军",
      "是否参评": "已参评",
      "最近评价时间": "2021-07-12",
      "信用等级": "守信",
      "是否有效": "有效"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-29 15:13:45.615",
  "spider_end_time": "2021-10-29 15:13:48",
  "task_params":
  {
    "province": "guangxi",
    "step": "start",
    "index": 4
  },
  "metadata":
  {
    "province": "guangxi",
    "index": 4
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.10"
}

河北:

{
  "data":
  [
    {
      "qyid": 16622,
      "qymc": "邯郸新兴发电有限责任公司",   # 企业名称
      "orgcode": "911304817840855238",  # 企业代码
      "qyxzname": "重点排污单位",
      "id": 16153,
      "stime": "1585908614273",
      "etime": null,
      "sfhmd": 0,
      "ljpf": 96,                 # 评分
      "qybz": 1,                  # 企业标识 1:A类企业、2:B类企业、3:C类企业、4:D类企业、5:E类企业
      "sfcp": null,
      "hstime": null,
      "hetime": null,
      "gstime": "1599548668767",  # 更新时间戳?不确定是否该字段
      "zq_stime": "1585908614273",
      "zq_etime": null,
      "sfww": null,
      "hmdtime": null,
      "qybzname": null,
      "ljpfs": null,
      "xqcount": 0
    },
    {
      "qyid": 2511,
      "qymc": "中普(邯郸)钢铁有限公司",
      "orgcode": "75025136X",
      "qyxzname": "重点排污单位",
      "id": 17534,
      "stime": null,
      "etime": null,
      "sfhmd": 0,
      "ljpf": 89,
      "qybz": 1,
      "sfcp": null,
      "hstime": null,
      "hetime": null,
      "gstime": "1599548496797",
      "zq_stime": null,
      "zq_etime": null,
      "sfww": null,
      "hmdtime": null,
      "qybzname": null,
      "ljpfs": null,
      "xqcount": 0
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-30 10:16:49.339",
  "spider_end_time": "2021-10-30 10:16:54.067",
  "task_params": {"province": "hebei"},
  "metadata": {"index": "36"},
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.38"
}

辽宁:

{
  "data":
  [
    {
      "year": 2019, # 年份
      "enterpriseName": "大连金海润废物综合利用有限公司",  # 企业名称
      "city": "大连市",  # 所在城市 	
      "area": "金普新区", # 所在区县
      "enterpriseCode": "91210213MA0YT5EY7C", # 统一社会信用代码
      "industry": "危险废物治理", # 行业类别 	
      "pollutionGroup": "重点排污单位", # 污染源企业分类
      "scoreNum": 11, # 参照字段解析
      "isStop": "0"   # 参照字段解析
    },
    {
      "year": 2019,
      "enterpriseName": "沈阳广宇供热有限公司总参热源厂",
      "city": "沈阳市",
      "area": "和平区",
      "enterpriseCode": "91210106738671337Y",
      "industry": "热力生产和供应",
      "pollutionGroup": "重点排污单位",
      "scoreNum": 11,
      "isStop": "0"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-11-01 15:12:39.342",
  "spider_end_time": "2021-11-01 15:12:41",
  "task_params":
  {
    "province": "liaoning",
    "step": "start",
    "year": 2019,
    "index": 1
  },
  "metadata":
  {
    "province": "liaoning",
    "year": 2019,
    "index": 1
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.10"
}

字段解析

山东:

{
  "data":
  [
    {
      "XH": "20211029230149891b2f3dcb6a41fbba72a2315794219a",
      "QYGXJGMC": "",
      "DF": 7,
      "YSBS": "#ffff33",
      "XTXH": "1635519677032004730880",
      "QYMC": "利津誉鑫新型建材有限责任公司",
      "QYBH": "8efdb9506bea47a9f78017a328a6b981",
      "QYDZ": "山东省东营市利津县经济开发区S316路南(利津力能热电对过)",
      "TYSHXYDM": "91370522MA3CFETM8A",
      "FRDB": "刘彬",
      "PJRQ": "2021-10-29",
      "DJFLMC": "黄色等级"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-10-30 16:26:25.305",
  "spider_end_time": "2021-10-30 16:26:29",
  "task_params": {"province": "shandong","company": "利津誉鑫新型建材有限责任公司"},
  "metadata": {"province": "shandong","company": "利津誉鑫新型建材有限责任公司"},
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.14"
}

吉林:

{
  "data":
  [
    {
        "address": "吉林省吉林市永吉县岔路河镇一委",  # 企业住址
        "capital": "100",  # 注册资本(万人民币)
        "code": "91220221MA0Y3JCJ8A",  # 社会信用代码/工商许可证号
        "id": "427",
        "lawer": "许笑维",  # 法定代表人或负责人
        "name": "永吉县仁合供热有限公司",  # 企业名称
        "score": "3",  # 总扣分项
        "wasid": "218230",
        "level": "blue"  # 当前环境信用状况结果,与score关联,蓝标(blue):score >=1 and score <=6;黄标(yellow):score >=7 and score <=11;红标(red):score>=12
    },
    {
      "wasid": "218230",
      "id": "596",
      "code": "91220112310008964E",
      "name": "长春市泓利供热有限公司",
      "capital": "500",
      "address": "长春市双阳区山河街道泓利港湾小区东侧",
      "score": "2",
      "lawer": "李建",
      "level": "blue"
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-11-01 10:54:59.762",
  "spider_end_time": "2021-11-01 10:55:01",
  "task_params":
  {
    "province": "jilin",
    "step": "start",
    "level": "blue",
    "index": 3
  },
  "metadata":
  {
    "province": "jilin",
    "level": "blue",
    "index": 3
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.10"
}

安徽:

{
  "data":
  [
    {
      "area_id": "繁昌县",  # 地区
      "b_id": "9297d507725146d08b7d5f9d8cf0f9c6",
      "com_id": "ebc07434060e4670827fd462df69ceeb",
      "com_name": "芜湖海螺水泥有限公司",  # 企业名称
      "cplx": "省级",  # 参评类型
      "id": "a36e410a3bfc4b0091682f815db93b41",
      "public_level": "环保诚信企业",  # 年度评定结果
      "row_number": 21,
      "year": 2016  # 年度
    },
    {
      "area_id": "经济技术开发区",
      "b_id": "8d0e5dd6986949d198cf69af39eec955",
      "com_id": "5a581012297445e5852afbb2d6a32afc",
      "com_name": "安徽楚江科技新材料股份有限公司",
      "cplx": "省级",
      "id": "15850528a44541c2813782b6dcbcaa0b",
      "public_level": "环保诚信企业",
      "row_number": 22,
      "year": 2016
    }
  ],
  "http_code": 200,
  "error_msg": "",
  "task_result": 1000,
  "data_type": "list",
  "spider_start_time": "2021-11-01 13:43:36.944",
  "spider_end_time": "2021-11-01 13:43:37",
  "task_params":
  {
    "province": "anhui",
    "step": "start",
    "year": 2016,
    "index": 3
  },
  "metadata":
  {
    "province": "anhui",
    "year": 2016,
    "index": 3
  },
  "spider_name": "environmental_protection",
  "spider_ip": "10.8.1.10"
}

爬虫运行环境

scrapy

爬虫部署信息

target: node_51,
spider_name: environmental_protection
10个进程  

Taskhub地址

提交任务地址: 
代码编写地址: 

Taskhub调度规则说明

爬虫监控指标设计

(先观察,待补充)
索引: 
监控频率: 
监控起止时间: 
报警条件: 
报警群:  
报警内容: 

数据归集

责任人

数据归集方式

  • 爬虫直接写kafka

  • 爬虫写文件logstash采集

爬虫结果目录

采集文件存放路径:
/data/gravel_spiders/environmental_protection

归集后存放目录

10.8.6.228:
/data2_227/grvael_spider_result/environmental_protection

logstash配置文件名称

logstash文件采集type

数据归集的topic

general-taxpayer

ES日志索引及筛选条件

gravel-spider-data-*

监控指标看板

数据保留策略


数据清洗

责任人

代码地址

部署地址

部署方法及说明

  • crontab + data_pump
  • supervisor + data_pump
  • supervisor + consumer

数据接收来源

数据存储表地址

  • 数据库地址:
  • 表名:
Clone repository
  • README
  • basic_guidelines
  • basic_guidelines
    • basic_guidelines
    • dev_guide
    • project_build
    • 开发流程
  • best_practice
  • best_practice
    • AlterTable
    • RDS
    • azkaban
    • create_table
    • design
    • elasticsearch
    • elasticsearch
      • ES运维
    • logstash
View All Pages