... | ... | @@ -24,8 +24,6 @@ social_organ |
|
|
官网PC端入口:
|
|
|
https://datasearch.chinanpo.gov.cn/gsxt/newList
|
|
|
采集文件存放路径:
|
|
|
/data/enterprise_spider_data/enterprise_captcha_spider
|
|
|
日志文件路径:
|
|
|
/data/gravel_spiders/social_organ
|
|
|
```
|
|
|
|
... | ... | @@ -64,7 +62,7 @@ db_password: |
|
|
|
|
|
## 责任人
|
|
|
```buildoutcfg
|
|
|
郭本江
|
|
|
蒋家升
|
|
|
```
|
|
|
|
|
|
## 爬虫名称
|
... | ... | @@ -247,17 +245,19 @@ scrapy |
|
|
project: social_organ_spiders,
|
|
|
spider: social_organ
|
|
|
爬虫机器:10.8.6.51
|
|
|
进程数:15
|
|
|
进程数:5
|
|
|
```
|
|
|
|
|
|
|
|
|
## Taskhub地址
|
|
|
```buildoutcfg
|
|
|
提交任务地址: http://10.8.6.222:18518/task/
|
|
|
代码编写地址:
|
|
|
```
|
|
|
## Taskhub相关
|
|
|
### 任务提交
|
|
|
> 提交任务地址: http://10.8.6.222:18518/task/
|
|
|
|
|
|
> 任务提交示例: `curl -L -X POST 'http://10.8.6.222:8526/task/' -H 'Content-Type: application/json' --data-raw '{"spider_name": "social_organ","company_name": "北京市东城区混沌创新学校","credit_no": "52110101400789098K",}'`<br>
|
|
|
相当于task_params再加入`"spider_name": "social_organ"`
|
|
|
|
|
|
|
|
|
## Taskhub调度规则说明
|
|
|
## Taskhub重试调度规则说明
|
|
|
```buildoutcfg
|
|
|
task_result=1000 # 正常获取到详情任务
|
|
|
task_result=1101 # 无结果信息
|
... | ... | |