... | ... | @@ -216,17 +216,19 @@ task_result=8000 # 参数错误 |
|
|
|
|
|
- [ ] 爬虫直接写kafka
|
|
|
|
|
|
- [ ] 爬虫写文件logstash采集
|
|
|
- [ x ] 爬虫写文件logstash采集
|
|
|
|
|
|
## 爬虫结果目录
|
|
|
```html
|
|
|
采集文件存放路径:
|
|
|
/data/gravel_spiders/equity_penetration_qcc
|
|
|
/data/gravel_spiders/equity_penetration_qcc_login
|
|
|
```
|
|
|
|
|
|
## 归集后存放目录
|
|
|
```html
|
|
|
/data2_227/grvael_spider_result/equity_penetration_qcc
|
|
|
/data2_227/grvael_spider_result/equity_penetration_qcc_login
|
|
|
```
|
|
|
|
|
|
## logstash配置文件名称
|
... | ... | @@ -236,11 +238,12 @@ task_result=8000 # 参数错误 |
|
|
## logstash文件采集type
|
|
|
```html
|
|
|
equity_penetration_qcc
|
|
|
equity_penetration_qcc_login
|
|
|
```
|
|
|
|
|
|
## 数据归集的topic
|
|
|
```html
|
|
|
general-taxpayer
|
|
|
qcc_spider
|
|
|
```
|
|
|
|
|
|
## ES日志索引及筛选条件
|
... | ... | @@ -256,6 +259,11 @@ gravel-spider-data-* |
|
|
|
|
|
# **数据清洗**
|
|
|
|
|
|
## 直接使用topic清洗数据时的group
|
|
|
```
|
|
|
qcc_spider_etl
|
|
|
```
|
|
|
|
|
|
## 责任人
|
|
|
|
|
|
|
... | ... | |