|
|
# 流程文档
|
|
|
|
|
|
## 流程图
|
|
|
|
|
|
```plantuml
|
|
|
@startuml
|
|
|
|
|
|
file ic_crawler_bson_10.8.6.227 as bson_aliyun
|
|
|
file ic_crawler_bson_10.8.6.84 as bson_office
|
|
|
queue kafka的topic_ic_spider_all
|
|
|
database ic_ar
|
|
|
database ic_base
|
|
|
database ic_biz
|
|
|
file update_data_json
|
|
|
queue table_update_redis
|
|
|
file update_data_json_province
|
|
|
|
|
|
bson_aliyun --> bson_office: 共享盘/data_227/
|
|
|
bson_office --> kafka的topic_ic_spider_all: 所有的数据类型,bson_reader,kafka_writer
|
|
|
kafka的topic_ic_spider_all --> ic_ar: ar相关data_type,kafka_reader,sync_mysql_filter,redis_writer
|
|
|
kafka的topic_ic_spider_all --> ic_base: base相关data_type,kafka_reader,sync_mysql_filter,redis_writer
|
|
|
kafka的topic_ic_spider_all --> ic_biz: biz相关data_type,kafka_reader,sync_mysql_filter,redis_writer
|
|
|
ic_ar --> table_update_redis: 入库时将变化记录的主键写入redis
|
|
|
ic_base --> table_update_redis: 入库时将变化记录的主键写入redis
|
|
|
ic_biz --> table_update_redis: 入库时将变化记录的主键写入redis
|
|
|
table_update_redis --> update_data_json: redis_reader,udm_filter,file_writer
|
|
|
update_data_json --> update_data_json_province: 按省份分目录
|
|
|
|
|
|
@enduml
|
|
|
```
|
|
|
|
|
|
## 爬虫结果同步到mysql库
|
|
|
|
|
|
```
|
|
|
1.分data_type读到的所有bson文件写入kafka的同一topic:
|
|
|
部署地址:10.8.6.84
|
|
|
data_pump配置文件:
|
|
|
/home/collie/product/app_online_lake/data_pump/new_online/all_spider_update_lake.yml
|
|
|
supervisor配置:(29个进程)
|
|
|
/home/collie/product/app_online_lake/supervisor/ic_spider_sync_lake.conf
|
|
|
2.消费kafka更新mysql并写入redis:
|
|
|
部署地址:10.8.6.84
|
|
|
data_pump配置文件:
|
|
|
/home/collie/product/app_online_lake/data_pump/new_online/collie_all_spider_update_lake_kafka.yml
|
|
|
supervisor配置:(72个进程)
|
|
|
/home/collie/product/app_online_lake/supervisor/ic_spider_update_lake_kafka.conf
|
|
|
```
|
|
|
|
|
|
| data_type | topic | group | num_procs |
|
|
|
| ---------- | ------------- | ---------------------- | --------- |
|
|
|
| 所有的类型 | ic_spider_all | ic_spider_all_to_mysql | 72 |
|
|
|
|
|
|
## 增量数据读redis查mysql获取
|
|
|
|
|
|
```
|
|
|
部署地址:10.8.6.84
|
|
|
/home/collie/product/app_online_lake/data_pump/new_online/collie_all_spider_update_lake_redis.yml
|
|
|
supervisor配置:(37个进程)
|
|
|
/home/collie/product/app_online_lake/supervisor/ic_spider_update_lake_redis.conf
|
|
|
redis:bdp-mq-001.redis.rds.aliyuncs.com
|
|
|
db:1
|
|
|
```
|
|
|
|
|
|
## |
|
|
\ No newline at end of file |