Last edited by
songzp
流程文档
流程图
爬虫结果同步到mysql库
1.分data_type读到的所有bson文件写入kafka的同一topic:
部署地址:10.8.6.84
data_pump配置文件:
/home/collie/product/app_online_lake/data_pump/new_online/all_spider_update_lake.yml
supervisor配置:(29个进程)
/home/collie/product/app_online_lake/supervisor/ic_spider_sync_lake.conf
2.消费kafka更新mysql并写入redis:
部署地址:10.8.6.84
data_pump配置文件:
/home/collie/product/app_online_lake/data_pump/new_online/collie_all_spider_update_lake_kafka.yml
supervisor配置:(72个进程)
/home/collie/product/app_online_lake/supervisor/ic_spider_update_lake_kafka.conf
data_type |
topic |
group |
num_procs |
所有的类型 |
ic_spider_all |
ic_spider_all_to_mysql |
72 |
增量数据读redis查mysql获取
部署地址:10.8.6.84
/home/collie/product/app_online_lake/data_pump/new_online/collie_all_spider_update_lake_redis.yml
supervisor配置:(37个进程)
/home/collie/product/app_online_lake/supervisor/ic_spider_update_lake_redis.conf
redis:bdp-mq-001.redis.rds.aliyuncs.com
db:1