... | ... | @@ -1670,26 +1670,46 @@ ec-spider-others-data -> 12个分区 |
|
|
# **数据清洗**
|
|
|
|
|
|
## 责任人
|
|
|
|
|
|
```buildoutcfg
|
|
|
李子健
|
|
|
```
|
|
|
|
|
|
## 代码地址
|
|
|
|
|
|
```buildoutcfg
|
|
|
http://tech.pingansec.com/granite/project-ec/app_conv_ebusiness/udms/conv_pdd_kw_goods/__init__.py
|
|
|
```
|
|
|
|
|
|
## 部署地址
|
|
|
<!--机器及线上代码地址-->
|
|
|
```buildoutcfg
|
|
|
部署机器:10.8.6.47
|
|
|
部署目录:/home/collie/product/app_conv_ebusiness
|
|
|
```
|
|
|
|
|
|
|
|
|
## 部署方法及说明
|
|
|
<!--运行方法及运行命令、supervisor配置、supervisor的program等-->
|
|
|
- [ ] crontab + data_pump
|
|
|
- [ ] supervisor + data_pump
|
|
|
- [X] supervisor + data_pump
|
|
|
- [ ] supervisor + consumer
|
|
|
|
|
|
```buildoutcfg
|
|
|
配置文件:/etc/supervisord.d/collie/pdd_data_kafka_to_mysql.conf
|
|
|
进程数:12
|
|
|
部署机器:10.8.6.47
|
|
|
group:pdd_find_goods
|
|
|
```
|
|
|
|
|
|
## 数据接收来源
|
|
|
<!--来源于kafka还是归集的文件、topic的group?-->
|
|
|
|
|
|
```buildoutcfg
|
|
|
kafka:
|
|
|
topic:ec-spider-others-data
|
|
|
consumer_group:pinduoduo_goods_search_etl
|
|
|
```
|
|
|
|
|
|
## 数据存储表地址
|
|
|
|
|
|
* 数据库地址:
|
|
|
* 表名: |
|
|
\ No newline at end of file |
|
|
* 数据库地址:bdp-ec.rwlb.rds.aliyuncs.com
|
|
|
* 库名:bdp_pdd
|
|
|
* 表名:pdd_found_goods |
|
|
\ No newline at end of file |