Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
K
kb
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2
    • Issues 2
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • kb
  • Wiki
    • Business_doc
  • get_update_data

get_update_data · Changes

Page history
update: 增量数据提取 authored Sep 28, 2021 by 李子健's avatar 李子健
Hide whitespace changes
Inline Side-by-side
Showing with 254 additions and 0 deletions
+254 -0
  • business_doc/get_update_data.md business_doc/get_update_data.md +254 -0
  • No files found.
business_doc/get_update_data.md 0 → 100644
View page @ 76f9e010
# 增量数据提取
## 方案流程
```plantuml
@startuml
file maxwell
queue topic_maxwell_test
database 监控的库表
file 原始数据消费组
file 处理数据消费组
file 原始数据落地
file 处理后数据落地
监控的库表 --> maxwell: 所有库表的binlog数据,统一采用producer为kafka
maxwell --> topic_maxwell_test: binlog数据统一入kafka的topic:maxwell_test
topic_maxwell_test --> 原始数据消费组: 对于binlog数据不做处理直接持久化到磁盘
topic_maxwell_test --> 处理数据消费组: 根据需求对数据进行清洗转换后持久化到指定目录
原始数据消费组 --> 原始数据落地: 持久化到10.8.6.227:/data4/data_sync/maxwell_binlog
处理数据消费组 --> 处理后数据落地: 持久化到10.8.6.227:/data/data_sync/lake_ic
@enduml
```
```
由maxwell监控业务需求的mysql库表,将binlog数据发送到kafka,以tablename的hash值作为partiti_key,通过同一个表的数据落在同一个分区中以保证binlog数据的时序性。由data_pump消费kafka数据,分两个方面,一方面对数据不做处理,将原始binlog文件持久化到磁盘做原始数据存储;一方面根据业务需求增加自定义udm模块对数据进行清洗转换,后持久化到指定目录。
```
## 监控表分布详情
| 服务器 | database | table | 数量 |
| -------------------------------------- | --------------- | ------------------------------- | ---- |
| **bdp-rds-001.mysql.rds.aliyuncs.com** | utn_ng_risk | risk_environment_punish | 12 |
| | | risk_tax_punish | 12 |
| | | risk_xianxiao | |
| | | risk_zhixing | |
| | | risk_zhongben | |
| | | risk_lawsuit | |
| | | risk_lawsuit_list | |
| | | risk_court_announcement | |
| | | risk_court_announcement_list | |
| | | risk_court_notice | |
| | | risk_court_notice_list | |
| | | company_own_tax_info | |
| **bdp-rds-005.mysql.rds.aliyuncs.com** | utn_ng_tm | sys_cat | 28 |
| | | ip_trademark_flow_list | 5 |
| | | ip_trademark_category_list | |
| | | ip_trademark_applicant_list | |
| | | ip_trademark | |
| | utn_ng_ip | ip_copyright_works_list | 7 |
| | | ip_copyright_works | |
| | | ip_copyright_software_list | |
| | | ip_copyright_software | |
| | | ip_patent_legal_status | |
| | | ip_patent | |
| | | ip_patent_applicant_list | |
| | utn_ng_emp | company_employment | 1 |
| | utn_stock | stock_financial_analysis | 11 |
| | | stock_executive_new | |
| | | stock_base_new | |
| | | stock_announcement | |
| | | stock_allotment | |
| | | stock_bonus | |
| | | stock_issue | |
| | | stock_holder | |
| | | tradable_stock_holder | |
| | | stock_holding | |
| | | stock_equity_change | |
| | utn_ng_qmp_info | qimingpian_history_rongzi | 4 |
| | | qimingpian_jingpin_map | |
| | | qimingpian_product | |
| | | qimingpian_team_member | |
| **bdp-rds-003.mysql.rds.aliyuncs.com** | utn_data | tb_icp_base_info | 13 |
| | utn_ng_bid | company_bid_main | 1 |
| | | company_bid_entity_list_new | 2 |
| | utn_ng_biz_info | tb_general_taxpayer | 6 |
| | | tax_credit | |
| | | land_result_announcement | |
| | | tb_certificate | |
| | | wechat_public_num | |
| | | company_logo | |
| | utn_risk | tb_judicial_sale | 3 |
| | | risk_shixin | |
| | | tb_executed_person | |
| | utn_ng_biz | company_customs_business_credit | 1 |
| **10.8.6.87** | utn_special | special_enterprise | 5 |
| | | special_foundation | 5 |
| | | special_law_office | |
| | | special_social_organ | |
| | | special_trade_union | |
| **bdp-rds-007.mysql.rds.aliyuncs.com** | utn_ic | tb_company_sme | 2 |
| | | tb_tag_company_qualifications | 2 |
| **bdp-rds-801.mysql.rds.aliyuncs.com** | bdp_company | company_contact_details | 1 |
| **bdp-rds-t02.mysql.rds.aliyuncs.com** | ic_ar | annual_report_asset | 37 |
| | | annual_report_base | 9 |
| | | annual_report_change | |
| | | annual_report_equity_change | |
| | | annual_report_guarantee | |
| | | annual_report_invest | |
| | | annual_report_partner | |
| | | annual_report_social_security | |
| | | annual_report_website | |
| | ic_base | company_base | 9 |
| | | company_branch | |
| | | company_change | |
| | | company_employee | |
| | | company_history_name | |
| | | company_industry | |
| | | company_partner | |
| | | company_partner_pay | |
| | | company_partner_realpay | |
| | ic_biz | company_abnormal | 19 |
| | | company_allow | |
| | | company_chattel | |
| | | company_chattel_change | |
| | | company_chattel_mortgage | |
| | | company_chattel_pawn | |
| | | company_check | |
| | | company_clear | |
| | | company_illegal | |
| | | company_intellectual | |
| | | company_intellectual_change | |
| | | company_justice | |
| | | company_justice_equity_change | |
| | | company_justice_freeze | |
| | | company_justice_invalid | |
| | | company_justice_keep_freeze | |
| | | company_justice_unfreeze | |
| | | company_pledge | |
| | | company_punish | |
## 配置文件分布
| 类型 | 文件名 | database | table数量 |
| ---------- | ---------------------- | --------------- | --------- |
| 原始数据 | | | |
| | maxwell_update_file_01 | | 19 |
| | | ic_biz | 19 |
| | maxwell_update_file_02 | | 14 |
| | | utn_ng_risk | 12 |
| | | utn_ic | 2 |
| | maxwell_update_file_03 | | 9 |
| | | ic_ar | 9 |
| | maxwell_update_file_04 | | 14 |
| | | ic_base | 9 |
| | | utn_ng_tm | 5 |
| | maxwell_update_file_05 | | 14 |
| | | utn_stock | 11 |
| | | utn_risk | 3 |
| | maxwell_update_file_06 | | 14 |
| | | utn_ng_ip | 7 |
| | | utn_ng_biz_info | 6 |
| | | bdp_company | 1 |
| | maxwell_update_file_07 | | 14 |
| | | utn_ng_emp | 1 |
| | | utn_ng_qmp_info | 4 |
| | | utn_data | 1 |
| | | utn_ng_bid | 2 |
| | | utn_ng_biz | 1 |
| | | utn_special | 5 |
| 处理的数据 | | | |
| | conv_binlog_file_01 | | 19 |
| | | ic_biz | 19 |
| | conv_binlog_file_02 | | 14 |
| | | utn_ng_risk | 12 |
| | | utn_ic | 2 |
| | conv_binlog_file_03 | | 9 |
| | | ic_ar | 9 |
| | conv_binlog_file_04 | | 9 |
| | | ic_base | 9 |
| | conv_binlog_file_05 | | 19 |
| | | utn_stock | 11 |
| | | utn_risk | 3 |
| | | utn_ng_tm | 5 |
| | conv_binlog_file_06 | | 14 |
| | | utn_ng_ip | 7 |
| | | utn_ng_biz_info | 6 |
| | | bdp_company | 1 |
| | conv_binlog_file_07 | | 14 |
| | | utn_ng_emp | 1 |
| | | utn_ng_qmp_info | 4 |
| | | utn_data | 1 |
| | | utn_ng_bid | 2 |
| | | utn_ng_biz | 1 |
| | | utn_special | 5 |
## kafka
```shell
topic:maxwell_test
consumer_group:
maxwell_conv_file_01
maxwell_conv_file_02
maxwell_conv_file_03
maxwell_conv_file_04
maxwell_conv_file_05
maxwell_conv_file_06
maxwell_conv_file_07
maxwell_update_file_01
maxwell_update_file_02
maxwell_update_file_03
maxwell_update_file_04
maxwell_update_file_05
maxwell_update_file_06
maxwell_update_file_07
```
## maxwell
* [maxwell](best_practice/maxwell)
```shell
服务器:10.8.6.74
目录/opt/maxwell/
配置文件/opt/maxwell/conf/
-rw-r--r-- 1 root root 12233 Sep 27 09:56 001_config.properties
-rw-r--r-- 1 root root 12281 Sep 27 09:56 003_config.properties
-rw-r--r-- 1 root root 12874 Sep 27 09:56 005_config.properties
-rw-r--r-- 1 root root 11821 Sep 27 09:56 007_config.properties
-rw-r--r-- 1 root root 11788 Sep 27 09:56 801_config.properties
-rw-r--r-- 1 root root 11929 Sep 27 09:56 87_config.properties
-rw-r--r-- 1 root root 11770 Sep 27 09:56 t02_config.properties
7个实例同时运行
/opt/maxwell/bin/maxwell --config /opt/maxwell/conf/001_config.properties --daemon
将监控到的binlog信息写入kafka的toupic:maxwell_test
```
## data_pump
```shell
服务器:10.8.6.47
app_maxwell:/home/collie/product
读取kafka转存原始binlog:
supervisor文件:/etc/supervisord.d/collie/maxwell_update_file.conf
7个进程读取kafka后转存到/data4_227/data_sync/maxwell_binlog
读取kafka转换数据格式转存:
supervisor文件:/etc/supervisord.d/collie/conv_binlog_file.conf
7个进程读取kafka后转存到/data4_227/data_sync/maxwell_data
```
## 数据存放目录
```shell
10.8.6.227
原始binlog:
/data4/data_sync/maxwell_binlog
转换数据格式后:
/data/data_sync/lake_ic
```
Clone repository
  • README
  • basic_guidelines
  • basic_guidelines
    • basic_guidelines
    • dev_guide
    • project_build
    • 开发流程
  • best_practice
  • best_practice
    • AlterTable
    • RDS
    • azkaban
    • create_table
    • design
    • elasticsearch
    • elasticsearch
      • ES运维
    • logstash
View All Pages