Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
K
kb
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2
    • Issues 2
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • kb
  • Wiki
    • Data_pump
    • Writers
  • es

Last edited by fanzx Dec 24, 2021
Page history

es

将数据保写入到ES

class参数配置为es.EsDocWriter

init 参数

配置参数 默认值 说明
index ES的index名称
doc_type 文档的类型
hosts ES的hosts
http_auth ES的hosts
doc_id 数据插入更新删除的唯一索引(ES的_id),默认为None时ES会自己生成一个,eg:{"_id": "Ru5QZXYBsqJa7tOBqcqR"},当没有定义doc_id配置时认为数据行为插入操作,即action=index
action update ES的数据更新API提供的4种操作方式
index_fields 义了这个字段时,数据中只保留index_fields定义的字段入ES
add_timestamp True 默认入ES时间, 默认开启, 会在数据中增加一个"@timestamp"字段
clear_null_field False 默认False, True时, 会把数据中的空值,None,NULL等字段从数据中剔除
update_on_create_fileds 指定只新建不更新的字段。这些字段的值一旦设置在后续的更新中不会再发生变化
append_array_fields 指定数组类型的字段。后续的更新的值会被追加到数组末尾。只保留最后的10个值
set_fields 指定数组类型的字段。后续的更新的值会被追加到数组末尾。只保留最后的10个值
timeout 10 指定超时时间单位为秒,默认timeout=10

配置样例:

  es_test_nest_year:
    class: es.EsDocWriter
    init:
      hosts: es-cn-4591blu580004eavf.elasticsearch.aliyuncs.com:9200
      http_auth: [ '{user}', '{passwd}' ]
      index: test_nest
      doc_type: doc
      doc_id: company_name_digest
      add_timestamp: False
      timeout: 20
      set_fields: ['annual_report_years', 'no_ar_submitted']
      index_fields:
        - annual_report_years
        - no_ar_submitted

操作ES中的集合、数组数据

sync_condition": {"operation": "remove"}
sync_condition": {"operation": "add"}
数据中定义集合的行为,只有remove、add2种行为(也可以不写sync_condition,默认就是add)

1.采用以下的配置:
      set_fields: ['annual_report_years', 'no_ar_submitted']
      index_fields:
        - annual_report_years
        - no_ar_submitted

输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": [2019]}
再次输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": [2019]}
备注: set_fields决定了ES中的no_ar_submitted字段中的值是去重的

输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "annual_report_years": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "annual_report_years": [2019]}
备注: set_fields决定了ES中的annual_report_years字段中的值只有一个2019

输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "annual_report_years": 2020}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "annual_report_years": [2019, 2020]}

输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019, "sync_condition": {"operation": "remove"}}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": []}
备注: sync_condition.operation=remove时,会将no_ar_submitted中的2019元素删除

2.采用以下的配置:
      append_array_fields: ['annual_report_years', 'no_ar_submitted']
      index_fields:
        - annual_report_years
        - no_ar_submitted

输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": [2019]}
再次输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": [2019, 2019]}
备注: append_array_fields决定了ES中的no_ar_submitted字段中的元素值会一直新增,es.EsDocWriter中定义了append_array_fields最多存储10个元素,超过10个时新进的元素会覆盖最老的元素。

输入: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": 2019, "sync_condition": {"operation": "remove"}}
ES结果: {"company_name_digest": "0000789d52a0f30c27bb7c7a6e78557d", "no_ar_submitted": []}
备注: sync_condition.operation=remove时,会将no_ar_submitted中的等于2019元素删除

Clone repository
  • README
  • basic_guidelines
  • basic_guidelines
    • basic_guidelines
    • dev_guide
    • project_build
    • 开发流程
  • best_practice
  • best_practice
    • AlterTable
    • RDS
    • azkaban
    • create_table
    • design
    • elasticsearch
    • elasticsearch
      • ES运维
    • logstash
View All Pages