Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
project-collie
project-collie
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 5
    • Issues 5
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • project-collieproject-collie
  • Wiki
    • Data_pump
    • Readers
  • file

Last edited by 宋志鹏 Apr 29, 2020
Page history
This is an old version of this page. You can view the most recent version or browse the history.

file

文件输入 file

从文件中读取

class参数配置为file.FileDocReader

示例:

company_name:  # 名称(自定义)
    class: file.FileDocReader
    init:
      path: "hdfs://hdp-nn-001:8020/user/data/digest_company_name/"
      formater: company_name_digest
      pattern: "*.gz"
      offset:
        path: "/home/collie/product/offset_store"
  • path:文件路径,支持hdfs、ftp和本地文件
  • formater: 格式化处理器
  • pattern: 文件名匹配模式。比如*.py匹配.py结尾的文件,*.gz匹配.gz格式压缩文件
  • offset: 读取起始的偏移量。如果读取本地文件,将offset设为True, offset文件放在path(init)目录下面。如果读取集群文件,要指定offset文件的存放目录,如示例所示。如果读取的是单个文件,一定要指定offset文件的存放目录。offset文件的命名格式为'.{offset}.db'.format(offset=name), 如示例,则offset文件的路径为:/home/collie/product/offset_store/.company_name.db
Clone repository
  • README
  • data_pump
    • data_pump
    • filters
    • filters
      • bloom
    • readers
    • readers
      • file
      • kafka
      • mongodb
      • sql
    • writers
    • writers
      • file
  • dev_guide
  • dev_manual
  • Home
  • ops
    • ansible
View All Pages