• dremio 集成lakefs


    核心还是利用了lakefs 暴露s3 的特性,没多少技术难度,但是基于lakefs 支持git 的特性,我们可以更好的管理数据

    环境准备

    • docker-compose
    version: '3'
    services:
      lakefs:
        image: "treeverse/lakefs:${VERSION:-latest}"
        ports:
          - "8000:8000"
        depends_on:
          - "postgres"
        environment:
          - LAKEFS_AUTH_ENCRYPT_SECRET_KEY=${LAKEFS_AUTH_ENCRYPT_SECRET_KEY:-some random secret string}
          - LAKEFS_DATABASE_CONNECTION_STRING=${LAKEFS_DATABASE_CONNECTION_STRING:-postgres://lakefs:lakefs@postgres/postgres?sslmode=disable}
          - LAKEFS_BLOCKSTORE_TYPE=${LAKEFS_BLOCKSTORE_TYPE:-local}
          - LAKEFS_BLOCKSTORE_LOCAL_PATH=${LAKEFS_BLOCKSTORE_LOCAL_PATH:-/home/lakefs}
          - LAKEFS_GATEWAYS_S3_DOMAIN_NAME=${LAKEFS_GATEWAYS_S3_DOMAIN_NAME:-s3.local.lakefs.io:8000}
          - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:-}
          - LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_SECRET_KEY=${AWS_SECRET_ACCESS_KEY:-}
          - LAKEFS_LOGGING_LEVEL=${LAKEFS_LOGGING_LEVEL:-INFO}
          - LAKEFS_STATS_ENABLED
          - LAKEFS_BLOCKSTORE_S3_ENDPOINT
          - LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE
          - LAKEFS_COMMITTED_LOCAL_CACHE_DIR=${LAKEFS_COMMITTED_LOCAL_CACHE_DIR:-/home/lakefs/.local_tier}
        entrypoint: ["/app/wait-for", "postgres:5432", "--", "/app/lakefs", "run"]
      postgres:
        image: "postgres:${PG_VERSION:-11}"
        command: "-c log_min_messages=FATAL"
        environment:
          POSTGRES_USER: lakefs
          POSTGRES_PASSWORD: lakefs
        logging:
          driver: none
      dremio:
        image: dremio/dremio-oss:20.0.0
        ports:
        - "9047:9047"
        - "31010:31010"

    使用

    • 启动
    docker-compose up -d 

    demo 项目如下

    • 链接
      因为lakefs 已经暴露了标准的s3 api 我们可以使用minio 以及aws 提供的client 工具
      mc 如下:
     
    mc  config host add lakefs  http://127.0.0.1:8000 xxxx xxxxxxxx
    mc ls lakefs

    dremio 链接,就是添加s3 配置,几个需要关注的配置

    fs.s3a.path.style.access  true
    fs.s3a.endpoint http://lakefs:8000
    fs.s3a.connection.ssl.enabled false
    • 集成效果

    lakefs 会暴露项目为一个s3 的bucket,分支为文件夹

    说明

    集成上没有傻难度,但是记基于lakefs 强大版本管理能力,我们可以方便的进行数据版本管理

    参考资料

    https://docs.lakefs.io/quickstart/installing.html
    https://docs.dremio.com/data-sources/s3/
    https://github.com/rongfengliang/lakefs-dremio

  • 相关阅读:
    爬虫杂七杂八
    pycharm使用技巧
    python杂七杂八
    mysql杂七杂八
    mysql常见函数总结:
    CF1030F Putting Boxes Together
    AT2688 [ARC080C] Young Maids
    P5280 [ZJOI2019]线段树
    雨的味道
    P2572 [SCOI2010]序列操作
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/15874553.html
Copyright © 2020-2023  润新知