• dremio 的job 分析参考示例


    dremio 官方专门提供了一个章节,讲解如何进行分析,一些是一个总结,以及学习

    预备

    下载profiles,以及上传到dremio 中(个人默认space中)因为profiles 默认是json 格式的,我们直接使用dremio 查询分析数据

    需要关注的内容:

    sql 查询:sql 查询是否符合预期
    query 源: 数据是否在需要的数据源中
    计划时间: 时间是否符合预期
    非默认参数:查看是否有非模式参数使用了
    计划数据: 查看是否使用了rows ,row count 会造成比较高昂的广播,构建,构建会将数据加载到内存中
    加速数据:包含了反射的选择,系统活跃情况
    我们需要重点关注的是线程状态,通过此了解系统的等待,阻塞情况
    错误处理:dremio nodes 情况,oom
    性能考虑: 计划时间以及执行实现,线程数,row count 以及row,阻塞以及睡眠,准备以及等待时间

    operator_type 数据准备

    • 准备操作类型映射
      operator_type.json 内容如下
     
    {"NAME":"SINGLE_SENDER","ID":0}
    {"NAME":"BROADCAST_SENDER","ID":1}
    {"NAME":"FILTER","ID":2}
    {"NAME":"HASH_AGGREGATE","ID":3}
    {"NAME":"HASH_JOIN","ID":4}
    {"NAME":"MERGE_JOIN","ID":5}
    {"NAME":"HASH_PARTITION_SENDER","ID":6}
    {"NAME":"LIMIT","ID":7}
    {"NAME":"MERGING_RECEIVER","ID":8}
    {"NAME":"ORDERED_PARTITION_SENDER","ID":9}
    {"NAME":"PROJECT","ID":10}
    {"NAME":"UNORDERED_RECEIVER","ID":11}
    {"NAME":"RANGE_SENDER","ID":12}
    {"NAME":"SCREEN","ID":13}
    {"NAME":"SELECTION_VECTOR_REMOVER","ID":14}
    {"NAME":"STREAMING_AGGREGATE","ID":15}
    {"NAME":"TOP_N_SORT","ID":16}
    {"NAME":"EXTERNAL_SORT","ID":17}
    {"NAME":"TRACE","ID":18}
    {"NAME":"UNION","ID":19}
    {"NAME":"OLD_SORT","ID":20}
    {"NAME":"PARQUET_ROW_GROUP_SCAN","ID":21}
    {"NAME":"HIVE_SUB_SCAN","ID":22}
    {"NAME":"SYSTEM_TABLE_SCAN","ID":23}
    {"NAME":"MOCK_SUB_SCAN","ID":24}
    {"NAME":"PARQUET_WRITER","ID":25}
    {"NAME":"DIRECT_SUB_SCAN","ID":26}
    {"NAME":"TEXT_WRITER","ID":27}
    {"NAME":"TEXT_SUB_SCAN","ID":28}
    {"NAME":"JSON_SUB_SCAN","ID":29}
    {"NAME":"INFO_SCHEMA_SUB_SCAN","ID":30}
    {"NAME":"COMPLEX_TO_JSON","ID":31}
    {"NAME":"PRODUCER_CONSUMER","ID":32}
    {"NAME":"HBASE_SUB_SCAN","ID":33}
    {"NAME":"WINDOW","ID":34}
    {"NAME":"NESTED_LOOP_JOIN","ID":35}
    {"NAME":"AVRO_SUB_SCAN","ID":36}
    {"NAME":"MONGO_SUB_SCAN","ID":37}
    {"NAME":"ELASTICSEARCH_SUB_SCAN","ID":38}
    {"NAME":"ELASTICSEARCH_AGGREGATOR_SUB_SCAN","ID":39}
    {"NAME":"FLATTEN","ID":40}
    {"NAME":"EXCEL_SUB_SCAN","ID":41}
    {"NAME":"ARROW_SUB_SCAN","ID":42}
    {"NAME":"ARROW_WRITER","ID":43}
    {"NAME":"JSON_WRITER","ID":44}
    {"NAME":"VALUES_READER","ID":45}
    {"NAME":"CONVERT_FROM_JSON","ID":46}
    {"NAME":"JDBC_SUB_SCAN","ID":47}
    {"NAME":"DICTIONARY_LOOKUP","ID":48}
    {"NAME":"WRITER_COMMITTER","ID":49}
    {"NAME":"ROUND_ROBIN_SENDER","ID":50}

    效果

    参考查询

    • 查询行数分析
    SELECT majorFragmentId, operatorId, op_type.name, sum(records) records
    -- minorFragmentId, nested_4.minorFragmentProfile.maxMemoryUsed AS maxMemoryUsed, nested_4.minorFragmentProfile.memoryUsed AS memoryUsed, nested_4.endpoint.address AS address, nested_4.endpoint.maxDirectMemory AS maxDirectMemory
    FROM (
      SELECT minorFragmentProfile, nested_3.minorFragmentProfile.endpoint AS endpoint, minorFragmentId, operatorType, operatorId, nested_3.inputProfile.records AS records, majorFragmentId
      FROM (
        SELECT minorFragmentProfile, nested_2.minorFragmentProfile.minorFragmentId AS minorFragmentId, operatorProfile, nested_2.operatorProfile.operatorType AS operatorType, nested_2.operatorProfile.operatorId AS operatorId, flatten(nested_2.operatorProfile.inputProfile) AS inputProfile, majorFragmentId
        FROM (
          SELECT fragmentProfile, minorFragmentProfile, flatten(nested_1.minorFragmentProfile.operatorProfile) AS operatorProfile, majorFragmentId
          FROM (
            SELECT fragmentProfile, flatten(nested_0.fragmentProfile.minorFragmentProfile) AS minorFragmentProfile, nested_0.fragmentProfile.majorFragmentId AS majorFragmentId
            FROM (
              SELECT flatten(fragmentProfile) AS fragmentProfile
              FROM "@dalong"."profile_attempt_0" AS "profile_attempt_0"
            ) nested_0
          ) nested_1
        ) nested_2
      ) nested_3
    ) nested_4, "@dalong".operator_type op_type
    where nested_4.operatorType = op_type.id
    group by majorFragmentId, operatorId, op_type.name
    order by majorFragmentId, operatorId

    效果

    • 查询内存情况
    SELECT nested_2.endpoint.address AS address, sum(nested_2.minorFragmentProfile.maxMemoryUsed)/1024/1024 AS maxMemoryUsed_MB
    -- nested_2.minorFragmentProfile.memoryUsed AS memoryUsed, nested_2.minorFragmentProfile.maxMemoryUsed AS maxMemoryUsed, nested_2.endpoint.address AS address, nested_2.endpoint.maxDirectMemory AS maxDirectMemory, nested_2.endpoint.provisionId AS provisionId, majorFragmentId
    FROM (
     SELECT fragmentProfile, minorFragmentProfile, nested_1.minorFragmentProfile.endpoint AS endpoint, majorFragmentId
     FROM (
       SELECT fragmentProfile, flatten(nested_0.fragmentProfile.minorFragmentProfile) AS minorFragmentProfile, nested_0.fragmentProfile.majorFragmentId AS majorFragmentId
       FROM (
         SELECT flatten(fragmentProfile) AS fragmentProfile
         FROM "@dalong"."profile_attempt_0" AS "profile_attempt_0"
       ) nested_0
     ) nested_1
    ) nested_2
    group by nested_2.endpoint.address
    order by nested_2.endpoint.address

    效果

    说明

    对于job 的分析可以结合apache drill 学习,dremio 文档对于这部分介绍还是太简单,推荐结合drill 学习,同时默认32k 数据太小,需要调整下
    参考命令

     
    alter system set limits.single_field_size_bytes = 99000;

    参考资料

    https://docs.dremio.com/software/jobs/job-profile/
    https://docs.dremio.com/software/jobs/analyzing-profiles/
    https://docs.dremio.com/software/jobs/analyzing-profiles-attempt-0/
    https://docs.dremio.com/software/jobs/analyzing-profiles-mappings-for-examples/
    https://drill.apache.org/docs/query-profiles/
    https://drill.apache.org/docs/query-profile-column-descriptions/

  • 相关阅读:
    Tomcat在服务开始时去调用某个指定的类
    不能上网!???!!!
    (转载)攻击方式学习之(1) 跨站式脚本(CrossSite Scripting)
    Oracle中password file的作用及说明
    Oracle实例参数文件
    使用struts1.x标签<logic:iterate>来输出map,list
    转 remote_login_passwordfile参数和命令文件的关系
    9个重要的视图
    Oracle 关于杀掉数据库session
    Connection reset by peer: socket write error错误分析及解决
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/16327142.html
Copyright © 2020-2023  润新知