• 【原创】大数据基础之Kudu(4)spark读写kudu


    spark2.4.3+kudu1.9

    1 批量读

    val df = spark.read.format("kudu")
          .options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test_db.test_table"))
          .load
    df.createOrReplaceTempView("tmp_table")
    spark.sql("select * from tmp_table limit 10").show()

    2 批量写

    import org.apache.kudu.spark.kudu.{KuduContext, KuduWriteOptions}
    
    val kuduMaster = "master:7051"
    val table = "impala::test_db.test_table"
    
    val kuduContext = new KuduContext(kuduMaster, sc)
    
    kuduContext.upsertRows(df, table, new KuduWriteOptions(false, true))

    3 单个读/条件读

    cd $SPARK_HOME
    bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.9.0
    
    import org.apache.kudu.client.{KuduPredicate, RowResult}
    import org.apache.kudu.spark.kudu.KuduContext
    
    val kuduMaster = "master:7051"
    val table = "impala::test_db.test_table"
    
    val kuduContext = new KuduContext(kuduMaster, sc)
    val table = kuduContext.syncClient.openTable(table)
    val predicate = KuduPredicate.newComparisonPredicate(table.getSchema().getColumn("id"),KuduPredicate.ComparisonOp.EQUAL, "testid")
    val scanner = kuduContext.syncClient.newScannerBuilder(table).addPredicate(predicate).build()
    
    scanner.hasMoreRows
    val rows = scanner.nextRows
    rows.hasNext
    val row = rows.next
    
    println(row.getString(0))

    4 单个写

    cd $SPARK_HOME
    bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.9.0
    
    import org.apache.kudu.client.{KuduPredicate, RowResult}
    import org.apache.kudu.spark.kudu.KuduContext
    import org.apache.kudu.client.SessionConfiguration
    
    val kuduMaster = "172.26.192.219:7051"
    
    val kuduContext = new KuduContext(kuduMaster, sc)
    val kuduClient = kuduContext.syncClient
    val kuduTable = kuduClient.openTable("impala::dataone_xishaoye.tbl_order_union")
    val kuduSession = kuduClient.newSession()
    
    //AUTO_FLUSH_BACKGROUND AUTO_FLUSH_SYNC MANUAL_FLUSH
    kuduSession.setFlushMode(SessionConfiguration.FlushMode.AUTO_FLUSH_SYNC)
    kuduSession.setMutationBufferSpace(1000)
    
    val insert = kuduTable.newInsert()
    val row = insert.getRow()
    row.addString(0, "hello")
    kuduSession.apply(insert)
    //kuduSession.flush

    其他:newInsert/newUpdate/newDelete/newUpsert

    5 错误定位

    如果apply之后发现修改没有生效,并且确认已经提交,可能有报错(不会抛异常),需要从OperationResponse中打印错误信息

    val opResponse = session.apply(op)
    if (opResponse != null && opResponse.hasRowError) println(opResponse.getRowError.toString)

    注意一定要使用FlushMode.AUTO_FLUSH_SYNC,详见源代码

    org.apache.kudu.client.KuduSession

        public OperationResponse apply(Operation operation) throws KuduException {
            while(true) {
                try {
                    Deferred<OperationResponse> d = this.session.apply(operation);
                    if(this.getFlushMode() == FlushMode.AUTO_FLUSH_SYNC) {
                        return (OperationResponse)d.join();
                    }
    
                    return null;
                } catch (PleaseThrottleException var5) {
                    PleaseThrottleException ex = var5;
    
                    try {
                        ex.getDeferred().join();
                    } catch (Exception var4) {
                        LOG.error("Previous batch had this exception", var4);
                    }
                } catch (Exception var6) {
                    throw KuduException.transformException(var6);
                }
            }
        }

    参考:

    https://kudu.apache.org/docs/developing.html

  • 相关阅读:
    【eoe资源】通过片段创建灵活的用户界面
    史上最全的Android开发索引帖
    临时记录
    五种开源协议的比较(BSD,Apache,GPL,LGPL,MIT)
    【转】深入探讨 Android 传感器
    Java Collections Framework Java集合框架List,Map,Set等全面介绍之概要篇
    谷歌 G1 android APK安装器 离线安装软件
    【转】請為你的 Android 程式加上 obfuscation 吧!
    【转】Android Toolchain与Bionic Libc
    用VirtualBox在XP环境下虚拟Ubuntu的过程
  • 原文地址:https://www.cnblogs.com/barneywill/p/10868038.html
Copyright © 2020-2023  润新知