• 鸡肋的JdbcRDD


          今天准备将mysql的数据倒腾到RDD。非常早曾经就知道有一个JdbcRDD。就想着使用一下,结果发现却是鸡肋一个。
          首先,看看JdbcRDD的定义:
     * An RDD that executes an SQL query on a JDBC connection and reads results.
     * For usage example, see test case JdbcRDDSuite.
     *
     * @param getConnection a function that returns an open Connection.
     *   The RDD takes care of closing the connection.
     * @param sql the text of the query.
     *   The query must contain two ? placeholders for parameters used to partition the results.
     *   E.g. "select title, author from books where ? <= id and id <= ?"
     * @param lowerBound the minimum value of the first placeholder
     * @param upperBound the maximum value of the second placeholder
     *   The lower and upper bounds are inclusive.
     * @param numPartitions the number of partitions.
     *   Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2,
     *   the query would be executed twice, once with (1, 10) and once with (11, 20)
     * @param mapRow a function from a ResultSet to a single row of the desired result type(s).
     *   This should only call getInt, getString, etc; the RDD takes care of calling next.
     *   The default maps a ResultSet to an array of Object.
     */
    class JdbcRDD[T: ClassTag](
        sc: SparkContext,
        getConnection: () => Connection,
        sql: String,
        lowerBound: Long,
        upperBound: Long,
        numPartitions: Int,
        mapRow: (ResultSet) => T = JdbcRDD.resultSetToObjectArray _)

    附上个样例:
    package test
    
    import java.sql.{Connection, DriverManager, ResultSet}
    import org.apache.spark.rdd.JdbcRDD
    import org.apache.spark.{SparkConf, SparkContext}
    
    object spark_mysql {
      def main(args: Array[String]) {
        //val conf = new SparkConf().setAppName("spark_mysql").setMaster("local")
        val sc = new SparkContext("local","spark_mysql")
    
        def createConnection() = {
          Class.forName("com.mysql.jdbc.Driver").newInstance()
          DriverManager.getConnection("jdbc:mysql://192.168.0.15:3306/wsmall", "root", "passwd")
        }
    
        def extractValues(r: ResultSet) = {
          (r.getString(1), r.getString(2))
        }
    
        val data = new JdbcRDD(sc, createConnection, "SELECT id,aa FROM bbb where ?

    <= ID AND ID <= ?", lowerBound = 3, upperBound =5, numPartitions = 1, mapRow = extractValues) println(data.collect().toList) sc.stop() } }


    使用的MySQL表的数据例如以下:

     
    执行结果例如以下:

     
        能够看出:JdbcRDD的sql參数要带有两个?的占位符,而这两个占位符是给參数lowerBound和參数upperBound定义where语句的边界的,假设不过这种话,还能够接受;但悲催的是參数lowerBound和參数upperBound都是Long类型的,鸡肋的JdbcRDD - mmicky - mmicky 的博客,不知道如今作为keyword或做查询的字段有多少long类型呢?不过參照JdbcRDD的源代码,用户还是能够写出符合自己需求的JdbcRDD,这算是不幸中之大幸了。


        近期一直忙于炼数成金的spark课程。没多少时间整理博客。

    特意给想深入了解spark的朋友推荐一位好友的博客http://www.cnblogs.com/cenyuhai/ 。里面有不少源代码博文,利于理解spark的内核。




  • 相关阅读:
    Go 语言简介(下)— 特性
    Array.length vs Array.prototype.length
    【转】javascript Object使用Array的方法
    【转】大话程序猿眼里的高并发架构
    【转】The magic behind array length property
    【转】Build Your own Simplified AngularJS in 200 Lines of JavaScript
    【转】在 2016 年做 PHP 开发是一种什么样的体验?(一)
    【转】大话程序猿眼里的高并发
    php通过token验证表单重复提交
    windows 杀进程软件
  • 原文地址:https://www.cnblogs.com/llguanli/p/8512267.html
Copyright © 2020-2023  润新知