• 鸡肋的JdbcRDD


          今天准备将mysql的数据倒腾到RDD。非常早曾经就知道有一个JdbcRDD。就想着使用一下,结果发现却是鸡肋一个。
          首先,看看JdbcRDD的定义:
     * An RDD that executes an SQL query on a JDBC connection and reads results.
     * For usage example, see test case JdbcRDDSuite.
     *
     * @param getConnection a function that returns an open Connection.
     *   The RDD takes care of closing the connection.
     * @param sql the text of the query.
     *   The query must contain two ? placeholders for parameters used to partition the results.
     *   E.g. "select title, author from books where ? <= id and id <= ?"
     * @param lowerBound the minimum value of the first placeholder
     * @param upperBound the maximum value of the second placeholder
     *   The lower and upper bounds are inclusive.
     * @param numPartitions the number of partitions.
     *   Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2,
     *   the query would be executed twice, once with (1, 10) and once with (11, 20)
     * @param mapRow a function from a ResultSet to a single row of the desired result type(s).
     *   This should only call getInt, getString, etc; the RDD takes care of calling next.
     *   The default maps a ResultSet to an array of Object.
     */
    class JdbcRDD[T: ClassTag](
        sc: SparkContext,
        getConnection: () => Connection,
        sql: String,
        lowerBound: Long,
        upperBound: Long,
        numPartitions: Int,
        mapRow: (ResultSet) => T = JdbcRDD.resultSetToObjectArray _)

    附上个样例:
    package test
    
    import java.sql.{Connection, DriverManager, ResultSet}
    import org.apache.spark.rdd.JdbcRDD
    import org.apache.spark.{SparkConf, SparkContext}
    
    object spark_mysql {
      def main(args: Array[String]) {
        //val conf = new SparkConf().setAppName("spark_mysql").setMaster("local")
        val sc = new SparkContext("local","spark_mysql")
    
        def createConnection() = {
          Class.forName("com.mysql.jdbc.Driver").newInstance()
          DriverManager.getConnection("jdbc:mysql://192.168.0.15:3306/wsmall", "root", "passwd")
        }
    
        def extractValues(r: ResultSet) = {
          (r.getString(1), r.getString(2))
        }
    
        val data = new JdbcRDD(sc, createConnection, "SELECT id,aa FROM bbb where ?

    <= ID AND ID <= ?", lowerBound = 3, upperBound =5, numPartitions = 1, mapRow = extractValues) println(data.collect().toList) sc.stop() } }


    使用的MySQL表的数据例如以下:

     
    执行结果例如以下:

     
        能够看出:JdbcRDD的sql參数要带有两个?的占位符,而这两个占位符是给參数lowerBound和參数upperBound定义where语句的边界的,假设不过这种话,还能够接受;但悲催的是參数lowerBound和參数upperBound都是Long类型的,鸡肋的JdbcRDD - mmicky - mmicky 的博客,不知道如今作为keyword或做查询的字段有多少long类型呢?不过參照JdbcRDD的源代码,用户还是能够写出符合自己需求的JdbcRDD,这算是不幸中之大幸了。


        近期一直忙于炼数成金的spark课程。没多少时间整理博客。

    特意给想深入了解spark的朋友推荐一位好友的博客http://www.cnblogs.com/cenyuhai/ 。里面有不少源代码博文,利于理解spark的内核。




  • 相关阅读:
    修改Matlab的背景颜色
    lane车道连接规则
    用汇编语言输出Hello World!
    运用ISAPI_Rewrite将asp静态化应注意路径
    广州,佛山>黄岐生活资讯网生活信息发布
    最近使用网络电话,还比较便宜
    小心中中国移动“短号集群网”的招
    正规表达式的一些知识
    ubuntu 8.04下安装和使用systemtap
    应用程序框架设计之二:分层和层间数据传递(上)
  • 原文地址:https://www.cnblogs.com/llguanli/p/8512267.html
Copyright © 2020-2023  润新知