• [Spark][Python]Spark Join 小例子


    [training@localhost ~]$ hdfs dfs -cat people.json

    {"name":"Alice","pcode":"94304"}
    {"name":"Brayden","age":30,"pcode":"94304"}
    {"name":"Carla","age":19,"pcoe":"10036"}
    {"name":"Diana","age":46}
    {"name":"Etienne","pcode":"94104"}
    [training@localhost ~]$

    hdfs dfs -cat pcodes.json

    {"pcode":"10036","city":"New York","state":"NY"}
    {"pcode:"87501","city":"Santa Fe","state":"NM"}
    {"pcode":"94304","city":"Palo Alto","state":"CA"}
    {"pcode":"94104","city":"San Francisco","state":"CA"}

    sqlContext = HiveContext(sc)
    peopleDF = sqlContext.read.json("people.json")

    sqlContext = HiveContext(sc)
    pcodesDF = sqlContext.read.json("pcodes.json")

    mydf001=peopleDF.join(pcodesDF,"pcode")

    mydf001.limit(5).show()

    +-----+----+-------+----+---------------+-------------+-----+
    |pcode| age| name|pcoe|_corrupt_record| city|state|
    +-----+----+-------+----+---------------+-------------+-----+
    |94304|null| Alice|null| null| Palo Alto| CA|
    |94304| 30|Brayden|null| null| Palo Alto| CA|
    |94104|null|Etienne|null| null|San Francisco| CA|
    +-----+----+-------+----+---------------+-------------+-----+

  • 相关阅读:
    HashMap源码分析
    LinkedList源码分析
    ArrayList源码学习
    Java容器知识总结
    Collections 工具类和 Arrays 工具类常见方法
    Java基础知识
    MySQL高级之索引优化分析
    MySQL命令大全
    Java IO
    SpringCloud笔记
  • 原文地址:https://www.cnblogs.com/gaojian/p/7630003.html
Copyright © 2020-2023  润新知