• Spark_飞机项目


    Spark_飞机项目

    • 首先将csv文件变成UTF-8
    scala> val flights=sc.textFile("/data/USA_Flight")
    scala> flights.take(3)
    val df = spark.read.format("csv").option("header",true).load("/data/USA_Flight")
    重新定义英文名
    scala> val df1=df.withColumn("origin_id",col("起飞机场编号"))
    起始机场编号的总数
    scala> df1.groupBy("origin_id").agg(count("*")).show(1)
    起始机场编号10个的总数(别名cnt)
    scala> df1.groupBy("origin_id").agg(count("*")).as("cnt").show(10)
    起始机场编号排名
    scala> df1.groupBy("origin_id").agg(count("*").as("cnt")).sort(desc("cnt")).show(10)
    

    rdd

    • 起始机场编号排名
    scala> val df1=df.withColumn("origin_id",col("起飞机场编号"))
    scala> val rdd=df1.select("origin_id").rdd
    scala> rdd.map(row=>(row.get(0),1)).reduceByKey(_+_).sortBy(_._2,false).collect
    rdd.延时航班的数量/总共发出航班的数量比例
    scala> val rdd3=sc.textFile("/data/USA_Flight")
    scala> val rdd3_1=rdd3.mapPartitionsWithIndex((idx,it)=>{if(idx==0) it.drop(1) else it})
    scala>val rdd3_2=rdd3_1.map(line=>line.split(",")).map(arr=>(arr(2),arr(11)))
    延时航班的数量/总共发出航班的数量
    scala>rdd3_2.groupByKey().map(comp=>(comp._1,comp._2.count(x=>x.toInt>0).toDouble/comp._2.size)).take(3)
    
    
    scala> import org.apache.spark.graphx._
    
    scala> val rdd3=sc.textFile("/data/USA_Flight")
    scala> val rdd3_1=rdd3.mapPartitionsWithIndex((idx,it)=>{if(idx==0) it.drop(1) else it})
    
    scala> val airports = rdd3_1.map(line=>line.split(",")).map(arr=>(arr(5),arr(6),arr(7),arr(8))).flatMap(x=>Array((x._1.toLong,x._2),(x._3,x._4)))
    scala> val airports=rdd3_1.map(line=>line.split(",")).map(arr=>(arr(5),arr(6),arr(7),arr(8))).flatMap(x=>Array((x._1.toLong,x._2),(x._3.toLong,x._4))).distinct
    
    scala> val arilines=rdd3_1.map(line=>line.split(",")).map(arr=>(arr(5),arr(7),arr(16)))
    scala> val airlines=rdd3_1.map(line=>line.split(",")).map(arr=>Edge(arr(5).toLong,arr(7).toLong,arr(16).toLong)).distinct
    scala> val graph=Graph(airports,arilines)
    scala> graph.vertices.collect
    

    机场数量/航线数量

    求顶点个数:
    scala> graph.numVertices
    求边的个数:
    scala> graph.numEdges
    

    计算最长的飞行航线

    • 最大的边属性
    方法一:scala> graph.edges.sortBy(edge=>edge.attr,false).take(1)
    方法二:scala> graph.triplets.sortBy(triplet=>triplet.attr,false).take(1)
    

    找出最繁忙的机场

    • 哪个机场到达航班最多
    计算顶点的入度并排序
    scala> graph.inDegrees.take(10)
    scala> graph.inDegrees.sortBy(_._2,false).take(10)
    计算出度并排序
    scala> graph.outDegrees.sortBy(_._2,false).take(1)
    

    找出最重要的飞行航线

    • PageRank
    scala> graph.pageRank(0.001).vertices.sortBy(_._2,false).take(3)
    res19: Array[(org.apache.spark.graphx.VertexId, Double)] = Array((10397,11.804830496200681), (13930,11.559339731504148), (11298,11.415597402337278))
    

    prege

    sampleRDD

    scala> val sampleRDD=sc.makeRDD(1 to 10)
    scala> sampleRDD.sample(false,0.1,10)
    

    找出最便宜的飞行航线

    • 创建顶点
    scala> val sample=graph.vertices.sample(false,0.4,100)
    scala> val first=sample.first
    first: (org.apache.spark.graphx.VertexId, String) = (10397,ATL)
    
    • 初始化源点(0)
    第一个值设为0,其他为无穷大
    scala> val initGraph=graph.mapVertices((vid,_)=>{if(vid==first._1) 0 else Double.PositiveInfinity}
    scala> initGraph.vertices.take(3)
    
    scala> val initGraph=graph.mapVertices((vid,_)=>{if(vid==first._1) 0 else Double.PositiveInfinity}).mapEdges(edge=>180+edge.attr*0.5)
    scala> initGraph.edges.take(3)
    
    scala> val pregel=initGraph.pregel(Double.PositiveInfinity)(vprog=(vid,price,new_price)=>{math.min(price,new_price)},sendMsg=(triplet)=>{if (triplet.srcAttr+triplet.attr<triplet.dstAttr) Iterator((triplet.dstId,triplet.srcAttr+triplet.attr)) else Iterator.empty},mergeMsg=(a,b)=>{math.min(a,b)})
    pregel.vertices.take(3)
    
    scala> pregel.vertices.foreach(x=>{println(first._1+" -> "+x._1+" price is : "+x._2)})
    ![1564978817966](C:Users土豆丝AppDataLocalTemp1564978817966.png)
    
  • 相关阅读:
    Python爬取中国疫情的实时数据
    文件上传
    条件查询和分页(第三周)
    全国疫情可视化图表
    Jquery的Ajax技术(第二周)
    软件工程开课博客
    求一个整数数组、环形数组中最大子数组的和
    今日所学—Android中ViewPager的使用
    今日所学—Android中ExpandableListView的使用
    你看,蚂蚁金服都上市了,程序员什么时候才能财富自由呢?
  • 原文地址:https://www.cnblogs.com/tudousiya/p/11333459.html
Copyright © 2020-2023  润新知