spark0.9.1集群模式执行graphx測试程序（LiveJournalPageRank,新增Connected Components）

spark0.9.1集群模式执行graphx測试程序（LiveJournalPageRank,新增Connected Components）

spark最新版公布了。之前的版本号就已经集成了graphx，这个版本号还改了一些bug。
我做了简单測试，只是网上关于集群模式执行spark资料太少了，仅仅有关于EC2（见參考资料1）的。可是还非常旧，好多命令都有变化了。非常讨厌写安装类的博客不注明当前使用软件的版本号，这是常识好不好？！

我的平台配置：

spark：0.9.1

scala：2.10.4

hadoop：1.0.4

jdk：1.7.0

master node：1

worker node：16

1. spark 0.9.1的部署

參见之前的博客

2. 下载graphx的測试程序输入集（点击下载：soc-LiveJournal1.txt.gz）

假设失效能够留言跟我要。

3. 执行graphx測试程序pagerank

./bin/run-example org.apache.spark.examples.graphx.LiveJournalPageRank spark://$MASTERIP:7077 hdfs://$HDFSIP:9000/soc-LiveJournal1.txt --numEPart=192 --output=pagerank_out

參数解释，自己看吧：

Usage: LiveJournalPageRank <master> <edge_list_file>
[--tol=<tolerance>]
The tolerance allowed at convergence (smaller => more accurate). Default is 0.001.
[--output=<output_file>]
If specified, the file to write the ranks to.
[--numEPart=<num_edge_partitions>]
The number of partitions for the graph's edge RDD. Default is 4.
[--partStrategy=RandomVertexCut | EdgePartition1D | EdgePartition2D | CanonicalRandomVertexCut]
The way edges are assigned to edge partitions. Default is RandomVertexCut.

4. 执行graphx測试程序Connected Components

该benchmark输入和pagerank能够一样。执行命令例如以下：

./bin/run-example org.apache.spark.graphx.lib.Analytics spark://$MASTERIP:7077 cc hdfs://$HDFSIP:8020/soc-LiveJournal1.txt --numIter=20 -numEPart=192

參考资料：

1. https://github.com/amplab/graphx/wiki/Launch-a-benchmarking-cluster

2. http://blog.csdn.net/qianlong4526888/article/details/21441131

3. http://spark.apache.org/docs/latest/graphx-programming-guide.html#pagerank
相关阅读:
Windows2008R2安装DNS和SQLServer200r2服务（9.18第七天）
Windows2008R2安装iis和iis下搭建web服务器（9.18 第七天）
Ubuntu 安装phpmyadmin （9.17第六天）
Ubuntu Navicat链接mysql （9.17第六天）
Spring之AOP由浅入深
 oracle并行模式（Parallel）
转：Java后端面试自我学习
 Spring Security 简介
 spring boot入门
 git--分布式版本管理系统
原文地址：https://www.cnblogs.com/wzzkaifa/p/6876556.html

spark0.9.1集群模式执行graphx測试程序（LiveJournalPageRank,新增Connected Components）

1. spark 0.9.1的部署

2. 下载graphx的測试程序输入集（点击下载：soc-LiveJournal1.txt.gz）

3. 执行graphx測试程序pagerank

4. 执行graphx測试程序Connected Components