• 在standalone模式下运行yarn 0.9.0对HDFS上的数据进行计算


    1.通读http://spark.incubator.apache.org/docs/latest/spark-standalone.html

    2.在每台机器上将spark安装到/opt/spark

    3.在第一台机器上启动spark master.

    [root@jfp3-1 latest]# ./sbin/start-master.sh

    在logs目录查看日志:

    [root@jfp3-1 latest]# tail -100f logs/spark-root-org.apache.spark.deploy.master.Master-1-jfp3-1.out
    Spark Command: /usr/java/default/bin/java -cp :/opt/spark/spark-0.9.0-incubating-bin-hadoop2/conf:/opt/spark/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip jfp3-1 --port 7077 --webui-port 8080
    ========================================

    log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    14/02/21 04:59:50 INFO Master: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    14/02/21 04:59:50 INFO Master: Starting Spark master at spark://jfp3-1:7077
    14/02/21 04:59:51 INFO MasterWebUI: Started Master web UI at http://jfp3-1:8080
    14/02/21 04:59:51 INFO Master: I have been elected leader! New state: ALIVE

    启动http://jfp3-1:8080上看集群的状况

    4.在第2,3,4太机器上启动spark worker

    [root@jfp3-2 latest]# ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.0.71:7077
    log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    14/02/21 05:05:09 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    14/02/21 05:05:09 INFO Worker: Starting Spark worker jfp3-2:53344 with 32 cores, 61.9 GB RAM
    14/02/21 05:05:09 INFO Worker: Spark home: /opt/spark/latest
    14/02/21 05:05:09 INFO WorkerWebUI: Started Worker web UI at http://jfp3-2:8081
    14/02/21 05:05:09 INFO Worker: Connecting to master spark://192.168.0.71:7077...
    14/02/21 05:05:30 INFO Worker: Connecting to master spark://192.168.0.71:7077...
    14/02/21 05:05:50 INFO Worker: Connecting to master spark://192.168.0.71:7077...
    14/02/21 05:06:10 ERROR Worker: All masters are unresponsive! Giving up.

    同时在master的日志中也发现错误日志:

    14/02/21 05:06:23 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@jfp3-1:7077] -> [akka.tcp://sparkWorker@jfp3-3:53721]: Error [Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]] [
    akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]
    Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: jfp3-3/192.168.0.73:53721
    ]
    14/02/21 05:06:23 INFO Master: akka.tcp://sparkWorker@jfp3-3:53721 got disassociated, removing it.
    14/02/21 05:06:23 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@jfp3-1:7077] -> [akka.tcp://sparkWorker@jfp3-3:53721]: Error [Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]] [
    akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@jfp3-3:53721]
    Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: jfp3-3/192.168.0.73:53721
    ]

    用IP连spark master出现问题改用hostname:

    [root@jfp3-2 latest]# ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://jfp3-1:7077
    log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    14/02/21 05:08:41 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    14/02/21 05:08:41 INFO Worker: Starting Spark worker jfp3-2:60198 with 32 cores, 61.9 GB RAM
    14/02/21 05:08:41 INFO Worker: Spark home: /opt/spark/latest
    14/02/21 05:08:41 INFO WorkerWebUI: Started Worker web UI at http://jfp3-2:8081
    14/02/21 05:08:41 INFO Worker: Connecting to master spark://jfp3-1:7077...
    14/02/21 05:08:41 INFO Worker: Successfully registered with master spark://jfp3-1:7077

    5.在spark master界面上查看集群状态,发现多了3个worker

    6. 启动HDFS集群

    7.进入spark-shell界面:

    [root@jfp3-1 latest]# MASTER=spark://jfp3-1:7077 ./bin/spark-shell

    计算HDFS上的一个文件包含2144这个字符的行数

    scala> val textFile = sc.textFile("hdfs://192.168.0.71/user/shaochen/apsh/20111201/20111201/44-ABIS-APSH-1G-20111201")
    14/02/21 10:16:18 INFO MemoryStore: ensureFreeSpace(146579) called with curMem=0, maxMem=308713881
    14/02/21 10:16:18 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 143.1 KB, free 294.3 MB)
    textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12

    scala> val targetRows = textFile.filter(line => line.contains("2144"))
    targetRows: org.apache.spark.rdd.RDD[String] = FilteredRDD[2] at filter at <console>:14

    scala> targetRows.count()
    14/02/21 10:18:27 INFO FileInputFormat: Total input paths to process : 1
    14/02/21 10:18:27 INFO SparkContext: Starting job: count at <console>:17
    14/02/21 10:18:27 INFO DAGScheduler: Got job 0 (count at <console>:17) with 11 output partitions (allowLocal=false)
    14/02/21 10:18:27 INFO DAGScheduler: Final stage: Stage 0 (count at <console>:17)
    14/02/21 10:18:27 INFO DAGScheduler: Parents of final stage: List()
    14/02/21 10:18:27 INFO DAGScheduler: Missing parents: List()
    14/02/21 10:18:27 INFO DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter at <console>:14), which has no missing parents
    14/02/21 10:18:27 INFO DAGScheduler: Submitting 11 missing tasks from Stage 0 (FilteredRDD[2] at filter at <console>:14)
    14/02/21 10:18:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 11 tasks
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 2: jfp3-3 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:0 as 1716 bytes in 5 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 1: jfp3-2 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:1 as 1716 bytes in 1 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on executor 0: jfp3-4 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:2 as 1716 bytes in 0 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:3 as TID 3 on executor 2: jfp3-3 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:3 as 1716 bytes in 0 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:4 as TID 4 on executor 1: jfp3-2 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:4 as 1716 bytes in 1 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:5 as TID 5 on executor 0: jfp3-4 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:5 as 1716 bytes in 0 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:6 as TID 6 on executor 2: jfp3-3 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:6 as 1716 bytes in 0 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:7 as TID 7 on executor 1: jfp3-2 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:7 as 1716 bytes in 0 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:8 as TID 8 on executor 0: jfp3-4 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:8 as 1716 bytes in 0 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:9 as TID 9 on executor 2: jfp3-3 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:9 as 1716 bytes in 1 ms
    14/02/21 10:18:27 INFO TaskSetManager: Starting task 0.0:10 as TID 10 on executor 1: jfp3-2 (NODE_LOCAL)
    14/02/21 10:18:27 INFO TaskSetManager: Serialized task 0.0:10 as 1716 bytes in 1 ms
    14/02/21 10:18:30 INFO TaskSetManager: Finished TID 10 in 2850 ms on jfp3-2 (progress: 0/11)
    14/02/21 10:18:30 INFO DAGScheduler: Completed ResultTask(0, 10)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 5 in 3188 ms on jfp3-4 (progress: 1/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 5)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 8 in 3188 ms on jfp3-4 (progress: 2/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 8)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 1 in 3237 ms on jfp3-2 (progress: 3/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 1)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 7 in 3234 ms on jfp3-2 (progress: 4/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 7)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 2 in 3269 ms on jfp3-4 (progress: 5/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 2)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 9 in 3300 ms on jfp3-3 (progress: 6/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 9)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 4 in 3362 ms on jfp3-2 (progress: 7/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 4)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 3 in 3423 ms on jfp3-3 (progress: 8/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 3)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 6 in 3439 ms on jfp3-3 (progress: 9/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 6)
    14/02/21 10:18:31 INFO TaskSetManager: Finished TID 0 in 3458 ms on jfp3-3 (progress: 10/11)
    14/02/21 10:18:31 INFO DAGScheduler: Completed ResultTask(0, 0)
    14/02/21 10:18:31 INFO TaskSchedulerImpl: Remove TaskSet 0.0 from pool
    14/02/21 10:18:31 INFO DAGScheduler: Stage 0 (count at <console>:17) finished in 3.466 s
    14/02/21 10:18:31 INFO SparkContext: Job finished: count at <console>:17, took 3.593541623 s
    res0: Long = 12129 

    附录:

    命令脚本集合:

    启动master:

    /opt/spark/latest/sbin/start-master.sh

    启动worker:

    /opt/spark/latest/bin/spark-class org.apache.spark.deploy.worker.Worker spark://jfp3-1:7077

  • 相关阅读:
    JavaScript或jQuery模拟点击超链接和按钮
    web开发中目录路径问题的解决
    jQuery操作复选框的简单使用
    php中常用魔术方法的举例
    Code-Validator:验证经度、验证维度
    Code-Validator:验证身份证号
    Code-Validator:验证IPv6地址
    Code-Validator:验证IPv4地址
    Code-Validator:验证网址(可以匹配IPv4地址但没对IPv4地址进行格式验证;IPv6暂时没做匹配)
    Code-Validator:验证电子邮箱
  • 原文地址:https://www.cnblogs.com/littlesuccess/p/3559277.html
Copyright © 2020-2023  润新知