• 在Hadoop Yarn 运行 pyspark 的一些问题


    hduser@master:~$ pyspark --master local[4]
    Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
    [GCC 5.4.0 20160609] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    18/08/16 09:13:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_   version 2.3.1
          /_/
    
    Using Python version 2.7.12 (default, Dec  4 2017 14:50:18)
    SparkSession available as 'spark'.
    >>> sc.master
    u'local[4]'
    >>> textFile=sc.textFile("file:/usr/local/spark/README.md")
    >>> textFile.count()
    103                                                                             
    >>> textFile =sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt")
    >>> textFile.count()
    1594
    >>> exit()
    hduser@master:~$ HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
    Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
    [GCC 5.4.0 20160609] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    18/08/16 09:16:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/08/16 09:16:10 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_   version 2.3.1
          /_/
    
    Using Python version 2.7.12 (default, Dec  4 2017 14:50:18)
    SparkSession available as 'spark'.
    >>> 18/08/16 09:16:46 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
    18/08/16 09:16:47 ERROR TransportClient: Failed to send RPC 7397841312553810412 to /192.168.0.102:39808: java.nio.channels.ClosedChannelException
    java.nio.channels.ClosedChannelException
    	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
    18/08/16 09:16:47 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
    java.io.IOException: Failed to send RPC 7397841312553810412 to /192.168.0.102:39808: java.nio.channels.ClosedChannelException
    	at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    	at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    	at java.lang.Thread.run(Thread.java:748)
    Caused by: java.nio.channels.ClosedChannelException
    	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
    18/08/16 09:16:47 ERROR Utils: Uncaught exception in thread Yarn application state monitor
    org.apache.spark.SparkException: Exception thrown in awaitResult: 
    	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:567)
    	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95)
    	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155)
    	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508)
    	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1755)
    	at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1931)
    	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
    	at org.apache.spark.SparkContext.stop(SparkContext.scala:1930)
    	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
    Caused by: java.io.IOException: Failed to send RPC 7397841312553810412 to /192.168.0.102:39808: java.nio.channels.ClosedChannelException
    	at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
    	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    	at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
    	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
    	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    	at java.lang.Thread.run(Thread.java:748)
    Caused by: java.nio.channels.ClosedChannelException
    	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
    
    >>> sc.master
    u'yarn'
    >>> textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/spark/python/pyspark/context.py", line 546, in textFile
        minPartitions = minPartitions or min(self.defaultParallelism, 2)
      File "/usr/local/spark/python/pyspark/context.py", line 400, in defaultParallelism
        return self._jsc.sc().defaultParallelism()
      File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
      File "/usr/local/spark/python/pyspark/sql/utils.py", line 63, in deco
        return f(*a, **kw)
      File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o30.defaultParallelism.
    : java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
    This stopped SparkContext was created at:
    
    org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    py4j.Gateway.invoke(Gateway.java:238)
    py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    py4j.GatewayConnection.run(GatewayConnection.java:238)
    java.lang.Thread.run(Thread.java:748)
    
    The currently active SparkContext was created at:
    
    (No active SparkContext.)
             
    	at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:99)
    	at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:2332)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:282)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:238)
    	at java.lang.Thread.run(Thread.java:748)
    
    >>> textFile.count()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'textFile' is not defined
    >>> 
    

      在本地运行pyspark程序查询没问题,但在Hadoop YARN 运行pyspark出现上述问题,希望有关大神看到,指点一下迷津。十分感谢~~~

    下面附上我的yarn-site.xml设置

    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
    <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8025</value>
    </property>
    <property>
    <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value>
    </property>
    <property>
    <name>yarn.resourcemanager.address</name> <value>master:8050</value>
    </property>
    
    <property>
            <name>yarn.nodemanager.vmem-check-enabled</name>
            <value>false</value>
    </property>
     
    
     
    </configuration>
    

      

    你只管努力,其他的交给天意~
  • 相关阅读:
    最简洁的Mysql修改密码,不会你打我
    两步解决jetbrains-agent-latest.zip不能拖入工作区
    JSON学习笔记
    【书评】【不推荐】《机器学习实战——基于Scikit-Learn和TensorFlow》
    weblogic更新部署
    功能网址
    jupyter快捷键
    MYSQL连接字符串参数说明
    C# 格式验证
    Supervisor
  • 原文地址:https://www.cnblogs.com/genghenggao/p/9485421.html
Copyright © 2020-2023  润新知