• spark运行java-jar:Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs


    今天碰到的一个 spark问题,困扰好久才解决

    首先我的spark集群部署使用的部署包是官方提供的

    spark-1.0.2-bin-hadoop2.tgz

    部署在hadoop集群上。

    在运行java jar包的时候使用命令

    java -jar chinahadoop-1.0-SNAPSHOT.jar  chinahadoop-1.0-SNAPSHOT.jar  hdfs://node1:8020/user/ning/data.txt /user/ning/output

    出现了如下错误

    14/08/23 23:18:55 INFO AppClient$ClientActor: Executor updated: app-20140823231852-0000/1 is now RUNNING
    before count:MappedRDD[1] at textFile at Analysis.scala:35
    Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
     at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421)
     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
     at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
     at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
     at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
     at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
     at scala.Option.getOrElse(Option.scala:120)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
     at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
     at scala.Option.getOrElse(Option.scala:120)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
     at org.apache.spark.rdd.RDD.count(RDD.scala:861)
     at cn.chinahadoop.spark.Analysis$.main(Analysis.scala:39)
     at cn.chinahadoop.spark.Analysis.main(Analysis.scala)

    在网上找了好久都没有找到答案,最终在我的maven配置文件 pom.xml添加上这么一行,终于运行通过

            <transformer
                                            implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                        <resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>
             </transformer> 
    

    maven的全部配置如下

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>chinahadoop</groupId>
        <artifactId>chinahadoop</artifactId>
        <version>1.0-SNAPSHOT</version>
    
        <repositories>
            <repository>
                <id>Akka repository</id>
                <url>http://repo.akka.io/releases</url>
            </repository>
        </repositories>
    
        <build>
            <sourceDirectory>src/main/scala/</sourceDirectory>
            <testSourceDirectory>src/test/scala/</testSourceDirectory>
    
            <plugins>
                <plugin>
                    <groupId>org.scala-tools</groupId>
                    <artifactId>maven-scala-plugin</artifactId>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                        </execution>
                    </executions>
                    <configuration>
                        <scalaVersion>2.10.3</scalaVersion>
                    </configuration>
                </plugin>
    
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>2.2</version>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
                                <filters>
                                    <filter>
                                        <artifact>*:*</artifact>
                                        <excludes>
                                            <exclude>META-INF/*.SF</exclude>
                                            <exclude>META-INF/*.DSA</exclude>
                                            <exclude>META-INF/*.RSA</exclude>
                                        </excludes>
                                    </filter>
                                </filters>
                                <transformers>
                                    <transformer
                                            implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                        <resource>reference.conf</resource>
                                    </transformer>
    
                                    <transformer
                                            implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                        <manifestEntries>
                                            <Main-Class>cn.chinahadoop.spark.Analysis</Main-Class>
                                        </manifestEntries>
                                    </transformer>
                                    <transformer
                                            implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                        <resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>
                                    </transformer>
                                </transformers>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.10</artifactId>
                <version>1.0.2</version>
            </dependency>
    
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>2.4.1</version>
            </dependency>
    
    
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-streaming_2.10</artifactId>
                <version>1.0.2</version>
            </dependency>
    
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs</artifactId>
                <version>2.4.1</version>
            </dependency>
    
    
        </dependencies>
    
    </project>
  • 相关阅读:
    iqueryable lambda表达式
    win10安装后耳机有声音而外放无声音
    Coursera机器学习week11 笔记
    Coursera机器学习week10 单元测试
    Coursera机器学习week10 笔记
    Coursera机器学习week9 编程作业
    Coursera机器学习week9 单元测试
    Coursera机器学习week9 笔记
    Coursera机器学习week8 编程作业
    Coursera机器学习week8 单元测试
  • 原文地址:https://www.cnblogs.com/ningbj/p/3932456.html
Copyright © 2020-2023  润新知