• windows eclipse直接访问远程linux hadoop开发环境配置(符合实际开发的做法)


    CDH 5.x搭建请参考CentOS 7离线安装CDH 5.16.1完全指南(含各种错误处理)

    如果使用的是cloudera quickstart vm,则只能在linux服务器中使用eclipse提交,无法远程访问(主要是quickstart绑定的所有ip都是localhost所致,所以最好还是自己搭建一个单机的hadoop环境)。

    安装包下载

    hadoop-2.6.5.tar.gz(最好是和服务器版本保持一致,避免出现各种版本不匹配导致的接口不匹配问题)

    解压

    hadoop.dll-and-winutils.exe-for-hadoop2.7.3-on-windows_X64-master.zip

    解压,拷贝到hadoop/bin目录,如下:

    拷贝hadoop.dll到c:windowssystem32目录。

    hadoop-eclipse-plugin-2.6.0.jar

    拷贝到eclispe/plugins目录,如下:

    eclipse开发环境配置

    拷贝hadoop-2.6.5etchadoop目录下的log4j.properties和core-site.xml到项目resources目录,如下:

    core-site.xml中的内容如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://192.168.223.150:8020</value>
        </property>
    </configuration>

    其实只要增加hdfs位置即可。注:如果在代码中写死服务器地址的话,这个配置文件是可选的。

    指定项目的外部库,如下:

    maven依赖配置:

            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs</artifactId>
                <version>2.6.0</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>2.6.0</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>2.6.0</version>
            </dependency>

    parquet依赖:

                <groupId>org.apache.parquet</groupId>
                <artifactId>parquet</artifactId>
                <version>1.8.1</version>
                <type>pom</type>
            </dependency>
            <dependency>
                <groupId>org.apache.parquet</groupId>
                <artifactId>parquet-common</artifactId>
                <version>1.8.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.parquet</groupId>
                <artifactId>parquet-encoding</artifactId>
                <version>1.8.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.parquet</groupId>
                <artifactId>parquet-column</artifactId>
                <version>1.8.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.parquet</groupId>
                <artifactId>parquet-hadoop</artifactId>
                <version>1.8.1</version>
            </dependency>

    hadoop location配置:

    上述配置完成后,就可以在本地开发hadoop服务,直接提交到远程HDFS执行了,如下:

    package hadoop;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.parquet.example.data.Group;
    import org.apache.parquet.example.data.simple.SimpleGroupFactory;
    import org.apache.parquet.hadoop.ParquetOutputFormat;
    import org.apache.parquet.hadoop.example.GroupWriteSupport;
    import java.io.IOException;
    import java.util.Random;
    import java.util.StringTokenizer;
    import java.util.UUID;
    /**
     * 
    
    * <p>Title: ParquetNewMR</p>  
    
    * <p>Description: </p>  
    
    * @author zjhua
    
    * @date 2019年4月7日
     */
    public class ParquetNewMR {
     
        public static class WordCountMap extends
                Mapper<LongWritable, Text, Text, IntWritable> {
     
            private final IntWritable one = new IntWritable(1);
            private Text word = new Text();
            @Override
            public void map(LongWritable key, Text value, Context context)
                    throws IOException, InterruptedException {
                String line = value.toString();
                StringTokenizer token = new StringTokenizer(line);
                while (token.hasMoreTokens()) {
                    word.set(token.nextToken());
                    context.write(word, one);
                }
            }
        }
     
        public static class WordCountReduce extends
                Reducer<Text, IntWritable, Void, Group> {
            private SimpleGroupFactory factory;
            @Override
            public void reduce(Text key, Iterable<IntWritable> values,
                               Context context) throws IOException, InterruptedException {
                int sum = 0;
                for (IntWritable val : values) {
                    sum += val.get();
                }
                Group group = factory.newGroup()
                        .append("name",  key.toString())
                        .append("age", sum);
                context.write(null,group);
            }
     
            @Override
            protected void setup(Context context) throws IOException, InterruptedException {
                super.setup(context);
                factory = new SimpleGroupFactory(GroupWriteSupport.getSchema(context.getConfiguration()));
     
            }
        }
     
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            String writeSchema = "message example {
    " +
                    "required binary name;
    " +
                    "required int32 age;
    " +
                    "}";
            conf.set("parquet.example.schema",writeSchema);
    //        conf.set("dfs.client.use.datanode.hostname", "true");
     
            Job job = new Job(conf);
            job.setJarByClass(ParquetNewMR.class);
            job.setJobName("parquet");
     
            String in = "hdfs://192.168.223.150:8020/user/hadoop1/wordcount/input";
            String out = "hdfs://192.168.223.150:8020/user/hadoop1/pq_out_" + UUID.randomUUID().toString();
     
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(IntWritable.class);
     
            job.setOutputValueClass(Group.class);
     
            job.setMapperClass(WordCountMap.class);
            job.setReducerClass(WordCountReduce.class);
     
            job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(ParquetOutputFormat.class);
     
            FileInputFormat.addInputPath(job, new Path(in));
            ParquetOutputFormat.setOutputPath(job, new Path(out));
            ParquetOutputFormat.setWriteSupportClass(job, GroupWriteSupport.class);
     
            job.waitForCompletion(true);
        }
    }

    输出如下:

    19/04/20 13:15:12 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    19/04/20 13:15:12 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    19/04/20 13:15:13 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/04/20 13:15:13 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
    19/04/20 13:15:13 INFO input.FileInputFormat: Total input paths to process : 3
    19/04/20 13:15:13 INFO mapreduce.JobSubmitter: number of splits:3
    19/04/20 13:15:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local496876089_0001
    19/04/20 13:15:13 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    19/04/20 13:15:13 INFO mapreduce.Job: Running job: job_local496876089_0001
    19/04/20 13:15:13 INFO mapred.LocalJobRunner: OutputCommitter set in config null
    19/04/20 13:15:13 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.parquet.hadoop.ParquetOutputCommitter
    19/04/20 13:15:13 INFO mapred.LocalJobRunner: Waiting for map tasks
    19/04/20 13:15:13 INFO mapred.LocalJobRunner: Starting task: attempt_local496876089_0001_m_000000_0
    19/04/20 13:15:13 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
    19/04/20 13:15:13 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@6d0fe6eb
    19/04/20 13:15:13 INFO mapred.MapTask: Processing split: hdfs://192.168.223.150:8020/user/hadoop1/wordcount/input/file2:0+34
    19/04/20 13:15:13 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/04/20 13:15:13 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/04/20 13:15:13 INFO mapred.MapTask: soft limit at 83886080
    19/04/20 13:15:13 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/04/20 13:15:13 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/04/20 13:15:13 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: 
    19/04/20 13:15:14 INFO mapred.MapTask: Starting flush of map output
    19/04/20 13:15:14 INFO mapred.MapTask: Spilling map output
    19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufend = 62; bufvoid = 104857600
    19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
    19/04/20 13:15:14 INFO mapred.MapTask: Finished spill 0
    19/04/20 13:15:14 INFO mapred.Task: Task:attempt_local496876089_0001_m_000000_0 is done. And is in the process of committing
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: map
    19/04/20 13:15:14 INFO mapred.Task: Task 'attempt_local496876089_0001_m_000000_0' done.
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local496876089_0001_m_000000_0
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: Starting task: attempt_local496876089_0001_m_000001_0
    19/04/20 13:15:14 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
    19/04/20 13:15:14 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@49728985
    19/04/20 13:15:14 INFO mapred.MapTask: Processing split: hdfs://192.168.223.150:8020/user/hadoop1/wordcount/input/file1:0+30
    19/04/20 13:15:14 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/04/20 13:15:14 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/04/20 13:15:14 INFO mapred.MapTask: soft limit at 83886080
    19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/04/20 13:15:14 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: 
    19/04/20 13:15:14 INFO mapred.MapTask: Starting flush of map output
    19/04/20 13:15:14 INFO mapred.MapTask: Spilling map output
    19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufend = 58; bufvoid = 104857600
    19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
    19/04/20 13:15:14 INFO mapred.MapTask: Finished spill 0
    19/04/20 13:15:14 INFO mapred.Task: Task:attempt_local496876089_0001_m_000001_0 is done. And is in the process of committing
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: map
    19/04/20 13:15:14 INFO mapred.Task: Task 'attempt_local496876089_0001_m_000001_0' done.
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local496876089_0001_m_000001_0
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: Starting task: attempt_local496876089_0001_m_000002_0
    19/04/20 13:15:14 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
    19/04/20 13:15:14 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@66abe49f
    19/04/20 13:15:14 INFO mapred.MapTask: Processing split: hdfs://192.168.223.150:8020/user/hadoop1/wordcount/input/file0:0+22
    19/04/20 13:15:14 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/04/20 13:15:14 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/04/20 13:15:14 INFO mapred.MapTask: soft limit at 83886080
    19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/04/20 13:15:14 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: 
    19/04/20 13:15:14 INFO mapred.MapTask: Starting flush of map output
    19/04/20 13:15:14 INFO mapred.MapTask: Spilling map output
    19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufend = 38; bufvoid = 104857600
    19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600
    19/04/20 13:15:14 INFO mapreduce.Job: Job job_local496876089_0001 running in uber mode : false
    19/04/20 13:15:14 INFO mapreduce.Job:  map 67% reduce 0%
    19/04/20 13:15:14 INFO mapred.MapTask: Finished spill 0
    19/04/20 13:15:14 INFO mapred.Task: Task:attempt_local496876089_0001_m_000002_0 is done. And is in the process of committing
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: map
    19/04/20 13:15:14 INFO mapred.Task: Task 'attempt_local496876089_0001_m_000002_0' done.
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local496876089_0001_m_000002_0
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: map task executor complete.
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: Waiting for reduce tasks
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: Starting task: attempt_local496876089_0001_r_000000_0
    19/04/20 13:15:14 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
    19/04/20 13:15:14 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@51a03c64
    19/04/20 13:15:14 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@3f9676f4
    19/04/20 13:15:14 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
    19/04/20 13:15:14 INFO reduce.EventFetcher: attempt_local496876089_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
    19/04/20 13:15:14 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local496876089_0001_m_000002_0 decomp: 48 len: 52 to MEMORY
    19/04/20 13:15:14 INFO reduce.InMemoryMapOutput: Read 48 bytes from map-output for attempt_local496876089_0001_m_000002_0
    19/04/20 13:15:14 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 48, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->48
    19/04/20 13:15:14 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local496876089_0001_m_000000_0 decomp: 78 len: 82 to MEMORY
    19/04/20 13:15:14 INFO reduce.InMemoryMapOutput: Read 78 bytes from map-output for attempt_local496876089_0001_m_000000_0
    19/04/20 13:15:14 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 78, inMemoryMapOutputs.size() -> 2, commitMemory -> 48, usedMemory ->126
    19/04/20 13:15:14 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local496876089_0001_m_000001_0 decomp: 74 len: 78 to MEMORY
    19/04/20 13:15:14 INFO reduce.InMemoryMapOutput: Read 74 bytes from map-output for attempt_local496876089_0001_m_000001_0
    19/04/20 13:15:14 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 74, inMemoryMapOutputs.size() -> 3, commitMemory -> 126, usedMemory ->200
    19/04/20 13:15:14 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: 3 / 3 copied.
    19/04/20 13:15:14 INFO reduce.MergeManagerImpl: finalMerge called with 3 in-memory map-outputs and 0 on-disk map-outputs
    19/04/20 13:15:14 INFO mapred.Merger: Merging 3 sorted segments
    19/04/20 13:15:14 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 173 bytes
    19/04/20 13:15:14 INFO reduce.MergeManagerImpl: Merged 3 segments, 200 bytes to disk to satisfy reduce memory limit
    19/04/20 13:15:14 INFO reduce.MergeManagerImpl: Merging 1 files, 200 bytes from disk
    19/04/20 13:15:14 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
    19/04/20 13:15:14 INFO mapred.Merger: Merging 1 sorted segments
    19/04/20 13:15:14 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 187 bytes
    19/04/20 13:15:14 INFO mapred.LocalJobRunner: 3 / 3 copied.
    19/04/20 13:15:14 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    19/04/20 13:15:15 INFO mapred.Task: Task:attempt_local496876089_0001_r_000000_0 is done. And is in the process of committing
    19/04/20 13:15:15 INFO mapred.LocalJobRunner: 3 / 3 copied.
    19/04/20 13:15:15 INFO mapred.Task: Task attempt_local496876089_0001_r_000000_0 is allowed to commit now
    19/04/20 13:15:15 INFO output.FileOutputCommitter: Saved output of task 'attempt_local496876089_0001_r_000000_0' to hdfs://192.168.223.150:8020/user/hadoop1/pq_out_d05c6a75-3bbd-4f34-98ff-2f7b7a231de4/_temporary/0/task_local496876089_0001_r_000000
    19/04/20 13:15:15 INFO mapred.LocalJobRunner: reduce > reduce
    19/04/20 13:15:15 INFO mapred.Task: Task 'attempt_local496876089_0001_r_000000_0' done.
    19/04/20 13:15:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local496876089_0001_r_000000_0
    19/04/20 13:15:15 INFO mapred.LocalJobRunner: reduce task executor complete.
    19/04/20 13:15:15 INFO mapreduce.Job:  map 100% reduce 100%
    19/04/20 13:15:15 INFO mapreduce.Job: Job job_local496876089_0001 completed successfully
    19/04/20 13:15:15 INFO mapreduce.Job: Counters: 38
    	File System Counters
    		FILE: Number of bytes read=4340
    		FILE: Number of bytes written=1010726
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=270
    		HDFS: Number of bytes written=429
    		HDFS: Number of read operations=37
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=6
    	Map-Reduce Framework
    		Map input records=3
    		Map output records=18
    		Map output bytes=158
    		Map output materialized bytes=212
    		Input split bytes=381
    		Combine input records=0
    		Combine output records=0
    		Reduce input groups=12
    		Reduce shuffle bytes=212
    		Reduce input records=18
    		Reduce output records=12
    		Spilled Records=36
    		Shuffled Maps =3
    		Failed Shuffles=0
    		Merged Map outputs=3
    		GC time elapsed (ms)=34
    		CPU time spent (ms)=0
    		Physical memory (bytes) snapshot=0
    		Virtual memory (bytes) snapshot=0
    		Total committed heap usage (bytes)=1267204096
    	Shuffle Errors
    		BAD_ID=0
    		CONNECTION=0
    		IO_ERROR=0
    		WRONG_LENGTH=0
    		WRONG_MAP=0
    		WRONG_REDUCE=0
    	File Input Format Counters 
    		Bytes Read=86
    	File Output Format Counters 
    		Bytes Written=429
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.codec.CodecConfig: Compression set to false
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Dictionary is on
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Validation is off
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Maximum row group padding size is 0 bytes
    2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 200
    2019-4-20 13:15:15 信息: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 131B for [name] BINARY: 12 values, 92B raw, 92B comp, 1 pages, encodings: [BIT_PACKED, PLAIN]
    2019-4-20 13:15:15 信息: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 39B for [age] INT32: 12 values, 6B raw, 6B comp, 1 pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY], dic { 3 entries, 12B raw, 3B comp}
    2019-4-20 13:15:15 信息: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5

    [hdfs@hadoop1 ~]$ hadoop fs -ls /user/hadoop1/pq_out*
    Found 4 items
    -rw-r--r-- 3 hdfs supergroup 0 2019-04-20 12:32 /user/hadoop1/pq_out_27c7ac84-ba26-43a6-8bcf-1d4f656a3d22/_SUCCESS
    -rw-r--r-- 3 hdfs supergroup 129 2019-04-20 12:32 /user/hadoop1/pq_out_27c7ac84-ba26-43a6-8bcf-1d4f656a3d22/_common_metadata
    -rw-r--r-- 3 hdfs supergroup 278 2019-04-20 12:32 /user/hadoop1/pq_out_27c7ac84-ba26-43a6-8bcf-1d4f656a3d22/_metadata
    -rw-r--r-- 3 hdfs supergroup 429 2019-04-20 12:32 /user/hadoop1/pq_out_27c7ac84-ba26-43a6-8bcf-1d4f656a3d22/part-r-00000.parquet

    FAQ

    1、使用hadoop eclipse-plugin删除hdfs文件报错,错误信息类似:

    Unable to delete file 

    ....

    org.apache.hadoop.security.AccessControlException: Permission denied: user =test , access=WRITE, inode="pokes":hadoop:supergroup:rwxr-xr-x

    解决方法1:增加HADOOP_USER_NAME环境变量,指向有权限的用户,例如hdfs。

    解决方法2:为用户分配权限,例如hadoop fs -chmod 777 /user/xxx

    网上还有一种解决方法:打开插件“Map/Reduce Location”,选中一个Location,打开“Advance parameters” Tab,找到"hadoop.job.ugi",可以看到我这里设置是:“test,Tardis”,修改为“hadoop, Tardis”,保存。但是我没有找到这个参数。 

    重启eclipse。

    2、org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z、Failed to locate the winutils binary in the hadoop binary path Java.io.IOException: Could not locate executablenullinwinutils.exe in the Hadoop binaries相关问题

    解决方法:下载winutils的windows版本(最好是hadoop.dll-and-winutils.exe-for-hadoop2.7.3-on-windows_X64-master的版本比windows本地hadoop版本高)、同时不要忘了将hadoop.dll拷贝到C:WindowsSystem32目录。

  • 相关阅读:
    VMI
    jsp环境搭建(Windows)
    128M小内存VPS优化与typecho环境搭建
    Shell字符串
    bash和sh区别
    PHPDocument 代码注释规范总结
    PHP 程序员的技术成长规划
    JavaScript:JSON
    mongoDB 使用手册
    PHP面向对象的标准
  • 原文地址:https://www.cnblogs.com/zhjh256/p/10740667.html
Copyright © 2020-2023  润新知