• Hadoop基础总结


    一、Hadoop是什么?

      Hadoop是开源的分布式存储和分布式计算平台


    二、Hadoop包含两个核心组成:

      1HDFS: 分布式文件系统,存储海量数据

        a、基本概念

          -(block

            HDFS的文件被分成块进行存储,每个块的默认大小64MB

             块是文件存储处理的逻辑单元

          -NameNode

             管理节点,存放文件元数据,包括:

              (1)文件与数据块的映射表

              (2)数据块与数据节点的映射表


          -DataNode

             是HDFS的工作节点,存放数据块

        b、数据管理策略

          11、数据块副本

             每个数据块三个副本,分布在两个机架内的三个节点,以防数据故障丢失

          22、心跳检测:

            DataNode定期向NameNode发送心跳信息

          33、二级NameNodeSecondary NameNode

             二级NameNode定期同步元数据映像文件和修改日志,NameNode发生故障时,备胎转正

                       

          44HDFS文件读取的流程

          55HDFS写入文件的流程

          66HDFS的特点

             数据冗余,硬件容错

             流式的数据访问,一次写入多次读取,一旦写入无法修改,要修改只有删除重写

             存储大文件,小文件NameNode压力会很大

          77、适用性和局限性

             适合数据批量读写,吞吐量高

             不适合交互式应用,低延迟很难满足

             适合一次写入多次读取,顺序读写

             不支持多用户并发写相同文件


      2Mapreduce:并行处理框架,实现任务分解和调度

        aMapreduce的原理

          分而治之,一个大任务分成多个小的子任务(map),由多个节点并行执行后,合并结果(reduce

        bMapreduce的运行流程

          11、基本概念

            - Job & Task

             job → Task(maptask, reducetask)

            - JobTracker

              作业任务

              分配任务、监控任务执行进度

              监控TaskTracker的状态

            - TaskTracker

              执行任务

              汇报任务状态

          22、作业执行过程

          33Mapreduce的容错机制

             重复执行

             推测执行

    三、可用来做什么

      搭建大型数据仓库,PB级数据的存储、处理、分析、统计等业务

      如:搜索引擎、商业智能、日志分析、数据挖掘


    四、Hadoop优势

      1、高扩展

        可通过增加一些硬件,使得性能和容量提升

      2、低成本

        普通PC即可实现,堆叠系统,通过软件方面的容错来保证系统的可靠性

      3、成熟的生态圈

        如:Hive, Hbase


    五、HDFS操作

      1shell命令操作

        常用HDFS Shell命令:

          类Linux系统:ls, cat, mkdir, rm, chmod, chown

         HDFS文件交互:copyFromLocalcopyToLocalget(下载)、put(上传)

    六、Hadoop生态圈

    七、Mapreduce操作实战

      本例中为了实现读取某个文档,并统计文档中各单词的数量

      先建立hdfs_map.py用于读取文档数据

    # hdfs_map.py
    import sys
    
    def read_input(file):
        for line in file:
            yield line.split()
    
    
    def main():
        data = read_input(sys.stdin)
    
        for words in data:
            for word in words:
                print('{}	1'.format(word))
    
    
    if __name__ == '__main__':
        main()

      建立hdfs_reduce.py用于统计各单词数量

    # hdfs_reduce.py
    
    import sys
    from operator import itemgetter
    from itertools import groupby
    
    
    def read_mapper_output(file, separator='	'):
        for line in file:
            yield line.rstrip().split(separator, 1)
    
    
    def main():
        data = read_mapper_output(sys.stdin)
    
        for current_word, group in groupby(data, itemgetter(0)):
            total_count = sum(int(count) for current_word, count in group)
    
            print('{} {}'.format(current_word, total_count))
    
    
    if __name__ == '__main__':
        main()

      事先建立文档mk.txt,并编辑部分内容,然后粗如HDFS中

      

      在命令行中运行Mapreduce操作

    hadoop jar /opt/hadoop-2.9.1/share/hadoop/tools/lib/hadoop-streaming-2.9.1.jar -files '/home/zzf/Git/Data_analysis/Hadoop/hdfs_map.py,/home/zzf/Git/Data_analysis/Hadoop/hdfs_reduce.py' -input /test/mk.txt -output /output/wordcount -mapper 'python3 hdfs_map.py' -reducer 'python3 hdfs_reduce.py'

      运行如下

      1 ➜  Documents hadoop jar /opt/hadoop-2.9.1/share/hadoop/tools/lib/hadoop-streaming-2.9.1.jar -files '/home/zzf/Git/Data_analysis/Hadoop/hdfs_map.py,/home/zzf/Git/Data_analysis/Hadoop/hdfs_reduce.py' -input /test/mk.txt -output /output/wordcount -mapper 'python3 hdfs_map.py' -reducer 'python3 hdfs_reduce.py' 
      2 # 结果
      3 18/06/26 16:22:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
      4 18/06/26 16:22:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
      5 18/06/26 16:22:45 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
      6 18/06/26 16:22:46 INFO mapred.FileInputFormat: Total input files to process : 1
      7 18/06/26 16:22:46 INFO mapreduce.JobSubmitter: number of splits:1
      8 18/06/26 16:22:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local49685846_0001
      9 18/06/26 16:22:46 INFO mapred.LocalDistributedCacheManager: Creating symlink: /home/zzf/hadoop_tmp/mapred/local/1530001366609/hdfs_map.py <- /home/zzf/Documents/hdfs_map.py
     10 18/06/26 16:22:46 INFO mapred.LocalDistributedCacheManager: Localized file:/home/zzf/Git/Data_analysis/Hadoop/hdfs_map.py as file:/home/zzf/hadoop_tmp/mapred/local/1530001366609/hdfs_map.py
     11 18/06/26 16:22:47 INFO mapred.LocalDistributedCacheManager: Creating symlink: /home/zzf/hadoop_tmp/mapred/local/1530001366610/hdfs_reduce.py <- /home/zzf/Documents/hdfs_reduce.py
     12 18/06/26 16:22:47 INFO mapred.LocalDistributedCacheManager: Localized file:/home/zzf/Git/Data_analysis/Hadoop/hdfs_reduce.py as file:/home/zzf/hadoop_tmp/mapred/local/1530001366610/hdfs_reduce.py
     13 18/06/26 16:22:47 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
     14 18/06/26 16:22:47 INFO mapred.LocalJobRunner: OutputCommitter set in config null
     15 18/06/26 16:22:47 INFO mapreduce.Job: Running job: job_local49685846_0001
     16 18/06/26 16:22:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
     17 18/06/26 16:22:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
     18 18/06/26 16:22:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
     19 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Waiting for map tasks
     20 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Starting task: attempt_local49685846_0001_m_000000_0
     21 18/06/26 16:22:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
     22 18/06/26 16:22:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
     23 18/06/26 16:22:47 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
     24 18/06/26 16:22:47 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/test/mk.txt:0+2267
     25 18/06/26 16:22:47 INFO mapred.MapTask: numReduceTasks: 1
     26 18/06/26 16:22:47 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
     27 18/06/26 16:22:47 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
     28 18/06/26 16:22:47 INFO mapred.MapTask: soft limit at 83886080
     29 18/06/26 16:22:47 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
     30 18/06/26 16:22:47 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
     31 18/06/26 16:22:47 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
     32 18/06/26 16:22:47 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python3, hdfs_map.py]
     33 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
     34 18/06/26 16:22:47 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
     35 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
     36 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
     37 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
     38 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
     39 18/06/26 16:22:47 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
     40 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
     41 18/06/26 16:22:47 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
     42 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
     43 18/06/26 16:22:47 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
     44 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
     45 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
     46 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
     47 18/06/26 16:22:47 INFO streaming.PipeMapRed: Records R/W=34/1
     48 18/06/26 16:22:47 INFO streaming.PipeMapRed: MRErrorThread done
     49 18/06/26 16:22:47 INFO streaming.PipeMapRed: mapRedFinished
     50 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 
     51 18/06/26 16:22:47 INFO mapred.MapTask: Starting flush of map output
     52 18/06/26 16:22:47 INFO mapred.MapTask: Spilling map output
     53 18/06/26 16:22:47 INFO mapred.MapTask: bufstart = 0; bufend = 3013; bufvoid = 104857600
     54 18/06/26 16:22:47 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26212876(104851504); length = 1521/6553600
     55 18/06/26 16:22:47 INFO mapred.MapTask: Finished spill 0
     56 18/06/26 16:22:47 INFO mapred.Task: Task:attempt_local49685846_0001_m_000000_0 is done. And is in the process of committing
     57 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Records R/W=34/1
     58 18/06/26 16:22:47 INFO mapred.Task: Task 'attempt_local49685846_0001_m_000000_0' done.
     59 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Finishing task: attempt_local49685846_0001_m_000000_0
     60 18/06/26 16:22:47 INFO mapred.LocalJobRunner: map task executor complete.
     61 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Waiting for reduce tasks
     62 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Starting task: attempt_local49685846_0001_r_000000_0
     63 18/06/26 16:22:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
     64 18/06/26 16:22:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
     65 18/06/26 16:22:47 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
     66 18/06/26 16:22:47 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@257adccd
     67 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
     68 18/06/26 16:22:47 INFO reduce.EventFetcher: attempt_local49685846_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
     69 18/06/26 16:22:47 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local49685846_0001_m_000000_0 decomp: 3777 len: 3781 to MEMORY
     70 18/06/26 16:22:47 INFO reduce.InMemoryMapOutput: Read 3777 bytes from map-output for attempt_local49685846_0001_m_000000_0
     71 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 3777, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->3777
     72 18/06/26 16:22:47 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
     73 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
     74 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
     75 18/06/26 16:22:47 INFO mapred.Merger: Merging 1 sorted segments
     76 18/06/26 16:22:47 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 3769 bytes
     77 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: Merged 1 segments, 3777 bytes to disk to satisfy reduce memory limit
     78 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: Merging 1 files, 3781 bytes from disk
     79 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
     80 18/06/26 16:22:47 INFO mapred.Merger: Merging 1 sorted segments
     81 18/06/26 16:22:47 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 3769 bytes
     82 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
     83 18/06/26 16:22:47 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python3, hdfs_reduce.py]
     84 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
     85 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
     86 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
     87 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
     88 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
     89 18/06/26 16:22:47 INFO streaming.PipeMapRed: Records R/W=381/1
     90 18/06/26 16:22:47 INFO streaming.PipeMapRed: MRErrorThread done
     91 18/06/26 16:22:47 INFO streaming.PipeMapRed: mapRedFinished
     92 18/06/26 16:22:47 INFO mapred.Task: Task:attempt_local49685846_0001_r_000000_0 is done. And is in the process of committing
     93 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
     94 18/06/26 16:22:47 INFO mapred.Task: Task attempt_local49685846_0001_r_000000_0 is allowed to commit now
     95 18/06/26 16:22:47 INFO output.FileOutputCommitter: Saved output of task 'attempt_local49685846_0001_r_000000_0' to hdfs://localhost:9000/output/wordcount/_temporary/0/task_local49685846_0001_r_000000
     96 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Records R/W=381/1 > reduce
     97 18/06/26 16:22:47 INFO mapred.Task: Task 'attempt_local49685846_0001_r_000000_0' done.
     98 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Finishing task: attempt_local49685846_0001_r_000000_0
     99 18/06/26 16:22:47 INFO mapred.LocalJobRunner: reduce task executor complete.
    100 18/06/26 16:22:48 INFO mapreduce.Job: Job job_local49685846_0001 running in uber mode : false
    101 18/06/26 16:22:48 INFO mapreduce.Job:  map 100% reduce 100%
    102 18/06/26 16:22:48 INFO mapreduce.Job: Job job_local49685846_0001 completed successfully
    103 18/06/26 16:22:48 INFO mapreduce.Job: Counters: 35
    104     File System Counters
    105         FILE: Number of bytes read=279474
    106         FILE: Number of bytes written=1220325
    107         FILE: Number of read operations=0
    108         FILE: Number of large read operations=0
    109         FILE: Number of write operations=0
    110         HDFS: Number of bytes read=4534
    111         HDFS: Number of bytes written=2287
    112         HDFS: Number of read operations=13
    113         HDFS: Number of large read operations=0
    114         HDFS: Number of write operations=4
    115     Map-Reduce Framework
    116         Map input records=34
    117         Map output records=381
    118         Map output bytes=3013
    119         Map output materialized bytes=3781
    120         Input split bytes=85
    121         Combine input records=0
    122         Combine output records=0
    123         Reduce input groups=236
    124         Reduce shuffle bytes=3781
    125         Reduce input records=381
    126         Reduce output records=236
    127         Spilled Records=762
    128         Shuffled Maps =1
    129         Failed Shuffles=0
    130         Merged Map outputs=1
    131         GC time elapsed (ms)=0
    132         Total committed heap usage (bytes)=536870912
    133     Shuffle Errors
    134         BAD_ID=0
    135         CONNECTION=0
    136         IO_ERROR=0
    137         WRONG_LENGTH=0
    138         WRONG_MAP=0
    139         WRONG_REDUCE=0
    140     File Input Format Counters 
    141         Bytes Read=2267
    142     File Output Format Counters 
    143         Bytes Written=2287
    144 18/06/26 16:22:48 INFO streaming.StreamJob: Output directory: /output/wordcount
    View Code

      查看结果

      1 ➜  Documents hdfs dfs -cat /output/wordcount/part-00000
      2 # 结果
      3 "Even 1    
      4 "My 1    
      5 "We 1    
      6 (16ft) 1    
      7 11 1    
      8 16, 1    
      9 17-member 1    
     10 25-year-old 1    
     11 5m 1    
     12 AFP. 1    
     13 BBC's 1    
     14 Bangkok 1    
     15 But 1    
     16 Chiang 1    
     17 Constant 1    
     18 Deputy 1    
     19 Desperate 1    
     20 Head, 1    
     21 How 1    
     22 I'm 1    
     23 Jonathan 1    
     24 June 1    
     25 Luang 2    
     26 Minister 1    
     27 Myanmar, 1    
     28 Nang 2    
     29 Navy 2    
     30 Non 2    
     31 October. 1    
     32 PM 1    
     33 Post, 1    
     34 Prawit 1    
     35 Prime 1    
     36 Rai 1    
     37 Rescue 2    
     38 Royal 1    
     39 Saturday 2    
     40 Saturday. 1    
     41 Thai 1    
     42 Thailand's 2    
     43 Tham 2    
     44 The 6    
     45 They 2    
     46 Tuesday 1    
     47 Tuesday. 2    
     48 Wongsuwon 1    
     49 a 8    
     50 able 1    
     51 according 2    
     52 after 2    
     53 afternoon. 1    
     54 aged 1    
     55 alive, 1    
     56 alive," 1    
     57 all 1    
     58 along 1    
     59 and 6    
     60 anything 1    
     61 are 5    
     62 areas 1    
     63 as 1    
     64 at 2    
     65 attraction 1    
     66 authorities 1    
     67 be 1    
     68 been 2    
     69 began 1    
     70 believed 1    
     71 between 1    
     72 bicycles 1    
     73 border 1    
     74 boys 1    
     75 boys, 1    
     76 briefly 1    
     77 bring 1    
     78 but 1    
     79 by 1    
     80 camping 1    
     81 can 1    
     82 case 1    
     83 cave 9    
     84 cave, 3    
     85 cave. 1    
     86 cave.According 1    
     87 ceremony 1    
     88 chamber 1    
     89 child, 1    
     90 coach 3    
     91 completely 1    
     92 complex, 1    
     93 correspondent. 1    
     94 cross 1    
     95 crying 1    
     96 day. 1    
     97 deputy 1    
     98 dive 1    
     99 divers 2    
    100 down. 1    
    101 drink."The 1    
    102 drones, 1    
    103 during 1    
    104 early 1    
    105 eat, 1    
    106 efforts 1    
    107 efforts, 2    
    108 enter 1    
    109 entered 2    
    110 enters 1    
    111 equipment 1    
    112 extensive 1    
    113 flood 1    
    114 floods. 1    
    115 footballers 1    
    116 footprints 1    
    117 for 4    
    118 found 1    
    119 fresh 1    
    120 from 2    
    121 gear, 1    
    122 get 1    
    123 group 1    
    124 group's 1    
    125 had 2    
    126 halted 2    
    127 hampered 1    
    128 hampering 1    
    129 has 1    
    130 have 6    
    131 he 1    
    132 here 1    
    133 holding 1    
    134 hopes 1    
    135 if 1    
    136 in 3    
    137 inaccessible 1    
    138 include 1    
    139 inside 3    
    140 into 1    
    141 is 4    
    142 it 1    
    143 kilometres 1    
    144 levels 1    
    145 lies 1    
    146 local 1    
    147 making 1    
    148 many 1    
    149 may 1    
    150 missing. 1    
    151 must 1    
    152 navy 1    
    153 near 1    
    154 network. 1    
    155 night 1    
    156 not 1    
    157 now," 1    
    158 of 4    
    159 officials. 1    
    160 on 5    
    161 one 1    
    162 optimistic 2    
    163 our 1    
    164 out 2    
    165 outside 2    
    166 parent 1    
    167 pools 1    
    168 poor 1    
    169 prayer 1    
    170 preparing 1    
    171 province 1    
    172 pumping 1    
    173 rainfall 1    
    174 rainy 1    
    175 raising 1    
    176 re-enter 1    
    177 relatives 1    
    178 reported 1    
    179 reportedly 1    
    180 rescue 1    
    181 resumed 1    
    182 return. 1    
    183 rising 2    
    184 runs 2    
    185 safe 1    
    186 safety. 1    
    187 said 3    
    188 said, 1    
    189 says 1    
    190 scene, 1    
    191 scuba 1    
    192 search 3    
    193 search. 1    
    194 searching 1    
    195 season, 1    
    196 seen 1    
    197 sent 1    
    198 should 1    
    199 small 1    
    200 sports 1    
    201 started 1    
    202 still 2    
    203 stream 2    
    204 submerged, 1    
    205 team 3    
    206 teams 1    
    207 the 23    
    208 their 5    
    209 them 1    
    210 these 1    
    211 they 5    
    212 third 1    
    213 though 1    
    214 thought 1    
    215 through 1    
    216 to 17    
    217 tourist 1    
    218 train 1    
    219 trapped 1    
    220 trapped? 1    
    221 try 1    
    222 underground. 1    
    223 underwater 1    
    224 unit 1    
    225 up 1    
    226 use 1    
    227 visibility 1    
    228 visitors 1    
    229 was 2    
    230 water 2    
    231 waters 2    
    232 were 4    
    233 which 5    
    234 who 1    
    235 with 2    
    236 workers 1    
    237 you 1    
    238 young 1    
    View Code

    八、思考一: 如何通过Hadoop存储小文件?

              1、应用程序自己控制
           
    2
    archive
           
    3
    Sequence File / Map File
           
    4
    CombineFileInputFormat***
           
    5
    、合并小文件,如HBase部分的compact

      思考二:当有节点故障时,Hadoop集群是如何继续提供服务的,如何读和写?

      思考三:哪些时影响Mapreduce性能的因素?

  • 相关阅读:
    古典问题-兔子生兔子
    order by 执行计划索引使用不同的坑
    MybatisPlus 通用枚举无法正确取值
    Arrays.asList 使用细节
    java 生成pdf文件(易上手版)
    Mysql-tinyint使用之实际采坑记
    mysql
    mysql -- froce index 使用
    java基础全套
    javaweb之servlet 全解
  • 原文地址:https://www.cnblogs.com/zongfa/p/9230090.html
Copyright © 2020-2023  润新知