Hadoop基础总结

一、Hadoop是什么？

　　Hadoop是开源的分布式存储和分布式计算平台

二、Hadoop包含两个核心组成：

　　1、HDFS: 分布式文件系统，存储海量数据

　　　　a、基本概念

　　　　　　-块(block）

　　　　　　　　HDFS的文件被分成块进行存储，每个块的默认大小64MB

　　　　　　　　　块是文件存储处理的逻辑单元

　　　　　　-NameNode

　　　　　　　　　管理节点，存放文件元数据，包括：

　　　　　　　　　　（1）文件与数据块的映射表

　　　　　　　　　　（2）数据块与数据节点的映射表

　　　　　　-DataNode

　　　　　　　　　是HDFS的工作节点，存放数据块

　　　　b、数据管理策略

　　　　　　11、数据块副本

　　　　　　　　　每个数据块三个副本，分布在两个机架内的三个节点，以防数据故障丢失

　　　　　　22、心跳检测：

　　　　　　　　DataNode定期向NameNode发送心跳信息

　　　　　　33、二级NameNode（Secondary NameNode）

　　　　　　　　　二级NameNode定期同步元数据映像文件和修改日志，NameNode发生故障时，备胎转正

　　　　　　44、HDFS文件读取的流程

　　　　　　55、HDFS写入文件的流程

　　　　　　66、HDFS的特点

　　　　　　　　　数据冗余，硬件容错

　　　　　　　　　流式的数据访问，一次写入多次读取，一旦写入无法修改，要修改只有删除重写

　　　　　　　　　存储大文件，小文件NameNode压力会很大

　　　　　　77、适用性和局限性

　　　　　　　　　适合数据批量读写，吞吐量高

　　　　　　　　　不适合交互式应用，低延迟很难满足

　　　　　　　　　适合一次写入多次读取，顺序读写

　　　　　　　　　不支持多用户并发写相同文件

　　2、Mapreduce：并行处理框架，实现任务分解和调度

　　　　a、Mapreduce的原理

　　　　　　分而治之，一个大任务分成多个小的子任务（map)，由多个节点并行执行后，合并结果（reduce）

　　　　b、Mapreduce的运行流程

　　　　　　11、基本概念

　　　　　　　　- Job & Task

　　　　　　　　　job → Task(maptask, reducetask)

　　　　　　　　- JobTracker

　　　　　　　　　　作业任务

　　　　　　　　　　分配任务、监控任务执行进度

　　　　　　　　　　监控TaskTracker的状态

　　　　　　　　- TaskTracker

　　　　　　　　　　执行任务

　　　　　　　　　　汇报任务状态

　　　　　　22、作业执行过程

　　　　　　33、Mapreduce的容错机制

　　　　　　　　　重复执行

　　　　　　　　　推测执行

三、可用来做什么

　　搭建大型数据仓库，PB级数据的存储、处理、分析、统计等业务

　　如：搜索引擎、商业智能、日志分析、数据挖掘

四、Hadoop优势

　　1、高扩展

　　　　可通过增加一些硬件，使得性能和容量提升

　　2、低成本

　　　　普通PC即可实现，堆叠系统，通过软件方面的容错来保证系统的可靠性

　　3、成熟的生态圈

　　　　如：Hive, Hbase

五、HDFS操作

　　1、shell命令操作

　　　　常用HDFS Shell命令：

　　　　　　类Linux系统：ls, cat, mkdir, rm, chmod, chown等

　　　　　HDFS文件交互：copyFromLocal、copyToLocal、get(下载）、put（上传）

六、Hadoop生态圈

七、Mapreduce操作实战

　　本例中为了实现读取某个文档，并统计文档中各单词的数量

　　先建立hdfs_map.py用于读取文档数据

# hdfs_map.py
import sys

def read_input(file):
    for line in file:
        yield line.split()


def main():
    data = read_input(sys.stdin)

    for words in data:
        for word in words:
            print('{}	1'.format(word))


if __name__ == '__main__':
    main()

　　建立hdfs_reduce.py用于统计各单词数量

# hdfs_reduce.py

import sys
from operator import itemgetter
from itertools import groupby


def read_mapper_output(file, separator='	'):
    for line in file:
        yield line.rstrip().split(separator, 1)


def main():
    data = read_mapper_output(sys.stdin)

    for current_word, group in groupby(data, itemgetter(0)):
        total_count = sum(int(count) for current_word, count in group)

        print('{} {}'.format(current_word, total_count))


if __name__ == '__main__':
    main()

　　事先建立文档mk.txt，并编辑部分内容，然后粗如HDFS中

　　在命令行中运行Mapreduce操作

hadoop jar /opt/hadoop-2.9.1/share/hadoop/tools/lib/hadoop-streaming-2.9.1.jar -files '/home/zzf/Git/Data_analysis/Hadoop/hdfs_map.py,/home/zzf/Git/Data_analysis/Hadoop/hdfs_reduce.py' -input /test/mk.txt -output /output/wordcount -mapper 'python3 hdfs_map.py' -reducer 'python3 hdfs_reduce.py'

　　运行如下

  1 ➜  Documents hadoop jar /opt/hadoop-2.9.1/share/hadoop/tools/lib/hadoop-streaming-2.9.1.jar -files '/home/zzf/Git/Data_analysis/Hadoop/hdfs_map.py,/home/zzf/Git/Data_analysis/Hadoop/hdfs_reduce.py' -input /test/mk.txt -output /output/wordcount -mapper 'python3 hdfs_map.py' -reducer 'python3 hdfs_reduce.py' 
  2 # 结果
  3 18/06/26 16:22:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
  4 18/06/26 16:22:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
  5 18/06/26 16:22:45 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
  6 18/06/26 16:22:46 INFO mapred.FileInputFormat: Total input files to process : 1
  7 18/06/26 16:22:46 INFO mapreduce.JobSubmitter: number of splits:1
  8 18/06/26 16:22:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local49685846_0001
  9 18/06/26 16:22:46 INFO mapred.LocalDistributedCacheManager: Creating symlink: /home/zzf/hadoop_tmp/mapred/local/1530001366609/hdfs_map.py <- /home/zzf/Documents/hdfs_map.py
 10 18/06/26 16:22:46 INFO mapred.LocalDistributedCacheManager: Localized file:/home/zzf/Git/Data_analysis/Hadoop/hdfs_map.py as file:/home/zzf/hadoop_tmp/mapred/local/1530001366609/hdfs_map.py
 11 18/06/26 16:22:47 INFO mapred.LocalDistributedCacheManager: Creating symlink: /home/zzf/hadoop_tmp/mapred/local/1530001366610/hdfs_reduce.py <- /home/zzf/Documents/hdfs_reduce.py
 12 18/06/26 16:22:47 INFO mapred.LocalDistributedCacheManager: Localized file:/home/zzf/Git/Data_analysis/Hadoop/hdfs_reduce.py as file:/home/zzf/hadoop_tmp/mapred/local/1530001366610/hdfs_reduce.py
 13 18/06/26 16:22:47 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
 14 18/06/26 16:22:47 INFO mapred.LocalJobRunner: OutputCommitter set in config null
 15 18/06/26 16:22:47 INFO mapreduce.Job: Running job: job_local49685846_0001
 16 18/06/26 16:22:47 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
 17 18/06/26 16:22:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
 18 18/06/26 16:22:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
 19 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Waiting for map tasks
 20 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Starting task: attempt_local49685846_0001_m_000000_0
 21 18/06/26 16:22:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
 22 18/06/26 16:22:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
 23 18/06/26 16:22:47 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
 24 18/06/26 16:22:47 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/test/mk.txt:0+2267
 25 18/06/26 16:22:47 INFO mapred.MapTask: numReduceTasks: 1
 26 18/06/26 16:22:47 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
 27 18/06/26 16:22:47 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
 28 18/06/26 16:22:47 INFO mapred.MapTask: soft limit at 83886080
 29 18/06/26 16:22:47 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
 30 18/06/26 16:22:47 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
 31 18/06/26 16:22:47 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 32 18/06/26 16:22:47 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python3, hdfs_map.py]
 33 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
 34 18/06/26 16:22:47 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
 35 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
 36 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
 37 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
 38 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
 39 18/06/26 16:22:47 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
 40 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
 41 18/06/26 16:22:47 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
 42 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
 43 18/06/26 16:22:47 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
 44 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
 45 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
 46 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
 47 18/06/26 16:22:47 INFO streaming.PipeMapRed: Records R/W=34/1
 48 18/06/26 16:22:47 INFO streaming.PipeMapRed: MRErrorThread done
 49 18/06/26 16:22:47 INFO streaming.PipeMapRed: mapRedFinished
 50 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 
 51 18/06/26 16:22:47 INFO mapred.MapTask: Starting flush of map output
 52 18/06/26 16:22:47 INFO mapred.MapTask: Spilling map output
 53 18/06/26 16:22:47 INFO mapred.MapTask: bufstart = 0; bufend = 3013; bufvoid = 104857600
 54 18/06/26 16:22:47 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26212876(104851504); length = 1521/6553600
 55 18/06/26 16:22:47 INFO mapred.MapTask: Finished spill 0
 56 18/06/26 16:22:47 INFO mapred.Task: Task:attempt_local49685846_0001_m_000000_0 is done. And is in the process of committing
 57 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Records R/W=34/1
 58 18/06/26 16:22:47 INFO mapred.Task: Task 'attempt_local49685846_0001_m_000000_0' done.
 59 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Finishing task: attempt_local49685846_0001_m_000000_0
 60 18/06/26 16:22:47 INFO mapred.LocalJobRunner: map task executor complete.
 61 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Waiting for reduce tasks
 62 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Starting task: attempt_local49685846_0001_r_000000_0
 63 18/06/26 16:22:47 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
 64 18/06/26 16:22:47 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
 65 18/06/26 16:22:47 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
 66 18/06/26 16:22:47 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@257adccd
 67 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 68 18/06/26 16:22:47 INFO reduce.EventFetcher: attempt_local49685846_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
 69 18/06/26 16:22:47 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local49685846_0001_m_000000_0 decomp: 3777 len: 3781 to MEMORY
 70 18/06/26 16:22:47 INFO reduce.InMemoryMapOutput: Read 3777 bytes from map-output for attempt_local49685846_0001_m_000000_0
 71 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 3777, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->3777
 72 18/06/26 16:22:47 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
 73 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
 74 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
 75 18/06/26 16:22:47 INFO mapred.Merger: Merging 1 sorted segments
 76 18/06/26 16:22:47 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 3769 bytes
 77 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: Merged 1 segments, 3777 bytes to disk to satisfy reduce memory limit
 78 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: Merging 1 files, 3781 bytes from disk
 79 18/06/26 16:22:47 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
 80 18/06/26 16:22:47 INFO mapred.Merger: Merging 1 sorted segments
 81 18/06/26 16:22:47 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 3769 bytes
 82 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
 83 18/06/26 16:22:47 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python3, hdfs_reduce.py]
 84 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
 85 18/06/26 16:22:47 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
 86 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
 87 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
 88 18/06/26 16:22:47 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
 89 18/06/26 16:22:47 INFO streaming.PipeMapRed: Records R/W=381/1
 90 18/06/26 16:22:47 INFO streaming.PipeMapRed: MRErrorThread done
 91 18/06/26 16:22:47 INFO streaming.PipeMapRed: mapRedFinished
 92 18/06/26 16:22:47 INFO mapred.Task: Task:attempt_local49685846_0001_r_000000_0 is done. And is in the process of committing
 93 18/06/26 16:22:47 INFO mapred.LocalJobRunner: 1 / 1 copied.
 94 18/06/26 16:22:47 INFO mapred.Task: Task attempt_local49685846_0001_r_000000_0 is allowed to commit now
 95 18/06/26 16:22:47 INFO output.FileOutputCommitter: Saved output of task 'attempt_local49685846_0001_r_000000_0' to hdfs://localhost:9000/output/wordcount/_temporary/0/task_local49685846_0001_r_000000
 96 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Records R/W=381/1 > reduce
 97 18/06/26 16:22:47 INFO mapred.Task: Task 'attempt_local49685846_0001_r_000000_0' done.
 98 18/06/26 16:22:47 INFO mapred.LocalJobRunner: Finishing task: attempt_local49685846_0001_r_000000_0
 99 18/06/26 16:22:47 INFO mapred.LocalJobRunner: reduce task executor complete.
100 18/06/26 16:22:48 INFO mapreduce.Job: Job job_local49685846_0001 running in uber mode : false
101 18/06/26 16:22:48 INFO mapreduce.Job:  map 100% reduce 100%
102 18/06/26 16:22:48 INFO mapreduce.Job: Job job_local49685846_0001 completed successfully
103 18/06/26 16:22:48 INFO mapreduce.Job: Counters: 35
104     File System Counters
105         FILE: Number of bytes read=279474
106         FILE: Number of bytes written=1220325
107         FILE: Number of read operations=0
108         FILE: Number of large read operations=0
109         FILE: Number of write operations=0
110         HDFS: Number of bytes read=4534
111         HDFS: Number of bytes written=2287
112         HDFS: Number of read operations=13
113         HDFS: Number of large read operations=0
114         HDFS: Number of write operations=4
115     Map-Reduce Framework
116         Map input records=34
117         Map output records=381
118         Map output bytes=3013
119         Map output materialized bytes=3781
120         Input split bytes=85
121         Combine input records=0
122         Combine output records=0
123         Reduce input groups=236
124         Reduce shuffle bytes=3781
125         Reduce input records=381
126         Reduce output records=236
127         Spilled Records=762
128         Shuffled Maps =1
129         Failed Shuffles=0
130         Merged Map outputs=1
131         GC time elapsed (ms)=0
132         Total committed heap usage (bytes)=536870912
133     Shuffle Errors
134         BAD_ID=0
135         CONNECTION=0
136         IO_ERROR=0
137         WRONG_LENGTH=0
138         WRONG_MAP=0
139         WRONG_REDUCE=0
140     File Input Format Counters 
141         Bytes Read=2267
142     File Output Format Counters 
143         Bytes Written=2287
144 18/06/26 16:22:48 INFO streaming.StreamJob: Output directory: /output/wordcount

View Code

　　查看结果

  1 ➜  Documents hdfs dfs -cat /output/wordcount/part-00000
  2 # 结果
  3 "Even 1    
  4 "My 1    
  5 "We 1    
  6 (16ft) 1    
  7 11 1    
  8 16, 1    
  9 17-member 1    
 10 25-year-old 1    
 11 5m 1    
 12 AFP. 1    
 13 BBC's 1    
 14 Bangkok 1    
 15 But 1    
 16 Chiang 1    
 17 Constant 1    
 18 Deputy 1    
 19 Desperate 1    
 20 Head, 1    
 21 How 1    
 22 I'm 1    
 23 Jonathan 1    
 24 June 1    
 25 Luang 2    
 26 Minister 1    
 27 Myanmar, 1    
 28 Nang 2    
 29 Navy 2    
 30 Non 2    
 31 October. 1    
 32 PM 1    
 33 Post, 1    
 34 Prawit 1    
 35 Prime 1    
 36 Rai 1    
 37 Rescue 2    
 38 Royal 1    
 39 Saturday 2    
 40 Saturday. 1    
 41 Thai 1    
 42 Thailand's 2    
 43 Tham 2    
 44 The 6    
 45 They 2    
 46 Tuesday 1    
 47 Tuesday. 2    
 48 Wongsuwon 1    
 49 a 8    
 50 able 1    
 51 according 2    
 52 after 2    
 53 afternoon. 1    
 54 aged 1    
 55 alive, 1    
 56 alive," 1    
 57 all 1    
 58 along 1    
 59 and 6    
 60 anything 1    
 61 are 5    
 62 areas 1    
 63 as 1    
 64 at 2    
 65 attraction 1    
 66 authorities 1    
 67 be 1    
 68 been 2    
 69 began 1    
 70 believed 1    
 71 between 1    
 72 bicycles 1    
 73 border 1    
 74 boys 1    
 75 boys, 1    
 76 briefly 1    
 77 bring 1    
 78 but 1    
 79 by 1    
 80 camping 1    
 81 can 1    
 82 case 1    
 83 cave 9    
 84 cave, 3    
 85 cave. 1    
 86 cave.According 1    
 87 ceremony 1    
 88 chamber 1    
 89 child, 1    
 90 coach 3    
 91 completely 1    
 92 complex, 1    
 93 correspondent. 1    
 94 cross 1    
 95 crying 1    
 96 day. 1    
 97 deputy 1    
 98 dive 1    
 99 divers 2    
100 down. 1    
101 drink."The 1    
102 drones, 1    
103 during 1    
104 early 1    
105 eat, 1    
106 efforts 1    
107 efforts, 2    
108 enter 1    
109 entered 2    
110 enters 1    
111 equipment 1    
112 extensive 1    
113 flood 1    
114 floods. 1    
115 footballers 1    
116 footprints 1    
117 for 4    
118 found 1    
119 fresh 1    
120 from 2    
121 gear, 1    
122 get 1    
123 group 1    
124 group's 1    
125 had 2    
126 halted 2    
127 hampered 1    
128 hampering 1    
129 has 1    
130 have 6    
131 he 1    
132 here 1    
133 holding 1    
134 hopes 1    
135 if 1    
136 in 3    
137 inaccessible 1    
138 include 1    
139 inside 3    
140 into 1    
141 is 4    
142 it 1    
143 kilometres 1    
144 levels 1    
145 lies 1    
146 local 1    
147 making 1    
148 many 1    
149 may 1    
150 missing. 1    
151 must 1    
152 navy 1    
153 near 1    
154 network. 1    
155 night 1    
156 not 1    
157 now," 1    
158 of 4    
159 officials. 1    
160 on 5    
161 one 1    
162 optimistic 2    
163 our 1    
164 out 2    
165 outside 2    
166 parent 1    
167 pools 1    
168 poor 1    
169 prayer 1    
170 preparing 1    
171 province 1    
172 pumping 1    
173 rainfall 1    
174 rainy 1    
175 raising 1    
176 re-enter 1    
177 relatives 1    
178 reported 1    
179 reportedly 1    
180 rescue 1    
181 resumed 1    
182 return. 1    
183 rising 2    
184 runs 2    
185 safe 1    
186 safety. 1    
187 said 3    
188 said, 1    
189 says 1    
190 scene, 1    
191 scuba 1    
192 search 3    
193 search. 1    
194 searching 1    
195 season, 1    
196 seen 1    
197 sent 1    
198 should 1    
199 small 1    
200 sports 1    
201 started 1    
202 still 2    
203 stream 2    
204 submerged, 1    
205 team 3    
206 teams 1    
207 the 23    
208 their 5    
209 them 1    
210 these 1    
211 they 5    
212 third 1    
213 though 1    
214 thought 1    
215 through 1    
216 to 17    
217 tourist 1    
218 train 1    
219 trapped 1    
220 trapped? 1    
221 try 1    
222 underground. 1    
223 underwater 1    
224 unit 1    
225 up 1    
226 use 1    
227 visibility 1    
228 visitors 1    
229 was 2    
230 water 2    
231 waters 2    
232 were 4    
233 which 5    
234 who 1    
235 with 2    
236 workers 1    
237 you 1    
238 young 1

View Code

八、思考一：如何通过Hadoop存储小文件？

　　　　　　 1、应用程序自己控制
　　　　　　　2、archive
　　　　　　　3、Sequence File / Map File
　　　　　　　4、CombineFileInputFormat***
　　　　　　　5、合并小文件，如HBase部分的compact

　　思考二：当有节点故障时，Hadoop集群是如何继续提供服务的，如何读和写？

　　思考三：哪些时影响Mapreduce性能的因素？

相关阅读:
古典问题-兔子生兔子
 order by 执行计划索引使用不同的坑
 MybatisPlus 通用枚举无法正确取值
 Arrays.asList 使用细节
 java 生成pdf文件(易上手版)
Mysql-tinyint使用之实际采坑记
 mysql
mysql -- froce index 使用
 java基础全套
 javaweb之servlet 全解
原文地址：https://www.cnblogs.com/zongfa/p/9230090.html