声明: author: 龚细军 时间: 17-08-01 类型: 笔记 转载时请注明出处及相应链接。 链接地址: http://www.cnblogs.com/gongxijun/p/5726024.html
本笔记所记录全部基于真实操作所得,所使用hadoop版本为hadoop-2.7.2,使用操作系统为kylin-linux.
默认是:已经安装好了jdk环境.并已经下载好hadoop&解压之后
1. 下载完成hadoo并解压之后
进入到安装目录,我们会看到如下几个文件夹和文件
/hadoop-2.7.2$ ls
bin include lib LICENSE.txt NOTICE.txt README.txt share
etc input libexec logs output sbin wc-in
介绍一下基本情况:
bin目录: hadoop的指令集合存储区,例如 hadoop ,hdfs , yarn,mapred等 这个文件比较重要
我们可以如此使用它们:
/hadoop-2.7.2$ bin/hadoop dfs -cat output/* |more
include目录: C++/C 开发用的头文件
lib目录: 提供各种库,c/c++开发库
etc目录: 环境配置包,其他的版本采用conf目录替换,进入该目录下会看到
/hadoop-2.7.2/etc/hadoop$ ls | grep .xml
capacity-scheduler.xml
core-site.xml
hadoop-policy.xml
hdfs-site.xml
hdfs-site.xml~
httpfs-site.xml
kms-acls.xml
kms-site.xml
mapred-queues.xml.template
mapred-site.xml.template
ssl-client.xml.example
ssl-server.xml.example
yarn-site.xml
关于如何伪分布式配置
1.配置文件core.site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
2.hdfs.site.xml文件配置
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>/home/gongxijun/HDFS/fileinput</value> </property> <property> <name>dfs.data.dir</name> <value>/home/gongxijun/HDFS/fileoutput</value> </property> <property> <name>dfs.permissions</name> <value>false</value> <description> if "true" ,enable permission checking in HDFS. if "false",permission checking is turned off,but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode , owner or group of files or directories. </description> </property> </configuration>
3.配置mapred-site.xml文件,如要将mapred-site.xml.template文件复制一份mapred-site.xml,并对mapred-site.xml进行如下配置
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
之后,启动hadoop,输入./start-all.sh
程序pom.xml文件配置
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> <scope>compile</scope> <exclusions> <exclusion> <artifactId>zookeeper</artifactId> <groupId>org.apache.zookeeper</groupId> </exclusion> <exclusion> <artifactId>slf4j-log4j12</artifactId> <groupId>org.slf4j</groupId> </exclusion> <exclusion> <artifactId>jsp-api</artifactId> <groupId>javax.servlet.jsp</groupId> </exclusion> <exclusion> <artifactId>jasper-runtime</artifactId> <groupId>tomcat</groupId> </exclusion> <exclusion> <artifactId>jasper-compiler</artifactId> <groupId>tomcat</groupId> </exclusion> <exclusion> <artifactId>jersey-server</artifactId> <groupId>com.sun.jersey</groupId> </exclusion> <exclusion> <artifactId>asm</artifactId> <groupId>asm</groupId> </exclusion> </exclusions> </dependency>
运行程序如下:
1 package com.qunar.mapReduce; 2 3 import org.apache.hadoop.conf.Configuration; 4 import org.apache.hadoop.fs.Path; 5 import org.apache.hadoop.io.IntWritable; 6 import org.apache.hadoop.io.LongWritable; 7 import org.apache.hadoop.io.Text; 8 import org.apache.hadoop.mapreduce.Job; 9 import org.apache.hadoop.mapreduce.Mapper; 10 import org.apache.hadoop.mapreduce.Reducer; 11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 12 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 13 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 14 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 15 16 import java.io.IOException; 17 import java.util.Scanner; 18 import java.util.StringTokenizer; 19 20 /** 21 * ********************************************************* 22 * <p/> 23 * Author: XiJun.Gong 24 * Date: 2016-07-29 14:59 25 * Version: default 1.0.0 26 * Class description: 27 * <p/> 28 * ********************************************************* 29 */ 30 public class MapReduceDemo { 31 32 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 33 34 private final static IntWritable one = new IntWritable(1); 35 private Text word = new Text(); 36 37 @Override 38 public void map(LongWritable key, Text value, Context context) 39 throws IOException, InterruptedException { 40 String line = value.toString(); 41 StringTokenizer tokenizer = new StringTokenizer(line); 42 while (tokenizer.hasMoreTokens()) { 43 word.set(tokenizer.nextToken()); 44 context.write(word, one); 45 } 46 } 47 48 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { 49 50 @Override 51 public void reduce(Text key, Iterable<IntWritable> values, Context context) 52 throws IOException, InterruptedException { 53 int sum = 0; 54 for (IntWritable val : values) { 55 sum += val.get(); 56 } 57 58 context.write(key, new IntWritable(sum)); 59 } 60 } 61 62 63 public static void main(String[] args) throws Exception { 64 65 Configuration configuration = new Configuration(); 66 Job job = new Job(configuration, "wordCount"); 67 job.setOutputKeyClass(Text.class); 68 job.setOutputValueClass(IntWritable.class); 69 job.setMapperClass(Map.class); 70 job.setReducerClass(Reduce.class); 71 job.setInputFormatClass(TextInputFormat.class); 72 job.setOutputFormatClass(TextOutputFormat.class); 73 Scanner reader = new Scanner(System.in); 74 while (reader.hasNext()) { 75 FileInputFormat.addInputPath(job, new Path(reader.next())); 76 FileOutputFormat.setOutputPath(job, new Path(reader.next())); 77 job.waitForCompletion(true); 78 } 79 } 80 } 81 }
运行程序:
Connected to the target VM, address: '127.0.0.1:51980', transport: 'socket' 12:41:05.404 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops) 12:41:05.441 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops) 12:41:05.442 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[GetGroups], always=false, type=DEFAULT, sampleName=Ops) 12:41:05.444 [main] DEBUG o.a.h.m.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics 12:41:05.871 [main] DEBUG o.a.h.s.a.util.KerberosName - Kerberos krb5 configuration not found, setting default realm to empty 12:41:05.883 [main] DEBUG org.apache.hadoop.security.Groups - Creating new Groups object 12:41:05.895 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... 12:41:05.896 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 12:41:05.896 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - java.library.path=/home/gongxijun/Qunar/idea-IU-139.1117.1/bin::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 12:41:05.897 [main] WARN o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 12:41:05.900 [main] DEBUG o.a.hadoop.util.PerformanceAdvisory - Falling back to shell based 12:41:05.905 [main] DEBUG o.a.h.s.JniBasedUnixGroupsMappingWithFallback - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping 12:41:05.957 [main] DEBUG org.apache.hadoop.util.Shell - setsid exited with exit code 0 12:41:05.957 [main] DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 12:41:05.961 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login 12:41:05.962 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login commit 12:41:05.968 [main] DEBUG o.a.h.security.UserGroupInformation - using local user:UnixPrincipal: gongxijun 12:41:05.969 [main] DEBUG o.a.h.security.UserGroupInformation - Using user: "UnixPrincipal: gongxijun" with name gongxijun 12:41:05.969 [main] DEBUG o.a.h.security.UserGroupInformation - User entry: "gongxijun" 12:41:05.970 [main] DEBUG o.a.h.security.UserGroupInformation - UGI loginUser:gongxijun (auth:SIMPLE)
输入命令:
/home/gongxijun/web进阶.txt
/home/gongxijun/a.txt
显示结果:
12:44:36.992 [main] INFO org.apache.hadoop.mapreduce.Job - Counters: 33 File System Counters FILE: Number of bytes read=6316 FILE: Number of bytes written=518809 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=84 Map output records=85 Map output bytes=1476 Map output materialized bytes=1652 Input split bytes=99 Combine input records=0 Combine output records=0 Reduce input groups=82 Reduce shuffle bytes=1652 Reduce input records=85 Reduce output records=82 Spilled Records=170 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=9 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=459276288 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1335 File Output Format Counters Bytes Written=1311
结果在a.txt文件夹中:
(kafuka卡夫卡) 1
(缺陷: 1
(需要重点学习) 1
---去查看QMQ--message---->broker 1
/ 1
1. 2
1.判断线程安全的两个机准: 1
2. 3
3. 1
Apache 1
Cache 1
Client: 1
ConCurrentHashMap 1
Dubbo 1
Executor 1
Futrue/CountDownLatch 1
Guava 1
HTTP: 1
HashMap 1
Hession 1
HttpComponents 1
Java 1
Json 1
Key-Value 1
Kryo(重点) 1
LRU 1
Protobuf 1
QMQ/AMQ/rabbitimq 1
ReadWriterLock 1
ReentrantLock 1
async-http-client 1
c3p0 1
client实现 1
dbpc 1
redis 1
seriialization 1
servlet 1
snchronized 1
spymemcached 1
tomcat-jdbc 1
xmemcached 1
一致性Hash 1
一: 1
三: 1
乐观锁: 1
二: 1
互斥 1
共享数据 1
分布式锁? 1
分布式: 1
前端轮询,后端异步: 1
单例的 1
参数回调 1
可复用资源,创建代价大 1
可扩展性,服务降级,负载均衡,灰度 1
可重入锁 1
可靠性 1
回顾 1
场景: 1
对象池: 1
将对象的状态信息转换为可以存储或传输形式的过程. 1
尽量不要使用本地缓存 1
并发修改 1
序列化: 1
建议: 1
异步调用 1
异步: 1
形成环) 1
性能 1
方式: 1
本地缓存太大,可以使用对象池 1
概念: 1
池化技术 1
消息队列: 1
类型: 1
线程池 1
缓存--本地 1
读写锁: 1
连接池: 1
(分段锁) 1
(推荐使用) 1
, 1