• HDFS简单配置笔记


    第五章:HDFS
    一、操作HDFS
    1、Web Console:端口50070
    2、命令行:有两种类型
    3、Java API

    二、HDFS输出数据的原理(画图):比较重要
    1、数据上传的原理(过程)
    2、数据下载的原理(过程)

    缓存元信息的内存:1000M
    	/root/training/hadoop-2.7.3/etc/hadoop
       文件:hadoop-env.sh
    	# The maximum amount of heap to use, in MB. Default is 1000.
    	#export HADOOP_HEAPSIZE=
    	#export HADOOP_NAMENODE_INIT_HEAPSIZE=""	
    

    三、HDFS的高级特性
    1、回收站: recyclebin
    日志
    -rmr: 删除目录,包括子目录
    hdfs dfs -rmr /bbb
    日志:
    17/12/08 20:32:10 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted /bbb

    	(*)默认,HDFS的回收站是关闭
    	(*)启用回收站:参数---> core-site.xml
    		 本质:删除数据的时候,实际是一个ctrl+x操作
    	
    		<property>
    		   <name>fs.trash.interval</name>
    		   <value>1440</value>
    		</property>
    		
    		日志:
    		hdfs dfs -rmr /folder1
    		rmr: DEPRECATED: Please use 'rm -r' instead.
    		17/12/11 21:05:57 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
    		Moved: 'hdfs://bigdata11:9000/folder1' to trash at: hdfs://bigdata11:9000/user/root/.Trash/Current			
    	(*)恢复:实际就是cp,拷贝
    	     hdfs dfs -cp /user/root/.Trash/Current/input/data.txt /input
    		 
    		 清空:hdfs dfs -expunge
    		 
    	(*)补充:Oracle数据库也有回收站
    			SQL> select * from tab;
    
    			TNAME                          TABTYPE  CLUSTERID
    			------------------------------ ------- ----------
    			BIN$WBSNMvxJpWvgUAB/AQBygg==$0 TABLE
    			BONUS                          TABLE
    			DEPT                           TABLE
    			EMP                            TABLE
    			RESULT                         TABLE
    			SALGRADE                       TABLE
    
    			6 rows selected.
    
    			SQL> -- drop table mydemo1;
    			SQL> show recyclebin;
    			ORIGINAL NAME    RECYCLEBIN NAME                OBJECT TYPE  DROP TIME
    			---------------- ------------------------------ ------------ -------------------
    			MYDEMO1          BIN$WBSNMvxJpWvgUAB/AQBygg==$0 TABLE        2017-09-01:06:56:15
    			SQL> select count(*) from mydemo1;
    			select count(*) from mydemo1
    								 *
    			ERROR at line 1:
    			ORA-00942: table or view does not exist
    
    
    			SQL> select count(*) from BIN$WBSNMvxJpWvgUAB/AQBygg==$0;
    			select count(*) from BIN$WBSNMvxJpWvgUAB/AQBygg==$0
    													*
    			ERROR at line 1:
    			ORA-00933: SQL command not properly ended
    
    
    			SQL> select count(*) from "BIN$WBSNMvxJpWvgUAB/AQBygg==$0";
    
    			  COUNT(*)
    			----------
    					30
    
    			SQL> flashback table mydemo1 to before drop;
    
    			Flashback complete.
    
    			SQL> show recyclebin;
    			SQL> select count(*) from mydemo1;
    
    			  COUNT(*)
    			----------
    					30
    	
    2、快照snapshot:备份  ---> 一般来说:不建议使用快照
    
    	(*)默认:HDFS的快照是禁用的
    	(*)第一步:管理员开启某个目录的快照功能
    		[-allowSnapshot <snapshotDir>]
    		[-disallowSnapshot <snapshotDir>]	
    
    		hdfs dfsadmin -allowSnapshot /mydir1
    	
    	(*)第二步:使用HDFS的操作命令,创建快照
    		[-createSnapshot <snapshotDir> [<snapshotName>]]
    		[-deleteSnapshot <snapshotDir> <snapshotName>]	
    		[-renameSnapshot <snapshotDir> <oldName> <newName>]	
    		
    		hdfs dfs -createSnapshot /mydir1 mydir1_backup_01
    		日志:Created snapshot /mydir1/.snapshot/mydir1_backup_01
    		本质:将数据拷贝一份到当前目录的一个隐藏目录下
    		
    	(*)继续试验
    		hdfs dfs -put student02.txt /mydir1
    		hdfs dfs -createSnapshot /mydir1 mydir1_backup_02
    		
    		对比快照: hdfs snapshotDiff /mydir1 mydir1_backup_01 mydir1_backup_02
    		Difference between snapshot mydir1_backup_01 and snapshot mydir1_backup_02 under directory /mydir1:
    		M       .
    		+       ./student02.txt
    		
    3、配额quota:(1)名称配额: 规定某个目录下,存放文件(目录)的个数
                                 实际的个数:N-1个
    				[-setQuota <quota> <dirname>...<dirname>]
    				[-clrQuota <dirname>...<dirname>]
    
    				hdfs dfs -mkdir /quota1
    				设置该目录的名称配额:3
    				hdfs dfsadmin -setQuota 3 /quota1
    				
    				当我们放第三个文件的时候
    				hdfs dfs -put data.txt /quota1
    				put: The NameSpace quota (directories and files) of directory /quota1 is exceeded: quota=3 file count=4
    				
    				
                  (2)空间配额: 规定某个目录下,文件的大小
    				[-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>]
    				[-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>]
    				
    				hdfs dfs -mkdir /quota2
    				设置该目录的空间配额是:10M
    				hdfs dfsadmin -setSpaceQuota 10M /quota2
    				
    				正确的做法:hdfs dfsadmin -setSpaceQuota 130M /quota2
    				
    				放一个小于10M的文件,会出错
    				Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.DSQuotaExceededException): The DiskSpace quota of /quota2 is exceeded: quota = 10485760 B = 10 MB but diskspace consumed = 134217728 B = 128 MB
    				
    				注意:尽管数据不到128M,但是占用的数据块依然是128M
    				切记:当设置空间配额的时候,这个值不能小于128M
    
    	
    4、HDFS安全模式: safemode  ---> HDFS只读
        命令: hdfs dfsadmin -safemode get|wait|leave|enter
    	作用:检查数据块的副本率,如果副本率不满足要求,就会进行水平复制
    
    6、HDFS的集群:开个头
    		集群的两大功能:负载均衡,高可用(失败迁移)
    
                   (1)NameNode联盟(Federation) ----> HDFS
    			   
                   (2)HA: HDFS、Yarn、HBase、Storm、Spark ---> 都需要ZooKeeper
  • 相关阅读:
    2017/7/26 SCJP英语学习
    JSF(JavaServer Faces)简介
    Java回话技术
    2.2 对象深拷贝、浅复制、序列化
    编码与解码
    pycharm 教程(一)安装和首次使用
    Java Eclipse进行断点调试
    详细介绍如何在Eclipse中使用SVN
    SVN客户端安装与使用
    炸鸡
  • 原文地址:https://www.cnblogs.com/notes-study/p/8435683.html
Copyright © 2020-2023  润新知