Purpose
This document is a starting point for users working with Hadoop Distributed File System (HDFS) either as a part of a Hadoop cluster or as a stand-alone general purpose distributed file system. While HDFS is designed to “just work” in many environments, a working knowledge of HDFS helps greatly with configuration improvements and diagnostics on a specific cluster.
目的
本文档对于使用HDFS的用户来说是一个起点,不管是作为Hadoop集群的一部分还是一个独立的通用的分布式文件系统。虽然HDFS被设计在很多环境下工作,但是HDFS工作原理的支持将极大的帮助配置的调高和特定集群的故障检测。
Overview
HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. The HDFS Architecture Guide describes HDFS in detail. This user guide primarily deals with the interaction of users and administrators with HDFS clusters. The HDFS architecture diagram depicts basic interactions among NameNode, the DataNodes, and the clients. Clients contact NameNode for file metadata or file modifications and perform actual file I/O directly with the DataNodes.
The following are some of the salient features that could be of interest to many users.
Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. MapReduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop.
HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters.
Hadoop is written in Java and is supported on all major platforms.
Hadoop supports shell-like commands to interact with HDFS directly.
The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster.
New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:
o File permissions and authentication.
o Rack awareness: to take a node’s physical location into account while scheduling tasks and allocating storage.
o Safemode: an administrative mode for maintenance.
o fsck: a utility to diagnose health of the file system, to find missing files or blocks.
o fetchdt: a utility to fetch DelegationToken and store it in a file on the local system.
o Balancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
o Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
o Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
o Checkpoint node: performs periodic checkpoints of the namespace and helps minimize the size of the log stored at the NameNode containing changes to the HDFS. Replaces the role previously filled by the Secondary NameNode, though is not yet battle hardened. The NameNode allows multiple Checkpoint nodes simultaneously, as long as there are no Backup nodes registered with the system.
o Backup node: An extension to the Checkpoint node. In addition to checkpointing it also receives a stream of edits from the NameNode and maintains its own in-memory copy of the namespace, which is always in sync with the active NameNode namespace state. Only one Backup node may be registered with the NameNode at once.
概览
HDFS是Hadoop应用程序使用的主要的分布式存储系统。一个HDFS集群主要包括一个NameNode和多个DataNode,NameNode管理文件系统元数据,DataNode存储真正的数据。HDFS的架构指南详细地描述了HDFS。本用户指南主要讲述用户和管理员与HDFS系统的交互。HDFS架构图描绘了NameNode,DataNode和client之间的基本的交互。Client连接NameNode取得文件元数据或者文件修改信息,然后直接与DataNode执行真正的文件I/O操作。
下面是一些可能引起很多用户兴趣的特性:
1. Hadoop,包括HDFS,非常适合用标准硬件进行分布式存储和分布式处理。它具有容错,可伸缩和及其简单的扩容等特性。因其对于大量的分布式应用程序的简单和高适用性而出名的MapReduce是Hadoop的一部分。
2. HDFS的默认配置适合大部分设备。大多数情况下,配置只在非常大的集群中时需要调优。
3. Hadoop用java编写,支持所有主流的平台。
4. Hadoop支持类shell的命令来与HDFS直接交互。
5. NameNode和DataNode内置了web服务器,使其更容易的检查集群当前的状态。
6. 新的特性和改进通常实现在HDFS中,下面是HDFS中的部分有用的特性:
1> 文件权限和认证
2> 机架感知:在调度任务和申请存储的时候考虑到节点的物理位置。
3> 安全模式:维护时的管理模式
4> fsck:一个检测集群文件系统健康状况的工具,可以找出丢失的文件和Block。
5> fetchdt:一个获取DelegationToken,然后存储到本地文件系统的工具。
6> Rebalancer:当数据不平均的分布在DataNode时平衡集群数据的工具。
7> Upgrade和rollback:在软件升级之后,再遇到不可预测的问题的情况下,回滚回HDFS升级之前的状态是可以的。
8> Secondary NameNode:周期性地执行namespace的检查点,帮助保持NameNode中存储HDFS的修改日志的文件的大小不超过某个范围。
9> Checkpoint Node:周期性地执行namespace的检查点操作,帮助减少存储在NameNode的包含HDFS变化的日志的大小。取代之前Secondary NameNode的角色,尽管还不是必须的。NameNode允许同时存在多个Checkpoint节点,只要系统中没有BackUp节点的存在。
10> BackUp节点:Checkpoint节点的扩展。除了checkpoint,它还从NameNode接收一个edit文件流,在内存中维护他自己的namespace的copy,这个copy总是与NameNode节点上namespace的状态同步。NameNode一次只能注册一个BackUp节点。
Prerequisites
The following documents describe how to install and set up a Hadoop cluster:
· Single Node Setup for first-time users.
· Cluster Setup for large, distributed clusters.
The rest of this document assumes the user is able to set up and run a HDFS with at least one DataNode. For the purpose of this document, both the NameNode and DataNode could be running on the same physical machine.
基本条件
下面的文档描述了如何安装和启动一个Hadoop集群:
Single Node Setup for first-time users.
Cluster Setup for large, distributed clusters.
本文档剩下的部分假设用户已能够建立和运行至少有一个DataNode节点的HDFS集群。为了实现本文档的目的,NameNode和DataNode节点可以运行在一台物理机器上。
Web Interface
NameNode and DataNode each run an internal web server in order to display basic information about the current status of the cluster. With the default configuration, the NameNode front page is at http://namenode-name:50070/. It lists the DataNodes in the cluster and basic statistics of the cluster. The web interface can also be used to browse the file system (using “Browse the file system” link on the NameNode front page).
Web接口
每一个NameNode和DataNode都运行一个内部的web服务器,以展示关于集群当前的状态的基本的信息。默认配置下,NameNode主页面在http://namenode-name:50070/。它列出了集群中所有的数据节点和集群中基本统计信息。这个web接口可以被用来浏览文件系统(在NameNode的主页上有“Browse the file system”的链接)。
Shell Commands
Hadoop includes various shell-like commands that directly interact with HDFS and other file systems that Hadoop supports. The command bin/hdfs dfs -help lists the commands supported by Hadoop shell. Furthermore, the command bin/hdfs dfs -help command-name displays more detailed help for a command. These commands support most of the normal files system operations like copying files, changing file permissions, etc. It also supports a few HDFS specific operations like changing replication of files. For more information see File System Shell Guide.
DFSAdmin Command
The bin/hdfs dfsadmin command supports a few HDFS administration related operations. The bin/hdfs dfsadmin -help command lists all the commands currently supported. For e.g.:
-report: reports basic statistics of HDFS. Some of this information is also available on the NameNode front page.
-safemode: though usually not required, an administrator can manually enter or leave Safemode.
-finalizeUpgrade: removes previous backup of the cluster made during last upgrade.
-refreshNodes: Updates the namenode with the set of datanodes allowed to connect to the namenode. Namenodes re-read datanode hostnames in the file defined by dfs.hosts, dfs.hosts.exclude Hosts defined in dfs.hosts are the datanodes that are part of the cluster. If there are entries in dfs.hosts, only the hosts in it are allowed to register with the namenode. Entries in dfs.hosts.exclude are datanodes that need to be decommissioned. Datanodes complete decommissioning when all the replicas from them are replicated to other datanodes. Decommissioned nodes are not automatically shutdown and are not chosen for writing for new replicas.
-printTopology : Print the topology of the cluster. Display a tree of racks and datanodes attached to the tracks as viewed by the NameNode.
For command usage, see dfsadmin.
Shell 命令
Hadoop拥有各种类shell命令,能够直接与HDFS和其他Hadoop支持的文件系统进行交互。bin/hdfsdfs –help命令可以列出Hadoop shell支持的命令。而且bin/hdfs dfs -help command-name会列出一个命令更多的细节。这些命令支持大多数标准文件系统操作,像复制文件,修改文件权限等。它也支持一些HDFS特有的操作,像改变文件副本个数。更多信息尽在 File System Shell Guide。
DFSAdmin Command
bin/hadoop dfsadmin命令支持一些HDFS管理相关的操作。bin/hadoopdfsadmin –help命令列出了当前支持的所有的命令,例如:
1. –report:报告HDFS基本的统计信息。这些信息中一些亦可以在NameNode主页中查看。
2. –safemode:最然通常不需要,但是一个管理员可以手工的进入或者离开安全模式。
3. –finalizeUpgrade:移除最近的一次升级时先前集群的备份。
4. –refreshNodes:更新namenode和多个连接到此NameNode的DataNode。NameNode重新读取定义在dfs.hosts, dfs.hosts.exclude文件中的DataNode的hostname。定义在dfs.hosts文件中的主机是集群中的datanode的部分。如果dfs.hosts中有条目,只有其中出现的主机才被允许注册到NameNode。dfs.hosts.exclude中出现的条目是需要退役的DataNode。当这些节点上的副本在其他数据节点副本完成,这些DataNode完成退役。退役的节点不会自动关机,新的副本不会在选择这些节点写入。
5. printTopology:打印机群的拓扑。展示一个树形的机架和依附于机架上的DataNode,就像在NameNode中看到的那样。
更多命令的用法,看 dfsadmin。
Secondary NameNode
The NameNode stores modifications to the file system as a log appended to a native file system file, edits. When a NameNode starts up, it reads HDFS state from an image file, fsimage, and then applies edits from the edits log file. It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file. Since NameNode merges fsimage and edits files only during start up, the edits log file could get very large over time on a busy cluster. Another side effect of a larger edits file is that next restart of NameNode takes longer.
The secondary NameNode merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode.
The start of the checkpoint process on the secondary NameNode is controlled by two configuration parameters.
dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and
dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.
The secondary NameNode stores the latest checkpoint in a directory which is structured the same way as the primary NameNode’s directory. So that the check pointed image is always ready to be read by the primary NameNode if necessary.
For command usage, see secondarynamenode.
Secondary NameNode
NameNode存储文件系统的变化,添加这些信息到本地文件系统中的日志文件的末尾,这个日志文件时edits。当一个NameNode启动,它从一个镜像文件,fsimage中读取HDFS的状态,然后应用edits日志文件中的edits。然后将一个新的HDFS状态写到fsimage中,用一个新的空的edit文件存储正常的操作。因为NameNode只在启动时合并fsimage和edit文件,随着时间推移,edit的日志文件可能在一个忙碌的集群中变得非常大。大edit日志文件的另一个副作用是下一次NameNode的启动将会花费更长的时间。
Secondary NameNode周期性地合并fsimage和edit日志文件,保持日志文件的大小在一个范围内。它通常运行在一个不同于NameNode的机器上,因为它的内存需求跟NameNode一样。
Checkpoint进程在Secondary NameNode上的启动被两个配置参数管理:
1. dfs.namenode.checkpoint.period:默认设置为1小时,这个参数指定两次连续的Checkpoint操作的最大间隔。
2. dfs.namenode.checkpoint.txns:默认设置为1百万,这个参数定义了NameNode中没有Checkpoint的事务的个数,如果超过这个个数,即使没有到Checkpoint的时间,也会强制Checkpoint。
Secondary NameNode将最近的Checkpoint存储到跟NameNode中一样结构的目录中。所以如果必要,被Checkpoint的image总是准备被NameNode读取。
更多命令用法,看secondarynamenode。
Checkpoint Node
NameNode persists its namespace using two files: fsimage, which is the latest checkpoint of the namespace and edits, a journal (log) of changes to the namespace since the checkpoint. When a NameNode starts up, it merges the fsimage and edits journal to provide an up-to-date view of the file system metadata. The NameNode then overwrites fsimage with the new HDFS state and begins a new edits journal.
The Checkpoint node periodically creates checkpoints of the namespace. It downloads fsimage and edits from the active NameNode, merges them locally, and uploads the new image back to the active NameNode. The Checkpoint node usually runs on a different machine than the NameNode since its memory requirements are on the same order as the NameNode. The Checkpoint node is started by bin/hdfs namenode -checkpoint on the node specified in the configuration file.
The location of the Checkpoint (or Backup) node and its accompanying web interface are configured viathe dfs.namenode.backup.address and dfs.namenode.backup.http-address configuration variables.
The start of the checkpoint process on the Checkpoint node is controlled by two configuration parameters.
dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints
dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.
The Checkpoint node stores the latest checkpoint in a directory that is structured the same as the NameNode’s directory. This allows the checkpointed image to be always available for reading by the NameNode if necessary. See Import checkpoint.
Multiple checkpoint nodes may be specified in the cluster configuration file.
For command usage, see namenode.
Checkpoint Node
NameNode用两个文件持久化它的namespace:fsimage和edits,fsimage是namespace的最近一次的Checkpoint,edits文件是自Checkpoint后namespace的变化日志。当NameNode启动时,它合并fsimage和edits日志文件以提供一个文件系统元数据的最新的视图。然后NameNode用新的HDFS状态覆盖fsimage,开起一个新的edits文件。
Checkpoint节点周期性的创建namespace的Checkpoint。它从active的NameNode下载fsimage和edits文件,本地合并它们生成新的image,然后将新的image上传回activeNameNode上。Checkpoint节点通常运行在不同于NameNode的机器上,因为它的内存需求跟NameNode一样。Checkpoint节点在配置文件中指定的节点上用bin/hdfs namenode –checkpoint启动。
Checkpoint(或Backup)节点和它通过dfs.namenode.backup.address 和dfs.namenode.backup.http-address配置的附带的web接口。
Checkpoint节点上的checkpoint进程被两个配置参数管理:
1. dfs.namenode.checkpoint.period:默认设置为1小时,这个参数指定两次连续的Checkpoint操作的最大间隔。
2. dfs.namenode.checkpoint.txns:默认设置为1百万,这个参数定义了NameNode中没有Checkpoint的事务的个数,如果超过这个个数,即使没有到Checkpoint的时间,也会强制Checkpoint。
Checkpoint节点将最近的Checkpoint存储到跟NameNode中一样结构的目录中。这使得必要时被checkpoint的image是可读的。查看Import checkpoint。
一个集群中可以配置多个Checkpoint节点。
更多的命令用法,看 namenode。
Backup Node
The Backup node provides the same checkpointing functionality as the Checkpoint node, as well as maintaining an in-memory, up-to-date copy of the file system namespace that is always synchronized with the active NameNode state. Along with accepting a journal stream of file system edits from the NameNode and persisting this to disk, the Backup node also applies those edits into its own copy of the namespace in memory, thus creating a backup of the namespace.
The Backup node does not need to download fsimage and edits files from the active NameNode in order to create a checkpoint, as would be required with a Checkpoint node or Secondary NameNode, since it already has an up-to-date state of the namespace state in memory. The Backup node checkpoint process is more efficient as it only needs to save the namespace into the local fsimage file and reset edits.
As the Backup node maintains a copy of the namespace in memory, its RAM requirements are the same as the NameNode.
The NameNode supports one Backup node at a time. No Checkpoint nodes may be registered if a Backup node is in use. Using multiple Backup nodes concurrently will be supported in the future.
The Backup node is configured in the same manner as the Checkpoint node. It is started with bin/hdfs namenode -backup.
The location of the Backup (or Checkpoint) node and its accompanying web interface are configured via the dfs.namenode.backup.address and dfs.namenode.backup.http-address configuration variables.
Use of a Backup node provides the option of running the NameNode with no persistent storage, delegating all responsibility for persisting the state of the namespace to the Backup node. To do this, start the NameNode with the -importCheckpoint option, along with specifying no persistent storage directories of type edits dfs.namenode.edits.dir for the NameNode configuration.
For a complete discussion of the motivation behind the creation of the Backup node and Checkpoint node, see HADOOP-4539. For command usage, see namenode.
Backup节点
跟Checkpoint node差不多。
Import Checkpoint
The latest checkpoint can be imported to the NameNode if all other copies of the image and the edits files are lost. In order to do that one should:
Create an empty directory specified in the dfs.namenode.name.dir configuration variable;
Specify the location of the checkpoint directory in the configuration variable dfs.namenode.checkpoint.dir;
and start the NameNode with -importCheckpoint option.
The NameNode will upload the checkpoint from the dfs.namenode.checkpoint.dir directory and then save it to the NameNode directory(s) set in dfs.namenode.name.dir. The NameNode will fail if a legal image is contained in dfs.namenode.name.dir. The NameNode verifies that the image in dfs.namenode.checkpoint.dir is consistent, but does not modify it in any way.
For command usage, see namenode.
Import Checkpoint
如果NameNode中所有其他的image和edits文件的copy都丢失了,最近的Checkpoint可以被import到NameNode中。为了可以import,你应该:
1. 在dfs.namenode.name.dir配置指定的path创建一个空的目录。
2. 用dfs.namenode.checkpoint.dir指定Checkpoint目录。
3. 用-importCheckpoint 选项启动NameNode。
NameNode将从dfs.namenode.checkpoint.dir目录中上传Checkpoint,然后将它保存到 dfs.namenode.name.dir设置的NameNode的目录。如果在dfs.namenode.name.dir目录中有一个合法的image,NameNode将会失败。NameNode检验dfs.namenode.checkpoint.dir 的image是否一致,但是任何情况下都不会修改它。
更多命令用法,查看namenode。
Balancer
HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:
Policy to keep one of the replicas of a block on the same node as the node that is writing the block.
Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack.
One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.
Spread HDFS data uniformly across the DataNodes in the cluster.
Due to multiple competing considerations, data might not be uniformly placed across the DataNodes. HDFS provides a tool for administrators that analyzes block placement and rebalanaces data across the DataNode. A brief administrator’s guide for balancer is available at HADOOP-1652.
For command usage, see balancer.
Rebalancer
HDFS数据可能不总是一致的被存放在DataNode中。一个常见的原因是新DataNode节点的增加。当存放新的Block(一个文件的数据被存放为一些列的Block)时,NameNode考虑很多的参数在选择接收这些Block的DataNode时。下面是一些考虑的因素:
1. 保持一个Block的多个副本中的一个与正在写入的Block在一个节点上。
2. 需要将副本跨机架传播,这样集群可以在整个机架沦陷时幸存。
3. 多个副本中的一个通常存放在跟正在写入的文件相同的机架上,这样可以减少跨机架的网络I/O。
4. 一致的的在集群中的DataNode之间传播HDFS数据。
考虑到多个相互矛盾的因素,数据可能不一致的存放在DataNode中。HDFS提供了一个分析数据块的位置和重新平衡DataNode中的数据的工具。HADOOP-1652中是一个简短的rebalancer的管理员指南,pdf格式。
更多命令用法,看balancer。
Rack Awareness
Typically large Hadoop clusters are arranged in racks and network traffic between different nodes with in the same rack is much more desirable than network traffic across the racks. In addition NameNode tries to place replicas of block on multiple racks for improved fault tolerance. Hadoop lets the cluster administrators decide which rack a node belongs to through configuration variable net.topology.script.file.name. When this script is configured, each node runs the script to determine its rack id. A default installation assumes all the nodes belong to the same rack. This feature and configuration is further described in PDF attached to HADOOP-692.
Rack Awareness
通常一个大的Hadoop集群分布在多个机架上,同一个机架上的不同节点间的网络流量比跨机架的节点间的网络流量更令人满意。NameNode试图将Block的副本放到多个机架上以提高容错。通过配置net.topology.script.file.name,Hadoop让集群管理员自己决定一个节点属于哪一个机架。当此脚本配置,每一个节点运行这个脚本来决定它属于哪一个机架。默认设置是假设所有的节点属于同一个机架。此特性和配置更进一步的描述在 HADOOP-692。
Safemode
During start up the NameNode loads the file system state from the fsimage and the edits log file. It then waits for DataNodes to report their blocks so that it does not prematurely start replicating the blocks though enough replicas already exist in the cluster. During this time NameNode stays in Safemode. Safemode for the NameNode is essentially a read-only mode for the HDFS cluster, where it does not allow any modifications to file system or blocks. Normally the NameNode leaves Safemode automatically after the DataNodes have reported that most file system blocks are available. If required, HDFS could be placed in Safemode explicitly using bin/hdfs dfsadmin -safemode command. NameNode front page shows whether Safemode is on or off. A more detailed description and configuration is maintained as JavaDoc for setSafeMode().
Safemode
NameNode启动是从fsimage和edits日志文件中加载文件系统状态。然后等待DataNode报告它们的Block,所以NameNode不过早的复制Block,可能集群中有足够的副本。在这段时间内,NameNode在safemode状态。NameNode的Safemode本质上来说就是HDFS集群的只读模式,它不允许文件系统或Block的任何修改。正常情况下,在DataNode报告它的大多数文件系统的Block available之后,NameNode会自动的离开safemode模式。如果有必要,HDFS可以用bin/hadoop dfsadmin –safemode明确的进入safemode。
NameNode主页展示了safemode开关状态。更详细的描述和配置在setSafeMode()的java doc中。
fsck
HDFS supports the fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures. By default fsck ignores open files but provides an option to select all files during reporting. The HDFS fsck command is not a Hadoop shell command. It can be run as bin/hdfs fsck. For command usage, see fsck. fsck can be run on the whole file system or on a subset of files.
Fsck
HDFS支持fsck命令来检查各种不一致状态。它被设置用来报告各种文件的各种问题,例如,丢失一个文件的某个Block或者正在复制的Block。不像针对本地文件系统的传统的fsck工具,这个命令不更正它检测到的错误。正常情况下,NameNode自动更正大部分可恢复的失效。默认,fsck忽略打开的文件但是提供一个选项在报告时选择所有的文件。HDFS fsck命令不是Hadoop shell命令。它可以用bin/hadoop fsck运行。更多命令的用法,看fsck。Fsck可以运行在整个文件系统或者所有文件的子集。
fetchdt
HDFS supports the fetchdt command to fetch Delegation Token and store it in a file on the local system. This token can be later used to access secure server (NameNode for example) from a non secure client. Utility uses either RPC or HTTPS (over Kerberos) to get the token, and thus requires kerberos tickets to be present before the run (run kinit to get the tickets). The HDFS fetchdt command is not a Hadoop shell command. It can be run as bin/hdfs fetchdt DTfile. After you got the token you can run an HDFS command without having Kerberos tickets, by pointing HADOOP_TOKEN_FILE_LOCATION environmental variable to the delegation token file. For command usage, see fetchdt command.
Fetchdt
HDFS支持fetchdt命令来获取 Delegation Token和将其存储到本地文件系统中。这个token之后可以被用来从一个不安全的客户端访问安全的服务(例如NameNode)。此工具使用RPC或者HTTPS(在Kerberos之上)来获取token,因此需要ticket才能运行(运行kinit 命令可以得到ticket)。HDFSfetchdt命令不是Hadoop shell命令。它可以用bin/hadoop fetchdt DTfile运行。在你取得token之后,通过指定 HADOOP_TOKEN_FILE_LOCATION环境变量你可以不需要Kerberosticket就运行HDFS命令, HADOOP_TOKEN_FILE_LOCATION指定delegationtoken文件的位置。更多命令用法,查看 fetchdt命令。
Recovery Mode
Typically, you will configure multiple metadata storage locations. Then, if one storage location is corrupt, you can read the metadata from one of the other storage locations.
However, what can you do if the only storage locations available are corrupt? In this case, there is a special NameNode startup mode called Recovery mode that may allow you to recover most of your data.
You can start the NameNode in recovery mode like so: namenode -recover
When in recovery mode, the NameNode will interactively prompt you at the command line about possible courses of action you can take to recover your data.
If you don’t want to be prompted, you can give the -force option. This option will force recovery mode to always select the first choice. Normally, this will be the most reasonable choice.
Because Recovery mode can cause you to lose data, you should always back up your edit log and fsimage before using it.
Recovery Mode
通常情况下,你需要配置多个元数据的存储位置。然后,若果一个存储位置崩溃,你可以从另一个其他的位置读取元数据。
但是,如果仅有的存储崩溃,你能做啥呢?在这种情况下,有一个特殊的NameNode启动模式,Recovery Mode,它允许你恢复你的大多数数据。
你可以用namenode –recover以recovery mode启动NameNode。
在Recovery Mode时,NameNode将在命令行交互性地提示你可以恢复数据的可能的行动步骤。
如果你不希望被提示,你可以给 -force选项。这个选项将强制RecoveryMode总是选择第一个选项。正常情况下,这将是最合理的选择。
因为Recovery Mode可能使你丢失信息,在使用它之前,你应该总是备份你的edit日志和fsimage。
Upgrade and Rollback
When Hadoop is upgraded on an existing cluster, as with any software upgrade, it is possible there are new bugs or incompatible changes that affect existing applications and were not discovered earlier. In any non-trivial HDFS installation, it is not an option to loose any data, let alone to restart HDFS from scratch. HDFS allows administrators to go back to earlier version of Hadoop and rollback the cluster to the state it was in before the upgrade. HDFS upgrade is described in more detail in Hadoop Upgrade Wiki page. HDFS can have one such backup at a time. Before upgrading, administrators need to remove existing backup using bin/hadoop dfsadmin -finalizeUpgrade command. The following briefly describes the typical upgrade procedure:
Before upgrading Hadoop software, finalize if there an existing backup. dfsadmin -upgradeProgress status can tell if the cluster needs to be finalized.
Stop the cluster and distribute new version of Hadoop.
Run the new version with -upgrade option (bin/start-dfs.sh -upgrade).
Most of the time, cluster works just fine. Once the new HDFS is considered working well (may be after a few days of operation), finalize the upgrade. Note that until the cluster is finalized, deleting the files that existed before the upgrade does not free up real disk space on the DataNodes.
If there is a need to move back to the old version,
o stop the cluster and distribute earlier version of Hadoop.
o run the rollback command on the namenode (bin/hdfs namenode -rollback).
o start the cluster with rollback option. (sbin/start-dfs.sh -rollback).
When upgrading to a new version of HDFS, it is necessary to rename or delete any paths that are reserved in the new version of HDFS. If the NameNode encounters a reserved path during upgrade, it will print an error like the following:
/.reserved is a reserved path and .snapshot is a reserved path component in this version of HDFS. Please rollback and delete or rename this path, or upgrade with the -renameReserved [key-value pairs] option to automatically rename these paths during upgrade.
Specifying -upgrade -renameReserved [optional key-value pairs] causes the NameNode to automatically rename any reserved paths found during startup. For example, to rename all paths named .snapshot to .my-snapshot and .reserved to .my-reserved, a user would specify -upgrade -renameReserved .snapshot=.my-snapshot,.reserved=.my-reserved.
If no key-value pairs are specified with -renameReserved, the NameNode will then suffix reserved paths with .<LAYOUT-VERSION>.UPGRADE_RENAMED, e.g. .snapshot.-51.UPGRADE_RENAMED.
There are some caveats to this renaming process. It’s recommended, if possible, to first hdfs dfsadmin -saveNamespace before upgrading. This is because data inconsistency can result if an edit log operation refers to the destination of an automatically renamed file.
Upgrade 和 Rollback
当Hadoop在一个已存在的集群上被升级的时候,就像任何的软件升级一样,它可能有一些新的bug或者不兼容的变化,这些bug和变化可能会影响已存在的应用程序,并且不能过早的发现。在任何重要的HDFS安装中,丢失任何数据都是不允许的,更不用说HDFS重新启动。HDFS允许管理员回滚回Hadoop升级之前的版本,将集群回滚回升级之前的状态。HDFS升级更多的细节在Hadoop Upgrade。这个时候,HDFS可以有这样一个备份。在升级之前,管理员需要 用bin/hadoop dfsadmin -finalizeUpgrade命令移除已经存在的备份。下面是对一个典型的升级过程简短的描述:
1. 在升级Hadoop软件之前,如果有一个已经存在的备份,finalize掉。dfsadmin -upgradeProgress状态可以告诉我们集群是否需要被finalize。
2. 停止集群,分发新版本的Hadoop。
3. 用bin/start-dfs.sh-upgrade运行新版本的hadoop
4. 大多数情况下,集群会很好的工作。一旦新HDFS被认为工作良好(可能是很多天的操作之后得出),finalize掉这个Upgrade。注意,直到集群被finalize,删除升级之前存在的文件不会释放DataNode上真正的存储空间。
5. 如果有需要回滚回旧版本,
a) 停掉集群,分发hadoop旧版本。
b) 用rollback选项启动集群,bin/start-dfs.sh –rollback。
当升级到一个新版本的HDFS,有必要更改或删除任何存储在新版本中的路径。如果升级期间,NameNode遇到一个存在的路径,它将会打印像下面这样的错误:
/.reserved is a reserved path and .snapshot is areserved path component in this version of HDFS. Please rollback and delete orrename this path, or upgrade with the -renameReserved [key-value pairs] optionto automatically rename these paths during upgrade.
指定 -upgrade -renameReserved[optional key-value pairs]会使NameNode自动更改启动过程中发现任何保存的路径。例如,更改所有的.snapshot命名的路径为.my-snapshot,更改所有的.reserved路径为.my-reserved,用户也可以指定-upgrade -renameReserved.snapshot=.my-snapshot,.reserved=.my-reserved。
如果没有key-value对用-renameReserved被指定,NameNode将添加后缀 .<LAYOUT-VERSION>.UPGRADE_RENAMED,例如, .snapshot.-51.UPGRADE_RENAMED。
Rename进程会有一些警告。建议,如果可能,升级之前先运行hdfs dfsadmin -saveNamespace。这是因为如果edit日志操作涉及到自动修改过的文件的话,数据会出现不一致的情况。
DataNode Hot Swap Drive
Datanode supports hot swappable drives. The user can add or replace HDFS data volumes without shutting down the DataNode. The following briefly describes the typical hot swapping drive procedure:
If there are new storage directories, the user should format them and mount them appropriately.
The user updates the DataNode configuration dfs.datanode.data.dir to reflect the data volume directories that will be actively in use.
The user runs dfsadmin -reconfig datanode HOST:PORT start to start the reconfiguration process. The user can use dfsadmin -reconfig datanode HOST:PORT status to query the running status of the reconfiguration task.
Once the reconfiguration task has completed, the user can safely umount the removed data volume directories and physically remove the disks.
File Permissions and Security
The file permissions are designed to be similar to file permissions on other familiar platforms like Linux. Currently, security is limited to simple file permissions. The user that starts NameNode is treated as the superuser for HDFS. Future versions of HDFS will support network authentication protocols like Kerberos for user authentication and encryption of data transfers. The details are discussed in the Permissions Guide.
文件权限和安全
文件权限的设计跟其他常见的平台像linux是相似的。目前,安全仅限于简单的文件权限。启动NameNode的用户被认为是HDFS的超级用户。将来的HDFS版本将支持网络认证协议像Kerberos来支持用户认证和数据传输加密。更详细的讨论在权限指南。
Scalability
Hadoop currently runs on clusters with thousands of nodes. The PoweredBy Wiki page lists some of the organizations that deploy Hadoop on large clusters. HDFS has one NameNode for each cluster. Currently the total memory available on NameNode is the primary scalability limitation. On very large clusters, increasing average size of files stored in HDFS helps with increasing cluster size without increasing memory requirements on NameNode. The default configuration may not suite very large clusters. The FAQ Wiki page lists suggested configuration improvements for large Hadoop clusters.
Scalability
Hadoop目前可以运行在几千个节点的集群上。PoweredByWiki页面上列出了一些部署hadoop大规模集群的组织。HDFS在每个集群中有一个NameNode。目前NameNode上总的内存是主要的扩展限制。在每一个大集群上,增加存储在HDFS中的文件的大小有助于在不增加NameNode内存的情况下增加集群存储能力。默认的配置可能不适合非常大的集群。FAQ Wiki页面列出了对于大规模hadoop集群的建议的配置提高。
Related Documentation
This user guide is a good starting point for working with HDFS. While the user guide continues to improve, there is a large wealth of documentation about Hadoop and HDFS. The following list is a starting point for further exploration:
· Hadoop Site: The home page for the Apache Hadoop site.
· Hadoop Wiki: The home page (FrontPage) for the Hadoop Wiki. Unlike the released documentation, which is part of Hadoop source tree, Hadoop Wiki is regularly edited by Hadoop Community.
· FAQ: The FAQ Wiki page.
· Hadoop User Mailing List: user[at]hadoop.apache.org.
· Explore聽hdfs-default.xml. It includes brief description of most of the configuration variables available.
· HDFS Commands Guide: HDFS commands usage.
相关的文档
本用户指南对于用HDFS工作来说是一个好的起点。当用户指南继续改进,将会有一个很大的关于hadoop和HDFS的文档。下面列出了对于更进一步的探索的起点:
Hadoop Site: The home page for the Apache Hadoop site.
Hadoop Wiki: The home page (FrontPage) for the Hadoop Wiki. Unlike the released documentation, which is part of Hadoop source tree, Hadoop Wiki is regularly edited by Hadoop Community.
FAQ: The FAQ Wiki page.
Hadoop User Mailing List: user[at]hadoop.apache.org.
Explore hdfs-default.xml. It includes brief description of most of the configuration variables available.
Hadoop Commands Guide: Hadoop commands usage.