hdfscli 命令行
# hdfscli --help HdfsCLI: a command line interface for HDFS. Usage: hdfscli [interactive] [-a ALIAS] [-v...] hdfscli download [-fsa ALIAS] [-v...] [-t THREADS] HDFS_PATH LOCAL_PATH hdfscli upload [-sa ALIAS] [-v...] [-A | -f] [-t THREADS] LOCAL_PATH HDFS_PATH hdfscli -L | -V | -h Commands: download Download a file or folder from HDFS. If a single file is downloaded, - can be specified as LOCAL_PATH to stream it to standard out. interactive Start the client and expose it via the python interpreter (using iPython if available). upload Upload a file or folder to HDFS. - can be specified as LOCAL_PATH to read from standard in. Arguments: HDFS_PATH Remote HDFS path. LOCAL_PATH Path to local file or directory. Options: -A --append Append data to an existing file. Only supported if uploading a single file or from standard in. -L --log Show path to current log file and exit. -V --version Show version and exit. -a ALIAS --alias=ALIAS Alias of namenode to connect to. -f --force Allow overwriting any existing files. -s --silent Don't display progress status. -t THREADS --threads=THREADS Number of threads to use for parallelization. 0 allocates a thread per file. [default: 0] -v --verbose Enable log output. Can be specified up to three times (increasing verbosity each time). Examples: hdfscli -a prod /user/foo hdfscli download features.avro dat/ hdfscli download logs/1987-03-23 - >>logs hdfscli upload -f - data/weights.tsv <weights.tsv HdfsCLI exits with return status 1 if an error occurred and 0 otherwise.
要使用hdfscli,首先需要设置hdfscli的默认配置文件
# cat ~/.hdfscli.cfg [global] default.alias = dev [dev.alias] url = http://hadoop:50070 user = root
python可用的客户端类:
InsecureClient(default)
TokenClient
上传或下载文件
使用hdfscli上传文件或文件夹(将hadoop文件夹上传到/hdfs)
# hdfscli upload --alias=dev -f /hadoop-2.4.1/etc/hadoop/ /hdfs
使用hdfscli下载/logs目录到操作系统的/root/test目录下
# hdfscli download /logs /root/test/
hdfscli 交互模式
[root@hadoop ~]# hdfscli --alias=dev Welcome to the interactive HDFS python shell. The HDFS client is available as `CLIENT`. >>> CLIENT.list("/") [u'Demo', u'hdfs', u'logs', u'logss'] >>> CLIENT.status("/Demo") {u'group': u'supergroup', u'permission': u'755', u'blockSize': 0, u'accessTime': 0, u'pathSuffix': u'', u'modificationTime': 1495123035501L, u'replication': 0, u'length': 0, u'childrenNum': 1, u'owner': u'root', u'type': u'DIRECTORY', u'fileId': 16389} >>> CLIENT.delete("logs/install.log") False >>> CLIENT.delete("/logs/install.log") True
与python接口的绑定
初始化客户端
1、导入client类,然后调用它的构造函数
>>> from hdfs import InsecureClient >>> client = InsecureClient("http://172.10.236.21:50070",user='ann') >>> client.list("/") [u'Demo', u'hdfs', u'logs', u'logss']
2、导入config类,加载一个已存在的配置文件并且从已存在的alias创建一个client,配置文件默认的读取文件为~/.hdfs_config.cfg
>>> from hdfs import Config >>> client=Config().get_client("dev") >>> client.list("/") [u'Demo', u'hdfs', u'logs', u'logss']
读文件
read()方法可从hdfs系统读取一个文件,但是它必须放在with块中,以确保每次都能正确关闭连接
>>> with client.read("/logs/yarn-env.sh",encoding="utf-8") as reader: ... features=reader.read() ... >>> print features
chunk_size参数将返回一个生成器,它使文件的内容变成流数据
>>> with client.read("/logs/yarn-env.sh",chunk_size=1024) as reader: ... for chunk in reader: ... print chunk ...
delimiter参数同样返回一个生成器,文件内容是被指定符号分隔的
>>> with client.read("/logs/yarn-env.sh", encoding="utf-8", delimiter=" ") as reader: ... for line in reader: ... time.sleep(1) ... print line
写文件
write方法用于写文件到hdfs(将本地文件kong.txt写入hdfs的/logs/kongtest.txt文件中)
>>> with open("/root/test/kong.txt") as reader, client.write("/logs/kongtest.txt") as writer: ... for line in reader: ... if line.startswith("-"): ... writer.write(line)