• hdfs操作手册


    hdfscli 命令行

    # hdfscli --help
    HdfsCLI: a command line interface for HDFS.
    
    Usage:
      hdfscli [interactive] [-a ALIAS] [-v...]
      hdfscli download [-fsa ALIAS] [-v...] [-t THREADS] HDFS_PATH LOCAL_PATH
      hdfscli upload [-sa ALIAS] [-v...] [-A | -f] [-t THREADS] LOCAL_PATH HDFS_PATH
      hdfscli -L | -V | -h
    
    Commands:
      download                      Download a file or folder from HDFS. If a
                                    single file is downloaded, - can be
                                    specified as LOCAL_PATH to stream it to
                                    standard out.
      interactive                   Start the client and expose it via the python
                                    interpreter (using iPython if available).
      upload                        Upload a file or folder to HDFS. - can be
                                    specified as LOCAL_PATH to read from standard
                                    in.
    
    Arguments:
      HDFS_PATH                     Remote HDFS path.
      LOCAL_PATH                    Path to local file or directory.
    
    Options:
      -A --append                   Append data to an existing file. Only supported
                                    if uploading a single file or from standard in.
      -L --log                      Show path to current log file and exit.
      -V --version                  Show version and exit.
      -a ALIAS --alias=ALIAS        Alias of namenode to connect to.
      -f --force                    Allow overwriting any existing files.
      -s --silent                   Don't display progress status.
      -t THREADS --threads=THREADS  Number of threads to use for parallelization.
                                    0 allocates a thread per file. [default: 0]
      -v --verbose                  Enable log output. Can be specified up to three
                                    times (increasing verbosity each time).
    
    Examples:
      hdfscli -a prod /user/foo
      hdfscli download features.avro dat/
      hdfscli download logs/1987-03-23 - >>logs
      hdfscli upload -f - data/weights.tsv <weights.tsv
    
    HdfsCLI exits with return status 1 if an error occurred and 0 otherwise.
    

      

    要使用hdfscli,首先需要设置hdfscli的默认配置文件

    # cat ~/.hdfscli.cfg 
    [global]
    default.alias = dev
    
    [dev.alias]
    url = http://hadoop:50070
    user = root
    

      python可用的客户端类:

        InsecureClient(default)

        TokenClient

     上传或下载文件

    使用hdfscli上传文件或文件夹(将hadoop文件夹上传到/hdfs)

      # hdfscli upload --alias=dev -f /hadoop-2.4.1/etc/hadoop/ /hdfs

    使用hdfscli下载/logs目录到操作系统的/root/test目录下  

      # hdfscli download /logs /root/test/

    hdfscli 交互模式

    [root@hadoop ~]# hdfscli --alias=dev
    
    Welcome to the interactive HDFS python shell.
    The HDFS client is available as `CLIENT`.
    
    >>> CLIENT.list("/")
    [u'Demo', u'hdfs', u'logs', u'logss']
    >>> CLIENT.status("/Demo")  
    {u'group': u'supergroup', u'permission': u'755', u'blockSize': 0,
     u'accessTime': 0, u'pathSuffix': u'', u'modificationTime': 1495123035501L, 
     u'replication': 0, u'length': 0, u'childrenNum': 1, u'owner': u'root', 
     u'type': u'DIRECTORY', u'fileId': 16389}
    >>> CLIENT.delete("logs/install.log")
    False
    >>> CLIENT.delete("/logs/install.log")         
    True
    

      

    与python接口的绑定

      初始化客户端

      1、导入client类,然后调用它的构造函数

    >>> from hdfs import InsecureClient
    >>> client = InsecureClient("http://172.10.236.21:50070",user='ann')
    >>> client.list("/")
    [u'Demo', u'hdfs', u'logs', u'logss']
    

      2、导入config类,加载一个已存在的配置文件并且从已存在的alias创建一个client,配置文件默认的读取文件为~/.hdfs_config.cfg

    >>> from hdfs import Config
    >>> client=Config().get_client("dev")
    >>> client.list("/")   
    [u'Demo', u'hdfs', u'logs', u'logss']
    

      

      读文件

      read()方法可从hdfs系统读取一个文件,但是它必须放在with块中,以确保每次都能正确关闭连接

    >>> with client.read("/logs/yarn-env.sh",encoding="utf-8") as reader:
    ...   features=reader.read()
    ... 
    >>> print features
    

      chunk_size参数将返回一个生成器,它使文件的内容变成流数据

    >>> with client.read("/logs/yarn-env.sh",chunk_size=1024) as reader:
    ...   for chunk in reader:
    ...      print chunk
    ... 
    

      delimiter参数同样返回一个生成器,文件内容是被指定符号分隔的

    >>> with client.read("/logs/yarn-env.sh", encoding="utf-8", delimiter="
    ") as reader:
    ...   for line in reader:
    ...     time.sleep(1)
    ...     print line
    

      写文件

    write方法用于写文件到hdfs(将本地文件kong.txt写入hdfs的/logs/kongtest.txt文件中)

    >>> with open("/root/test/kong.txt") as reader, client.write("/logs/kongtest.txt") as writer:
    ...   for line in reader:
    ...     if line.startswith("-"):
    ...       writer.write(line)
    

      

  • 相关阅读:
    Android Jetpack之WorkManager: 观察结果
    解决'androidx.arch.core:core-runtime' has different version for the compile (2.0.0) and runtime (2.0.1)
    我要研究一下minio,管理大量的照片
    分发消息的写法
    百度地图坐标转换
    HighChart 实现从后台取数据来实时更新柱状和折线组图
    导出Excel
    Java 8新特性之集合
    java中的Switch case语句
    提问:"~"运算符
  • 原文地址:https://www.cnblogs.com/kongzhagen/p/6877472.html
Copyright © 2020-2023  润新知