• Hadoop通过c语言API访问hdfs


    Hadoop给我们提供了使用c语言访问hdfsAPI,下面进行简要介绍:

    环境:ubuntu14.04  hadoop1.0.1  jdk1.7.0_51

    访问hdfs的函数主要定义在hdfs.h文件中,该文件位于hadoop-1.0.1/src/c++/libhdfs/文件夹下,而相应的库文件是位于hadoop-1.0.1/c++/Linux-amd64-64/lib/目录下的libhdfs.so,另外要访问hdfs还需要依赖jdk的相关API,头文件目录包括jdk1.7.0_51/include/和jdk1.7.0_51/include/linux/,库文件为jdk1.7.0_51/jre/lib/amd64/server/目录下的libjvm.so,这些库和包含目录都要在编译连接时给出。下面是一段简单的源程序main.c

      1 #include <stdio.h>
      2 
      3 #include <stdlib.h>
      4 
      5 #include <string.h>
      6 
      7 #include "hdfs.h"
      8 
      9  
     10 
     11 int main(int argc, char **argv)
     12 
     13 {
     14 
     15     /*
     16 
     17      * Connection to hdfs.
     18 
     19      */
     20 
     21     hdfsFS fs = hdfsConnect("127.0.0.1", 9000);
     22 
     23     if(!fs)
     24 
     25     {
     26 
     27         fprintf(stderr, "Failed to connect to hdfs.
    ");
     28 
     29         exit(-1);
     30 
     31     }
     32 
     33     /*
     34 
     35      * Create and open a file in hdfs.
     36 
     37      */
     38 
     39     const char* writePath = "/user/root/output/testfile.txt";
     40 
     41     hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0);
     42 
     43     if(!writeFile)
     44 
     45     {
     46 
     47         fprintf(stderr, "Failed to open %s for writing!
    ", writePath);
     48 
     49         exit(-1);
     50 
     51     }
     52 
     53     /*
     54 
     55      * Write data to the file.
     56 
     57      */
     58 
     59     const char* buffer = "Hello, World!";
     60 
     61     tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1);
     62 
     63  
     64 
     65     /*
     66 
     67      * Flush buffer.
     68 
     69      */
     70 
     71     if (hdfsFlush(fs, writeFile))
     72 
     73     {
     74 
     75         fprintf(stderr, "Failed to 'flush' %s
    ", writePath);
     76 
     77         exit(-1);
     78 
     79     }
     80 
     81  
     82 
     83     /*
     84 
     85      * Close the file.
     86 
     87      */
     88 
     89     hdfsCloseFile(fs, writeFile);
     90 
     91  
     92 
     93     unsigned bufferSize=1024;
     94 
     95     const char* readPath = "/user/root/output/testfile.txt";
     96 
     97     hdfsFile readFile = hdfsOpenFile(fs, readPath, O_RDONLY, bufferSize, 0, 0);
     98 
     99     if (!readFile) {
    100 
    101         fprintf(stderr,"couldn't open file %s for reading
    ",readPath);
    102 
    103         exit(-2);
    104 
    105     }
    106 
    107     // data to be written to the file
    108 
    109     char* rbuffer = (char*)malloc(sizeof(char) * (bufferSize+1));
    110 
    111     if(rbuffer == NULL) {
    112 
    113         return -2;
    114 
    115     }
    116 
    117  
    118 
    119     // read from the file
    120 
    121     tSize curSize = bufferSize;
    122 
    123     for (; curSize == bufferSize;) {
    124 
    125         curSize = hdfsRead(fs, readFile, (void*)rbuffer, curSize);
    126 
    127         rbuffer[curSize]='';
    128 
    129         fprintf(stdout, "read '%s' from file!
    ", rbuffer);
    130 
    131     }
    132 
    133  
    134 
    135     free(rbuffer);
    136 
    137     hdfsCloseFile(fs, readFile);
    138 
    139     /*
    140 
    141      * Disconnect to hdfs.
    142 
    143      */
    144 
    145     hdfsDisconnect(fs);
    146 
    147  
    148 
    149     return 0;
    150 
    151 }

    程序比较简单,重要的地方都有注释,这里就不一一解释了。程序所实现的主要功能为在hdfs/user/root/output/目录下新建一名称为testfile.txt的文件,并写入Hello, World!,然后将Hello, World!从该文件中读出并打印出来。如果你的hdfs中没有/user/root/output/目录,则需要你新建一个或将路径改为一个存在的路径。

    下面给出我系统中的编译连接指令:

    g++ main.cpp -I /root/hadoop-1.0.1/src/c++/libhdfs/ -I /usr/java/jdk1.7.0_51/include/ -I /usr/java/jdk1.7.0_51/include/linux/ -L /root/hadoop-1.0.1/c++/Linux-amd64-64/lib/ -lhdfs -L /usr/java/jdk1.7.0_51/jre/lib/amd64/server/ -ljvm -o hdfs-test

    其中,g++为编译指令,-I后面的是头文件包含路径,-L后面的是要连接的库文件路径-lhdfs-ljvm是要连接的具体库名称。具体路径需要换成你系统中的相应路径。至此,编译应该就可以完成了。但运行时回报找不到libhdfs.so.0libjvm.so。解决办法是将相应库文件所在目录追加到到/etc/ld.so.conf文件中,然后执行ldconfig命令,这相当于在系统中注册了一下相应的库,运行时就不会找不到了。

  • 相关阅读:
    unittest learning
    C++类和对象
    Linux shell基础(十二)
    Linux shell基础(十一)
    Linux shell基础(十)
    Linux shell基础(九)
    Linux shell基础(八)
    Linux shell基础(六)
    Linux shell基础(七)
    Linux shell基础(五)
  • 原文地址:https://www.cnblogs.com/caoyingjie/p/3794250.html
Copyright © 2020-2023  润新知