• Linux系统运维之Hadoop、Hive、Flume数据处理


    配置环境

    主机名
    IP 备注
    Hadoop-Data01
    192.168.0.194
    Hadoop-Master/Hive/Mysql/Flume-Agent
    Hadoop-Data02
    192.168.0.195
    Hadoop-Slave
    软件版本:
    CentOS release 6.6 (Final)
    Hdk-8u131-linux-x64
    Hadoop-2.7.3
    Hive-2.1.1
    Apache-flume-1.7.0-bin
    下载JDK、Hadoop、Hive、Flume:
    [root@Hadoop-Data01 soft]# wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
    [root@Hadoop-Data01 soft]# wget http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz
    [root@Hadoop-Data01 soft]# wget http://apache.fayea.com/hive/hive-2.1.1/apache-hive-2.1.1-bin.tar.gz

    Hadoop部署

    修改主机名,/etc/hosts文件,确保各主机DNS解析正确:
    [root@Hadoop-Data01 ~]# cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.0.194   Hadoop-Data01
    192.168.0.195   Hadoop-Data02
    192.168.0.196   Hadoop-Data03
    注:Slave服务器内容同上.

      配置Hadoop-Master、Hadoop-Slave主机间的免key登录:

    [root@Hadoop-Data01 ~]# vim /etc/ssh/sshd_config
    RSAAuthentication yes
    PubkeyAuthentication yes
    注:这里可以通过sed:sed -i '47,48s/^#//g' /etc/ssh/sshd_config
    [root@Hadoop-Data01 ~]# ssh-keygen -t rsa
    [root@Hadoop-Data01 .ssh]# cat authorized_keys 
    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA2JGjCEwc+H3/5Y939DHSkhHYAO7qPjO86gyaqvlN2j1ZMUhdKhXUmTH0pBBwXIqp9jooTXxtIu55cuBvOeBD6eUKN5mH9rydRIXm8HEvb9nQzOvVghP1E9lBTGsGXkUWDo0KPkFYOhb2NguYibzVUgpUpAt0NY5iqdenXNqvDOWGhWqDsg/C6VnUzsxskiT9x2EROhddWQnYsObXxjOasgdGPngzZsJZPchRboS+HfvVF0uSyUjljtKsQqYOX2Nt0plO4t6VlcnZXvjDXKezJCNwGToFvvoiIHnjVu/akgtv/bpd8HZp1dZEj7cYnSFkqN5xdodg7TmtjAjobutU5Q== root@Hadoop-Data01
    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAvQ3JZOtdfFvrsM/m6YwQQuGkOCpNt0+tw87tS4p1gB98ZAn+zaUnFMw5Gvo0i1KvHVaxmb0s1gqDjGDNVLQM5MB60emyVFHLs6DZBI5f4c0BiA17KfDRzlsfuTmuLdymmoj54OhPbEcH+mwo/N1UK9V0gqxAB9abC6UFT00MXXXJN1+qBkV9mUuFbXhn4m5/DCoEbIxvMlWghAsSrDtMaMtJYRumRvd7MLwwefdCYyQd8dZASE1Z8VP0K/BDRntWXCeKGCVMb4uJAnSdhN6ZcRme/Qlx0YCkPpQir3jgcblVW5RODNUyaIc+vUMp9UYagvK7nKKfWAGa/MPdyfu2nw== root@Hadoop-Data02
    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA1pC5Py1aqbojVetakak3WmxJf4DgmTe1ci60tn9Hyq84kdAhw7z1lAQN544uPDDvl4XPki36Y13Hjl0P+S3g11iOi42FRugkBDmokqADZrUfp5tqWX8K9QvYMePoyiuQlnrGAyCpOiMmEAykBR6lVkNHgPAWThjU9eggt6dalMPiy/dDKZNemlWGHy8wdS5PyjVsIuDGgTtNLADn6OOaYcO/UWq78gqc1Nkq4mNxKSTYorh7taki9SKw4cq0NeggDFz7cZEewtgJdRla0W2ZKz8bgfuUSSntbN55/uCVUSgK+kurqRmklQ3sA3c9687BH1Lse5luDFJRaYo2wa5nlQ== root@Hadoop-Data03
    注:合并三台服务器/root/.ssh/id_rsa.pub文件到authorized_keys
    [root@Hadoop-Data01 .ssh]# scp authorized_keys root@192.168.0.195:/root/.ssh/
    [root@Hadoop-Data01 .ssh]# scp authorized_keys root@192.168.0.196:/root/.ssh/

      在各个主机上安装JDK

    [root@Hadoop-Data01 soft]# tar -xf jdk-8u131-linux-x64.tar.gz
    [root@Hadoop-Data01 soft]# cp -r jdk1.8.0_131 /usr/local/
    [root@Hadoop-Data01 soft]# cd /usr/local/
    [root@Hadoop-Data01 local]# ln -s jdk1.8.0_131 jdk
    [root@Hadoop-Data01 ~]# vim /etc/profile
    >>>>>
    ulimit -n 10240
    export JAVA_HOME=/usr/local/jdk
    export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
    export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
    [root@Hadoop-Data01 ~]# source /etc/profile
    [root@Hadoop-Data03 ~]# java -version
    java version "1.8.0_131"
    Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

     安装hadoop

    /usr/local/hadoop/etc/hadoop/core-site.xml配置文件:
    [root@Hadoop-Data01 soft]# tar -xf hadoop-2.7.3.tar.gz
    [root@Hadoop-Data01 soft]# mv hadoop-2.7.3 /usr/local/
    [root@Hadoop-Data01 soft]# cd /usr/local/
    [root@Hadoop-Data01 local]# ln -s hadoop-2.7.3 hadoop
    [root@Hadoop-Data01 hadoop]# vim core-site.xml
    >>>>>
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://192.168.0.194:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>file:/usr/local/hadoop/tmp</value>
        </property>
        <property>
            <name>io.file.buffer.size</name>
            <value>1024</value>
        </property>
    </configuration>
    注:
    <fs.defaultFS>:默认文件系统的名称,URI形式,uri的scheme需要由(fs.SCHEME.impl)指定文件系统实现类;uri的authority部分用来指定host、port等;默认是本地文件系统。HA方式,这里设置服务名,例如:hdfs:// 192.168.0.194:9000,HDFS的客户端访问HDFS需要此参数;
    <hadoop.tmp.dir>:Hadoop的临时目录,其它目录会基于此路径,本地目录。只可以设置一个值;建议设置到一个足够空间的地方,而不是默认的/tmp下,服务端参数,修改需重启;
    <io.file.buffer.size>:在读写文件时使用的缓存大小,这个大小应该是内存Page的倍数,建议1M。
    
    ----------
    
    /usr/local/hadoop/etc/hadoop/hdfs-site.xml配置文件:
    [root@Hadoop-Data01 hadoop]# vim hdfs-site.xml
    >>>>>
    <configuration>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/usr/local/hadoop/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/usr/local/hadoop/dfs/data</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>192.168.0.194:9001</value>
        </property>
        <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
        </property>
    </configuration>
    注:
    <dfs.namenode.name.dir>:本地磁盘目录,NN存储fsimage文件的地方;可以是按逗号分隔的目录列表,fsimage文件会存储在全部目录,冗余安全;这里多个目录设定,最好在多个磁盘,另外,如果其中一个磁盘故障,不会导致系统故障,会跳过坏磁盘。由于使用了HA,建议仅设置一个,如果特别在意安全,可以设置2个;
    <dfs.datanode.data.dir>:本地磁盘目录,HDFS数据应该存储Block的地方。可以是逗号分隔的目录列表(典型的,每个目录在不同的磁盘),这些目录被轮流使用,一个块存储在这个目录,下一个块存储在下一个目录,依次循环;每个块在同一个机器上仅存储一份,不存在的目录被忽略;必须创建文件夹,否则被视为不存在;
    <dfs.replication>:数据块副本数,此值可以在创建文件是设定,客户端可以只有设定,也可以在命令行修改;不同文件可以有不同的副本数,默认值用于未指定时。
    <dfs.namenode.secondary.http-address>:SNN的http服务地址,如果是0,服务将随机选择一个空闲端口,使用了HA后,就不再使用SNN;
    <dfs.webhdfs.enabled>:在NN和DN上开启WebHDFS (REST API)功能。
    
    ----------
    
    /usr/local/hadoop/etc/hadoop/mapred-site.xml配置文件:
    [root@Hadoop-Data01 hadoop]# cp mapred-site.xml.template mapred-site.xml
    [root@Hadoop-Data01 hadoop]# vim mapred-site.xml
    >>>>>
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>192.168.0.194:10020</value>
        </property>
        <property>
           <name>mapreduce.jobhistory.webapp.address</name>
            <value>192.168.0.194:19888</value>
        </property>
    </configuration>
    
    /usr/local/hadoop/etc/hadoop/yarn-site.xml配置文件:
    [root@Hadoop-Data01 hadoop]# vim yarn-site.xml
    >>>>>
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>  
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>192.168.0.194:8032</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>192.168.0.194:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>192.168.0.194:8031</value>
        </property>
        <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>192.168.0.194:8033</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>192.168.0.194:8088</value>
        </property>
        <property>
            <name>yarn.nodemanager.resource.memory-mb</name>
            <value>8192</value>
        </property>
    </configuration>
    注:
    <mapreduce.framework.name>:MapReduce按照任务大小和设置的不同,提供了两种任务模式:①本地模式(LocalJobRunner实现)mapreduce.framework.name设置为local,则不会使用YARN集群来分配资源,在本地节点执行。在本地模式运行的任务,无法发挥集群的优势。在web UI是查看不到本地模式运行的任务。②Yarn模式(YARNRunner实现)mapreduce.framework.name设置为yarn,当客户端配置mapreduce.framework.name为yarn时, 客户端会使用YARNRunner与服务端通信, 而YARNRunner真正的实现是通过ClientRMProtocol与RM交互, 包括提交Application, 查询状态等功能。
    <mapreduce.jobhistory.address>和<mapreduce.jobhistory.webapp.address>:Hadoop自带了一个历史服务器,可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。
    
    ----------
    
    配置hadoop环境变量:
    [root@Hadoop-Data01 hadoop]# vim /etc/profile
    >>>>>
    export HADOOP_HOME=/usr/local/hadoop
    export PATH=$HADOOP_HOME/bin:$PATH
    
    [root@Hadoop-Data01 hadoop]# vim hadoop-env.sh
    >>>>>
    export JAVA_HOME=/usr/local/jdk
    
    添加从节点IP到Slave文件:
    [root@Hadoop-Data01 hadoop]# echo > slave && echo 192.168.0.195 > slave
    
    拷贝hadoop服务目录到从主机:
    [root@Hadoop-Data01 local]# scp -r hadoop-2.7.3 root@192.168.0.195:/usr/local/
    
    进入Hadoop目录,启动Hadoop-Master主机上的服务:
    ①初始化:
    [root@Hadoop-Data01 bin]# sh /usr/local/hadoop/bin/hdfs namenode -format
    ②启动服务:
    [root@Hadoop-Data01 sbin]# sh /usr/local/hadoop/sbin/start-all.sh
    ③关闭服务:
    [root@Hadoop-Data01 sbin]# sh /usr/local/hadoop/sbin/stop-all.sh
    ④查看组件:
    [root@Hadoop-Data01 sbin]# jps
    6517 SecondaryNameNode
    6326 NameNode
    6682 ResourceManager
    6958 Jps

     测试访问OK

    浏览器访问:http://192.168.0.194:8088/
    

    浏览器访问:http://192.168.0.194:50070/
    

     

    部署Hive

     解压部署、配置环境变量:

    [root@Hadoop-Data01 soft]# tar -xf  apache-hive-2.1.1-bin.tar.gz
    [root@Hadoop-Data01 soft]# mv apache-hive-2.1.1-bin /usr/local/
    [root@Hadoop-Data01 soft]# cd /usr/local/
    [root@Hadoop-Data01 local]# ln -s apache-hive-2.1.1-bin hive
    [root@Hadoop-Data01 conf]# cp hive-env.sh.template hive-env.sh
    [root@Hadoop-Data01 conf]# vim hive-env.sh
    >>>>>
    HADOOP_HOME=/usr/local/hadoop
    export HIVE_CONF_DIR=/usr/loca/hive/conf
    export HIVE_AUX_JARS_PATH=/usr/loca/hive/lib

     安装部署mysql环境

    [root@Hadoop-Data01 conf]# yum install httpd php mysql mysql-server php-mysql -y
    [root@Hadoop-Data01 conf]# /usr/bin/mysqladmin -u root password 'hadoopmysql'
    [root@Hadoop-Data01 conf]# /usr/bin/mysqladmin -u root -h192.168.0.194 password 'hadoopmysql'
    [root@Hadoop-Data01 conf]# mysql -uroot -phadoopmysql
    mysql> create user 'hive' identified by 'hive';
    mysql> grant all privileges on *.* to 'hive'@'localhost' identified by 'hive';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> grant all privileges on *.* to 'hive'@'%' identified by 'hiveycfw';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> flush privileges;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> create database hive;
    Query OK, 1 row affected (0.00 sec)

     修改HIVE配置文件:

    [root@Hadoop-Data01 conf]# vim hive-site.xml
    44行:>>>>>
    <name>hive.exec.local.scratchdir</name>
    <value>/usr/local/hive/iotmp</value>
    批量替换::%s/${system:java.io.tmpdir}//usr/local/hive/iotmp/g
    486行:>>>>>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    501行:>>>>>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    686行:>>>>>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    933行:>>>>>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    957行:>>>>>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    
    拷贝JDBC驱动到lib目录下:
    [root@Hadoop-Data01 mysql-connector-java-5.1.42]# cp mysql-connector-java-5.1.42-bin.jar  /usr/local/hive/lib/
    
    精简版hive-site.xml:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <property>
            <name>javax.jdo.option.ConnectionURL</name>     #数据库连接串
            <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>  #JDBC驱动
            <value>com.mysql.jdbc.Driver</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionUserName</name>    #数据库账号
            <value>hive</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>    #数据库密码
            <value>hive</value>
        </property>
        <property>
            <name>hive.metastore.warehouse.dir</name>       #该参数指定了 Hive 的数据存储目录,默认位置在 HDFS 上面的 /user/hive/warehouse 路径下
            <value>/user/hive/warehouse</value>
        </property>
        <property>
            <name>hive.exec.scratchdir</name>       #该参数指定了 Hive 的数据临时文件目录,默认位置为 HDFS 上面的 /tmp/hive 路径下
            <value>/tmp/hive</value>
        </property>
    </configuration>

     初始化Mysql

    [root@Hadoop-Data01 bin]# schematool -initSchema -dbType mysql      #初始化完成后,mysql数据库中会增加hive库
    which: no hbase in (/usr/local/hive/bin:/usr/local/hive/conf:/usr/local/hadoop/bin:/usr/local/jdk//bin:/usr/local/jdk//jre/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:    jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
    Metastore Connection Driver :    com.mysql.jdbc.Driver
    Metastore connection User:   hive
    Starting metastore schema initialization to 2.1.0
    Initialization script hive-schema-2.1.0.mysql.sql
    Initialization script completed
    schemaTool completed

     启动Hive

    [root@Hadoop-Data01 bin]# ./hive
    Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
    hive>
    hive> show functions;   #查看hive函数;
    hive> desc function day;    #查看day函数详细信息;
    OK
    day(param) - Returns the day of the month of date/timestamp, or day component of interval
    Time taken: 0.039 seconds, Fetched: 1 row(s)

    部署Flume

    一、简介

    1. flume是分布式的日志收集系统,把收集来的数据传送到目的地去。
    2. flume里面有个核心概念,叫做agent。agent是一个java进程,运行在日志收集节点。
    3. agent里面包含3个核心组件:source、channel、sink。 source组件是专用于收集日志的,可以处理各种类型各种格式的日志数据,包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy、自定义。 source组件把数据收集来以后,临时存放在channel中。 channel组件是在agent中专用于临时存储数据的,可以存放在memory、jdbc、file、自定义。 channel中的数据只有在sink发送成功之后才会被删除。 sink组件是用于把数据发送到目的地的组件,目的地包括hdfs、logger、avro、thrift、ipc、file、null、hbase、solr、自定义。
    4. 在整个数据传输过程中,流动的是event。事务保证是在event级别。
    5. flume可以支持多级flume的agent,支持扇入(fan-in)、扇出(fan-out)。

    二、安装

     解压flume文件,传输到/usr/local/下(安装到hadoop服务器):

    [root@Hadoop-Data01 soft]# cp -r apache-flume-1.7.0-bin /usr/local/
    [root@Hadoop-Data01 soft]# cd /usr/local/
    [root@Hadoop-Data01 local]# ln -s apache-flume-1.7.0-bin flume
    [root@Hadoop-Data01 conf]# cp flume-env.sh.template flume-env.sh
    [root@Hadoop-Data01 conf]# vim flume-env.sh
    >>>>:
    export JAVA_HOME=/usr/local/jdk
    [root@Hadoop-Data01 bin]# ./flume-ng version
    Flume 1.7.0
    Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
    Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707
    Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016
    From source with checksum 0d21b3ffdc55a07e1d08875872c00523

     下载flume服务到需要采集的服务器,这里是windows,然后配置/conf/flume-conf.properties:

    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = spooldir
    a1.sources.r1.channels = c1
    a1.sources.r1.spoolDir = D:\flume\log     #收集这个目录下的文件
    a1.sources.r1.fileHeader = true
    a1.sources.r1.basenameHeader = true
    a1.sources.r1.basenameHeaderKey = fileName
    a1.sources.r1.ignorePattern = ^(.)*\.tmp$
    a1.sources.r1.interceptors = i1
    a1.sources.r1.interceptors.i1.type = timestamp
    
    a1.sinks.k1.type = avro
    a1.sinks.k1.hostname = 192.168.0.194        #接受agent端地址
    a1.sinks.k1.port = 19949
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type=memory  
    a1.channels.c1.capacity=10000  
    a1.channels.c1.transactionCapacity=1000  
    a1.channels.c1.keep-alive=30  
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

     启动采集端服务,windows端:

    D:apache-flume-1.7.0-binin>flume-ng.cmd agent --conf ..conf --conf-file ..confflume-conf.properties --name a1

     配置Linux端,agent配置/conf/flume-conf.properties:

    tier1.sources=source1
    tier1.channels=channel1
    tier1.sinks=sink1
    
    tier1.sources.source1.type=avro
    tier1.sources.source1.bind=192.168.0.194    #flume接收端地址
    tier1.sources.source1.port=19949
    tier1.sources.source1.channels=channel1
    
    
    tier1.channels.channel1.type=memory
    tier1.channels.channel1.capacity=10000
    tier1.channels.channel1.transactionCapacity=1000
    tier1.channels.channel1.keep-alive=30
    
    tier1.sinks.sink1.channel=channel1
    
    tier1.sources.source1.interceptors=e1 e2
    tier1.sources.source1.interceptors.e1.type=com.huawei.flume.InterceptorsCommons$Builder
    tier1.sources.source1.interceptors.e2.type=com.huawei.flume.InterceptorsFlows$Builder
    
    tier1.sinks.sink1.type = hdfs
    tier1.sinks.sink1.hdfs.path=hdfs://192.168.0.194:9000/user/hive/warehouse/%{table_name}/inputdate=%Y-%m-%d      #flume接受端agent,hive表名
    tier1.sinks.sink1.hdfs.writeFormat = Text
    tier1.sinks.sink1.hdfs.fileType = DataStream
    tier1.sinks.sink1.hdfs.fileSuffix = .log
    tier1.sinks.sink1.hdfs.rollInterval = 0
    tier1.sinks.sink1.hdfs.rollSize = 0
    tier1.sinks.sink1.hdfs.rollCount = 0
    tier1.sinks.sink1.hdfs.useLocalTimeStamp = true
    tier1.sinks.sink1.hdfs.idleTimeout = 60
    tier1.sinks.sink1.hdfs.rollSize = 125829120
    tier1.sinks.sink1.hdfs.minBlockReplicas = 1

     启动Linux端,agent服务:

    [root@Hadoop-Data01 conf]# flume-ng agent -c /usr/local/flume/conf/ -f /usr/local/flume/conf/flume-conf.properties -n tier1 -Dflume.root.logger=DEBUG,console
     

    本文来自博客园,作者:白日梦想家Zz,转载请注明原文链接:https://www.cnblogs.com/zzlain/p/6895346.html

  • 相关阅读:
    JSON以及Java转换JSON的方法(前后端经常使用处理方法)
    让cocos2dx支持并通过arm64 编译
    matlab7安装后的常见问题
    Open SSH原理
    Bringing up interface eth0: Device eth0 does not seem to be present, delaying initialization
    12C -- 配置EM Express的端口
    ORA-16179: incremental changes to "log_archive_dest_1" not allowed with SPFILE
    ORA-16019: cannot use LOG_ARCHIVE_DEST_1 with LOG_ARCHIVE_DEST or LOG_ARCHIVE_DUPLEX_DEST
    11g新特性-SQL Plan Management
    11g新特性-自动sql调优(Automatic SQL Tuning)
  • 原文地址:https://www.cnblogs.com/zzlain/p/6895346.html
Copyright © 2020-2023  润新知