大数据实训

实训内容

本次实训，是以大数据基础知识为主要实训内容，结合目前主流框架。（ Springboot 框架 + 前端html、css、js）
a、环境搭建
b、本次实训主要讲解Spark框架如何处理数据
1、离线数据的处理
2、实时数据的处理
c、主要Java后端开发框架Springboot框架+echarts作为框架整合
1、Springboot框架的基础应用
2、数据可视化
d、项目打包和项目运行
1、Maven作为本次实训项目管控工具，项目jar包管控、项目打包、项目部署
2、项目答辩

环境搭建

a、hadoop集群搭建
b、Flume环境搭建
1.官网下载安装包
2.配置Flume环境变量

解压flume软件包：
[root@master ~]# tar -zxvf /soft/apache-flume-1.7.0-bin.tar.gz -C /opt

配置flume的环境变量：
[root@master ~]# vi /etc/profile

#配置flume的环境变量
export FLUME_HOME=/opt/apache-flume-1.7.0-bin
export PATH=$JAVA_HOME/bin:$FLUME_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH:$HOME/bin

生效环境变量：
[root@master ~]# source /etc/profile
[root@master ~]# flume-ng version
Flume 1.7.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707
Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016
From source with checksum 0d21b3ffdc55a07e1d08875872c00523

Flume 是一个分布式、可靠、和高可用的海量日志采集、聚和和传输的系统。可以理解为一个Agent，分为 source、channel、sink 三部分，将 数据源 通过管道下沉到 目的地。

Natcat案例

Natcat基于socket

先下载natcat软件：
[root@master ~]# yum install nc -y     -----yum迅雷下，intall安装的 -y表示一路yes

nc的基础用法：
服务器端侦听44444端口
[root@master ~]# nc -kl 44444          -----已经开始监听这个44444端口

客户端访问端口44444
[root@master ~]# nc master 44444
hello world

相当于服务器端可以直接跟客户端进行对话
简单案例：
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

案例：
创建一个flume-conf文件夹，专门用来存放flume文件
[root@master ~]# mkdir ~/flume-conf

创建一个flume执行文件：
[root@master ~]# vi ~/flume-conf/nc-flume-logger.properties
[root@master ~]# cat ~/flume-conf/nc-flume-logger.properties 
# flume的本质就是agent,a1代表agent别名
#定义sources
a1.sources = r1
#定义sinks
a1.sinks = k1
#定义channels
a1.channels = c1

# 数据源的描述，配置数据源
a1.sources.r1.type = netcat
a1.sources.r1.bind = master
a1.sources.r1.port = 44444

# 下沉的描述，配置sinks
a1.sinks.k1.type = logger

# 管道的配置，表示管道是memory，内存
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定source和channel
a1.sources.r1.channels = c1
# 绑定sinks和channel
a1.sinks.k1.channel = c1

执行flume进行数据采集：
[root@master ~]# flume-ng agent -c /opt/apache-flume-1.7.0-bin/conf/ -n a1 -f ~/flume-conf/nc-flume-logger.properties -Dflume.root.logger=INFO,console

客户端：
[root@master ~]# nc master 44444
hello flume
OK
hello hadoop

关闭flume的agent，ctrl+c 进行关闭

相关阅读:
1
webpack
webpack32
41324
124
CSS 32
Git 分支管理
Git 标签管理
datetime的timedelta对象
unittest中的testCase执行顺序

原文地址：https://www.cnblogs.com/thx2199/p/16416310.html