• 【原创】大数据基础之Logstash(1)简介、安装、使用


    Logstash 6.6.2

    官方:https://www.elastic.co/products/logstash

    一 简介

    Centralize, Transform & Stash Your Data

    Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite “stash.” (Ours is Elasticsearch, naturally.)

    集中、转换、储存你的数据:logstash是一个开源的服务端数据处理管道,可以从非常多的数据源接受数据、转换格式、同时发送到你的数据仓库中;

    结构

    A Logstash pipeline has two required elements, input and output, and one optional element, filter

    1 INPUTS

    Ingest Data of All Shapes, Sizes, and Sources

    Data is often scattered or siloed across many systems in many formats. Logstash supports a variety of inputs that pull in events from a multitude of common sources, all at the same time. Easily ingest from your logs, metrics, web applications, data stores, and various AWS services, all in continuous, streaming fashion.

    接收任何形式、大小和来源的数据:数据通常以各种格式分散在各个系统中,logstash支持很多类型的input可以从各种数据源中将数据拉取过来,这些数据源包括日志、监控、web应用、数据存储等;

    2 FILTERS

    Parse & Transform Your Data On the Fly

    As data travels from source to store, Logstash filters parse each event, identify named fields to build structure, and transform them to converge on a common format for easier, accelerated analysis and business value.

    将你的数据进行解析并转换格式:当数据收集上来之后,logstash filter会解析每一条数据,识别数据格式,同时将数据转换为更通用的格,方便后续更简单快速的分析;

    最常用的filter包括grok(正则)和ruby(代码),另外还有mutate/date/json/kv,可以轻松解析你的任意数据;

    3 OUTPUTS

    Choose Your Stash, Transport Your Data

    While Elasticsearch is our go-to output that opens up a world of search and analytics possibilities, it’s not the only one available.

    选择你的数据仓库,移动你的数据:elasticsearch提供了无限的搜索和分析的可能性,但es并不是唯一的output;

    二 安装

    1 ambari安装

    详见:https://www.cnblogs.com/barneywill/p/10281678.html

    2 docker安装

    详见:https://www.cnblogs.com/barneywill/p/10367297.html

    3 手工tar安装

    $ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.6.2.tar.gz
    $ tar xvf logstash-6.6.2.tar.gz
    $ cd logstash-6.6.2

    logstash插件目录

    $LOGSTASH_HOME/vendor/bundle/jruby/2.3.0/gems/

    可以看到当前所有的插件以及对应的版本,手工查看和安装插件:

    $LOGSTASH_HOME/bin/logstash-plugin list
    $LOGSTASH_HOME/bin/logstash-plugin install logstash-input-jdbc

    插件源码:https://github.com/logstash-plugins

    4 手工yum安装

    # rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
    # yum install logstash

    注册服务:

    $ sudo /usr/share/logstash/bin/system-install /etc/logstash/startup.options systemd

    三 使用 

    1 调试filter

    调试grok

    http://grokdebug.herokuapp.com/

    内置grok pattern

    https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns

    调试ruby

    https://ruby.github.io/TryRuby/

     

    2 测试nginx日志的解析:file->grok->stdout

    nginx日志默认格式:

        #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
        #                  '$status $body_bytes_sent "$http_referer" '
        #                  '"$http_user_agent" "$http_x_forwarded_for"';

    nginx日志示例:

    1.119.132.168 - - [18/Mar/2019:09:13:50 +0000] "POST /cmf/services/6/healthStatusBar?timestamp=1552900429484&currentMode=true HTTP/1.1" 200 929 "http://some.server/cmf/services/6/instances" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36" "-"

    配置文件

    test.conf

    input {
        file {
            path => [ "/tmp/test.log" ]
            start_position => "beginning"
            ignore_older => 0
        }
    }
    filter {
        grok {
            match => { "message" => "%{IPORHOST:client_ip} (%{USER:ident}|-) (%{USER:auth}|-) [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} (%{URIPATHPARAM:request}|-)(?: HTTP/%{NUMBER:http_version})?|-)" (%{NUMBER:response}|-) (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{QS:x_forward_for}" }
        }
    }
    output {
        stdout {}
    }

    启动

    $LOGSTASH_HOME/bin/logstash -f /path/to/test.conf --path.data=/path/to/data --verbose --debug

    注意:如果一台机器上启动多个logstash,要通过--path.data来区分;通过--verbose --debug显示控制台中的output;

    测试

    $ head -5 /var/log/nginx/access.log >> /tmp/test.log

    更多input详见

    https://www.elastic.co/guide/en/logstash/current/input-plugins.html

    更多filter详见

    https://www.elastic.co/guide/en/logstash/current/filter-plugins.html

    最常用的filter:grok

    https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html

    最常用的filter:ruby

    https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html

    常用的filter:mutate

    https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html

    常用的filter:date

    https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html

    常用的filter:json

    https://www.elastic.co/guide/en/logstash/current/plugins-filters-json.html

    常用的filter:kv

    https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html

    更多output详见

    https://www.elastic.co/guide/en/logstash/current/output-plugins.html

  • 相关阅读:
    sql刷题day03
    sql刷题day2
    sql刷题day1
    Vue学习
    HashMap学习笔记整理
    数组问题(鸽巢原理、数字交换、链表寻环)
    mybatis参数设置问题
    codeforces 327A
    codeforces 189A
    codeforces-455A
  • 原文地址:https://www.cnblogs.com/barneywill/p/10311928.html
Copyright © 2020-2023  润新知