• flume source,sinks类型官方翻译


    Exec Source
    Exec source runs a given Unix command on start-up and expects that process to continuously produce data on standard out (stderr is simply discarded, unless property logStdErr is set to true). If the process exits for any reason, the source also exits and will produce no further data. This means configurations such as cat [named pipe] or tail -F [file] are going to produce the desired results where as date will probably not - the former two commands produce streams of data where as the latter produces a single event and exits.
    翻译:
    Exec源在启动时运行给定的Unix命令,并期望该进程在标准输出上连续生成数据(除非将属性logStdErr设置为true,否则将丢弃stderr)。 如果进程因任何原因退出,则源也会退出并且不会产生更多数据。 这意味着诸如cat [named pipe]或tail -F [file]之类的配置将产生所需的结果,而日期可能不会 - 前两个命令产生数据流,而后者产生单个事件并退出。
    

    Warning
    The problem with ExecSource and other asynchronous sources is that the source can not guarantee that if there is a failure to put the event into the Channel the client knows about it. In such cases, the data will be lost. As a for instance, one of the most commonly requested features is the tail -F [file]-like use case where an application writes to a log file on disk and Flume tails the file, sending each line as an event. While this is possible, there’s an obvious problem; what happens if the channel fills up and Flume can’t send an event? Flume has no way of indicating to the application writing the log file that it needs to retain the log or that the event hasn’t been sent, for some reason. If this doesn’t make sense, you need only know this: Your application can never guarantee data has been received when using a unidirectional asynchronous interface such as ExecSource! As an extension of this warning - and to be completely clear - there is absolutely zero guarantee of event delivery when using this source. For stronger reliability guarantees, consider the Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK.

      

    警告
     ExecSource和其他异步源的问题在于,如果无法将事件放入Channel中,则源无法保证客户端知道它。在这种情况下,数据将丢失。例如,最常请求的功能之一是tail -F [file]类似的用例,其中应用程序写入磁盘上的日志文件,Flume将文件作为尾部发送,将每一行作为事件发送。虽然这是可能的,但是有一个明显的问题;如果频道填满并且Flume无法发送事件,会发生什么?由于某种原因,Flume无法向编写日志文件的应用程序指示它需要保留日志或事件尚未发送。如果这没有意义,您只需要知道:当使用ExecSource等单向异步接口时,您的应用程序永远无法保证收到数据!作为此警告的延伸 - 并且完全清楚 - 使用此源时,事件传递绝对没有保证。为了获得更强的可靠性保证,请考虑Spooling Directory Source,Taildir Source或通过SDK直接与Flume集成

    spoondir介绍

    此源允许您通过将要摄取的文件放入磁盘上的“假脱机”目录来摄取数据。此源将查看新文件的指定目录,并将在新文件出现时解析事件。事件解析逻辑是可插入的。

    在给定文件完全读入通道后,它被重命名以指示完成(或可选地删除)。

    与Exec源不同,即使Flume重新启动或终止,此源也是可靠的并且不会遗漏数据。为了换取这种可靠性,只能将不可变的,唯一命名的文件放入假脱机目录中。

    Flume尝试检测这些问题,如果违反则会大声失败:

    1. 如果在放入假脱机目录后写入文件,Flume会将错误打印到其日志文件并停止处理。
    2. 如果稍后重复使用文件名,Flume将在其日志文件中输出错误并停止处理。

    为了避免上述问题,在将文件名移动到假脱机目录中时,添加唯一标识符(例如时间戳)可能很有用。

    尽管该源的可靠性保证,但仍存在如果发生某些下游故障则可能重复事件的情况。这与其他Flume组件提供的保证一致。

  • 相关阅读:
    大学阶段最后的交流
    JavaScript的一些基础性知识
    CSS的一些总结
    JavaWeb的一些理解
    Java Web之XML基础
    Java基础增强
    反射的理解
    Java网络编程
    Java 中剩下的流以及线程方面的知识
    Java中的流操作
  • 原文地址:https://www.cnblogs.com/huiandong/p/9449786.html
Copyright © 2020-2023  润新知