structure streaming笔记

structure streaming笔记
- 基于micro-batch, spark2.3之后, 支持continues processing
- 基于spark SQL
- 如同在静态table上运行标准批查询一样表现流计算, spark 通过在一个 unbound input table 上运行增量查询来实现.
- unbound input table
  - 　每条输入数据, 体现为表的一条新行
- result table
  - 　每批新输入被处理后, 更新此表. 三种mode:
  - 　complete mode: 每次都更新全表
  - append mode: result table只追加新行. 即新一批输入的处理结果不会依赖且不会影响之前的输出.
  - update mode: 只有被新一批输入计算结果影响了的行, 才会被更新
- event time
  - 数据被输入的时间. 区别于spark收到数据的时间.
- fault tolerant semantics
  - 　end-to-end exactly-once
    
    　捕获failure并重试process
    
    　基于checkpointing 和 WAL - 断点接续
  - 　区别与:
    
    　at-most once
    
    　至多写一次. 弱保证
    
    　at-least once
    
    　至少写一次. 强保证
- 基于DataSet和DataFrame的API
相关阅读:
HDU 4833 Best Financing（DP）（2014年百度之星程序设计大赛
 HDU 4832 Chess（DP+组合数学）（2014年百度之星程序设计大赛
 HDU 4718 The LCIS on the Tree（树链剖分）
HDU 3308 LCIS（线段树）
HDU 4513 吉哥系列故事——完美队形II（Manacher）
HDU 4512 吉哥系列故事——完美队形（LCIS）
人造奇迹——二进制位运算的运用
 UVA 10891 Game of Sum（DP）
在Liferay 7中如何自定义一个Portlet的toolbar
如何在Liferay 7中创建一个简单的JSF Portlet
原文地址：https://www.cnblogs.com/PigeonNoir/p/10630975.html