Overview
Kinesis Streams
Streaming data and video in real-time
Kinesis Data Firehose
Data analytics with BI tools
Kinesis Data Analytics
Real-time data analytics with SQL
Kinesis Streams
- Has Producers and Consumers
- Has Shards
- Has Retention period
Kinesis Shard
The data capacity of the stream is determined by the number of shards. If the data rate increase, you can increase capacity on your stream by increase the number of shards.
- Kinesis streams are made up of shards
- Each shared is a sequence of one or more data recods and provides a fixed unit of capacity
- 5 read pre second. Max total read rate is 2 MB per second
- 1000 writes per second. Max total write rate is 1 MB per second
Kinesis Firehose
- No shards
- No consumers
- Using existing BI tools
- Store data
Kinesis Data Analytsis
- Real-time
- SQL
Kinesis Client Library
- The KCL ensures that for every shared there is a recod processor.
- If you have only one consumter, the KCL will create all the recod processors on single consumer.
- If you have two consumers it will load balance and create half the processors on one instance and half on another.
Scaling Out Consumers
- With KCL, the number of instances does not exceed the number of shards
- You never need multiple instances to handle the processing load of one shard.
- However, one worker can process multiple shards.
- It's fine that number of shards exceeds the number of instances.
- Reshard, doesn't mean need more instances.
- Instead, CPU utilisation is what should drive the quantity of consumer instances you have, NOT the number of shards in your Kinesis stream.
- Using Auto Scaling group, and base scaling decision on CPU load on your consumers.