Cassandra使用 —— 一个气象站的例子

Cassandra使用 —— 一个气象站的例子

使用场景：

Cassandra非常适合存储时序类型的数据，本文我们使用一个气象站的例子（该气象站每分钟需要存储一条温度数据）。

一、方案1：每个设备占用一行

这个方案的思路就是给每个数据源创建一行，比如这里一个气象站的温度就占用一行，然后每个分钟要采集一个温度，那么就让每个时刻的时标将作为列名，而温度值就是列值。

（1）创建表的语句如下：

CREATE TABLE temperature (

  weatherstation_id text,

  event_time timestamp,

  temperature text,

  PRIMARY KEY (weatherstation_id,event_time) );

（2）然后插入如下数据。

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2019-08-03 07:01:00','72F');

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2019-08-03 07:02:00','73F');

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2019-08-03 07:03:00','73F');

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2019-08-03 07:04:00','74F');

（3）如果要查询这个气象站的所有数据，则如下

SELECT event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';

（4）如果要查询某个时间范围的数据，则如下：

SELECT temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2019-08-03 07:01:00'；

二、方案2：每个设备的每天的数据占用一行

有时候把一个设备的所有数据存储在一行可能有点困难，比如放不下（这种情况应该很少见），此时我们就可以对上一个方案做拆分，在row key中增加一个表示，比如可以限制把每个设备每一天的数据放在单独一行，这样一行的数量大小就可控了。

（1）创建表

CREATE TABLE temperature_by_day (

  weatherstation_id text,

  date text,

  event_time timestamp,

  temperature text,

PRIMARY KEY ((weatherstation_id,date),event_time) );

（2）插入数据

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES ('1234ABCD','2019-08-03','2019-08-03 07:01:00','72F');

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES ('1234ABCD','2019-08-03','2019-08-03 07:02:00','73F');

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES ('1234ABCD','2019-08-04','2019-08-04 07:01:00','73F');

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES ('1234ABCD','2019-08-04','2019-08-04 07:02:00','74F');

（3）查询某个设备某一天的数据

SELECT * FROM temperature_by_day WHERE weatherstation_id='1234ABCD' AND date='2019-08-03';

三、方案3：存储带时效性的数据，过期就自动删除

对于时序的数据的另外一种典型应用就是要做循环存储，想象一下，比如我们要在一个dashboard展示最新的10条温度数据，老的数据就没用了，可以不用理会。如果使用其他的数据库，我们往往需要设置一个后台的job去对历史数据做定时清理。但是使用Cassandra，我们可以使用Cassandra的一个叫做过期列（expiring colmn）的新特性，只要超过指定的时间，这个列就自动消失了。

（1）创建表

CREATE TABLE latest_temperatures (

  weatherstation_id text,

  event_time timestamp,

  temperature text,

PRIMARY KEY (weatherstation_id,event_time),

) WITH CLUSTERING ORDER BY (event_time DESC);

（2）插入数据

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2019-08-03 07:03:00','72F') USING TTL 20;

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2019-08-03 07:02:00','73F') USING TTL 20;

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2019-08-03 07:01:00','73F') USING TTL 20;

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2019-08-03 07:04:00','74F') USING TTL 20;

（3）观察

在插入数据之后，你可以不断的使用查询语句来看这些数据，我们可以看到他们一条一条的消失，直到最后所有都没了。

time-series，其是Cassandra最有竞争力的数据模型之一

原文摘要：

　Cassandra can store up to 2 billion columns per row

参考资料：

  https://academy.datastax.com/resources/getting-started-time-series-data-modeling

  http://www.rubyscale.com/post/143067470585/basic-time-series-with-cassandra

  http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
相关阅读:
第二章金字塔内部的结构
 第一章为什么要用金字塔结构
 考研级《计算机网络》知识梳理——第二期
 考研级《计算机网络》知识梳理——第一期
 leetcode常规算法题复盘（科普短文篇）——为何哈希表的容量一般是质数
 leetcode常规算法题复盘（第十六期）——数据流中的第 K 大元素
 leetcode常规算法题复盘（第十四期）——最后一块石头的重量
 leetcode常规算法题复盘（第十三期）——最大矩形&柱状图中最大的矩形
 leetcode常规算法题复盘（第十二期）——摆动序列&买卖股票的最佳时机含手续费
 leetcode常规算法题复盘（基础篇）——线性表java实现
原文地址：https://www.cnblogs.com/Soy-technology/p/11310005.html

Cassandra使用 —— 一个气象站的例子

使用场景：