如何在 BigQuery 中实现对 array<struct<a string, b string>> 的 group by 操作

如何在 BigQuery 中实现对 array<struct<a string, b string>> 的 group by 操作
最近做的需求又开始贴近 SQL 了，感觉有点手生。毕竟最近半年切换上下文有点频繁，做的东西有点杂。

之前比较少对复合字段进行操作，涉及到数组操作和结构体操作， SQL 竟也提供了一大套完整的操作函数。越发觉得现在 SQL 真是强大啊。。。

谈论 group by array<struct<a string, b string>> 这种结构之前我们可以先看下如果是简单的 struct<a string, b string> 效果会如何
```
raw_event_1_0_1    RECORD    NULLABLE            
data STRING    REQUIRED            
name STRING    REQUIRED    
```
我们有如下字段 raw_event_1_0_1, no repeated 所以它是 strcut<data string, name string> 结构。 BigQuery 支持直接对单层 struct 里的字段行 group by 操作。

like this
```
 select 
   count(raw_event),
   ANY_VALUE(load_tstamp)
 from snowplow_zee.events where load_tstamp >= '2022-10-28' and load_tstamp < '2022-11-01'
   group by raw_event.name
```
那么对 array<struct<a string, b string>> 我们就没法儿直接操作了，思路还是我们得尝试把这个数组打开进行操作。通过查询文档我找到了 SQL 语句对 from clause 子部分支持的操作
```
FROM from_clause[, ...]

from_clause:
    from_item
    [ { pivot_operator | unpivot_operator } ]
    [ tablesample_operator ]

from_item:
    {
      table_name [ as_alias ] [ FOR SYSTEM_TIME AS OF timestamp_expression ] 
      | { join_operation | ( join_operation ) }
      | ( query_expr ) [ as_alias ]
      | field_path
      | unnest_operator
      | cte_name [ as_alias ]
    }

as_alias:
    [ AS ] alias
```
在 from_item 里面我们可以看到该部分。持 unnest_operator 操作
```
unnest_operator:
    {
      UNNEST( array_expression )
      | UNNEST( array_path )
      | array_path
    }
    [ as_alias ]
    [ WITH OFFSET [ as_alias ] ]

as_alias:
    [AS] alias
```
由此可知我们可以将 array 用 UNNEST 展开形成一个新的表类似于 array => table，通常情况下会生成一列字段我们可以直接用 as 指定。而这一列的内容就是数组里的每个元素。
```
mapping_data    RECORD    REPEATED            
mapped_event    STRING    NULLABLE            
event_category    STRING    NULLABLE    
```
```
select 
event_id,
user_id,
mapped_event,
event_category
from ee.flat_v_events as e,
  UNNEST(e.mapping_data) as p

where load_tstamp >= '2022-10-26' and load_tstamp < '2022-10-28' and p.event_category != "" limit 100
```
results:

这个操作有点类似于将列转成行来进行展示,实际上是进行了一次 cross join。所以上面语句还可以改写成
```
select 
event_id,
user_id,
mapped_event,
event_category
from ee.flat_v_events as e cross join
  UNNEST(e.mapping_data) as p

where load_tstamp >= '2022-10-26' and load_tstamp < '2022-10-28' and p.event_category != "" limit 100
```
最后需要注意的是，如果 cross join 结果需要去重，记得要把去重做一下。

Reference:

https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#unnest_operator

https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#comma_cross_join

https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#query_structs_in_an_array
相关阅读:
power desinger 学习笔记<五>
power desinger 学习笔记<四>
power desinger 学习笔记<八>
kill session真的能杀掉进程吗
 转: Oracle AWR 报告每天自动生成并发送邮箱
 Bootstrap 图片
 Bootstrap历练实例：禁用的按钮
 Bootstrap历练实例：点击激活的按钮
 Bootstrap历练实例：块级按钮
 Bootstrap历练实例：超小的按钮
原文地址：https://www.cnblogs.com/piperck/p/16855428.html