用于基于某个标识符将字符串/列拆分/断开为多个,并返回列表:
df_b = spark.createDataFrame([('1','ABC-07-DEF')],[ "ID","col1"]) df_b = df_b.withColumn('post_split', F.split(F.col('col1'), "-")) df_b.show() +---+----------+--------------+ | ID| col1| post_split| +---+----------+--------------+ | 1|ABC-07-DEF|[ABC, 07, DEF]| +---+----------+--------------+
此外,还可以使用getitem()从该arry列中提取列,如下所示
df_b = df_b.withColumn('split_col1', F.col('post_split').getItem(0))
.withColumn('split_col2', F.col('post_split').getItem(1))
.withColumn('split_col3', F.col('post_split').getItem(2)) df_b.show() +---+----------+--------------+----------+----------+----------+ | ID| col1| post_split|split_col1|split_col2|split_col3| +---+----------+--------------+----------+----------+----------+ | 1|ABC-07-DEF|[ABC, 07, DEF]| ABC| 07| DEF| +---+----------+--------------+----------+----------+----------+