一,创建Dataframe scala> val df = sc.parallelize(Seq( | | (0,"cat26",30.9), | | (1,"cat67",28.5), | | (2,"cat56",39.6), | | (3,"cat8",35.6))).toDF("Hour", "Category", "Value") df: org.apache.spark.sql.DataFrame = [Hour: int, Category: string ... 1 more field] scala> df.show() +----+--------+-----+ |Hour|Category|Value| +----+--------+-----+ | 0| cat26| 30.9| | 1| cat67| 28.5| | 2| cat56| 39.6| | 3| cat8| 35.6| +----+--------+-----+ 二,方法1:(!号是取反) scala> var df1 = df.select(df.columns.filter(x => !x.contains("Val")).map(df(_)) : _*) df1: org.apache.spark.sql.DataFrame = [Hour: int, Category: string] scala> df1.show() +----+--------+ |Hour|Category| +----+--------+ | 0| cat26| | 1| cat67| | 2| cat56| | 3| cat8| +----+--------+ 三,方法2: scala> val regex = """^((?!Va).)*$""".r regex: scala.util.matching.Regex = ^((?!Va).)*$ scala> val selection = df.columns.filter(s => regex.findFirstIn(s).isDefined) selection: Array[String] = Array(Hour, Category) scala> var newdf = df.select(selection.head, selection.tail : _*) newdf: org.apache.spark.sql.DataFrame = [Hour: int, Category: string] scala> newdf.show() +----+--------+ |Hour|Category| +----+--------+ | 0| cat26| | 1| cat67| | 2| cat56| | 3| cat8| +----+--------+ 正则表达式这块没怎么研究,可参考: https://www.runoob.com/scala/scala-regular-expressions.html https://stackoverflow.com/questions/59065137/select-columns-in-spark-dataframe-based-on-column-name-pattern