I'm trying to save dataframe in table hive.
In spark 1.6 it's work but after migration to 2.2.0 it doesn't work anymore.
Here's the code:
blocs
.toDF()
.repartition($"col1", $"col2", $"col3", $"col4")
.write
.format("parquet")
.mode(saveMode)
.partitionBy("col1", "col2", "col3", "col4")
.saveAsTable("db".tbl)
The format of the existing table project_bsc_dhr.bloc_views is HiveFileFormat
. It doesn't match the specified format ParquetFileFormat
.; org.apache.spark.sql.AnalysisException: The format of the existing table project_bsc_dhr.bloc_views is HiveFileFormat
. It doesn't match the specified format ParquetFileFormat
.;
-
have you got any solution ? i am facing same issue..can you please let me know what is the work around – BigD Feb 8 '19 at 11:42
-
Yes, i used insertInto instead of saveAsTable and i deleted partitionby. The code: blocs .toDF() .repartition($"col1", $"col2", $"col3", $"col4") .write .format("parquet") .insertInto("db".tbl) – youssef grati Feb 9 '19 at 12:07
-
am using spark 2.3.0 .. is repartitions works on latest spark ? – BigD Feb 9 '19 at 15:34
I have just tried to use .format("hive")
to saveAsTable
after getting the error and it worked.
I also would not recommend to use insertInto
suggested by the author, because it looks not type-safe (as much as this term can be applied to SQL API) and is error-prone in the way it ignores column names and uses position-base resolution.