https://flink.apache.org/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html
Typically, evolving the schema of an application’s state happens because of some business logic change (adding or dropping fields or changing data types). In all cases, the schema is determined by means of its serializer, and can be thought of in terms of an alter table statement when compared with a database. When a state variable is first introduced it is like running a CREATE_TABLE
command, there is a lot of freedom with its execution. However, having data in that table (registered rows) limits developers in what they can do and what rules they follow in order to make updates or changes by an ALTER_TABLE
statement. Schema migration in Apache Flink follows a similar principle since the framework is essentially running an ALTER_TABLE
statement across savepoints.
Flink 1.8 comes with built-in support for Apache Avro (specifically the 1.7.7 specification) and evolves state schema according to Avro specifications by adding and removing types or even by swapping between generic and specific Avro record types.
In Flink 1.9 the community added support for schema evolution for POJOs, including the ability to remove existing fields from POJO types or add new fields. The POJO schema evolution tends to be less flexible — when compared to Avro — since it is not possible to change neither the declared field types nor the class name of a POJO type, including its namespace.
With the community’s efforts related to schema evolution, Flink developers can now expect out-of-the-box support for both Avro and POJO formats, with backwards compatibility for all Flink state backends. Future work revolves around adding support for Scala Case Classes, Tuples and other formats. Make sure to subscribe to the Flink mailing list to contribute and stay on top of any upcoming additions in this space.