• FlinkSQL源码阅读-schema管理


    在Flink SQL中, 元数据的管理分为三层: catalog-> database-> table,
    我们知道Flink SQL是依托calcite框架来进行SQL执行树生产,校验,优化等等, 所以本文讲介绍FlinkSQL是如何来结合Calcite来进行元数据管理的.

    calcite开放的接口

    public interface Schema {
        Table getTable(String name);
    
        Schema getSubSchema(String name);
    
        ....
    }
    

    如接口所示, Schema接口,可以通过table名来获得一张表, 可以通过schema名来获得一个子schema.

    public interface Table {
        RelDataType getRowType(RelDataTypeFactory typeFactory);
        ....
    }
    

    看Table的接口, 主要就是返回table的RelDataType.

    Flink的相关实现

    接下来,我们来看下Flink是如何实现这些接口的:

    public class CatalogManagerCalciteSchema extends FlinkSchema {
    	@Override
    	public Schema getSubSchema(String schemaName) {
    		if (catalogManager.schemaExists(name)) {
    			return new CatalogCalciteSchema(name, catalogManager, isStreamingMode);
    		} else {
    			return null;
    		}
    	}
    }
    
    
    public class CatalogCalciteSchema extends FlinkSchema {
        @Override
        public Schema getSubSchema(String schemaName) {
            if (catalogManager.schemaExists(catalogName, schemaName)) {
                return new DatabasecalciteSchema(schemaName, catalogNmae, catalogManager, isStreamingMode);
            }
        }
    }
    
    public class DatabaseCalciteSchema extends FlinkSchema {
        private final String databaseName;
        private final String catalogName;
        private final CatalogManager catalogManager;
    
        @Override
        public Table getTable(String tableName) {
    		ObjectIdentifier identifier = ObjectIdentifier.of(catalogName, databaseName, tableName);
    		return catalogManager.getTable(identifier)
    			.map(result -> {
    				CatalogBaseTable table = result.getTable();
    				FlinkStatistic statistic = getStatistic(result.isTemporary(), table, identifier);
    				return new CatalogSchemaTable(identifier,
    					table,
    					statistic,
    					catalogManager.getCatalog(catalogName)
    						.flatMap(Catalog::getTableFactory)
    						.orElse(null),
    					isStreamingMode,
    					result.isTemporary());
    			})
    			.orElse(null);
        }
    
        @Override
        public Schema getSubSchema(String name) {
            return null;
        }
    }
    

    很容易发现,CatalogSchema返回DatabaseSchema, DatabaseSchema返回Table,
    这样就容易理解,Flink的三层结构是怎样的了. 同时, 具体的元数据实际上都是在catalogManager中。

    DatabaseSchema中返回的Table类型为CatalogSchemaTable,我们来看下具体的结结构是怎样的,
    上文中也提到了,Table接口主为getRowType函数, 用于返回某个table的type信息。
    TableSchema是Flink内部用于保存各个字段的类型信息的类, 通过相关的转化函数,转换为calcite的type类型.

    public class CatalogSchemaTable extends AbstractTable implements TemporalTable {
        
    	private final ObjectIdentifier tableIdentifier;
    	private final CatalogBaseTable catalogBaseTable;
    	private final FlinkStatistic statistic;
    	private final boolean isStreamingMode;
    	private final boolean isTemporary;
        ...
    	private static RelDataType getRowType(RelDataTypeFactory typeFactory,
    			CatalogBaseTable catalogBaseTable,
    			boolean isStreamingMode) {
    		final FlinkTypeFactory flinkTypeFactory = (FlinkTypeFactory) typeFactory;
    		TableSchema tableSchema = catalogBaseTable.getSchema();
    		final DataType[] fieldDataTypes = tableSchema.getFieldDataTypes();
    		if (!isStreamingMode
    			&& catalogBaseTable instanceof ConnectorCatalogTable
    			&& ((ConnectorCatalogTable) catalogBaseTable).getTableSource().isPresent()) {
    			// If the table source is bounded, materialize the time attributes to normal TIMESTAMP type.
    			// Now for ConnectorCatalogTable, there is no way to
    			// deduce if it is bounded in the table environment, so the data types in TableSchema
    			// always patched with TimeAttribute.
    			// See ConnectorCatalogTable#calculateSourceSchema
    			// for details.
    
    			// Remove the patched time attributes type to let the TableSourceTable handle it.
    			// We should remove this logic if the isBatch flag in ConnectorCatalogTable is fixed.
    			// TODO: Fix FLINK-14844.
    			for (int i = 0; i < fieldDataTypes.length; i++) {
    				LogicalType lt = fieldDataTypes[i].getLogicalType();
    				if (lt instanceof TimestampType
    					&& (((TimestampType) lt).getKind() == TimestampKind.PROCTIME
    					|| ((TimestampType) lt).getKind() == TimestampKind.ROWTIME)) {
    					int precision = ((TimestampType) lt).getPrecision();
    					fieldDataTypes[i] = DataTypes.TIMESTAMP(precision);
    				}
    			}
    		}
    		return TableSourceUtil.getSourceRowType(flinkTypeFactory,
    			tableSchema,
    			scala.Option.empty(),
    			isStreamingMode);
    	}
    }
    

    CatalogBaseTable接口定义如下, Flink的Table的参数(schema参数,connector参数)都可以最终表示为一个map.

    public interface CatalogBaseTable {
    	/**
    	 * Get the properties of the table.
    	 *
    	 * @return property map of the table/view
    	 */
    	Map<String, String> getProperties();
    
    	/**
    	 * Get the schema of the table.
    	 *
    	 * @return schema of the table/view.
    	 */
    	TableSchema getSchema();
    
    	/**
    	 * Get comment of the table or view.
    	 *
    	 * @return comment of the table/view.
    	 */
    	String getComment();
    
    	/**
    	 * Get a deep copy of the CatalogBaseTable instance.
    	 *
    	 * @return a copy of the CatalogBaseTable instance
    	 */
    	CatalogBaseTable copy();
    
    	/**
    	 * Get a brief description of the table or view.
    	 *
    	 * @return an optional short description of the table/view
    	 */
    	Optional<String> getDescription();
    
    	/**
    	 * Get a detailed description of the table or view.
    	 *
    	 * @return an optional long description of the table/view
    	 */
    	Optional<String> getDetailedDescription();
    }
    
    

    FlinkSchema的使用

    上面都是的相关接口都是Flink用于适配calcite框架元数据的相关实现。
    那么这些类具体是在哪里调用的? 已经什么时候会被调用到?
    calcite中的schema,主要是在validate过程中, 获得对应table的字段信息, 对应的function的返回值信息,
    确保SQL的字段名, 字段类型是正确的.
    类的依赖关系为:
    validator ---> schemaReader ---> schema

    FlinkPlannerImpl.scala中

      private def createSqlValidator(catalogReader: CatalogReader) = {
        val validator = new FlinkCalciteSqlValidator(
          operatorTable,
          catalogReader,
          typeFactory)
        validator.setIdentifierExpansion(true)
        // Disable implicit type coercion for now.
        validator.setEnableTypeCoercion(false)
        validator
      }
    

    PlanningConfigurationBuilder.java

    	private CatalogReader createCatalogReader(
    			boolean lenientCaseSensitivity,
    			String currentCatalog,
    			String currentDatabase) {
    		SqlParser.Config sqlParserConfig = getSqlParserConfig();
    		final boolean caseSensitive;
    		if (lenientCaseSensitivity) {
    			caseSensitive = false;
    		} else {
    			caseSensitive = sqlParserConfig.caseSensitive();
    		}
    
    		SqlParser.Config parserConfig = SqlParser.configBuilder(sqlParserConfig)
    			.setCaseSensitive(caseSensitive)
    			.build();
    
    		return new CatalogReader(
    			rootSchema,
    			asList(
    				asList(currentCatalog, currentDatabase),
    				singletonList(currentCatalog)
    			),
    			typeFactory,
    			CalciteConfig.connectionConfig(parserConfig));
    	}
    

    综上所诉, 我们就知道了Flink是如何来利用calcite的schema来管理Flink的table信息的.

  • 相关阅读:
    PHP面试题遇到的几个坑。...面壁ing
    Java基础- super 和 this 解析
    openStack use
    ceph伦理概念
    openstack core components use 总结
    current imporant Posts
    openNebula rgister img instance vms error collections
    openStack images概念及维护
    Error copying image in the datastore: Not allowed to copy image file
    OpenNebula openldap集成
  • 原文地址:https://www.cnblogs.com/0x12345678/p/13174976.html
Copyright © 2020-2023  润新知