• PostgreSQL查询优化简介


    简介

    PostgreSQL查询优化器执行过程

    1. 语法分析:生成查询树
    2. 语义检查:对SQL表达的语义进行检查
    3. 查询优化
      1. 视图重写
      2. 逻辑优化:子查询优化,条件化简,等价谓词重写,连接消除,得到逻辑计划
      3. 物理优化:基于代价优化,得到物理计划。PostgreSQL主要采用动态规划和遗传算法
      4. 非SPJ优化:主要针对分组,排序,去重等操作
    4. 查询计划执行

    在PostgreSQL中,语法树并不是一棵树状结构的,把关系平面化到一个链表里面。因为,PostgreSQL认为,在这个阶段不清楚表之间如何链接。

    重要数据结构

    查询语法树

    typedef struct Query
    {
        //上面还有节点类型,是否存在相关子句
        List	   *cteList;		/* WITH 子句 */
        List	   *rtable;			/* list of range table entries */
        FromExpr   *jointree;       /* table join tree (FROM and WHERE clauses) */
        List	   *targetList;		/* target list (of TargetEntry) */
        List	   *returningList;	/* return-values list (of TargetEntry) */
        List	   *groupClause;	/* a list of SortGroupClause's */
        Node	   *havingQual;		/* qualifications applied to groups */
        List	   *windowClause;	/* 窗口函数子句链表 */
        List	   *distinctClause; /* a list of SortGroupClause's */
        List	   *sortClause;		/* a list of SortGroupClause's */
        
        Node	   *limitOffset;	/* limit的offset子句 */
        Node	   *limitCount;		/* limit的个数*/
        Node	   *setOperations;	/* 是否为多个SQL UNION/INTERSECT/EXCEPT query */
    } Query;
    

    范围表(优化前)

    表示被查询的对象,可以是一张表,一个From子句中的子查询,一个连接操作的结果

    typedef struct RangeTblEntry
    {
        //普通表
        Oid			relid;			/* OID of the relation */
        char		relkind;		/* relation kind (see pg_class.relkind) */
        struct TableSampleClause *tablesample;		/* sampling info, or NULL */
        
        //子查询
        Query	   *subquery;		/* the sub-query */
        bool		security_barrier;		/* is from security_barrier view?如果是视图展开的子查询,PostgreSQL不做优化 */
        
        //连接类型
        JoinType	jointype;		/* type of join */
        List	   *joinaliasvars;	/* list of alias-var expansions */
    } RangeTblEntry;
    

    关系优化信息(优化过程中)

    对应PlannerInfo结构体的两个成员(simple_rel_array和join_rel_list),是优化阶段的操作对象,具有查询优化的相关信息

    typedef struct RelOptInfo
    {
        /* all relations included in this RelOptInfo */
        Relids		relids;			/* set of base relids (rangetable indexes) */
        
        /* 估算结果的行数 */
        double		rows;
        
        /* materialization information */
        List	   *pathlist;		/* 存放所有可能的路径 */
        List	   *ppilist;		/* ParamPathInfos used in pathlist */
        List	   *partial_pathlist;		/* partial Paths */
        
        /* 局部最优不一定后面最优,上一层的3个可能的最优结果 */
        struct Path *cheapest_startup_path;
        struct Path *cheapest_total_path;
        struct Path *cheapest_unique_path;
        List	   *cheapest_parameterized_paths;
        
        //本关系为单表或者join
        
        /* used by various scans and joins: */
        List	   *baserestrictinfo;		/* RestrictInfo structures (if base
        									 * rel) */
        QualCost	baserestrictcost;		/* cost of evaluating the above */
        List	   *joininfo;		/* RestrictInfo structures for join clauses
        							 * involving this rel */
        bool		has_eclass_joins;		/* T means joininfo is incomplete */
        } RelOptInfo;
    

    计划节点信息

    全局查询优化计划的相关信息,存放在PlannerInfo结构体

    typedef struct PlannerInfo
    {
        Query	   *parse;			/* 开始时的查询计划树 */
        PlannerGlobal *glob;		/* global info for current planner run */
        Index		query_level;	/* 本计划所处的层数*/
        
        struct PlannerInfo *parent_root;	/* NULL at outermost Query */
        
        struct RelOptInfo **simple_rel_array;		/* 所有基本表信息 */
        int			simple_rel_array_size;	/* allocated size of array */
        RangeTblEntry **simple_rte_array;	/* rangetable as an array */
        
        //考虑过连接后生成的新关系
        List	   *join_rel_list;	/* list of join-relation RelOptInfos */
        struct HTAB *join_rel_hash; /* optional hashtable for join relations */
        
        List	  **join_rel_level; /*结果关系*/
        int			join_cur_level; /* index of list being extended */
        
    } PlannerInfo;
    

    计划节点

    代表根据最有路径,生成的物理计划(Plan)

    typedef struct Plan
    {
        /*
         * estimated execution costs for plan (see costsize.c for more info)
         */
        Cost		startup_cost;	/* cost expended before fetching any tuples */
        Cost		total_cost;		/* total cost (assuming all tuples fetched) */
        
        /*
         * 估计的元组数和元组宽度
         */
        double		plan_rows;
        int			plan_width;	
        
        /*
         * Common structural data for all Plan types.
         */
        int			plan_node_id;	/* unique across entire final plan tree */
        List	   *targetlist;		/* target list to be computed at this node */
        List	   *qual;			/* implicitly-ANDed qual conditions */
        struct Plan *lefttree;		/* input plan tree(s) */
        struct Plan *righttree;
        List	   *initPlan;		/* Init Plan nodes (un-correlated expr
        							 * subselects) */
    } Plan;
    

    PlannedStmt

    优化器结果,保存查询执行计划,范围表相关信息

    SelectStmt

    语法分析结果

    typedef struct SelectStmt
    {
        List	   *distinctClause; /* distinct子句*/
        IntoClause *intoClause;		/* target for SELECT INTO */
        List	   *targetList;		/* 投影列子句 */
        List	   *fromClause;		/* From子句,包括join */
        Node	   *whereClause;	/* where子句 */
        List	   *groupClause;	/* GROUP BY clauses */
        Node	   *havingClause;	/* HAVING conditional-expression */
        List	   *windowClause;	/* WINDOW window_name AS (...), ... */
        
        List	   *sortClause;		/* sort clause (a list of SortBy's) */
        Node	   *limitOffset;	/* # of result tuples to skip */
        Node	   *limitCount;		/* # of result tuples to return */
        List	   *lockingClause;	/* FOR UPDATE (list of LockingClause's) */
    } SelectStmt;
    

    结构体关系

    1. PlannerInfo是逻辑优化的主要产物,拥有查询树(Query),关系优化信息,约束条件
    2. 路径是物理优化阶段的主要产物,拥有排序键和连接节点
    3. PlannerInfo和路径混杂在一起
    4. 查询执行计划Plan是所有路径的最小代价生成的

    代码入口

    planner:主入口函数

    PlannedStmt *
    planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
    {
        PlannedStmt *result;
        
        if (planner_hook)
        	result = (*planner_hook) (parse, cursorOptions, boundParams);
        else
        	result = standard_planner(parse, cursorOptions, boundParams);
        return result;
    }
    

    被主函数调用

    standard_planner——标准的查询优化器入口

    standard_planner只是查询优化器的外壳,通过调用subquery_planner完成查询优化,通过调用set_plan_references完成清理辅助工作

    PlannedStmt *
    standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
    {
        PlannedStmt *result;//结果
        PlannerGlobal *glob;//查询优化一些所有子查询需要的公共信息
        double		tuple_fraction;//
        PlannerInfo *root;
        RelOptInfo *final_rel;
        Path	   *best_path;
        Plan	   *top_plan;
        ListCell   *lp,
        		   *lr;
    
    ···    
        
        /* primary planning entry point (may recurse for subqueries) */
        root = subquery_planner(glob, parse, NULL,
        						false, tuple_fraction);
        
        /* Select best Path and turn it into a Plan */
        final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
        best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
        
        top_plan = create_plan(root, best_path);
    
    ···        
       
        /* final cleanup of the plan */
        Assert(glob->finalrtable == NIL);
        Assert(glob->finalrowmarks == NIL);
        Assert(glob->resultRelations == NIL);
        top_plan = set_plan_references(root, top_plan);
        /* ... and the subplans (both regular subplans and initplans) */
        Assert(list_length(glob->subplans) == list_length(glob->subroots));
        forboth(lp, glob->subplans, lr, glob->subroots)
        {
        	Plan	   *subplan = (Plan *) lfirst(lp);
        	PlannerInfo *subroot = (PlannerInfo *) lfirst(lr);
        
        	lfirst(lp) = set_plan_references(subroot, subplan);
        }
    
    ···        
        /* build the PlannedStmt result */
        result = makeNode(PlannedStmt);
        
        result->commandType = parse->commandType;
        result->queryId = parse->queryId;   //parse结果
        result->planTree = top_plan;    //查询计划
        result->rtable = glob->finalrtable; //范围表
        result->resultRelations = glob->resultRelations;    //
        
        return result;
    }
    
    1. subquery_planner返回逻辑优化和物理优化结果root(PlannerInfo *)
    2. create_plan根据最优路径,和PlannerInfo生成物理执行计划Plan
    3. set_plan_references对执行计划部分调整和清理

    subquery_planner生成(子)查询执行计划的函数

    subquery_planner分为两步。第一步是逻辑优化;第二步是物理优化

    //传入glob,parse,typle_fraction,parent_root最开始为NULL
    PlannerInfo *
    subquery_planner(PlannerGlobal *glob, Query *parse,
    				 PlannerInfo *parent_root,
    				 bool hasRecursion, double tuple_fraction)
    {
        PlannerInfo *root;
        List	   *newWithCheckOptions;
        List	   *newHaving;
        bool		hasOuterJoins;
        RelOptInfo *final_rel;
        ListCell   *l;
        
        /* 为当前子查询创建PlannerInfo */
        root = makeNode(PlannerInfo);
        root->parse = parse;
        root->glob = glob;
        root->query_level = parent_root ? parent_root->query_level + 1 : 1;
        root->parent_root = parent_root;
        root->hasRecursion = hasRecursion;
        if (hasRecursion)
        	root->wt_param_id = SS_assign_special_param(root);
        else
        	root->wt_param_id = -1;
        root->non_recursive_path = NULL;
        
        /*
         * Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try
         * to transform them into joins.  Note that this step does not descend
         * into subqueries; if we pull up any subqueries below, their SubLinks are
         * processed just before pulling them up.
         *子连接:在where和join子句含有ANY和EXISTS
         */
        if (parse->hasSubLinks)
        	pull_up_sublinks(root);
        
        //上拉子查询
        pull_up_subqueries(root);
        
        //子查询合并
        if (parse->setOperations)
        	flatten_simple_union_all(root);
        
        //上拉子查询后处理继承关系
        preprocess_rowmarks(root);
        expand_inherited_tables(root);
    
        //条件化简
        preprocess_expression
    
        //合并having子句到where子句,如果having里面含有聚集函数,易失函数,子查询不能合并    
    
        //消除外连接
        if (hasOuterJoins)
        	reduce_outer_joins(root);
        
        /*
         * Do the main planning.  If we have an inherited target relation, that
         * needs special processing, else go straight to grouping_planner.
         */
        if (parse->resultRelation &&
        	rt_fetch(parse->resultRelation, parse->rtable)->inh)
        	//含有继承关系的物理优化
        	inheritance_planner(root);
        else
            //物理优化
        	grouping_planner(root, false, tuple_fraction);
        
    
        set_cheapest(final_rel);
        
        return root;
    }
    
    1. 逻辑优化
      1. 处理CTE表达式(ss_process_ctes)
      2. 上拉子连接
      3. 上拉子查询
      4. Union all处理:flatten_simple_union_all
      5. 处理for update(row lock):preprocess_rowmark
      6. 继承表处理(expand_inherited_tables)
      7. 处理目标列(prepocess_expression)
      8. 处理withCheckOptions:prepocess_expression
      9. 处理return 表达式,window子句,limit off子句:prepocess_expression
      10. 合并having到where子句
      11. 消除外连接
    2. 物理优化:生成本成查询PlanInfo的三条最优路径,返回给上层

    整体流程

    • subquery_planner
      • 处理CTE表达式
      • 上拉子链接(去除in, some ,exist)
      • 上拉子查询
      • 预处理表达式(and/or, 计算明显的结果, 处理不能上拉的子连接)
      • 消除外连接
      • grouping_planer
        • 处理集合(生成子查询)
        • 处理非集合,调整order, group, target list之句中的顺序
        • query_planner
          • 构建基本表的RelOptInfo
          • 选择下推
          • 投影下推
          • 推导隐含表达式
          • 生成pathkey
          • make_one_rel(通过数据直方图计算选择率)
            • 处理单表最优的查询方式
            • 处理两表join最优的方式(判定是否special join)
            • 动态规划或者遗传算法构建多表Join
        • 获取cheatest_path,再加上其他子句
  • 相关阅读:
    设为首页 和 收藏本页
    软件开发:需求分析的20条法则
    常用的50条网站推广方法
    IT人士在离职时可以做的14件事情
    需求文档的编写
    无法清除DNS缓存
    输入法没有了 输入法不见了
    CMS
    用户输入的防注入总结 简介和第一步
    winmail无法给新浪发送邮件
  • 原文地址:https://www.cnblogs.com/biterror/p/6913267.html
Copyright © 2020-2023  润新知