• NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server


    NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server

    NOT IN

    1
    2
    3
    4
    5
    6
    7
    SELECT  l.id, l.value
    FROM    [20090915_anti].t_left l
    WHERE   l.value NOT IN
    (
    SELECT  value
    FROM    [20090915_anti].t_right r
    )

    View query results, details and execution plan

    As we can see, this query uses Merge Anti Semi Join which is extremely efficient if there is a cheap way to obtain two ordered resultsets (like in example above). Since value is indexed in both tables, the indexes serve as such resulsets.

    Merge Join means that the server iterates both resultsets from lower values to higher ones, keeping a pointer to the current value and advancing it in both resultsets.

    Anti Semi Join above means that as soon as the engine meets a match in t_right it just skips all matching values in both t_left and t_right. Since values from t_right are pregrouped using Stream Aggregate (making the right resultset 100 times as small), the values are only skipped in t_left (10 at once).

    The whole query takes as little as 0.271 s.

    NOT EXISTS

    1
    2
    3
    4
    5
    6
    7
    8
    SELECT  l.id, l.value
    FROM    [20090915_anti].t_left l
    WHERE   NOT EXISTS
    (
    SELECT  NULL
    FROM    [20090915_anti].t_right r
    WHERE   r.value = l.value
    )

    View query results, details and execution plan

    Exactly same plan and exactly same execution time as above.

    In SQL Server, NOT IN and NOT EXISTS are complete synonyms in terms of the query plans and execution times (as long as both columns are NOT NULL).

    LEFT JOIN / IS NULL

    1
    2
    3
    4
    5
    6
    SELECT  l.id, l.value
    FROM    [20090915_anti].t_left l
    LEFT JOIN
    [20090915_anti].t_right r
    ON      r.value = l.value
    WHERE   r.value IS NULL

    View query results, details and execution plan

    Here, the results are the same but performance details are very different.

    SQL Server's optimizer cannot discern an ANTI JOIN in a LEFT JOIN / IS NULL construct.

    That's why it just build the complete resultset (as with a common LEFT JOIN) and filters out the matching values.

    Since we have lots of values to filter in this case (almost 10,000,000), it's a hard job to filter such a tremendous lot of values. This operation is performed using quite an efficient Hash Match which can be and is parallelized, but filtering the values out still takes the most time.

    That's why the LEFT JOIN / IS NULL query takes 810 ms, or 3 times as much as the NOT EXISTS / NOT IN query.

    Summary

    In SQL Server, NOT EXISTS and NOT IN predicates are the best way to search for missing values, as long as both columns in question are NOT NULL. They produce the safe efficient plans with some kind of an Anti Join.

    LEFT JOIN / IS NULL is less efficient, since it makes no attempt to skip the already matched values in the right table, returning all results and filtering them out instead.

    EXISTS vs JOIN and use of EXISTS clause

    回答1

    EXISTS is used to return a boolean value, JOIN returns a whole other table

    EXISTS is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN is used to extend a result set by combining it with additional fields from another table to which there is a relation.

    In your example, the queries are semantically equivalent.

    In general, use EXISTS when:

    • You don't need to return data from the related table
    • You have dupes in the related table (JOIN can cause duplicate rows if values are repeated)
    • You want to check existence (use instead of LEFT OUTER JOIN...NULL condition)

    If you have proper indexes, most of the time the EXISTS will perform identically to the JOIN. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS.

    If your JOIN key is not indexed, it may be quicker to use EXISTS but you will need to test for your specific circumstance.

    JOIN syntax is easier to read and clearer normally as well.

    回答2

    • EXISTS is a semi-join
    • JOIN is a join

    So with 3 rows and 5 rows matching

    • JOIN gives 15 rows
    • EXISTS gives 3 rows

    The result is the "short circuit" effect mentioned by others and no need to use DISTINCT with a JOIN. EXISTS is almost always quicker when looking for existence of rows on the n side of a 1:n relationship.

    SQL performance on LEFT OUTER JOIN vs NOT EXISTS

    Joe's link is a good starting point. Quassnoi covers this too.

    In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST in the subquery) NOT EXISTS will perform better.

    EXISTS and NOT EXISTS both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.

    LEFT JOIN will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN criteria, this can be very very resource intensive资源密集型.

    I normally try to use NOT EXISTS and EXISTS where possible. For SQL Server, IN and NOT IN are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.

  • 相关阅读:
    WindowsPhone7 经典3D游戏《刺客信条》评测
    WPF案例 — 展厅触摸屏展示系统
    Silverlight三维柱状图3D饼图的Silverlight图表组件案例
    应聘Silverlight讲师(全职或兼职均可)
    WPF案例之生产线控制器管理系统
    Silverlight 5 Beta 版发布日期确定
    《银光志Silverlight 3.0开发详解与最佳实践》发行第三版总销量过万册
    微软Silverlight5发布会提供线上注册
    Silverlight WebOS案例2.0版本(基于Silverlight4开发的Web操作系统)
    长年承接WP7游戏和WP7软件外包
  • 原文地址:https://www.cnblogs.com/chucklu/p/12972943.html
Copyright © 2020-2023  润新知