• SQL Server: Difference between PARTITION BY and GROUP BY


    SQL Server: Difference between PARTITION BY and GROUP BY

    问题

    I've been using GROUP BY for all types of aggregate queries over the years. Recently, I've been reverse-engineering some code that uses PARTITION BY to perform aggregations. In reading through all the documentation I can find about PARTITION BY, it sounds a lot like GROUP BY, maybe with a little extra functionality added in? Are they two versions of the same general functionality, or are they something different entirely?

    回答1

    They're used in different places. group by modifies the entire query, like:

    select customerId, count(*) as orderCount
    from Orders
    group by customerId

    But partition by just works on a window function, like row_number:

    select row_number() over (partition by customerId order by orderId)
        as OrderNumberForThisCustomer
    from Orders

    A group by normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row. partition by does not affect the number of rows returned, but it changes how a window function's result is calculated.

    回答2

    We can take a simple example.

    Consider a table named TableA with the following values:

    id  firstname                   lastname                    Mark
    -------------------------------------------------------------------
    1   arun                        prasanth                    40
    2   ann                         antony                      45
    3   sruthy                      abc                         41
    6   new                         abc                         47
    1   arun                        prasanth                    45
    1   arun                        prasanth                    49
    2   ann                         antony                      49

    GROUP BY

    The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns.

    In more simple words GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.

    Syntax:

    SELECT expression1, expression2, ... expression_n, 
           aggregate_function (aggregate_expression)
    FROM tables
    WHERE conditions
    GROUP BY expression1, expression2, ... expression_n;

    We can apply GROUP BY in our table:

    select SUM(Mark)marksum,firstname from TableA
    group by id,firstName

    Results:

    marksum  firstname
    ----------------
    94      ann                      
    134     arun                     
    47      new                      
    41      sruthy   

    In our real table we have 7 rows and when we apply GROUP BY id, the server group the results based on id:

    In simple words:

    here GROUP BY normally reduces the number of rows returned by rolling them up and calculating Sum() for each row.

    PARTITION BY

    Before going to PARTITION BY, let us look at the OVER clause:

    According to the MSDN definition:

    OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

    PARTITION BY will not reduce the number of rows returned.

    We can apply PARTITION BY in our example table:

    SELECT SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname FROM TableA

    Result:

    marksum firstname 
    -------------------
    134     arun                     
    134     arun                     
    134     arun                     
    94      ann                      
    94      ann                      
    41      sruthy                   
    47      new  

    Look at the results - it will partition the rows and returns all rows, unlike GROUP BY.

    回答3

    partition by doesn't actually roll up the data. It allows you to reset something on a per group basis. For example, you can get an ordinal column within a group by partitioning on the grouping field and using rownum() over the rows within that group. This gives you something that behaves a bit like an identity column that resets at the beginning of each group.

  • 相关阅读:
    failed to create pid file /var/run/rsyncd.pid: File exists报错
    Ansible系列之roles使用说明
    设置build.gradle打包时自动加时间
    Oracle客户端连接数据库配置
    Gradle实现自动打包,签名,自定义apk文件名
    linux定时任务执行没结果,手动执行有结果问题总结
    实现TableLayout布局下循环取出TableRow控件中的文字内容到list集合
    ImageView的src和background的区别
    关于Merge的整理--Merge的使用方法和注意事项的Demo
    关于Merge的整理--AndroidScreenSlidePager开源库中用到的
  • 原文地址:https://www.cnblogs.com/chucklu/p/16401837.html
Copyright © 2020-2023  润新知