• R Programming week 3-Loop functions


    Looping on the Command Line

    Writing for, while loops is useful when programming but not particularly easy when working interactively on the command line. There are some functions which implement looping to make life easier

    lapply: Loop over a list and evaluate a function on each elementsapply: Same as lapply but try to simplify the result

    apply: Apply a function over the margins of an array

    tapply: Apply a function over subsets of a vector mapply: Multivariate version of lapply

    An auxiliary function split is also useful, particularly in conjunction with lapply

    lapply

    lapply takes three arguments: (1) a list X; (2) a function (or the name of a function) FUN; (3) other arguments via its ... argument. If X is not a list, it will be coerced to a list using as.list.

    ## function (X, FUN, ...)

    ## {

    ## FUN <- match.fun(FUN)

    ## if (!is.vector(X) || is.object(X))

    ## X <- as.list(X)

    ## .Internal(lapply(X, FUN))

    ## }

    ## <bytecode: 0x7ff7a1951c00>

    ## <environment: namespace:base>

    The actual looping is done internally in C code.

    lapply always returns a list, regardless of the class of the input.

    x <- list(a = 1:5, b = rnorm(10))

    lapply(x, mean)

    x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) lapply(x, mean)

    > x <- 1:4 > lapply(x, runif)

    lapply and friends make heavy use of anonymous function

    > x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))

    > x

    $a

    [,1] [,2]

    [1,] 1 3

    [2,] 2 4

    $b

    [,1] [,2]

    [1,] 1 4

    [2,] 2 5

    [3,] 3 6

    An anonymous function for extracting the first column of each matrix.

    > lapply(x, function(elt) elt[,1])

    $a

    [1] 1 2

    $b

    [1] 1 2 3

    sapply

    > x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))

    > lapply(x, mean)

    apply

    apply is used to a evaluate a function (often an anonymous one) over the margins of an array.

    It is most often used to apply a function to the rows or columns of a matrix

    It can be used with general arrays, e.g. taking the average of an array of matrices

    It is not really faster than writing a loop, but it works in one line!

    > str(apply)

    function (X, MARGIN, FUN, ...)

    X is an array

    MARGIN is an integer vector indicating which margins should be “retained”.

    FUN is a function to be applied

    ... is for other arguments to be passed to FUN

    > x <- matrix(rnorm(200), 20, 10)

    > apply(x, 2, mean)

    [1] 0.04868268 0.35743615 -0.09104379

    [4] -0.05381370 -0.16552070 -0.18192493

    [7] 0.10285727 0.36519270 0.14898850

    [10] 0.26767260

    col/row sums and means

    For sums and means of matrix dimensions, we have some shortcuts.

    rowSums = apply(x, 1, sum)

    rowMeans = apply(x, 1, mean)

    colSums = apply(x, 2, sum)

    colMeans = apply(x, 2, mean)

    The shortcut functions are much faster, but you won’t notice unless you’re using a large matrix.

    Other Ways to Apply

    Quantiles of the rows of a matrix.

    > x <- matrix(rnorm(200), 20, 10)

    > apply(x, 1, quantile, probs = c(0.25, 0.75))

    mapply

    mapply is a multivariate apply of sorts which applies a function in parallel over a set of arguments.

    > str(mapply)

    function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE,USE.NAMES = TRUE)

    FUN is a function to apply ... contains arguments to apply over MoreArgs is a list of other arguments to FUN.

    SIMPLIFY indicates whether the result should be simplified

    The following is tedious to type

    list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))

    Instead we can do

    Vectorizing a Function

    > noise <- function(n, mean, sd) {

    + rnorm(n, mean, sd)

    + }

    > noise(5, 1, 2)

    [1] 2.4831198 2.4790100 0.4855190 -1.2117759

    [5] -0.2743532

    > noise(1:5, 1:5, 2)

    [1] -4.2128648 -0.3989266 4.2507057 1.1572738

    [5] 3.7413584

    Instant Vectorization

    > mapply(noise, 1:5, 1:5, 2)

    Which is the same as

    list(noise(1, 1, 2), noise(2, 2, 2), noise(3, 3, 2), noise(4, 4, 2), noise(5, 5, 2))

    tapply

    tapply is used to apply a function over subsets of a vector. I don’t know why it’s called tapply.

    > str(tapply) function (X, INDEX, FUN = NULL, ..., simplify = TRUE)

    X is a vector

    INDEX is a factor or a list of factors (or else they are coerced to factors)

    FUN is a function to be applied

    ... contains other arguments to be passed FUN

    simplify, should we simplify the result?

    Take group means.

    > x <- c(rnorm(10), runif(10), rnorm(10, 1))

    > f <- gl(3, 10)

    > f

    [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3

    [24] 3 3 3 3 3 3 3

    Levels: 1 2 3

    > tapply(x, f, mean)

    1 2 3

    0.1144464 0.5163468 1.2463678

    Take group means without simplification.

    > tapply(x, f, mean, simplify = FALSE)

    $‘1‘

    [1] 0.1144464

    $‘2‘

    [1] 0.5163468

    $‘3‘

    [1] 1.246368

    Find group ranges.

    > tapply(x, f, range)

    $‘1‘

    [1] -1.097309 2.694970

    $‘2‘

    [1] 0.09479023 0.79107293

    $‘3‘

    [1] 0.4717443 2.5887025

    split

    split takes a vector or other objects and splits it into groups determined by a factor or list of factors.

    > str(split) function (x, f, drop = FALSE, ...)

    x is a vector (or list) or data frame

    f is a factor (or coerced to one) or a list of factors

    drop indicates whether empty factors levels should be dropped

    A common idiom is split followed by an lapply.

    > lapply(split(x, f), mean)

    Splitting a Data Frame

    > library(datasets)

    > head(airquality)

    > s <- split(airquality, airquality$Month)

    > lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))

    > sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))

    > sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm = TRUE))

    Splitting on More than One Level

    > x <- rnorm(10)

    > f1 <- gl(2, 5)

    > f2 <- gl(5, 2)

    Interactions can create empty levels.

    > str(split(x, list(f1, f2)))

    split

    Empty levels can be dropped

    > str(split(x, list(f1, f2), drop = TRUE))

    List of 6

    $ 1.1: num [1:2] -0.378 0.445

    $ 1.2: num [1:2] 1.4066 0.0166

    $ 1.3: num -0.355

    $ 2.3: num 0.315

    $ 2.4: num [1:2] -0.907 0.723

    $ 2.5: num [1:2] 0.732 0.360

    欢迎关注

  • 相关阅读:
    软件工程2019:第3次作业—— 团队项目阶段一: 项目需求分析
    软件工程2019:第2次作业—— 时事点评
    第1次作业—— 自我介绍 + 软工五问(热身运动)
    软工作业(4)用户体验分析:以 “师路南通网站” 为例
    软工作业(3):用户体验分析
    软工作业: (2)硬币游戏—— 代码分析与改进
    《软件工程导论》读后感想与疑惑
    软工作业(1)
    用户体验分析:以 “师路南通网站” 为例
    用户体验分析: 以 “南通大学教务管理系统微信公众号” 为例
  • 原文地址:https://www.cnblogs.com/jpld/p/4446804.html
Copyright © 2020-2023  润新知