R语言函数化学习笔记
1.apply函数
可以让list或者vector的元素依次执行一遍调用的函数,输出的结果是list格式
2.sapply函数
原理和list一样,但是输出的结果是一个向量的形式
3.vapply
这个函数输出的结果更加详细,但是函数使用的时候需要多写一个几个参数来控制
Use vapply
Before you get your hands dirty with the third and last apply function that you'll learn about in this intermediate R course, let's take a look at its syntax. The function is called vapply(), and it has the following syntax:
vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
Over the elements inside X, the function FUN is applied. The FUN.VALUE argument expects a template for the return argument of this function FUN. USE.NAMES is TRUE by default; in this case vapply() tries to generate a named array, if possible.
temp is already prepared for you in the workspace
Definition of below_zero()
below_zero <- function(x) {
return(x[x < 0])
}
# Apply below_zero over temp using sapply(): freezing_s
freezing_s<-sapply(temp,below_zero)
# Apply below_zero over temp using lapply(): freezing_l
freezing_l<-lapply(temp,below_zero)
# Are freezing_s and freezing_l identical?
dentical(freezing_s,freezing_l)
举个例子就知道结果的区别了
temp is already prepared for you in the workspace
Definition of below_zero()
below_zero <- function(x) {
return(x[x < 0])
}
Apply below_zero over temp using sapply(): freezing_s
freezing_s<-sapply(temp,below_zero)
Apply below_zero over temp using lapply(): freezing_l
freezing_l<-lapply(temp,below_zero)
Are freezing_s and freezing_l identical?
identical(freezing_s,freezing_l)
可以看一下改写
# temp is already defined in the workspace
# Convert to vapply() expression
sapply(temp, max)
vapply(temp,max,numeric(1))
# Convert to vapply() expression
sapply(temp, function(x, y) { mean(x) > y }, y = 5)
vapply(temp,function(x, y){ mean(x) > y },logical(1), y = 5)
除了以上的函数之外,还有
tapply
mapply
。。。
以下是常用的一些简单函数
seq():
Generate sequences, by specifying the from, to, and by arguments.
rep():
Replicate elements of vectors and lists.
sort():
Sort a vector in ascending order. Works on numerics, but also on character strings and logicals.
rev():
Reverse the elements in a data structures for which reversal is defined.
str():
Display the structure of any R object.
append():
Merge vectors or lists.
is.*():
Check for the class of an R object.
as.*():
Convert an R object from one class to another.
unlist():
Flatten (possibly embedded) lists to produce a vector.
# Fix me
> rep(seq(1, 7, by = 2), times = 7)
[1] 1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7
> rep(seq(1, 7, by = 2), each = 7)
[1] 1 1 1 1 1 1 1 3 3 3 3 3 3 3 5 5 5 5 5 5 5 7 7 7 7 7 7 7
> a<-rep(2008:2018, times = 11)
> a
[1] 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2008 2009 2010 2011
[16] 2012 2013 2014 2015 2016 2017 2018 2008 2009 2010 2011 2012 2013 2014 2015
[31] 2016 2017 2018 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2008
[46] 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2008 2009 2010 2011 2012
[61] 2013 2014 2015 2016 2017 2018 2008 2009 2010 2011 2012 2013 2014 2015 2016
[76] 2017 2018 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2008 2009
[91] 2010 2011 2012 2013 2014 2015 2016 2017 2018 2008 2009 2010 2011 2012 2013
[106] 2014 2015 2016 2017 2018 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
[121] 2018
>
> # Create first sequence: seq1
> seq1<-seq(1,500,by=3)
>
> # Create second sequence: seq2
> seq2<-seq(1200,900,by=-7)
>
> # Calculate total sum of the sequences
> sum(seq1,seq2)
[1] 87029
grepl & grep
这两个函数其实挺常用的就是match
In their most basic form, regular expressions can be used to see whether a pattern exists inside a character string or a vector of character strings. For this purpose, you can use:
grepl(), which returns TRUE when a pattern is found in the corresponding character string.
grep(), which returns a vector of indices of the character strings that contains the pattern.
Both functions need a pattern and an x argument, where pattern is the regular expression you want to match for, and the x argument is the character vector from which matches should be sought.
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE)
grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
regexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
regexec(pattern, text, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
demo
> # The emails vector has already been defined for you
> emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
"invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")
>
> # Use grepl() to match for "edu"
>
> grepl(pattern="edu",emails)
[1] TRUE TRUE FALSE TRUE TRUE FALSE
> # Use grep() to match for "edu", save result to hits
> hits<-grep(pattern="edu",emails)
>
> # Subset emails using hits
> emails[hits]
[1] "john.doe@ivyleague.edu" "education@world.gov"
[3] "invalid.edu" "quant@bigdatacollege.edu"
You can use the caret, ^, and the dollar sign, $ to match the content located in the start and end of a string, respectively. This could take us one step closer to a correct pattern for matching only the ".edu" email addresses from our list of emails. But there's more that can be added to make the pattern more robust:
@, because a valid email must contain an at-sign.
., which matches any character (.) zero or more times (). Both the dot and the asterisk are metacharacters. You can use them to match any character between the at-sign and the ".edu" portion of an email address.
.edu$, to match the ".edu" part of the email at the end of the string. The part escapes the dot: it tells R that you want to use the . as an actual character.
对于一些特殊的符号,需要使用来进行转码,否者会被识别错误
> # The emails vector has already been defined for you
> emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
"invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")
>
> # Use grepl() to match for .edu addresses more robustly
> grepl(pattern="@.*\.edu$",emails)
[1] TRUE FALSE FALSE FALSE TRUE FALSE
>
> # Use grep() to match for .edu addresses more robustly, save result to hits
> hits<-grep(pattern="@.*\.edu$",emails)
>
>
> # Subset emails using hits
> emails[hits]
[1] "john.doe@ivyleague.edu" "quant@bigdatacollege.edu"
sub & gsub
While grep() and grepl() were used to simply check whether a regular expression could be matched with a character vector, sub() and gsub() take it one step further: you can specify a replacement argument. If inside the character vector x, the regular expression pattern is found, the matching element(s) will be replaced with replacement.sub() only replaces the first match, whereas gsub() replaces all matches.
Suppose that emails vector you've been working with is an excerpt of DataCamp's email database. Why not offer the owners of the .edu email addresses a new email address on the datacamp.edu domain? This could be quite a powerful marketing stunt: Online education is taking over traditional learning institutions! Convert your email and be a part of the new generation!
这两个函数就是替换的意思
替换指定的字符
> # The emails vector has already been defined for you
> emails <- c("john.doe@ivyleague.edu", "education@world.gov", "global@peace.org",
"invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")
>
> # Use sub() to convert the email domains to datacamp.edu
> sub("@.*\.edu$","@datacamp.edu",emails)
[1] "john.doe@datacamp.edu" "education@world.gov"
[3] "global@peace.org" "invalid.edu"
[5] "quant@datacamp.edu" "cookie.monster@sesame.tv"
日期
> # Get the current date: today
> today <- Sys.Date()
>
> # See what today looks like under the hood
> unclass(today)
[1] 18238
>
> # Get the current time: now
> now <- Sys.time()
>
> # See what now looks like under the hood
> unclass(now)
[1] 1575802501
常用的格式
Create and format dates
To create a Date object from a simple character string in R, you can use the as.Date() function. The character string has to obey a format that can be defined using a set of symbols (the examples correspond to 13 January, 1982):
%Y: 4-digit year (1982)
%y: 2-digit year (82)
%m: 2-digit month (01)
%d: 2-digit day of the month (13)
%A: weekday (Wednesday)
%a: abbreviated weekday (Wed)
%B: month (January)
%b: abbreviated month (Jan)
demo
> # Definition of character strings representing dates
> str1 <- "May 23, '96"
> str2 <- "2012-03-15"
> str3 <- "30/January/2006"
>
> # Convert the strings to dates: date1, date2, date3
> date1 <- as.Date(str1, format = "%b %d, '%y")
> date2 <- as.Date(str2)
> date3 <- as.Date(str3, format = "%d/%B/%Y")
>
> # Convert dates to formatted strings
> format(date1, "%A")
[1] "Thursday"
> format(date2, "%d")
[1] "15"
> format(date3, "%b %Y")
[1] "Jan 2006"