• r 中sub() gsub()等匹配与替换函数


    Description

    grep, grepl, regexpr, gregexpr and regexec search for matches to pattern within each element of a character vector: they differ in the format of and amount of detail in the results.

    sub and gsub perform replacement of the first and all matches respectively.

    Usage

    pattern目标字符,replacement替换字符,x对象

    
    sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
        fixed = FALSE, useBytes = FALSE)#替换第一个匹配
    
    gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
         fixed = FALSE, useBytes = FALSE)#全部替换
    
    grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
         fixed = FALSE, useBytes = FALSE, invert = FALSE)
    
    grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
          fixed = FALSE, useBytes = FALSE)
    
    regexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
            fixed = FALSE, useBytes = FALSE)
    
    gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
             fixed = FALSE, useBytes = FALSE)
    
    regexec(pattern, text, ignore.case = FALSE, perl = FALSE,
            fixed = FALSE, useBytes = FALSE)
    

    Arguments

    pattern

    character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Coerced by as.character to a character string if possible. If a character vector of length 2 or more is supplied, the first element is used with a warning. Missing values are allowed except for regexpr, gregexpr and regexec.

    x, text

    a character vector where matches are sought, or an object which can be coerced by as.character to a character vector. Long vectors are supported.

    ignore.case

    if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.表示是否忽视大小写

    perl

    logical. Should Perl-compatible regexps be used perl规则

    value

    if FALSE, a vector containing the (integer) indices of the matches determined by grep is returned, and if TRUE, a vector containing the matching elements themselves is returned.

    fixed

    logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments.

    useBytes

    logical. If TRUE the matching is done byte-by-byte rather than character-by-character. See ‘Details’.

    invert

    logical. If TRUE return indices or values for elements that do not match.

    replacement

    a replacement for matched pattern in sub and gsub. Coerced to character if possible. For fixed = FALSE this can include backreferences "\1" to "\9" to parenthesized subexpressions of pattern. For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. If a character vector of length 2 or more is supplied, the first element is used with a warning. If NA, all elements in the result corresponding to matches will be set to NA.

    Details

    Arguments which should be character strings or character vectors are coerced to character if possible.

    Each of these functions operates in one of three modes:

    fixed = TRUE: use exact matching.

    perl = TRUE: use Perl-style regular expressions.

    fixed = FALSE, perl = FALSE: use POSIX 1003.2 extended regular expressions (the default).

    See the help pages on regular expression for details of the different types of regular expressions.

    The two *sub functions differ only in that sub replaces only the first occurrence of a pattern whereas gsub replaces all occurrences. If replacement contains backreferences which are not defined in pattern the result is undefined (but most often the backreference is taken to be "").

    For regexpr, gregexpr and regexec it is an error for pattern to be NA, otherwise NA is permitted and gives an NA match.

    Both grep and grepl take missing values in x as not matching a non-missing pattern.

    The main effect of useBytes = TRUE is to avoid errors/warnings about invalid inputs and spurious matches in multibyte locales, but for regexpr it changes the interpretation of the output. It inhibits the conversion of inputs with marked encodings, and is forced if any input is found which is marked as "bytes" (see Encoding).

    Caseless matching does not make much sense for bytes in a multibyte locale, and you should expect it only to work for ASCII characters if useBytes = TRUE.

    regexpr and gregexpr with perl = TRUE allow Python-style named captures, but not for long vector inputs.

    Invalid inputs in the current locale are warned about up to 5 times.

    Caseless matching with perl = TRUE for non-ASCII characters depends on the PCRE library being compiled with ‘Unicode property support’, which PCRE2 is by default.

  • 相关阅读:
    【原】用IDEA远程Debug Tomcat服务
    【原】getInputStream()与getParameterMap()获得Post请求的数据区别
    【原】使用Eclipse远程Debug测试环境
    【原】配置MySQL服务器端的字符集
    【原】Spring整合Redis(第三篇)—盘点SDR搭建中易出现的错误
    【原】Spring整合Redis(第二篇)—SDR环境搭建具体步骤
    【原】Spring整合Redis(第一篇)—SDR简述
    【转】InitializingBean的作用
    【原】Redis windows下的环境搭建
    【原】Maven解决jar冲突调试步骤:第三方组件引用不符合要求的javassit导致的相关异常
  • 原文地址:https://www.cnblogs.com/impw/p/13029395.html
Copyright © 2020-2023  润新知