「within」with(), within() 和 transform()的简单比较

within

`with()`, `within()` 和 `transform()`的简单比较

在R中，初次了解学习data.frame时，你会发现，为构造一个列向量，你需要多次重复输入数据框的名称，如下所示：

library(MASS)
anorexia$wtDiff <- anorexia$Postwt - anorexia$Prewt #多次重复键入数据框名称

事实上，无论何时，当你看到一遍又一遍重复的代码块时，你都需要思考，这块代码是否需要重写，是否有更为简洁的代码可取而代之？因此，当你为遇到上述情况而苦恼烦躁时，偶然间发现attach()函数，幸福感一定会油然而生！但是应用attach()函数之后，你会发现该函数的副作用太过令人烦躁头疼，即常常需要花更多的时间进行debugg。

attach(anorexia)
anorexia$wtDiff <- Postwt - Prew   #变量名称的书写错误
detach(anorexia)

在上面的代码片段中，变量名称的书写错误会导致第二行代码及之后的代码都无法运行，即detach()函数未被执行。随后，在修复书写错误之后，重新运行代码，此时anorexia在搜索路径中有两次。此时，存在的问题是，detach()函数仅运行一次，这会导致搜索路径中还有anorexia，这很容易造成后面数据之间覆盖之类的问题，而且此问题有时候比较难以发现。

根据上述问题，本文着重介绍with(), within() 和 transform()。这三个函数均能方便对数据框进行操作。例如，添加或覆盖某一列向量至数据框中。

# 数据框的单个改动，with()的代码相对简洁
anorexia$wtDiff <- with(anorexia, Postwt - Prewt)
anorexia <- within(anorexia, wtDiff2 <- Postwt - Prewt)
anorexia <- transform(anorexia, wtDiff3 = Postwt - Prewt)

# 数据框的多个改动，with()的代码相对冗长繁琐，推荐使用 within() 和 transform()
fahrenheit_to_celcius <- function(f) (f - 32) / 1.8

airquality[c("cTemp", "logOzone", "MonthName")] <- with(airquality, list(
  fahrenheit_to_celcius(Temp),
  log(Ozone),
  month.abb[Month]
))

airquality <- within(airquality,
{
  cTemp2     <- fahrenheit_to_celcius(Temp)
  logOzone2  <- log(Ozone)
  MonthName2 <- month.abb[Month]
})

airquality <- transform(airquality,
  cTemp3     = fahrenheit_to_celcius(Temp),
  logOzone3  = log(Ozone),
  MonthName3 = month.abb[Month]
)

`with()`和`within()` 的简单比较

用法：使用 list 或 data frame中items (variables) 评估执行R表达式。

with(data, expr, ...)
within(data, expr, ...)

# data  常用的数据格式有list 或 data frame。但是对于with()函数，它还可以是 an environment 或 an integer as in sys.call
# expr  使用数据内容进行评估执行的一个或多个表达式。注意，如果有多个表达式，则需要用花括号括起来。

?sys.call  ### Functions to Access the Function Call Stack(访问函数调用堆栈的函数)

with()函数是一个泛型函数，由数据构建的本地环境，评估执行R的表达式(命令)。环境将调用者的环境作为其父环境。这对于简化调用建模函数非常有用。（注意：如果data本身就是an environment，则它与它已存在的父环境一起使用）

注意，expr仅在构建的环境中工作，而不是当前用户的工作空间(workspace)

within()函数与with()函数类似，两者的区别在于，within()函数在评估执行R的表达式(命令)之后检查环境，并对数据的副本(a copy of data)做出相应的更改，然后再返回带有这些更改内容的新对象。 within()函数类似 transform的另一种形式。

返回值：

with()函数：返回评估执行R表达式的值

within()函数：返回修改对象

> install.packages("openintro")
> library(openintro)
> data(marioKart)
> names(marioKart)
 [1] "ID"         "duration"   "nBids"      "cond"      
 [5] "startPr"    "shipPr"     "totalPr"    "shipSp"    
 [9] "sellerRate" "stockPhoto" "wheels"     "title"     
> dim(marioKart)
[1] 143  12
> 
> #删除两个异常值
> mk0 <- marioKart[marioKart$totalPr < 100,]
> 
> 
> #创建图形
> with(mk0, {
+            boxplot(totalPr ~ wheels)
+            points(wheels+1.1, totalPr, col=4)
+           })
> 
> 
> #删除一个列向量
> mk2 <- within(mk0, rm(title))
> names(mk2)
 [1] "ID"         "duration"   "nBids"      "cond"      
 [5] "startPr"    "shipPr"     "totalPr"    "shipSp"    
 [9] "sellerRate" "stockPhoto" "wheels"    
> 
> 
> #更改值
> mk0$totalPr[50]
[1] 59.88
> mk0$startPr[25]
[1] 0.01
> mk3 <- within(mk0, { # Would not typically do...
+                      # this is just an example
+                     totalPr[50] <- 88.59
+                     startPr[25] <- 85.00
+                    })
> mk3$totalPr[50]
[1] 88.59
> mk3$startPr[25]
[1] 85
> 
> 
> #创建一个列向量
> mk4 <- within(mk0, endPrice <- totalPr - shipPr)
> all.equal(mk4$totalPr - mk4$shipPr, mk4$endPrice)
[1] TRUE
> names(mk4)
 [1] "ID"         "duration"   "nBids"      "cond"      
 [5] "startPr"    "shipPr"     "totalPr"    "shipSp"    
 [9] "sellerRate" "stockPhoto" "wheels"     "title"     
[13] "endPrice"

附：

with(data, expr, ...) --->> Evaluate an R expression in an environment constructed from data, possibly modifying (a copy of) the original data.

within(data, expr, ...) --->> Evaluate an R expression in an environment constructed from data, possibly modifying (a copy of) the original data.

transform(`_data`, ...) --->> transform is a generic function, which—at least currently—only does anything useful with data frames. transform.default converts its first argument to a data frame if possible and calls transform.data.frame.

head(mtcars)
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

mtcars$mpg[mtcars$cyl == 8  &  mtcars$disp > 350]
#更简洁的书写
with(mtcars, mpg[cyl == 8  &  disp > 350])
# [1] 18.7 14.3 10.4 10.4 14.7 19.2 15.8


# examples from glm:
with(data.frame(u = c(5,10,15,20,30,40,60,80,100),
                lot1 = c(118,58,42,35,27,25,21,19,18),
                lot2 = c(69,35,26,21,18,16,13,12,12)),
    list(summary(glm(lot1 ~ log(u), family = Gamma)),
         summary(glm(lot2 ~ log(u), family = Gamma))))


head(airquality)
#   Ozone Solar.R Wind Temp Month Day
# 1    41     190  7.4   67     5   1
# 2    36     118  8.0   72     5   2
# 3    12     149 12.6   74     5   3
# 4    18     313 11.5   62     5   4
# 5    NA      NA 14.3   56     5   5
# 6    28      NA 14.9   66     5   6

aq <- within(airquality, {     # 可更改多个变量
    lOzone <- log(Ozone)
    Month <- factor(month.abb[Month])
    cTemp <- round((Temp - 32) * 5/9, 1) # 将华氏温度转变为摄氏度
    S.cT <- Solar.R / cTemp  # 使用新创建的变量
    rm(Day, Temp)
})
head(aq)
#   Ozone Solar.R Wind Month      S.cT cTemp   lOzone
# 1    41     190  7.4   May  9.793814  19.4 3.713572
# 2    36     118  8.0   May  5.315315  22.2 3.583519
# 3    12     149 12.6   May  6.394850  23.3 2.484907
# 4    18     313 11.5   May 18.742515  16.7 2.890372
# 5    NA      NA 14.3   May        NA  13.3       NA
# 6    28      NA 14.9   May        NA  18.9 3.332205

# example from boxplot:
head(ToothGrowth)
#    len supp dose
# 1  4.2   VC  0.5
# 2 11.5   VC  0.5
# 3  7.3   VC  0.5
# 4  5.8   VC  0.5
# 5  6.4   VC  0.5
# 6 10.0   VC  0.5

with(ToothGrowth, {
    boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2,
            subset = (supp == "VC"), col = "yellow",
            main = "Guinea Pigs' Tooth Growth",
            xlab = "Vitamin C dose mg",
            ylab = "tooth length", ylim = c(0, 35))
    boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2,
            subset = supp == "OJ", col = "orange")
    legend(2, 9, c("Ascorbic acid", "Orange juice"),
           fill = c("yellow", "orange"))
})


# 避免子集参数的另一种形式:
with(subset(ToothGrowth, supp == "VC"),
     boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2,
             col = "yellow", main = "Guinea Pigs' Tooth Growth",
             xlab = "Vitamin C dose mg",
             ylab = "tooth length", ylim = c(0, 35)))
with(subset(ToothGrowth,  supp == "OJ"),
     boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2,
             col = "orange"))
legend(2, 9, c("Ascorbic acid", "Orange juice"),
       fill = c("yellow", "orange"))

`with()`常见应用之数据的拆分－应用－合并

> (frogger_scores <- data.frame(
+   player = rep(c("Tom", "Dick", "Harry"), times = c(2, 5, 3)),
+   score = round(rlnorm(10, 8), -1)))
   player score
1     Tom  4600
2     Tom  2810
3    Dick 16430
4    Dick  1860
5    Dick  1510
6    Dick  2150
7    Dick  1550
8   Harry 15040
9   Harry  1330
10  Harry 12590

> #计算每个玩家的平均得分
> ##三步法：拆分－应用－合并
> ### 首先，按玩家来分开数据集
> (scores_by_player <- with(
+   frogger_scores,
+   split(score, player)))
$`Dick`
[1] 18110  6180  5550 19770  4800

$Harry
[1]  910 8120  620

$Tom
[1] 3530 2800

> ### 然后，将（mean）函数应用于每个元素
> (list_of_means_by_player <- lAPPly(scores_by_player, mean))
$`Dick`
[1] 10882

$Harry
[1] 3216.667

$Tom
[1] 3165

> ### 最后，把结果合并到单个向量中
> (mean_by_player <- unlist(list_of_means_by_player))
     Dick     Harry       Tom 
10882.000  3216.667  3165.000 
> 
> ##或者使用一步法：拆分－应用－合并
> with(frogger_scores, tapply(score, player, mean))
     Dick     Harry       Tom 
10882.000  3216.667  3165.000

link1: http://rfunction.com/archives/2182

link2: https://www.r-bloggers.com/friday-function-triple-bill-with-vs-within-vs-transform/

with(), within() 和 transform()的简单比较

within

`with()`, `within()` 和 `transform()`的简单比较

`with()`和`within()` 的简单比较

`with()`常见应用之数据的拆分－应用－合并

相关阅读

栏目导航

推荐阅读

热门阅读

with(), within() 和 transform()的简单比较

within

with(), within() 和 transform()的简单比较

with()和within() 的简单比较

with()常见应用之数据的拆分－应用－合并

相关阅读

栏目导航

推荐阅读

热门阅读

`with()`, `within()` 和 `transform()`的简单比较

`with()`和`within()` 的简单比较

`with()`常见应用之数据的拆分－应用－合并