within
with()
, within()
和 transform()
的简单比较
在R中,初次了解学习data.frame时,你会发现,为构造一个列向量,你需要多次重复输入数据框的名称,如下所示:
library(MASS)
anorexia$wtDiff <- anorexia$Postwt - anorexia$Prewt #多次重复键入数据框名称
事实上,无论何时,当你看到一遍又一遍重复的代码块时,你都需要思考,这块代码是否需要重写,是否有更为简洁的代码可取而代之?因此,当你为遇到上述情况而苦恼烦躁时,偶然间发现attach()
函数,幸福感一定会油然而生!但是应用attach()
函数之后,你会发现该函数的副作用太过令人烦躁头疼,即常常需要花更多的时间进行debugg。
attach(anorexia)
anorexia$wtDiff <- Postwt - Prew #变量名称的书写错误
detach(anorexia)
在上面的代码片段中,变量名称的书写错误会导致第二行代码及之后的代码都无法运行,即detach()
函数未被执行。随后,在修复书写错误之后,重新运行代码,此时anorexia
在搜索路径中有两次。此时,存在的问题是,detach()
函数仅运行一次,这会导致搜索路径中还有anorexia
,这很容易造成后面数据之间覆盖之类的问题,而且此问题有时候比较难以发现。
根据上述问题,本文着重介绍with()
, within()
和 transform()
。这三个函数均能方便对数据框进行操作。例如,添加或覆盖某一列向量至数据框中。
# 数据框的单个改动,with()的代码相对简洁
anorexia$wtDiff <- with(anorexia, Postwt - Prewt)
anorexia <- within(anorexia, wtDiff2 <- Postwt - Prewt)
anorexia <- transform(anorexia, wtDiff3 = Postwt - Prewt)
# 数据框的多个改动,with()的代码相对冗长繁琐,推荐使用 within() 和 transform()
fahrenheit_to_celcius <- function(f) (f - 32) / 1.8
airquality[c("cTemp", "logOzone", "MonthName")] <- with(airquality, list(
fahrenheit_to_celcius(Temp),
log(Ozone),
month.abb[Month]
))
airquality <- within(airquality,
{
cTemp2 <- fahrenheit_to_celcius(Temp)
logOzone2 <- log(Ozone)
MonthName2 <- month.abb[Month]
})
airquality <- transform(airquality,
cTemp3 = fahrenheit_to_celcius(Temp),
logOzone3 = log(Ozone),
MonthName3 = month.abb[Month]
)
with()
和within()
的简单比较
用法:使用 list 或 data frame中items (variables) 评估执行R表达式。
with(data, expr, ...)
within(data, expr, ...)
# data 常用的数据格式有list 或 data frame。但是对于with()函数,它还可以是 an environment 或 an integer as in sys.call
# expr 使用数据内容进行评估执行的一个或多个表达式。注意,如果有多个表达式,则需要用花括号括起来。
?sys.call ### Functions to Access the Function Call Stack(访问函数调用堆栈的函数)
with()
函数是一个泛型函数,由数据构建的本地环境,评估执行R的表达式(命令)。环境将调用者的环境作为其父环境。这对于简化调用建模函数非常有用。(注意:如果data
本身就是an environment,则它与它已存在的父环境一起使用)
注意,expr
仅在构建的环境中工作,而不是当前用户的工作空间(workspace)
within()
函数与with()
函数类似,两者的区别在于,within()
函数在评估执行R的表达式(命令)之后检查环境,并对数据的副本(a copy of data
)做出相应的更改,然后再返回带有这些更改内容的新对象。 within()
函数类似 transform
的另一种形式。
返回值:
with()
函数: 返回评估执行R表达式的值
within()
函数: 返回修改对象
> install.packages("openintro")
> library(openintro)
> data(marioKart)
> names(marioKart)
[1] "ID" "duration" "nBids" "cond"
[5] "startPr" "shipPr" "totalPr" "shipSp"
[9] "sellerRate" "stockPhoto" "wheels" "title"
> dim(marioKart)
[1] 143 12
>
> #删除两个异常值
> mk0 <- marioKart[marioKart$totalPr < 100,]
>
>
> #创建图形
> with(mk0, {
+ boxplot(totalPr ~ wheels)
+ points(wheels+1.1, totalPr, col=4)
+ })
>
>
> #删除一个列向量
> mk2 <- within(mk0, rm(title))
> names(mk2)
[1] "ID" "duration" "nBids" "cond"
[5] "startPr" "shipPr" "totalPr" "shipSp"
[9] "sellerRate" "stockPhoto" "wheels"
>
>
> #更改值
> mk0$totalPr[50]
[1] 59.88
> mk0$startPr[25]
[1] 0.01
> mk3 <- within(mk0, { # Would not typically do...
+ # this is just an example
+ totalPr[50] <- 88.59
+ startPr[25] <- 85.00
+ })
> mk3$totalPr[50]
[1] 88.59
> mk3$startPr[25]
[1] 85
>
>
> #创建一个列向量
> mk4 <- within(mk0, endPrice <- totalPr - shipPr)
> all.equal(mk4$totalPr - mk4$shipPr, mk4$endPrice)
[1] TRUE
> names(mk4)
[1] "ID" "duration" "nBids" "cond"
[5] "startPr" "shipPr" "totalPr" "shipSp"
[9] "sellerRate" "stockPhoto" "wheels" "title"
[13] "endPrice"
附:
with(data, expr, ...) --->> Evaluate an R expression in an environment constructed from data, possibly modifying (a copy of) the original data.
within(data, expr, ...) --->> Evaluate an R expression in an environment constructed from data, possibly modifying (a copy of) the original data.
transform(`_data`, ...) --->> transform is a generic function, which—at least currently—only does anything useful with data frames. transform.default converts its first argument to a data frame if possible and calls transform.data.frame.
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars$mpg[mtcars$cyl == 8 & mtcars$disp > 350]
#更简洁的书写
with(mtcars, mpg[cyl == 8 & disp > 350])
# [1] 18.7 14.3 10.4 10.4 14.7 19.2 15.8
# examples from glm:
with(data.frame(u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12)),
list(summary(glm(lot1 ~ log(u), family = Gamma)),
summary(glm(lot2 ~ log(u), family = Gamma))))
head(airquality)
# Ozone Solar.R Wind Temp Month Day
# 1 41 190 7.4 67 5 1
# 2 36 118 8.0 72 5 2
# 3 12 149 12.6 74 5 3
# 4 18 313 11.5 62 5 4
# 5 NA NA 14.3 56 5 5
# 6 28 NA 14.9 66 5 6
aq <- within(airquality, { # 可更改多个变量
lOzone <- log(Ozone)
Month <- factor(month.abb[Month])
cTemp <- round((Temp - 32) * 5/9, 1) # 将华氏温度转变为摄氏度
S.cT <- Solar.R / cTemp # 使用新创建的变量
rm(Day, Temp)
})
head(aq)
# Ozone Solar.R Wind Month S.cT cTemp lOzone
# 1 41 190 7.4 May 9.793814 19.4 3.713572
# 2 36 118 8.0 May 5.315315 22.2 3.583519
# 3 12 149 12.6 May 6.394850 23.3 2.484907
# 4 18 313 11.5 May 18.742515 16.7 2.890372
# 5 NA NA 14.3 May NA 13.3 NA
# 6 28 NA 14.9 May NA 18.9 3.332205
# example from boxplot:
head(ToothGrowth)
# len supp dose
# 1 4.2 VC 0.5
# 2 11.5 VC 0.5
# 3 7.3 VC 0.5
# 4 5.8 VC 0.5
# 5 6.4 VC 0.5
# 6 10.0 VC 0.5
with(ToothGrowth, {
boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2,
subset = (supp == "VC"), col = "yellow",
main = "Guinea Pigs' Tooth Growth",
xlab = "Vitamin C dose mg",
ylab = "tooth length", ylim = c(0, 35))
boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2,
subset = supp == "OJ", col = "orange")
legend(2, 9, c("Ascorbic acid", "Orange juice"),
fill = c("yellow", "orange"))
})
# 避免子集参数的另一种形式:
with(subset(ToothGrowth, supp == "VC"),
boxplot(len ~ dose, boxwex = 0.25, at = 1:3 - 0.2,
col = "yellow", main = "Guinea Pigs' Tooth Growth",
xlab = "Vitamin C dose mg",
ylab = "tooth length", ylim = c(0, 35)))
with(subset(ToothGrowth, supp == "OJ"),
boxplot(len ~ dose, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2,
col = "orange"))
legend(2, 9, c("Ascorbic acid", "Orange juice"),
fill = c("yellow", "orange"))
with()
常见应用之数据的拆分-应用-合并
> (frogger_scores <- data.frame(
+ player = rep(c("Tom", "Dick", "Harry"), times = c(2, 5, 3)),
+ score = round(rlnorm(10, 8), -1)))
player score
1 Tom 4600
2 Tom 2810
3 Dick 16430
4 Dick 1860
5 Dick 1510
6 Dick 2150
7 Dick 1550
8 Harry 15040
9 Harry 1330
10 Harry 12590
> #计算每个玩家的平均得分
> ##三步法:拆分-应用-合并
> ### 首先,按玩家来分开数据集
> (scores_by_player <- with(
+ frogger_scores,
+ split(score, player)))
$`Dick`
[1] 18110 6180 5550 19770 4800
$Harry
[1] 910 8120 620
$Tom
[1] 3530 2800
> ### 然后,将(mean)函数应用于每个元素
> (list_of_means_by_player <- lAPPly(scores_by_player, mean))
$`Dick`
[1] 10882
$Harry
[1] 3216.667
$Tom
[1] 3165
> ### 最后,把结果合并到单个向量中
> (mean_by_player <- unlist(list_of_means_by_player))
Dick Harry Tom
10882.000 3216.667 3165.000
>
> ##或者使用一步法:拆分-应用-合并
> with(frogger_scores, tapply(score, player, mean))
Dick Harry Tom
10882.000 3216.667 3165.000
link1: http://rfunction.com/archives/2182
link2: https://www.r-bloggers.com/friday-function-triple-bill-with-vs-within-vs-transform/
相关阅读
Spring-AOP @AspectJ切点函数之within()
概述 语法 实例 withincomxgjNaiveWaiter withincomxgj withincomxgj withincomxgjMark 概述 通过类匹配模式串声明切点