r - Lag doesn't see the effects of mutate on previous rows -
i seem have stumbled upon mutate/lag/ifelse behaviour cannot explain. have following (simplified) dataframe:
test <- data.frame(type = c("start", "end", "start", "start", "start", "start", "end"), stringsasfactors = false) > test type 1 start 2 end 3 start 4 start 5 start 6 start 7 start 8 end i modify column type in order have sequence of alternating start , end pairs (note in test dataframe sequences of start possible, end never repeated):
> desired type 1 start 2 end 3 start 4 end 5 start 6 end 7 start 8 end i thought achieve goal following code:
test %>% mutate(type = ifelse( type == "start" & dplyr::lag(type, n=1, default="end") == "start" & dplyr::lead(type, n=1, default="end") == "start", "end" , type)) the code should detect rows in start preceded start , followed start, in case type value changed end. after change, following start (row number 5 of test) should not matched, since previous type value end. unfortunately, output of command following:
type 1 start 2 end 3 start 4 end 5 end 6 end 7 start 8 end it's value seen lag not affected mutate. how supposed work? there way code in way lag sees effects of mutate on previous row?
versions: r version 3.2.3 (2015-12-10), dplyr_0.4.3
update: reason why above code doesn't work explained paul rougieux below: lead , lag fixed , not take account further modification. guess correct answer "it cannot done straightforwardly using dplyr".
defining lag , lead variables separately in mutate() show call ifelse(type == "start" & lag == "start" & lead == "start", "end" , type) not going work:
test <- data.frame(type = c("start", "end", "start", "start", "start", "start", "end"), stringsasfactors = false) test %>% mutate(lag = dplyr::lag(type, n=1, default="end"), lead = dplyr::lead(type, n=1, default="end"), type2 = ifelse(type == "start" & lag == "start" & lead == "start", "end" , type)) # type lag lead type2 #1 start end end start #2 end start start end #3 start end start start #4 start start start end #5 start start start end #6 start start end start #7 end start end end dplyr::mutate() modifies vector whole. lead , lag fixed , not take account further modification type vector. want `reduce()̀ function in case. check help(reduce).
Comments
Post a Comment