How can I avoid complex for loops?

Clash Royale CLAN TAG#URR8PPP
I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.
I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.
Here is some simple example data:
A <- data.frame('Area' = c(4, 6, 5),
'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')
What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.
Currently I use this for loop:
sort_files <- list()
n <- 1
for(i in files)
name <- frames[n]
nom <- paste(name,'_sorted', sep = '')
data <- i[order(-i$flow),]
area <- sum(i$Area)
total <- sum(i$flow)
data$area_portion <- (data$Area/area)*100
data$flow_portion <- (data$flow/total)*100
data$cum_area <- cumsum(data$area_portion)
data$cum_flow <- cumsum(data$flow_portion)
assign(nom, data)
df <- get(paste(name,'_sorted', sep = ''))
sort_files[[nom]] <- df
n <- n + 1
Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.
How can I simplify and neaten up the above code?
This is the expected output:
sort_files
$`A_sorted`
Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000
$B_sorted
Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100
r for-loop
add a comment |
I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.
I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.
Here is some simple example data:
A <- data.frame('Area' = c(4, 6, 5),
'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')
What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.
Currently I use this for loop:
sort_files <- list()
n <- 1
for(i in files)
name <- frames[n]
nom <- paste(name,'_sorted', sep = '')
data <- i[order(-i$flow),]
area <- sum(i$Area)
total <- sum(i$flow)
data$area_portion <- (data$Area/area)*100
data$flow_portion <- (data$flow/total)*100
data$cum_area <- cumsum(data$area_portion)
data$cum_flow <- cumsum(data$flow_portion)
assign(nom, data)
df <- get(paste(name,'_sorted', sep = ''))
sort_files[[nom]] <- df
n <- n + 1
Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.
How can I simplify and neaten up the above code?
This is the expected output:
sort_files
$`A_sorted`
Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000
$B_sorted
Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100
r for-loop
2
You didn't definepos_filesin the script above. Alsonamesis a function so you better not define an object with that name.
– markus
Jan 31 at 8:59
1
av_portionis also missing, although I understand it's the mean oArea.filesis also a R function.
– patL
Jan 31 at 9:03
2
@tom91: can you add the expected output too?
– Tung
Jan 31 at 9:09
@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.
– tom91
Jan 31 at 9:12
@Tung Expected output has been added to the bottom
– tom91
Jan 31 at 9:14
add a comment |
I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.
I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.
Here is some simple example data:
A <- data.frame('Area' = c(4, 6, 5),
'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')
What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.
Currently I use this for loop:
sort_files <- list()
n <- 1
for(i in files)
name <- frames[n]
nom <- paste(name,'_sorted', sep = '')
data <- i[order(-i$flow),]
area <- sum(i$Area)
total <- sum(i$flow)
data$area_portion <- (data$Area/area)*100
data$flow_portion <- (data$flow/total)*100
data$cum_area <- cumsum(data$area_portion)
data$cum_flow <- cumsum(data$flow_portion)
assign(nom, data)
df <- get(paste(name,'_sorted', sep = ''))
sort_files[[nom]] <- df
n <- n + 1
Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.
How can I simplify and neaten up the above code?
This is the expected output:
sort_files
$`A_sorted`
Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000
$B_sorted
Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100
r for-loop
I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.
I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.
Here is some simple example data:
A <- data.frame('Area' = c(4, 6, 5),
'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')
What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.
Currently I use this for loop:
sort_files <- list()
n <- 1
for(i in files)
name <- frames[n]
nom <- paste(name,'_sorted', sep = '')
data <- i[order(-i$flow),]
area <- sum(i$Area)
total <- sum(i$flow)
data$area_portion <- (data$Area/area)*100
data$flow_portion <- (data$flow/total)*100
data$cum_area <- cumsum(data$area_portion)
data$cum_flow <- cumsum(data$flow_portion)
assign(nom, data)
df <- get(paste(name,'_sorted', sep = ''))
sort_files[[nom]] <- df
n <- n + 1
Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.
How can I simplify and neaten up the above code?
This is the expected output:
sort_files
$`A_sorted`
Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000
$B_sorted
Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100
r for-loop
r for-loop
edited Jan 31 at 11:59
double-beep
2,39641027
2,39641027
asked Jan 31 at 8:55
tom91tom91
16611
16611
2
You didn't definepos_filesin the script above. Alsonamesis a function so you better not define an object with that name.
– markus
Jan 31 at 8:59
1
av_portionis also missing, although I understand it's the mean oArea.filesis also a R function.
– patL
Jan 31 at 9:03
2
@tom91: can you add the expected output too?
– Tung
Jan 31 at 9:09
@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.
– tom91
Jan 31 at 9:12
@Tung Expected output has been added to the bottom
– tom91
Jan 31 at 9:14
add a comment |
2
You didn't definepos_filesin the script above. Alsonamesis a function so you better not define an object with that name.
– markus
Jan 31 at 8:59
1
av_portionis also missing, although I understand it's the mean oArea.filesis also a R function.
– patL
Jan 31 at 9:03
2
@tom91: can you add the expected output too?
– Tung
Jan 31 at 9:09
@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.
– tom91
Jan 31 at 9:12
@Tung Expected output has been added to the bottom
– tom91
Jan 31 at 9:14
2
2
You didn't define
pos_files in the script above. Also names is a function so you better not define an object with that name.– markus
Jan 31 at 8:59
You didn't define
pos_files in the script above. Also names is a function so you better not define an object with that name.– markus
Jan 31 at 8:59
1
1
av_portion is also missing, although I understand it's the mean o Area. files is also a R function.– patL
Jan 31 at 9:03
av_portion is also missing, although I understand it's the mean o Area. files is also a R function.– patL
Jan 31 at 9:03
2
2
@tom91: can you add the expected output too?
– Tung
Jan 31 at 9:09
@tom91: can you add the expected output too?
– Tung
Jan 31 at 9:09
@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.
– tom91
Jan 31 at 9:12
@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.
– tom91
Jan 31 at 9:12
@Tung Expected output has been added to the bottom
– tom91
Jan 31 at 9:14
@Tung Expected output has been added to the bottom
– tom91
Jan 31 at 9:14
add a comment |
2 Answers
2
active
oldest
votes
Using lapply to loop over files and dplyr mutate to add new columns
library(dplyr)
setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))
#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
Or completely going tidyverse way we can change lapply with map and setNames with set_names
library(tidyverse)
map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))
Updated the tidyverse approach following some pointers from @Moody_Mudskipper.
This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?
– tom91
Jan 31 at 9:34
1
@tom91 In this case not much benefit I would say. But some people findtidyversemore readable and easy to understand.
– Ronak Shah
Jan 31 at 9:37
some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can usestr_c(it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlistframes. (3) To make avoid these embedded parentheses over several lines you could put theset_namesafter a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4)function(x) x %>%can be replaced by a functional chain. %>%.
– Moody_Mudskipper
Jan 31 at 12:34
you would end up with something starting withmap(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...
– Moody_Mudskipper
Jan 31 at 12:35
1
@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)
– Ronak Shah
Jan 31 at 13:04
|
show 1 more comment
You could also define a function first ..
f <- function(data)
# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]
# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
.. and use lapply to, ahhm, apply f to your list
out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
If you want to change the names of out you can use setNames
out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))
2
Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!
– tom91
Jan 31 at 9:36
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54456701%2fhow-can-i-avoid-complex-for-loops%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using lapply to loop over files and dplyr mutate to add new columns
library(dplyr)
setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))
#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
Or completely going tidyverse way we can change lapply with map and setNames with set_names
library(tidyverse)
map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))
Updated the tidyverse approach following some pointers from @Moody_Mudskipper.
This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?
– tom91
Jan 31 at 9:34
1
@tom91 In this case not much benefit I would say. But some people findtidyversemore readable and easy to understand.
– Ronak Shah
Jan 31 at 9:37
some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can usestr_c(it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlistframes. (3) To make avoid these embedded parentheses over several lines you could put theset_namesafter a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4)function(x) x %>%can be replaced by a functional chain. %>%.
– Moody_Mudskipper
Jan 31 at 12:34
you would end up with something starting withmap(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...
– Moody_Mudskipper
Jan 31 at 12:35
1
@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)
– Ronak Shah
Jan 31 at 13:04
|
show 1 more comment
Using lapply to loop over files and dplyr mutate to add new columns
library(dplyr)
setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))
#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
Or completely going tidyverse way we can change lapply with map and setNames with set_names
library(tidyverse)
map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))
Updated the tidyverse approach following some pointers from @Moody_Mudskipper.
This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?
– tom91
Jan 31 at 9:34
1
@tom91 In this case not much benefit I would say. But some people findtidyversemore readable and easy to understand.
– Ronak Shah
Jan 31 at 9:37
some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can usestr_c(it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlistframes. (3) To make avoid these embedded parentheses over several lines you could put theset_namesafter a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4)function(x) x %>%can be replaced by a functional chain. %>%.
– Moody_Mudskipper
Jan 31 at 12:34
you would end up with something starting withmap(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...
– Moody_Mudskipper
Jan 31 at 12:35
1
@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)
– Ronak Shah
Jan 31 at 13:04
|
show 1 more comment
Using lapply to loop over files and dplyr mutate to add new columns
library(dplyr)
setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))
#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
Or completely going tidyverse way we can change lapply with map and setNames with set_names
library(tidyverse)
map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))
Updated the tidyverse approach following some pointers from @Moody_Mudskipper.
Using lapply to loop over files and dplyr mutate to add new columns
library(dplyr)
setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))
#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
Or completely going tidyverse way we can change lapply with map and setNames with set_names
library(tidyverse)
map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))
Updated the tidyverse approach following some pointers from @Moody_Mudskipper.
edited Jan 31 at 13:03
answered Jan 31 at 9:21
Ronak ShahRonak Shah
38.6k104161
38.6k104161
This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?
– tom91
Jan 31 at 9:34
1
@tom91 In this case not much benefit I would say. But some people findtidyversemore readable and easy to understand.
– Ronak Shah
Jan 31 at 9:37
some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can usestr_c(it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlistframes. (3) To make avoid these embedded parentheses over several lines you could put theset_namesafter a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4)function(x) x %>%can be replaced by a functional chain. %>%.
– Moody_Mudskipper
Jan 31 at 12:34
you would end up with something starting withmap(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...
– Moody_Mudskipper
Jan 31 at 12:35
1
@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)
– Ronak Shah
Jan 31 at 13:04
|
show 1 more comment
This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?
– tom91
Jan 31 at 9:34
1
@tom91 In this case not much benefit I would say. But some people findtidyversemore readable and easy to understand.
– Ronak Shah
Jan 31 at 9:37
some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can usestr_c(it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlistframes. (3) To make avoid these embedded parentheses over several lines you could put theset_namesafter a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4)function(x) x %>%can be replaced by a functional chain. %>%.
– Moody_Mudskipper
Jan 31 at 12:34
you would end up with something starting withmap(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...
– Moody_Mudskipper
Jan 31 at 12:35
1
@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)
– Ronak Shah
Jan 31 at 13:04
This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?
– tom91
Jan 31 at 9:34
This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?
– tom91
Jan 31 at 9:34
1
1
@tom91 In this case not much benefit I would say. But some people find
tidyverse more readable and easy to understand.– Ronak Shah
Jan 31 at 9:37
@tom91 In this case not much benefit I would say. But some people find
tidyverse more readable and easy to understand.– Ronak Shah
Jan 31 at 9:37
some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use
str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.– Moody_Mudskipper
Jan 31 at 12:34
some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use
str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.– Moody_Mudskipper
Jan 31 at 12:34
you would end up with something starting with
map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...– Moody_Mudskipper
Jan 31 at 12:35
you would end up with something starting with
map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...– Moody_Mudskipper
Jan 31 at 12:35
1
1
@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)
– Ronak Shah
Jan 31 at 13:04
@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)
– Ronak Shah
Jan 31 at 13:04
|
show 1 more comment
You could also define a function first ..
f <- function(data)
# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]
# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
.. and use lapply to, ahhm, apply f to your list
out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
If you want to change the names of out you can use setNames
out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))
2
Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!
– tom91
Jan 31 at 9:36
add a comment |
You could also define a function first ..
f <- function(data)
# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]
# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
.. and use lapply to, ahhm, apply f to your list
out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
If you want to change the names of out you can use setNames
out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))
2
Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!
– tom91
Jan 31 at 9:36
add a comment |
You could also define a function first ..
f <- function(data)
# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]
# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
.. and use lapply to, ahhm, apply f to your list
out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
If you want to change the names of out you can use setNames
out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))
You could also define a function first ..
f <- function(data)
# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]
# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
.. and use lapply to, ahhm, apply f to your list
out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
If you want to change the names of out you can use setNames
out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))
edited Jan 31 at 9:56
answered Jan 31 at 9:25
markusmarkus
13.1k1234
13.1k1234
2
Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!
– tom91
Jan 31 at 9:36
add a comment |
2
Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!
– tom91
Jan 31 at 9:36
2
2
Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!
– tom91
Jan 31 at 9:36
Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!
– tom91
Jan 31 at 9:36
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54456701%2fhow-can-i-avoid-complex-for-loops%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
You didn't define
pos_filesin the script above. Alsonamesis a function so you better not define an object with that name.– markus
Jan 31 at 8:59
1
av_portionis also missing, although I understand it's the mean oArea.filesis also a R function.– patL
Jan 31 at 9:03
2
@tom91: can you add the expected output too?
– Tung
Jan 31 at 9:09
@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.
– tom91
Jan 31 at 9:12
@Tung Expected output has been added to the bottom
– tom91
Jan 31 at 9:14