How can I avoid complex for loops?

I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.

I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.

Here is some simple example data:

A <- data.frame('Area' = c(4, 6, 5),
 'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
 'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')

What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.

Currently I use this for loop:

sort_files <- list()
n <- 1
for(i in files)
 name <- frames[n]
 nom <- paste(name,'_sorted', sep = '')
 data <- i[order(-i$flow),]
 area <- sum(i$Area)
 total <- sum(i$flow)
 data$area_portion <- (data$Area/area)*100
 data$flow_portion <- (data$flow/total)*100
 data$cum_area <- cumsum(data$area_portion)
 data$cum_flow <- cumsum(data$flow_portion)
 assign(nom, data)
 df <- get(paste(name,'_sorted', sep = ''))
 sort_files[[nom]] <- df
 n <- n + 1

Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.

How can I simplify and neaten up the above code?

This is the expected output:

sort_files

$`A_sorted`
 Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000

$B_sorted
 Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100

edited Jan 31 at 11:59

double-beep

2,39641027

asked Jan 31 at 8:55

tom91

16611

2

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
Jan 31 at 8:59

1

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
Jan 31 at 9:03

2

@tom91: can you add the expected output too?

– Tung
Jan 31 at 9:09

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
Jan 31 at 9:12

@Tung Expected output has been added to the bottom

– tom91
Jan 31 at 9:14

add a comment |

I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.

Here is some simple example data:

A <- data.frame('Area' = c(4, 6, 5),
 'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
 'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')

Currently I use this for loop:

sort_files <- list()
n <- 1
for(i in files)
 name <- frames[n]
 nom <- paste(name,'_sorted', sep = '')
 data <- i[order(-i$flow),]
 area <- sum(i$Area)
 total <- sum(i$flow)
 data$area_portion <- (data$Area/area)*100
 data$flow_portion <- (data$flow/total)*100
 data$cum_area <- cumsum(data$area_portion)
 data$cum_flow <- cumsum(data$flow_portion)
 assign(nom, data)
 df <- get(paste(name,'_sorted', sep = ''))
 sort_files[[nom]] <- df
 n <- n + 1

Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.

How can I simplify and neaten up the above code?

This is the expected output:

sort_files

$`A_sorted`
 Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000

$B_sorted
 Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100

edited Jan 31 at 11:59

double-beep

2,39641027

asked Jan 31 at 8:55

tom91

16611

2

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
Jan 31 at 8:59

1

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
Jan 31 at 9:03

2

@tom91: can you add the expected output too?

– Tung
Jan 31 at 9:09

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
Jan 31 at 9:12

@Tung Expected output has been added to the bottom

– tom91
Jan 31 at 9:14

add a comment |

I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.

Here is some simple example data:

A <- data.frame('Area' = c(4, 6, 5),
 'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
 'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')

Currently I use this for loop:

sort_files <- list()
n <- 1
for(i in files)
 name <- frames[n]
 nom <- paste(name,'_sorted', sep = '')
 data <- i[order(-i$flow),]
 area <- sum(i$Area)
 total <- sum(i$flow)
 data$area_portion <- (data$Area/area)*100
 data$flow_portion <- (data$flow/total)*100
 data$cum_area <- cumsum(data$area_portion)
 data$cum_flow <- cumsum(data$flow_portion)
 assign(nom, data)
 df <- get(paste(name,'_sorted', sep = ''))
 sort_files[[nom]] <- df
 n <- n + 1

Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.

How can I simplify and neaten up the above code?

This is the expected output:

sort_files

$`A_sorted`
 Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000

$B_sorted
 Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100

edited Jan 31 at 11:59

double-beep

2,39641027

asked Jan 31 at 8:55

tom91

16611

I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.

Here is some simple example data:

A <- data.frame('Area' = c(4, 6, 5),
 'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
 'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')

Currently I use this for loop:

sort_files <- list()
n <- 1
for(i in files)
 name <- frames[n]
 nom <- paste(name,'_sorted', sep = '')
 data <- i[order(-i$flow),]
 area <- sum(i$Area)
 total <- sum(i$flow)
 data$area_portion <- (data$Area/area)*100
 data$flow_portion <- (data$flow/total)*100
 data$cum_area <- cumsum(data$area_portion)
 data$cum_flow <- cumsum(data$flow_portion)
 assign(nom, data)
 df <- get(paste(name,'_sorted', sep = ''))
 sort_files[[nom]] <- df
 n <- n + 1

Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.

How can I simplify and neaten up the above code?

This is the expected output:

sort_files

$`A_sorted`
 Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000

$B_sorted
 Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100

r for-loop

edited Jan 31 at 11:59

double-beep

2,39641027

asked Jan 31 at 8:55

tom91

16611

edited Jan 31 at 11:59

double-beep

2,39641027

asked Jan 31 at 8:55

tom91

16611

edited Jan 31 at 11:59

double-beep

2,39641027

edited Jan 31 at 11:59

double-beep

2,39641027

edited Jan 31 at 11:59

double-beep

2,39641027

asked Jan 31 at 8:55

tom91

16611

asked Jan 31 at 8:55

tom91

16611

asked Jan 31 at 8:55

tom91

16611

2

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
Jan 31 at 8:59

1

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
Jan 31 at 9:03

2

@tom91: can you add the expected output too?

– Tung
Jan 31 at 9:09

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
Jan 31 at 9:12

@Tung Expected output has been added to the bottom

– tom91
Jan 31 at 9:14

add a comment |

2

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
Jan 31 at 8:59

1

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
Jan 31 at 9:03

2

@tom91: can you add the expected output too?

– Tung
Jan 31 at 9:09

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
Jan 31 at 9:12

@Tung Expected output has been added to the bottom

– tom91
Jan 31 at 9:14

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
Jan 31 at 8:59

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
Jan 31 at 9:03

@tom91: can you add the expected output too?

– Tung
Jan 31 at 9:09

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
Jan 31 at 9:12

@Tung Expected output has been added to the bottom

– tom91
Jan 31 at 9:14

add a comment |

2 Answers
2

active

oldest

votes

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)

setNames(lapply(files, function(x) 
 x %>%
 arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")), 
 . %>% arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited Jan 31 at 13:03

answered Jan 31 at 9:21

Ronak Shah

38.6k104161

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
Jan 31 at 9:34

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
Jan 31 at 9:37

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
Jan 31 at 12:34

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
Jan 31 at 12:35

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
Jan 31 at 13:04

|
show 1 more comment

You could also define a function first ..

f <- function(data) 

 # sort data by flow
 data <- data[order(data['flow'], decreasing = TRUE), ]

 # apply your functions
 data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
 data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
 data["cum_area"] <- cumsum(data['area_portion'])
 data["cum_flow"] <- cumsum(data['flow_portion'])
 data

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited Jan 31 at 9:56

answered Jan 31 at 9:25

markus

13.1k1234

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
Jan 31 at 9:36

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54456701%2fhow-can-i-avoid-complex-for-loops%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)

setNames(lapply(files, function(x) 
 x %>%
 arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")), 
 . %>% arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited Jan 31 at 13:03

answered Jan 31 at 9:21

Ronak Shah

38.6k104161

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
Jan 31 at 9:34

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
Jan 31 at 9:37

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
Jan 31 at 12:34

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
Jan 31 at 12:35

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
Jan 31 at 13:04

|
show 1 more comment

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)

setNames(lapply(files, function(x) 
 x %>%
 arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")), 
 . %>% arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited Jan 31 at 13:03

answered Jan 31 at 9:21

Ronak Shah

38.6k104161

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
Jan 31 at 9:34

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
Jan 31 at 9:37

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
Jan 31 at 12:34

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
Jan 31 at 12:35

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
Jan 31 at 13:04

|
show 1 more comment

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)

setNames(lapply(files, function(x) 
 x %>%
 arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")), 
 . %>% arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited Jan 31 at 13:03

answered Jan 31 at 9:21

Ronak Shah

38.6k104161

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)

setNames(lapply(files, function(x) 
 x %>%
 arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")), 
 . %>% arrange(desc(flow)) %>%
 mutate(area_portion = Area/sum(Area)*100, 
 flow_portion = flow/sum(flow) * 100, 
 cum_area = cumsum(area_portion),
 cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited Jan 31 at 13:03

answered Jan 31 at 9:21

Ronak Shah

38.6k104161

edited Jan 31 at 13:03

answered Jan 31 at 9:21

Ronak Shah

38.6k104161

answered Jan 31 at 9:21

Ronak Shah

38.6k104161

answered Jan 31 at 9:21

Ronak Shah

38.6k104161

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
Jan 31 at 9:34

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
Jan 31 at 9:37

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
Jan 31 at 12:34

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
Jan 31 at 12:35

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
Jan 31 at 13:04

|
show 1 more comment

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
Jan 31 at 9:34

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
Jan 31 at 9:37

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
Jan 31 at 12:34

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
Jan 31 at 12:35

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
Jan 31 at 13:04

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
Jan 31 at 9:34

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
Jan 31 at 9:37

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
Jan 31 at 12:34

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
Jan 31 at 12:35

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
Jan 31 at 13:04

|
show 1 more comment

You could also define a function first ..

f <- function(data) 

 # sort data by flow
 data <- data[order(data['flow'], decreasing = TRUE), ]

 # apply your functions
 data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
 data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
 data["cum_area"] <- cumsum(data['area_portion'])
 data["cum_flow"] <- cumsum(data['flow_portion'])
 data

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited Jan 31 at 9:56

answered Jan 31 at 9:25

markus

13.1k1234

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
Jan 31 at 9:36

add a comment |

You could also define a function first ..

f <- function(data) 

 # sort data by flow
 data <- data[order(data['flow'], decreasing = TRUE), ]

 # apply your functions
 data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
 data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
 data["cum_area"] <- cumsum(data['area_portion'])
 data["cum_flow"] <- cumsum(data['flow_portion'])
 data

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited Jan 31 at 9:56

answered Jan 31 at 9:25

markus

13.1k1234

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
Jan 31 at 9:36

add a comment |

You could also define a function first ..

f <- function(data) 

 # sort data by flow
 data <- data[order(data['flow'], decreasing = TRUE), ]

 # apply your functions
 data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
 data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
 data["cum_area"] <- cumsum(data['area_portion'])
 data["cum_flow"] <- cumsum(data['flow_portion'])
 data

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited Jan 31 at 9:56

answered Jan 31 at 9:25

markus

13.1k1234

You could also define a function first ..

f <- function(data) 

 # sort data by flow
 data <- data[order(data['flow'], decreasing = TRUE), ]

 # apply your functions
 data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
 data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
 data["cum_area"] <- cumsum(data['area_portion'])
 data["cum_flow"] <- cumsum(data['flow_portion'])
 data

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited Jan 31 at 9:56

answered Jan 31 at 9:25

markus

13.1k1234

edited Jan 31 at 9:56

answered Jan 31 at 9:25

markus

13.1k1234

answered Jan 31 at 9:25

markus

13.1k1234

answered Jan 31 at 9:25

markus

13.1k1234

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
Jan 31 at 9:36

add a comment |

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
Jan 31 at 9:36

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
Jan 31 at 9:36

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu