Speed up Flatten[] of a large nested list

up vote
2
down vote

favorite

I have a large jagged list, that is each sub-list has a different length. I would like to Flatten this list for Histogram purposes, but it seems to be taking an inordinate amount of time and memory

jaggedList=Table[RandomReal[1,RandomSample[Range[400000,800000],1]],n,100];

Just to illustrate, length of each of elements of the main list

ListPlot[Length/@jaggedList]

list lengths

Full Flatten takes a long time, my real data is several times larger, it gets painfully slow

fullFlatten=Flatten@jaggedList;//AbsoluteTiming
10.0055,Null

I noticed flattening non-jagged sub-lists is not a problem

partialFlatten=Flatten/@jaggedList;//AbsoluteTiming
0.289219,Null

Memory usage is huge on the final result of the full list, even though number of elements is the same:

ByteCount/@fullFlatten,partialFlatten,jaggedList
1460378864,486808224,486808224

Would super appreciate any tips on what I can change to make this faster / more memory compact !

asked 1 hour ago

Anatoly

306

add a commentÂ |Â

up vote
2
down vote

favorite

jaggedList=Table[RandomReal[1,RandomSample[Range[400000,800000],1]],n,100];

Just to illustrate, length of each of elements of the main list

ListPlot[Length/@jaggedList]

list lengths

Full Flatten takes a long time, my real data is several times larger, it gets painfully slow

fullFlatten=Flatten@jaggedList;//AbsoluteTiming
10.0055,Null

I noticed flattening non-jagged sub-lists is not a problem

partialFlatten=Flatten/@jaggedList;//AbsoluteTiming
0.289219,Null

Memory usage is huge on the final result of the full list, even though number of elements is the same:

ByteCount/@fullFlatten,partialFlatten,jaggedList
1460378864,486808224,486808224

Would super appreciate any tips on what I can change to make this faster / more memory compact !

asked 1 hour ago

Anatoly

306

add a commentÂ |Â

up vote
2
down vote

favorite

jaggedList=Table[RandomReal[1,RandomSample[Range[400000,800000],1]],n,100];

Just to illustrate, length of each of elements of the main list

ListPlot[Length/@jaggedList]

list lengths

Full Flatten takes a long time, my real data is several times larger, it gets painfully slow

fullFlatten=Flatten@jaggedList;//AbsoluteTiming
10.0055,Null

I noticed flattening non-jagged sub-lists is not a problem

partialFlatten=Flatten/@jaggedList;//AbsoluteTiming
0.289219,Null

Memory usage is huge on the final result of the full list, even though number of elements is the same:

ByteCount/@fullFlatten,partialFlatten,jaggedList
1460378864,486808224,486808224

Would super appreciate any tips on what I can change to make this faster / more memory compact !

asked 1 hour ago

Anatoly

306

jaggedList=Table[RandomReal[1,RandomSample[Range[400000,800000],1]],n,100];

Just to illustrate, length of each of elements of the main list

ListPlot[Length/@jaggedList]

list lengths

Full Flatten takes a long time, my real data is several times larger, it gets painfully slow

fullFlatten=Flatten@jaggedList;//AbsoluteTiming
10.0055,Null

I noticed flattening non-jagged sub-lists is not a problem

partialFlatten=Flatten/@jaggedList;//AbsoluteTiming
0.289219,Null

Memory usage is huge on the final result of the full list, even though number of elements is the same:

ByteCount/@fullFlatten,partialFlatten,jaggedList
1460378864,486808224,486808224

Would super appreciate any tips on what I can change to make this faster / more memory compact !

list-manipulation

asked 1 hour ago

Anatoly

306

asked 1 hour ago

Anatoly

306

asked 1 hour ago

Anatoly

306

asked 1 hour ago

Anatoly

306

asked 1 hour ago

Anatoly

306

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Applying Join is much faster than Flatten:

SeedRandom[1]
jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First

8.2375848

fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First

0.29729

fullFlatten2 == fullFlatten

True

ByteCount /@ fullFlatten, fullFlatten2, jaggedList

1462957016, 487652456, 487667608

answered 1 hour ago

kglr

167k8188390

Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
â€“Â Anatoly
1 hour ago

add a commentÂ |Â

up vote
3
down vote

The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:

SeedRandom[1]
list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

Turn on packing messages:

On["Packing"]

Then, using Flatten:

Flatten[list]

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

and using Join:

Join @@ list

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

As you can see, using Join generates no unpacking messages, which is why it is much faster.

answered 1 hour ago

Carl Woll

62.1k280158

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f184575%2fspeed-up-flatten-of-a-large-nested-list%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Applying Join is much faster than Flatten:

SeedRandom[1]
jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First

8.2375848

fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First

0.29729

fullFlatten2 == fullFlatten

True

ByteCount /@ fullFlatten, fullFlatten2, jaggedList

1462957016, 487652456, 487667608

answered 1 hour ago

kglr

167k8188390

Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
â€“Â Anatoly
1 hour ago

add a commentÂ |Â

up vote
2
down vote

accepted

Applying Join is much faster than Flatten:

SeedRandom[1]
jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First

8.2375848

fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First

0.29729

fullFlatten2 == fullFlatten

True

ByteCount /@ fullFlatten, fullFlatten2, jaggedList

1462957016, 487652456, 487667608

answered 1 hour ago

kglr

167k8188390

Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
â€“Â Anatoly
1 hour ago

add a commentÂ |Â

up vote
2
down vote

accepted

Applying Join is much faster than Flatten:

SeedRandom[1]
jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First

8.2375848

fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First

0.29729

fullFlatten2 == fullFlatten

True

ByteCount /@ fullFlatten, fullFlatten2, jaggedList

1462957016, 487652456, 487667608

answered 1 hour ago

kglr

167k8188390

Applying Join is much faster than Flatten:

SeedRandom[1]
jaggedList = Table[RandomReal[1, RandomSample[Range[400000, 800000], 1]], n, 100];

fullFlatten = Flatten@jaggedList; // AbsoluteTiming // First

8.2375848

fullFlatten2 = Join @@ jaggedList; // AbsoluteTiming // First

0.29729

fullFlatten2 == fullFlatten

True

ByteCount /@ fullFlatten, fullFlatten2, jaggedList

1462957016, 487652456, 487667608

answered 1 hour ago

kglr

167k8188390

answered 1 hour ago

kglr

167k8188390

answered 1 hour ago

kglr

167k8188390

answered 1 hour ago

kglr

167k8188390

Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
â€“Â Anatoly
1 hour ago

add a commentÂ |Â

Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
â€“Â Anatoly
1 hour ago

Super appreciate it, that's exactly what I needed ! Now just to speed up the Histogram!
â€“Â Anatoly
1 hour ago

add a commentÂ |Â

up vote
3
down vote

The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:

SeedRandom[1]
list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

Turn on packing messages:

On["Packing"]

Then, using Flatten:

Flatten[list]

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

and using Join:

Join @@ list

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

As you can see, using Join generates no unpacking messages, which is why it is much faster.

answered 1 hour ago

Carl Woll

62.1k280158

add a commentÂ |Â

up vote
3
down vote

The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:

SeedRandom[1]
list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

Turn on packing messages:

On["Packing"]

Then, using Flatten:

Flatten[list]

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

and using Join:

Join @@ list

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

As you can see, using Join generates no unpacking messages, which is why it is much faster.

answered 1 hour ago

Carl Woll

62.1k280158

add a commentÂ |Â

up vote
3
down vote

The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:

SeedRandom[1]
list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

Turn on packing messages:

On["Packing"]

Then, using Flatten:

Flatten[list]

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

and using Join:

Join @@ list

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

As you can see, using Join generates no unpacking messages, which is why it is much faster.

answered 1 hour ago

Carl Woll

62.1k280158

The difference between using Flatten and using Join as in @kglr's answer is that Flatten unpacks. Here is a smaller example:

SeedRandom[1]
list = Table[RandomReal[1, RandomSample[2;;5, 1]], 3]

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

Turn on packing messages:

On["Packing"]

Then, using Flatten:

Flatten[list]

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 5 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

Developer`FromPackedArray::unpack: Unpacking array in call to HoldForm.

General::stop: Further output of Developer`FromPackedArray::unpack will be suppressed during this calculation.

Developer`FromPackedArray::punpack: Unpacking array with dimensions 3 in call to Flatten.

General::stop: Further output of Developer`FromPackedArray::punpack will be suppressed during this calculation.

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

and using Join:

Join @@ list

0.269558, 0.445678, 0.158104, 0.751213, 0.965444, 0.0518202, 0.675946,
0.698472, 0.344389, 0.830322, 0.556863

As you can see, using Join generates no unpacking messages, which is why it is much faster.

answered 1 hour ago

Carl Woll

62.1k280158

answered 1 hour ago

Carl Woll

62.1k280158

answered 1 hour ago

Carl Woll

62.1k280158

answered 1 hour ago

Carl Woll

62.1k280158

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu