How to construct a list of lengths efficiently

up vote
16
down vote

favorite

Say I have a sorted list of integers

RandomInteger[1, 100000, 10000] // Sort // Short

I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:

Table[Length@Select[%, LessEqualThan[m]], m, 10000]

This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...

edited 11 hours ago

Alexey Popkov

38k4104260

asked 22 hours ago

AccidentalFourierTransform

4,9411941

1

What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
– Thies Heidecke
10 hours ago

add a comment |

up vote
16
down vote

favorite

Say I have a sorted list of integers

RandomInteger[1, 100000, 10000] // Sort // Short

I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:

Table[Length@Select[%, LessEqualThan[m]], m, 10000]

edited 11 hours ago

Alexey Popkov

38k4104260

asked 22 hours ago

AccidentalFourierTransform

4,9411941

1

What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
– Thies Heidecke
10 hours ago

add a comment |

up vote
16
down vote

favorite

Say I have a sorted list of integers

RandomInteger[1, 100000, 10000] // Sort // Short

I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:

Table[Length@Select[%, LessEqualThan[m]], m, 10000]

edited 11 hours ago

Alexey Popkov

38k4104260

asked 22 hours ago

AccidentalFourierTransform

4,9411941

Say I have a sorted list of integers

RandomInteger[1, 100000, 10000] // Sort // Short

I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:

Table[Length@Select[%, LessEqualThan[m]], m, 10000]

list-manipulation performance-tuning filtering

edited 11 hours ago

Alexey Popkov

38k4104260

asked 22 hours ago

AccidentalFourierTransform

4,9411941

edited 11 hours ago

Alexey Popkov

38k4104260

asked 22 hours ago

AccidentalFourierTransform

4,9411941

edited 11 hours ago

Alexey Popkov

38k4104260

edited 11 hours ago

Alexey Popkov

38k4104260

edited 11 hours ago

Alexey Popkov

38k4104260

asked 22 hours ago

AccidentalFourierTransform

4,9411941

asked 22 hours ago

AccidentalFourierTransform

4,9411941

asked 22 hours ago

AccidentalFourierTransform

4,9411941

1

What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
– Thies Heidecke
10 hours ago

add a comment |

1

What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
– Thies Heidecke
10 hours ago

What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
– Thies Heidecke
10 hours ago

add a comment |

4 Answers
4

active

oldest

votes

up vote
16
down vote

accepted

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited 19 hours ago

answered 22 hours ago

Carl Woll

64.8k285169

Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago

Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago

add a comment |

up vote
11
down vote

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited 10 hours ago

answered 20 hours ago

J42161217

2,873219

1

These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago

Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago

add a comment |

up vote
10
down vote

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited 13 hours ago

Henrik Schumacher

44.8k265130

answered 17 hours ago

kglr

170k8193398

Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago

Hey @ciao, you are back?!!
– kglr
16 hours ago

Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago

Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago

@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago

add a comment |

up vote
3
down vote

s = Sort[RandomInteger[1, 100000, 10000]];

Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited 24 mins ago

answered 17 hours ago

Henrik Schumacher

44.8k265130

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');

);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
16
down vote

accepted

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited 19 hours ago

answered 22 hours ago

Carl Woll

64.8k285169

Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago

Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago

add a comment |

up vote
16
down vote

accepted

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited 19 hours ago

answered 22 hours ago

Carl Woll

64.8k285169

Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago

Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago

add a comment |

up vote
16
down vote

accepted

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited 19 hours ago

answered 22 hours ago

Carl Woll

64.8k285169

You can use the usual UnitStep + Total tricks:

r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2

0.435358, Null

41.4357, Null

True

Update

As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:

mincounts[s_] := With[
 
 unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
 counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
 ,
 With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
 counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
 ]
]

Comparison:

SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
 Join[
 Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
 ]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3

0.432897, Null

0.122198, Null

0.025923, Null

True

edited 19 hours ago

answered 22 hours ago

Carl Woll

64.8k285169

edited 19 hours ago

answered 22 hours ago

Carl Woll

64.8k285169

answered 22 hours ago

Carl Woll

64.8k285169

answered 22 hours ago

Carl Woll

64.8k285169

Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago

Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago

add a comment |

Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago

Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago

Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago

Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago

add a comment |

up vote
11
down vote

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited 10 hours ago

answered 20 hours ago

J42161217

2,873219

1

These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago

Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago

add a comment |

up vote
11
down vote

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited 10 hours ago

answered 20 hours ago

J42161217

2,873219

1

These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago

Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago

add a comment |

up vote
11
down vote

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited 10 hours ago

answered 20 hours ago

J42161217

2,873219

I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?

r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming

0.157123, Null

Using MapThread the same code is way faster

r6 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 MapThread[
 Table, Range[t = Length[Select[s, # <= 10000 &]]], 
 Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming 

r6===r3

0.008387, Null

True

edited 10 hours ago

answered 20 hours ago

J42161217

2,873219

edited 10 hours ago

answered 20 hours ago

J42161217

2,873219

answered 20 hours ago

J42161217

2,873219

answered 20 hours ago

J42161217

2,873219

1

These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago

Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago

add a comment |

1

These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago

Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago

These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago

Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago

add a comment |

up vote
10
down vote

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited 13 hours ago

Henrik Schumacher

44.8k265130

answered 17 hours ago

kglr

170k8193398

Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago

Hey @ciao, you are back?!!
– kglr
16 hours ago

Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago

Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago

@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago

add a comment |

up vote
10
down vote

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited 13 hours ago

Henrik Schumacher

44.8k265130

answered 17 hours ago

kglr

170k8193398

Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago

Hey @ciao, you are back?!!
– kglr
16 hours ago

Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago

Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago

@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago

add a comment |

up vote
10
down vote

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited 13 hours ago

Henrik Schumacher

44.8k265130

answered 17 hours ago

kglr

170k8193398

BinCounts and Accumulate combination is faster than all the methods posted so far:

r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First

0.00069

versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:

r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
 1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00081

r3 = mincounts[s]; // RepeatedTiming // First

0.016

r2 = Flatten[Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
     Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First

0.149

r2 == r3 == r4 == r5

True

edited 13 hours ago

Henrik Schumacher

44.8k265130

answered 17 hours ago

kglr

170k8193398

edited 13 hours ago

Henrik Schumacher

44.8k265130

edited 13 hours ago

Henrik Schumacher

44.8k265130

edited 13 hours ago

Henrik Schumacher

44.8k265130

answered 17 hours ago

kglr

170k8193398

answered 17 hours ago

kglr

170k8193398

answered 17 hours ago

kglr

170k8193398

Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago

Hey @ciao, you are back?!!
– kglr
16 hours ago

Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago

Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago

@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago

add a comment |

Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago

Hey @ciao, you are back?!!
– kglr
16 hours ago

Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago

Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago

@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago

Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago

Hey @ciao, you are back?!!
– kglr
16 hours ago

Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago

Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago

@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago

add a comment |

up vote
3
down vote

s = Sort[RandomInteger[1, 100000, 10000]];

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited 24 mins ago

answered 17 hours ago

Henrik Schumacher

44.8k265130

add a comment |

up vote
3
down vote

s = Sort[RandomInteger[1, 100000, 10000]];

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited 24 mins ago

answered 17 hours ago

Henrik Schumacher

44.8k265130

add a comment |

up vote
3
down vote

s = Sort[RandomInteger[1, 100000, 10000]];

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited 24 mins ago

answered 17 hours ago

Henrik Schumacher

44.8k265130

s = Sort[RandomInteger[1, 100000, 10000]];

This can be done with this function:

mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
 If[(Head[rules] === Rule) && (rules[[1]] === ),
 rules[[2]],
 With[spopt = SystemOptions["SparseArrayOptions"],
 Internal`WithLocalSettings[
 SetSystemOptions[
 "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
 SparseArray[rules, dims, background],
 SetSystemOptions[spopt]]
 ]
 ]

So, in total, r can be obtained as follows:

r4 = Accumulate[
 mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
 ]; // RepeatedTiming // First

0.00055

For comparison:

r3 = Flatten[
 Join[Table[0, s[[1]] - 1], 
 Table[Table[i, Differences[s][[i]]], i, 
 Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; // 
 RepeatedTiming // First
r3 == r4

0.116

True

edited 24 mins ago

answered 17 hours ago

Henrik Schumacher

44.8k265130

edited 24 mins ago

answered 17 hours ago

Henrik Schumacher

44.8k265130

answered 17 hours ago

Henrik Schumacher

44.8k265130

answered 17 hours ago

Henrik Schumacher

44.8k265130

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu