How to construct a list of lengths efficiently
Clash Royale CLAN TAG#URR8PPP
up vote
16
down vote
favorite
Say I have a sorted list of integers
RandomInteger[1, 100000, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], m, 10000]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
add a comment |
up vote
16
down vote
favorite
Say I have a sorted list of integers
RandomInteger[1, 100000, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], m, 10000]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
– Thies Heidecke
10 hours ago
add a comment |
up vote
16
down vote
favorite
up vote
16
down vote
favorite
Say I have a sorted list of integers
RandomInteger[1, 100000, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], m, 10000]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
Say I have a sorted list of integers
RandomInteger[1, 100000, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], m, 10000]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
list-manipulation performance-tuning filtering
edited 11 hours ago
Alexey Popkov
38k4104260
38k4104260
asked 22 hours ago
AccidentalFourierTransform
4,9411941
4,9411941
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
– Thies Heidecke
10 hours ago
add a comment |
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
– Thies Heidecke
10 hours ago
1
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
or BinCounts
already can accomplish what you want?– Thies Heidecke
10 hours ago
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
or BinCounts
already can accomplish what you want?– Thies Heidecke
10 hours ago
add a comment |
4 Answers
4
active
oldest
votes
up vote
16
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago
add a comment |
up vote
11
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
1
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago
add a comment |
up vote
10
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago
Hey @ciao, you are back?!!
– kglr
16 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago
add a comment |
up vote
3
down vote
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
16
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago
add a comment |
up vote
16
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago
add a comment |
up vote
16
down vote
accepted
up vote
16
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming
r1 === r2
0.435358, Null
41.4357, Null
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
0.432897, Null
0.122198, Null
0.025923, Null
True
edited 19 hours ago
answered 22 hours ago
Carl Woll
64.8k285169
64.8k285169
Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago
add a comment |
Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago
Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago
Ah, great answer, as usual.
– AccidentalFourierTransform
22 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago
Can you please check my answer because my laptop is very slow
– J42161217
20 hours ago
add a comment |
up vote
11
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
1
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago
add a comment |
up vote
11
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
1
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago
add a comment |
up vote
11
down vote
up vote
11
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming
0.157123, Null
Using MapThread the same code is way faster
r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming
r6===r3
0.008387, Null
True
edited 10 hours ago
answered 20 hours ago
J42161217
2,873219
2,873219
1
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago
add a comment |
1
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago
1
1
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago
These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
– Okkes Dulgerci
19 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
19 hours ago
add a comment |
up vote
10
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago
Hey @ciao, you are back?!!
– kglr
16 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago
add a comment |
up vote
10
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago
Hey @ciao, you are back?!!
– kglr
16 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago
add a comment |
up vote
10
down vote
up vote
10
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
edited 13 hours ago
Henrik Schumacher
44.8k265130
44.8k265130
answered 17 hours ago
kglr
170k8193398
170k8193398
Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago
Hey @ciao, you are back?!!
– kglr
16 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago
add a comment |
Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago
Hey @ciao, you are back?!!
– kglr
16 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago
Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago
Beat me to it - BinCounts is the way... +1
– ciao
16 hours ago
Hey @ciao, you are back?!!
– kglr
16 hours ago
Hey @ciao, you are back?!!
– kglr
16 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago
Sorry @Henrik; thanks for the edit.
– kglr
13 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
8 hours ago
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
53 mins ago
add a comment |
up vote
3
down vote
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
add a comment |
up vote
3
down vote
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
add a comment |
up vote
3
down vote
up vote
3
down vote
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
s = Sort[RandomInteger[1, 100000, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
edited 24 mins ago
answered 17 hours ago
Henrik Schumacher
44.8k265130
44.8k265130
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
orBinCounts
already can accomplish what you want?– Thies Heidecke
10 hours ago