How to construct a list of lengths efficiently

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
16
down vote

favorite
6












Say I have a sorted list of integers



RandomInteger[1, 100000, 10000] // Sort // Short


I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:



Table[Length@Select[%, LessEqualThan[m]], m, 10000]


This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...










share|improve this question



















  • 1




    What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
    – Thies Heidecke
    10 hours ago














up vote
16
down vote

favorite
6












Say I have a sorted list of integers



RandomInteger[1, 100000, 10000] // Sort // Short


I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:



Table[Length@Select[%, LessEqualThan[m]], m, 10000]


This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...










share|improve this question



















  • 1




    What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
    – Thies Heidecke
    10 hours ago












up vote
16
down vote

favorite
6









up vote
16
down vote

favorite
6






6





Say I have a sorted list of integers



RandomInteger[1, 100000, 10000] // Sort // Short


I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:



Table[Length@Select[%, LessEqualThan[m]], m, 10000]


This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...










share|improve this question















Say I have a sorted list of integers



RandomInteger[1, 100000, 10000] // Sort // Short


I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:



Table[Length@Select[%, LessEqualThan[m]], m, 10000]


This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...







list-manipulation performance-tuning filtering






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 11 hours ago









Alexey Popkov

38k4104260




38k4104260










asked 22 hours ago









AccidentalFourierTransform

4,9411941




4,9411941







  • 1




    What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
    – Thies Heidecke
    10 hours ago












  • 1




    What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
    – Thies Heidecke
    10 hours ago







1




1




What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
– Thies Heidecke
10 hours ago




What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an EmpiricalDistribution or BinCounts already can accomplish what you want?
– Thies Heidecke
10 hours ago










4 Answers
4






active

oldest

votes

















up vote
16
down vote



accepted










You can use the usual UnitStep + Total tricks:



r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

r1 === r2



0.435358, Null



41.4357, Null



True




Update



As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



mincounts[s_] := With[

unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
,
With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]


Comparison:



SeedRandom[1];
s=RandomInteger[1,100000,10000]//Sort;

(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming

r1 === r2 === r3



0.432897, Null



0.122198, Null



0.025923, Null



True







share|improve this answer






















  • Ah, great answer, as usual.
    – AccidentalFourierTransform
    22 hours ago










  • Can you please check my answer because my laptop is very slow
    – J42161217
    20 hours ago

















up vote
11
down vote













I think this is at least x3 faster than Mr. Carl Woll's answer

Can anybody compare my timing?



r3 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



0.157123, Null




Using MapThread the same code is way faster



r6 = Flatten[
Join[Table[0, s[[1]] - 1],
MapThread[
Table, Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

r6===r3



0.008387, Null



True







share|improve this answer


















  • 1




    These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
    – Okkes Dulgerci
    19 hours ago











  • Hey, thanks for checking. I will use your timing. Thanks for the confirmation
    – J42161217
    19 hours ago

















up vote
10
down vote













BinCounts and Accumulate combination is faster than all the methods posted so far:



r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



0.00069




versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First



 0.00081




r3 = mincounts[s]; // RepeatedTiming // First



0.016




r2 = Flatten[Join[Table[0, s[[1]] - 1], 
Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First



0.149




r2 == r3 == r4 == r5



True







share|improve this answer






















  • Beat me to it - BinCounts is the way... +1
    – ciao
    16 hours ago










  • Hey @ciao, you are back?!!
    – kglr
    16 hours ago










  • Sorry @Henrik; thanks for the edit.
    – kglr
    13 hours ago










  • Short and fast. No need sorting.. Excellent!!... +1
    – Okkes Dulgerci
    8 hours ago










  • @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
    – ciao
    53 mins ago

















up vote
3
down vote













s = Sort[RandomInteger[1, 100000, 10000]];


Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



This can be done with this function:



mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
If[(Head[rules] === Rule) && (rules[[1]] === ),
rules[[2]],
With[spopt = SystemOptions["SparseArrayOptions"],
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]


So, in total, r can be obtained as follows:



r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First



0.00055




For comparison:



r3 = Flatten[
Join[Table[0, s[[1]] - 1],
Table[Table[i, Differences[s][[i]]], i,
Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4



0.116



True







share|improve this answer






















    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "387"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');

    );

    Post as a guest






























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    16
    down vote



    accepted










    You can use the usual UnitStep + Total tricks:



    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

    r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

    r1 === r2



    0.435358, Null



    41.4357, Null



    True




    Update



    As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



    mincounts[s_] := With[

    unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
    counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
    ,
    With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
    counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
    ]
    ]


    Comparison:



    SeedRandom[1];
    s=RandomInteger[1,100000,10000]//Sort;

    (* my first answer *)
    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
    (* J42161217's answer *)
    r2 = Flatten[
    Join[
    Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
    ]
    ][[;;10000]]; // AbsoluteTiming
    (* using Nearest *)
    r3 = mincounts[s]; //AbsoluteTiming

    r1 === r2 === r3



    0.432897, Null



    0.122198, Null



    0.025923, Null



    True







    share|improve this answer






















    • Ah, great answer, as usual.
      – AccidentalFourierTransform
      22 hours ago










    • Can you please check my answer because my laptop is very slow
      – J42161217
      20 hours ago














    up vote
    16
    down vote



    accepted










    You can use the usual UnitStep + Total tricks:



    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

    r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

    r1 === r2



    0.435358, Null



    41.4357, Null



    True




    Update



    As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



    mincounts[s_] := With[

    unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
    counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
    ,
    With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
    counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
    ]
    ]


    Comparison:



    SeedRandom[1];
    s=RandomInteger[1,100000,10000]//Sort;

    (* my first answer *)
    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
    (* J42161217's answer *)
    r2 = Flatten[
    Join[
    Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
    ]
    ][[;;10000]]; // AbsoluteTiming
    (* using Nearest *)
    r3 = mincounts[s]; //AbsoluteTiming

    r1 === r2 === r3



    0.432897, Null



    0.122198, Null



    0.025923, Null



    True







    share|improve this answer






















    • Ah, great answer, as usual.
      – AccidentalFourierTransform
      22 hours ago










    • Can you please check my answer because my laptop is very slow
      – J42161217
      20 hours ago












    up vote
    16
    down vote



    accepted







    up vote
    16
    down vote



    accepted






    You can use the usual UnitStep + Total tricks:



    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

    r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

    r1 === r2



    0.435358, Null



    41.4357, Null



    True




    Update



    As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



    mincounts[s_] := With[

    unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
    counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
    ,
    With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
    counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
    ]
    ]


    Comparison:



    SeedRandom[1];
    s=RandomInteger[1,100000,10000]//Sort;

    (* my first answer *)
    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
    (* J42161217's answer *)
    r2 = Flatten[
    Join[
    Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
    ]
    ][[;;10000]]; // AbsoluteTiming
    (* using Nearest *)
    r3 = mincounts[s]; //AbsoluteTiming

    r1 === r2 === r3



    0.432897, Null



    0.122198, Null



    0.025923, Null



    True







    share|improve this answer














    You can use the usual UnitStep + Total tricks:



    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming

    r2 = Table[Length@Select[s,LessEqualThan[m]],m,10000];//AbsoluteTiming

    r1 === r2



    0.435358, Null



    41.4357, Null



    True




    Update



    As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences. Here is a version that uses Nearest instead:



    mincounts[s_] := With[

    unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
    counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
    ,
    With[near = Nearest[unique->"Index", Range @ Length @ s][[All,1]],
    counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
    ]
    ]


    Comparison:



    SeedRandom[1];
    s=RandomInteger[1,100000,10000]//Sort;

    (* my first answer *)
    r1 = Table[Total[UnitStep[m-s]], m,10000]; //AbsoluteTiming
    (* J42161217's answer *)
    r2 = Flatten[
    Join[
    Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i, Length[Select[s, # <= 10000 &]]]
    ]
    ][[;;10000]]; // AbsoluteTiming
    (* using Nearest *)
    r3 = mincounts[s]; //AbsoluteTiming

    r1 === r2 === r3



    0.432897, Null



    0.122198, Null



    0.025923, Null



    True








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 19 hours ago

























    answered 22 hours ago









    Carl Woll

    64.8k285169




    64.8k285169











    • Ah, great answer, as usual.
      – AccidentalFourierTransform
      22 hours ago










    • Can you please check my answer because my laptop is very slow
      – J42161217
      20 hours ago
















    • Ah, great answer, as usual.
      – AccidentalFourierTransform
      22 hours ago










    • Can you please check my answer because my laptop is very slow
      – J42161217
      20 hours ago















    Ah, great answer, as usual.
    – AccidentalFourierTransform
    22 hours ago




    Ah, great answer, as usual.
    – AccidentalFourierTransform
    22 hours ago












    Can you please check my answer because my laptop is very slow
    – J42161217
    20 hours ago




    Can you please check my answer because my laptop is very slow
    – J42161217
    20 hours ago










    up vote
    11
    down vote













    I think this is at least x3 faster than Mr. Carl Woll's answer

    Can anybody compare my timing?



    r3 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



    0.157123, Null




    Using MapThread the same code is way faster



    r6 = Flatten[
    Join[Table[0, s[[1]] - 1],
    MapThread[
    Table, Range[t = Length[Select[s, # <= 10000 &]]],
    Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

    r6===r3



    0.008387, Null



    True







    share|improve this answer


















    • 1




      These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
      – Okkes Dulgerci
      19 hours ago











    • Hey, thanks for checking. I will use your timing. Thanks for the confirmation
      – J42161217
      19 hours ago














    up vote
    11
    down vote













    I think this is at least x3 faster than Mr. Carl Woll's answer

    Can anybody compare my timing?



    r3 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



    0.157123, Null




    Using MapThread the same code is way faster



    r6 = Flatten[
    Join[Table[0, s[[1]] - 1],
    MapThread[
    Table, Range[t = Length[Select[s, # <= 10000 &]]],
    Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

    r6===r3



    0.008387, Null



    True







    share|improve this answer


















    • 1




      These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
      – Okkes Dulgerci
      19 hours ago











    • Hey, thanks for checking. I will use your timing. Thanks for the confirmation
      – J42161217
      19 hours ago












    up vote
    11
    down vote










    up vote
    11
    down vote









    I think this is at least x3 faster than Mr. Carl Woll's answer

    Can anybody compare my timing?



    r3 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



    0.157123, Null




    Using MapThread the same code is way faster



    r6 = Flatten[
    Join[Table[0, s[[1]] - 1],
    MapThread[
    Table, Range[t = Length[Select[s, # <= 10000 &]]],
    Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

    r6===r3



    0.008387, Null



    True







    share|improve this answer














    I think this is at least x3 faster than Mr. Carl Woll's answer

    Can anybody compare my timing?



    r3 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;;10000]]; // AbsoluteTiming



    0.157123, Null




    Using MapThread the same code is way faster



    r6 = Flatten[
    Join[Table[0, s[[1]] - 1],
    MapThread[
    Table, Range[t = Length[Select[s, # <= 10000 &]]],
    Differences[s][[1 ;; t]]]]][[;; 10000]]; // AbsoluteTiming

    r6===r3



    0.008387, Null



    True








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 10 hours ago

























    answered 20 hours ago









    J42161217

    2,873219




    2,873219







    • 1




      These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
      – Okkes Dulgerci
      19 hours ago











    • Hey, thanks for checking. I will use your timing. Thanks for the confirmation
      – J42161217
      19 hours ago












    • 1




      These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
      – Okkes Dulgerci
      19 hours ago











    • Hey, thanks for checking. I will use your timing. Thanks for the confirmation
      – J42161217
      19 hours ago







    1




    1




    These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
    – Okkes Dulgerci
    19 hours ago





    These are the timing on my laptop r1=0.444354, Null,r2=39.456, Null,r3=0.157123, Null True
    – Okkes Dulgerci
    19 hours ago













    Hey, thanks for checking. I will use your timing. Thanks for the confirmation
    – J42161217
    19 hours ago




    Hey, thanks for checking. I will use your timing. Thanks for the confirmation
    – J42161217
    19 hours ago










    up vote
    10
    down vote













    BinCounts and Accumulate combination is faster than all the methods posted so far:



    r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



    0.00069




    versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



    r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
    1 ;; Length[s]]]
    ]; // RepeatedTiming // First



     0.00081




    r3 = mincounts[s]; // RepeatedTiming // First



    0.016




    r2 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First



    0.149




    r2 == r3 == r4 == r5



    True







    share|improve this answer






















    • Beat me to it - BinCounts is the way... +1
      – ciao
      16 hours ago










    • Hey @ciao, you are back?!!
      – kglr
      16 hours ago










    • Sorry @Henrik; thanks for the edit.
      – kglr
      13 hours ago










    • Short and fast. No need sorting.. Excellent!!... +1
      – Okkes Dulgerci
      8 hours ago










    • @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
      – ciao
      53 mins ago














    up vote
    10
    down vote













    BinCounts and Accumulate combination is faster than all the methods posted so far:



    r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



    0.00069




    versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



    r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
    1 ;; Length[s]]]
    ]; // RepeatedTiming // First



     0.00081




    r3 = mincounts[s]; // RepeatedTiming // First



    0.016




    r2 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First



    0.149




    r2 == r3 == r4 == r5



    True







    share|improve this answer






















    • Beat me to it - BinCounts is the way... +1
      – ciao
      16 hours ago










    • Hey @ciao, you are back?!!
      – kglr
      16 hours ago










    • Sorry @Henrik; thanks for the edit.
      – kglr
      13 hours ago










    • Short and fast. No need sorting.. Excellent!!... +1
      – Okkes Dulgerci
      8 hours ago










    • @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
      – ciao
      53 mins ago












    up vote
    10
    down vote










    up vote
    10
    down vote









    BinCounts and Accumulate combination is faster than all the methods posted so far:



    r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



    0.00069




    versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



    r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
    1 ;; Length[s]]]
    ]; // RepeatedTiming // First



     0.00081




    r3 = mincounts[s]; // RepeatedTiming // First



    0.016




    r2 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First



    0.149




    r2 == r3 == r4 == r5



    True







    share|improve this answer














    BinCounts and Accumulate combination is faster than all the methods posted so far:



    r4 = Accumulate @ BinCounts[s, 1, 1 + 10000, 1]; // RepeatedTiming // First 



    0.00069




    versus Henrik Schumacher's mySparseArray, Carl Woll's mincounts and J42161217's Differences-based method:



    r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[
    1 ;; Length[s]]]
    ]; // RepeatedTiming // First



     0.00081




    r3 = mincounts[s]; // RepeatedTiming // First



    0.016




    r2 = Flatten[Join[Table[0, s[[1]] - 1], 
    Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First



    0.149




    r2 == r3 == r4 == r5



    True








    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 13 hours ago









    Henrik Schumacher

    44.8k265130




    44.8k265130










    answered 17 hours ago









    kglr

    170k8193398




    170k8193398











    • Beat me to it - BinCounts is the way... +1
      – ciao
      16 hours ago










    • Hey @ciao, you are back?!!
      – kglr
      16 hours ago










    • Sorry @Henrik; thanks for the edit.
      – kglr
      13 hours ago










    • Short and fast. No need sorting.. Excellent!!... +1
      – Okkes Dulgerci
      8 hours ago










    • @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
      – ciao
      53 mins ago
















    • Beat me to it - BinCounts is the way... +1
      – ciao
      16 hours ago










    • Hey @ciao, you are back?!!
      – kglr
      16 hours ago










    • Sorry @Henrik; thanks for the edit.
      – kglr
      13 hours ago










    • Short and fast. No need sorting.. Excellent!!... +1
      – Okkes Dulgerci
      8 hours ago










    • @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
      – ciao
      53 mins ago















    Beat me to it - BinCounts is the way... +1
    – ciao
    16 hours ago




    Beat me to it - BinCounts is the way... +1
    – ciao
    16 hours ago












    Hey @ciao, you are back?!!
    – kglr
    16 hours ago




    Hey @ciao, you are back?!!
    – kglr
    16 hours ago












    Sorry @Henrik; thanks for the edit.
    – kglr
    13 hours ago




    Sorry @Henrik; thanks for the edit.
    – kglr
    13 hours ago












    Short and fast. No need sorting.. Excellent!!... +1
    – Okkes Dulgerci
    8 hours ago




    Short and fast. No need sorting.. Excellent!!... +1
    – Okkes Dulgerci
    8 hours ago












    @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
    – ciao
    53 mins ago




    @kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
    – ciao
    53 mins ago










    up vote
    3
    down vote













    s = Sort[RandomInteger[1, 100000, 10000]];


    Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



    This can be done with this function:



    mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
    If[(Head[rules] === Rule) && (rules[[1]] === ),
    rules[[2]],
    With[spopt = SystemOptions["SparseArrayOptions"],
    Internal`WithLocalSettings[
    SetSystemOptions[
    "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
    SparseArray[rules, dims, background],
    SetSystemOptions[spopt]]
    ]
    ]


    So, in total, r can be obtained as follows:



    r4 = Accumulate[
    mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
    ]; // RepeatedTiming // First



    0.00055




    For comparison:



    r3 = Flatten[
    Join[Table[0, s[[1]] - 1],
    Table[Table[i, Differences[s][[i]]], i,
    Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
    RepeatedTiming // First
    r3 == r4



    0.116



    True







    share|improve this answer


























      up vote
      3
      down vote













      s = Sort[RandomInteger[1, 100000, 10000]];


      Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



      This can be done with this function:



      mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
      If[(Head[rules] === Rule) && (rules[[1]] === ),
      rules[[2]],
      With[spopt = SystemOptions["SparseArrayOptions"],
      Internal`WithLocalSettings[
      SetSystemOptions[
      "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
      SparseArray[rules, dims, background],
      SetSystemOptions[spopt]]
      ]
      ]


      So, in total, r can be obtained as follows:



      r4 = Accumulate[
      mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
      ]; // RepeatedTiming // First



      0.00055




      For comparison:



      r3 = Flatten[
      Join[Table[0, s[[1]] - 1],
      Table[Table[i, Differences[s][[i]]], i,
      Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
      RepeatedTiming // First
      r3 == r4



      0.116



      True







      share|improve this answer
























        up vote
        3
        down vote










        up vote
        3
        down vote









        s = Sort[RandomInteger[1, 100000, 10000]];


        Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



        This can be done with this function:



        mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
        If[(Head[rules] === Rule) && (rules[[1]] === ),
        rules[[2]],
        With[spopt = SystemOptions["SparseArrayOptions"],
        Internal`WithLocalSettings[
        SetSystemOptions[
        "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
        SparseArray[rules, dims, background],
        SetSystemOptions[spopt]]
        ]
        ]


        So, in total, r can be obtained as follows:



        r4 = Accumulate[
        mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
        ]; // RepeatedTiming // First



        0.00055




        For comparison:



        r3 = Flatten[
        Join[Table[0, s[[1]] - 1],
        Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
        RepeatedTiming // First
        r3 == r4



        0.116



        True







        share|improve this answer














        s = Sort[RandomInteger[1, 100000, 10000]];


        Let us just imagine for the moment that the target list r is supposed to have length 100000 (we can truncate it afterwards). Then for each entry i in the list s, the list r needs to have a jump of height 1 at position i. The jumps are the "derivative" of r (in a discrete sense) and the antiderivative can be obtained with Accumulate. In order to get the list of jumps, we need additive matrix assembly.



        This can be done with this function:



        mySparseArray[rules_, dims_, f_: Total, background_: 0.] := 
        If[(Head[rules] === Rule) && (rules[[1]] === ),
        rules[[2]],
        With[spopt = SystemOptions["SparseArrayOptions"],
        Internal`WithLocalSettings[
        SetSystemOptions[
        "SparseArrayOptions" -> "TreatRepeatedEntries" -> f],
        SparseArray[rules, dims, background],
        SetSystemOptions[spopt]]
        ]
        ]


        So, in total, r can be obtained as follows:



        r4 = Accumulate[
        mySparseArray[Partition[s, 1] -> 1, s[[-1]], Total, 0][[1 ;; Length[s]]]
        ]; // RepeatedTiming // First



        0.00055




        For comparison:



        r3 = Flatten[
        Join[Table[0, s[[1]] - 1],
        Table[Table[i, Differences[s][[i]]], i,
        Length[Select[s, # <= 10000 &]]]]][[;; 10000]]; //
        RepeatedTiming // First
        r3 == r4



        0.116



        True








        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 24 mins ago

























        answered 17 hours ago









        Henrik Schumacher

        44.8k265130




        44.8k265130



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?