Median Calculation of List of Integers without using heap

Multi tool use

Clash Royale CLAN TAG#URR8PPP

up vote
7
down vote

favorite

References

Using those references:

rolling median implementations benchmark

FAST RUNNING MEDIAN USING AN INDEXABLE SKIPLIST

Calculate the median of a billion numbers

Find running median from a stream of integers

Code

A code was written to calculate the Median with the SortedContainers library:

from itertools import islice
from sortedcontainers import SortedList
import random
import time

start_time = time.time()


class Median(object):

 def __init__(self, iterable):
 self._iterable = islice(iterable, None)
 self._sortedlist = SortedList(self._iterable)

 def __iter__(self):
 self_sortedlist = self._sortedlist
 # print(self_sortedlist)
 length = len(self_sortedlist)
 half = length // 2
 if length % 2 == 0:
 yield (self_sortedlist[half] + self_sortedlist[half - 1]) // 2
 elif length % 2 == 1:
 yield self_sortedlist[half]


def main():
 m, n = 1000, 1500000
 data = [random.randrange(m) for i in range(n)]
 # print("Random Data: ", data)
 result = list(Median(data))
 print("Result: ", result)


if __name__ == "__main__":

 main()
 print("--- %s seconds ---" % (time.time() - start_time))

Explanation

Random Number Generator

The following code generates data within the Range m and quantity n.

m, n = 1000, 15000000
data = [random.randrange(m) for i in range(n)]

Median

The Median class sorts the list of numbers and if n is odd, returns the middle item with yield self_sortedlist[half]. Or if n is even, returns the mean of two middle itens of the list with yield (self_sortedlist[half] + self_sortedlist[half - 1]) // 2

Question

How do I improve the code performance? Because for a large list (100 milion), it takes --- 186.7168517112732 seconds--- on my computer.

asked Aug 15 at 13:08

danieltakeshi

1585

sorting has time complexity O(n log n). You may want to look into Medians of Medians (scroll down for a python implementation) which has linear time complexity.
â€“Â Andre Holzner
Aug 15 at 19:20

Why do you use integer division for a list with even length? Shouldn't it be a regular division of the two center values in the sorted list?
â€“Â Daniel Lenz
Aug 15 at 21:34

1

I think there is a misunderstanding; the use of heaps and such clever data structures is needed to maintain a rolling median, i.e. you have to return the current median after every number in the input. If you want to call it only once on the final list, then a simple quickselect is enough - that gets you to O(n) time. (If you were to use this algorithm for the rolling mean case, you would need O(1+2+..+n) = O(n^2) time, which is why heaps and other structures are used.)
â€“Â Ant
Aug 15 at 22:39

1

Timsort is a sorting algorithm, and it takes O(n log n) time. Quickselect is not a sorting algorithm, and it takes O(n) time. Quickselect is much faster than timsort for this use case; the key observation is that to get the median you don't need to sort the whole vector, just as if you want to compute the minimum you don't need to sort the whole vector, you can just do one pass over the elements. With the median (and in general with order statistics) it's more complicated, so you use quickselect, but it still has a O(n) performance, much better than your sorting approach which is O(n log n)
â€“Â Ant
Aug 16 at 11:58

1

@DanielLenz Oh, It was to reuse the number as Integer. Just as the data type of the input. However, if the precision of the program is needed for decimal numbers, then it can be done.
â€“Â danieltakeshi
Aug 16 at 16:33

Â |Â
show 2 more comments

up vote
7
down vote

favorite

References

Using those references:

rolling median implementations benchmark

FAST RUNNING MEDIAN USING AN INDEXABLE SKIPLIST

Calculate the median of a billion numbers

Find running median from a stream of integers

Code

A code was written to calculate the Median with the SortedContainers library:

from itertools import islice
from sortedcontainers import SortedList
import random
import time

start_time = time.time()


class Median(object):

 def __init__(self, iterable):
 self._iterable = islice(iterable, None)
 self._sortedlist = SortedList(self._iterable)

 def __iter__(self):
 self_sortedlist = self._sortedlist
 # print(self_sortedlist)
 length = len(self_sortedlist)
 half = length // 2
 if length % 2 == 0:
 yield (self_sortedlist[half] + self_sortedlist[half - 1]) // 2
 elif length % 2 == 1:
 yield self_sortedlist[half]


def main():
 m, n = 1000, 1500000
 data = [random.randrange(m) for i in range(n)]
 # print("Random Data: ", data)
 result = list(Median(data))
 print("Result: ", result)


if __name__ == "__main__":

 main()
 print("--- %s seconds ---" % (time.time() - start_time))

Explanation

Random Number Generator

The following code generates data within the Range m and quantity n.

m, n = 1000, 15000000
data = [random.randrange(m) for i in range(n)]

Median

Question

How do I improve the code performance? Because for a large list (100 milion), it takes --- 186.7168517112732 seconds--- on my computer.

asked Aug 15 at 13:08

danieltakeshi

1585

sorting has time complexity O(n log n). You may want to look into Medians of Medians (scroll down for a python implementation) which has linear time complexity.
â€“Â Andre Holzner
Aug 15 at 19:20

Why do you use integer division for a list with even length? Shouldn't it be a regular division of the two center values in the sorted list?
â€“Â Daniel Lenz
Aug 15 at 21:34

1

I think there is a misunderstanding; the use of heaps and such clever data structures is needed to maintain a rolling median, i.e. you have to return the current median after every number in the input. If you want to call it only once on the final list, then a simple quickselect is enough - that gets you to O(n) time. (If you were to use this algorithm for the rolling mean case, you would need O(1+2+..+n) = O(n^2) time, which is why heaps and other structures are used.)
â€“Â Ant
Aug 15 at 22:39

1

Timsort is a sorting algorithm, and it takes O(n log n) time. Quickselect is not a sorting algorithm, and it takes O(n) time. Quickselect is much faster than timsort for this use case; the key observation is that to get the median you don't need to sort the whole vector, just as if you want to compute the minimum you don't need to sort the whole vector, you can just do one pass over the elements. With the median (and in general with order statistics) it's more complicated, so you use quickselect, but it still has a O(n) performance, much better than your sorting approach which is O(n log n)
â€“Â Ant
Aug 16 at 11:58

1

@DanielLenz Oh, It was to reuse the number as Integer. Just as the data type of the input. However, if the precision of the program is needed for decimal numbers, then it can be done.
â€“Â danieltakeshi
Aug 16 at 16:33

Â |Â
show 2 more comments

up vote
7
down vote

favorite

References

Using those references:

rolling median implementations benchmark

FAST RUNNING MEDIAN USING AN INDEXABLE SKIPLIST

Calculate the median of a billion numbers

Find running median from a stream of integers

Code

A code was written to calculate the Median with the SortedContainers library:

from itertools import islice
from sortedcontainers import SortedList
import random
import time

start_time = time.time()


class Median(object):

 def __init__(self, iterable):
 self._iterable = islice(iterable, None)
 self._sortedlist = SortedList(self._iterable)

 def __iter__(self):
 self_sortedlist = self._sortedlist
 # print(self_sortedlist)
 length = len(self_sortedlist)
 half = length // 2
 if length % 2 == 0:
 yield (self_sortedlist[half] + self_sortedlist[half - 1]) // 2
 elif length % 2 == 1:
 yield self_sortedlist[half]


def main():
 m, n = 1000, 1500000
 data = [random.randrange(m) for i in range(n)]
 # print("Random Data: ", data)
 result = list(Median(data))
 print("Result: ", result)


if __name__ == "__main__":

 main()
 print("--- %s seconds ---" % (time.time() - start_time))

Explanation

Random Number Generator

The following code generates data within the Range m and quantity n.

m, n = 1000, 15000000
data = [random.randrange(m) for i in range(n)]

Median

Question

How do I improve the code performance? Because for a large list (100 milion), it takes --- 186.7168517112732 seconds--- on my computer.

asked Aug 15 at 13:08

danieltakeshi

1585

References

Using those references:

rolling median implementations benchmark

FAST RUNNING MEDIAN USING AN INDEXABLE SKIPLIST

Calculate the median of a billion numbers

Find running median from a stream of integers

Code

A code was written to calculate the Median with the SortedContainers library:

from itertools import islice
from sortedcontainers import SortedList
import random
import time

start_time = time.time()


class Median(object):

 def __init__(self, iterable):
 self._iterable = islice(iterable, None)
 self._sortedlist = SortedList(self._iterable)

 def __iter__(self):
 self_sortedlist = self._sortedlist
 # print(self_sortedlist)
 length = len(self_sortedlist)
 half = length // 2
 if length % 2 == 0:
 yield (self_sortedlist[half] + self_sortedlist[half - 1]) // 2
 elif length % 2 == 1:
 yield self_sortedlist[half]


def main():
 m, n = 1000, 1500000
 data = [random.randrange(m) for i in range(n)]
 # print("Random Data: ", data)
 result = list(Median(data))
 print("Result: ", result)


if __name__ == "__main__":

 main()
 print("--- %s seconds ---" % (time.time() - start_time))

Explanation

Random Number Generator

The following code generates data within the Range m and quantity n.

m, n = 1000, 15000000
data = [random.randrange(m) for i in range(n)]

Median

Question

How do I improve the code performance? Because for a large list (100 milion), it takes --- 186.7168517112732 seconds--- on my computer.

python python-3.x statistics

asked Aug 15 at 13:08

danieltakeshi

1585

asked Aug 15 at 13:08

danieltakeshi

1585

asked Aug 15 at 13:08

danieltakeshi

1585

asked Aug 15 at 13:08

danieltakeshi

1585

asked Aug 15 at 13:08

danieltakeshi

1585

sorting has time complexity O(n log n). You may want to look into Medians of Medians (scroll down for a python implementation) which has linear time complexity.
â€“Â Andre Holzner
Aug 15 at 19:20

Why do you use integer division for a list with even length? Shouldn't it be a regular division of the two center values in the sorted list?
â€“Â Daniel Lenz
Aug 15 at 21:34

1

I think there is a misunderstanding; the use of heaps and such clever data structures is needed to maintain a rolling median, i.e. you have to return the current median after every number in the input. If you want to call it only once on the final list, then a simple quickselect is enough - that gets you to O(n) time. (If you were to use this algorithm for the rolling mean case, you would need O(1+2+..+n) = O(n^2) time, which is why heaps and other structures are used.)
â€“Â Ant
Aug 15 at 22:39

1

Timsort is a sorting algorithm, and it takes O(n log n) time. Quickselect is not a sorting algorithm, and it takes O(n) time. Quickselect is much faster than timsort for this use case; the key observation is that to get the median you don't need to sort the whole vector, just as if you want to compute the minimum you don't need to sort the whole vector, you can just do one pass over the elements. With the median (and in general with order statistics) it's more complicated, so you use quickselect, but it still has a O(n) performance, much better than your sorting approach which is O(n log n)
â€“Â Ant
Aug 16 at 11:58

1

@DanielLenz Oh, It was to reuse the number as Integer. Just as the data type of the input. However, if the precision of the program is needed for decimal numbers, then it can be done.
â€“Â danieltakeshi
Aug 16 at 16:33

Â |Â
show 2 more comments

sorting has time complexity O(n log n). You may want to look into Medians of Medians (scroll down for a python implementation) which has linear time complexity.
â€“Â Andre Holzner
Aug 15 at 19:20

Why do you use integer division for a list with even length? Shouldn't it be a regular division of the two center values in the sorted list?
â€“Â Daniel Lenz
Aug 15 at 21:34

1

I think there is a misunderstanding; the use of heaps and such clever data structures is needed to maintain a rolling median, i.e. you have to return the current median after every number in the input. If you want to call it only once on the final list, then a simple quickselect is enough - that gets you to O(n) time. (If you were to use this algorithm for the rolling mean case, you would need O(1+2+..+n) = O(n^2) time, which is why heaps and other structures are used.)
â€“Â Ant
Aug 15 at 22:39

1

Timsort is a sorting algorithm, and it takes O(n log n) time. Quickselect is not a sorting algorithm, and it takes O(n) time. Quickselect is much faster than timsort for this use case; the key observation is that to get the median you don't need to sort the whole vector, just as if you want to compute the minimum you don't need to sort the whole vector, you can just do one pass over the elements. With the median (and in general with order statistics) it's more complicated, so you use quickselect, but it still has a O(n) performance, much better than your sorting approach which is O(n log n)
â€“Â Ant
Aug 16 at 11:58

1

@DanielLenz Oh, It was to reuse the number as Integer. Just as the data type of the input. However, if the precision of the program is needed for decimal numbers, then it can be done.
â€“Â danieltakeshi
Aug 16 at 16:33

sorting has time complexity O(n log n). You may want to look into Medians of Medians (scroll down for a python implementation) which has linear time complexity.
â€“Â Andre Holzner
Aug 15 at 19:20

Why do you use integer division for a list with even length? Shouldn't it be a regular division of the two center values in the sorted list?
â€“Â Daniel Lenz
Aug 15 at 21:34

I think there is a misunderstanding; the use of heaps and such clever data structures is needed to maintain a rolling median, i.e. you have to return the current median after every number in the input. If you want to call it only once on the final list, then a simple quickselect is enough - that gets you to O(n) time. (If you were to use this algorithm for the rolling mean case, you would need O(1+2+..+n) = O(n^2) time, which is why heaps and other structures are used.)
â€“Â Ant
Aug 15 at 22:39

Timsort is a sorting algorithm, and it takes O(n log n) time. Quickselect is not a sorting algorithm, and it takes O(n) time. Quickselect is much faster than timsort for this use case; the key observation is that to get the median you don't need to sort the whole vector, just as if you want to compute the minimum you don't need to sort the whole vector, you can just do one pass over the elements. With the median (and in general with order statistics) it's more complicated, so you use quickselect, but it still has a O(n) performance, much better than your sorting approach which is O(n log n)
â€“Â Ant
Aug 16 at 11:58

@DanielLenz Oh, It was to reuse the number as Integer. Just as the data type of the input. However, if the precision of the program is needed for decimal numbers, then it can be done.
â€“Â danieltakeshi
Aug 16 at 16:33

Â |Â
show 2 more comments

4 Answers
4

active

oldest

votes

up vote
5
down vote

accepted

Performance

I instrumented your code differently:

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = SortedList(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = list(Median(data))
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

For me, the results were:

--- 7.407598257064819 seconds to generate
--- 4.535749673843384 seconds to sort
Result: [500]
--- 5.01109504699707 seconds ---

This shows that the generation of the random input takes longer than finding the median, and that 90% of Median() is spent sorting (probably most of the rest is caused by converting between lists and iterators). You're unlikely to get large gains through modifications to your own part of the code.

I got better results by using Python's built-in sorted(). We don't need any of the extra functionality of SortedList (maintaining the invariant through adds and deletes), and a single sort (mostly in native code) gives these results:

def median(values):
 sortedlist = sorted(values)
 length = len(sortedlist)
 half = length // 2
 # return not yield; see "General review"
 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = sorted(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

--- 7.638948202133179 seconds to generate
--- 3.118924617767334 seconds to sort
Result: 500
--- 3.3397886753082275 seconds ---

General review

I don't see why you yield a single value, rather than simply returning. Is there a need for Median() to be iterable?

Given that length is an integer, if length % 2 is not 0, it must be 1 - so the elif could easily be else.

edited Aug 15 at 14:19

answered Aug 15 at 13:39

Toby Speight

19k43697

You are right on the General Review, yield is only necessary for the running median, so i changed to return and the result to result = Median(data). Else was used instead of elif. Is there any algorithm that generates the list on an improved way, faster?
â€“Â danieltakeshi
Aug 15 at 13:52

Yes, use Python's built-in sort for a one-time sorting of a list - see edit.
â€“Â Toby Speight
Aug 15 at 14:20

add a commentÂ |Â

up vote
3
down vote

If you want to time something, you should time just that. When I ran your original code, I got 6.023478031158447 seconds. When I instead did

start_time = time.time()
result = list(Median(data))
end_time = time.time()
print("Result: ", result, end_time-start_time)

I got 1.8368241786956787. Note that calling time.time() before the print statement means that the execution of the print statement isn't included in your calculation of elapsed time.

I also tried

import statistics
start_time = time.time()
result = statistics.median(data)
end_time = time.time()
print("Result: ", result, end_time-start_time)

and got 0.7475762367248535.

What do mean by "not using heap", and why do you want that? What makes you think SortedList doesn't use a heap? If you're going to use pre-made functions, why not just use the pre-made median function?

answered Aug 15 at 19:47

Acccumulation

1,01915

Just taking some challenges to learn Python. It was without heap, because there are many answers and duplicates. Thanks for the tip for measuring the time.
â€“Â danieltakeshi
Aug 15 at 19:56

add a commentÂ |Â

up vote
2
down vote

Thanks to Toby now the program takes:

--- 7.169409990310669 seconds to finish the program--- instead of --- 186.7168517112732 seconds--- for 100 milion numbers.

One other improvement I made was to add an extra optimization on the random generating algorithm, using numpy. So numpy.sort() was used to sort the numpy array at the start of median(), because it is faster for numpy arrays.

Code

import time
import numpy

start_time0 = time.time()


def median(values):
 sortedlist = numpy.sort(values)
 length = len(sortedlist)
 half = length // 2

 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]


def main():

 m, n = 1000, 100000000
 # Random Number Generator
 start_time = time.time()
 data = numpy.random.randint(1, m, n)
 print("--- %s seconds to numpy random---" % (time.time() - start_time))
 # Median
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds to find the Median---" % (time.time() - start_time))


if __name__ == "__main__":

 main()
 print("--- %s seconds to finish the program---" % (time.time() - start_time0))

edited Aug 15 at 19:50

SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€

6,86361645

answered Aug 15 at 17:34

danieltakeshi

1585

In this case, you might want to consider posting another question if you want another review? As it is, your newly refactored code isn't more helpful than the already accepted answer :)
â€“Â IEatBagels
Aug 15 at 19:44

3

Oh forgive us - it was difficult to see the implicit change. Hopefully my edit makes it easier for others to see
â€“Â SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€
Aug 15 at 19:47

add a commentÂ |Â

up vote
0
down vote

just a note :
you can enhance just a tidbit by defining the name in the list comprehension

from

data = [random.randrange(m) for i in range(n)]

data = [random.randrange(m) for num in range(n)]

answered Aug 16 at 19:51

Abdur-Rahmaan Janhangeer

21811

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f201727%2fmedian-calculation-of-list-of-integers-without-using-heap%23new-answer', 'question_page');

);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
5
down vote

accepted

Performance

I instrumented your code differently:

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = SortedList(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = list(Median(data))
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

For me, the results were:

--- 7.407598257064819 seconds to generate
--- 4.535749673843384 seconds to sort
Result: [500]
--- 5.01109504699707 seconds ---

def median(values):
 sortedlist = sorted(values)
 length = len(sortedlist)
 half = length // 2
 # return not yield; see "General review"
 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = sorted(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

--- 7.638948202133179 seconds to generate
--- 3.118924617767334 seconds to sort
Result: 500
--- 3.3397886753082275 seconds ---

General review

I don't see why you yield a single value, rather than simply returning. Is there a need for Median() to be iterable?

Given that length is an integer, if length % 2 is not 0, it must be 1 - so the elif could easily be else.

edited Aug 15 at 14:19

answered Aug 15 at 13:39

Toby Speight

19k43697

You are right on the General Review, yield is only necessary for the running median, so i changed to return and the result to result = Median(data). Else was used instead of elif. Is there any algorithm that generates the list on an improved way, faster?
â€“Â danieltakeshi
Aug 15 at 13:52

Yes, use Python's built-in sort for a one-time sorting of a list - see edit.
â€“Â Toby Speight
Aug 15 at 14:20

add a commentÂ |Â

up vote
5
down vote

accepted

Performance

I instrumented your code differently:

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = SortedList(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = list(Median(data))
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

For me, the results were:

--- 7.407598257064819 seconds to generate
--- 4.535749673843384 seconds to sort
Result: [500]
--- 5.01109504699707 seconds ---

def median(values):
 sortedlist = sorted(values)
 length = len(sortedlist)
 half = length // 2
 # return not yield; see "General review"
 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = sorted(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

--- 7.638948202133179 seconds to generate
--- 3.118924617767334 seconds to sort
Result: 500
--- 3.3397886753082275 seconds ---

General review

I don't see why you yield a single value, rather than simply returning. Is there a need for Median() to be iterable?

Given that length is an integer, if length % 2 is not 0, it must be 1 - so the elif could easily be else.

edited Aug 15 at 14:19

answered Aug 15 at 13:39

Toby Speight

19k43697

You are right on the General Review, yield is only necessary for the running median, so i changed to return and the result to result = Median(data). Else was used instead of elif. Is there any algorithm that generates the list on an improved way, faster?
â€“Â danieltakeshi
Aug 15 at 13:52

Yes, use Python's built-in sort for a one-time sorting of a list - see edit.
â€“Â Toby Speight
Aug 15 at 14:20

add a commentÂ |Â

up vote
5
down vote

accepted

Performance

I instrumented your code differently:

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = SortedList(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = list(Median(data))
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

For me, the results were:

--- 7.407598257064819 seconds to generate
--- 4.535749673843384 seconds to sort
Result: [500]
--- 5.01109504699707 seconds ---

def median(values):
 sortedlist = sorted(values)
 length = len(sortedlist)
 half = length // 2
 # return not yield; see "General review"
 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = sorted(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

--- 7.638948202133179 seconds to generate
--- 3.118924617767334 seconds to sort
Result: 500
--- 3.3397886753082275 seconds ---

General review

I don't see why you yield a single value, rather than simply returning. Is there a need for Median() to be iterable?

Given that length is an integer, if length % 2 is not 0, it must be 1 - so the elif could easily be else.

edited Aug 15 at 14:19

answered Aug 15 at 13:39

Toby Speight

19k43697

Performance

I instrumented your code differently:

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = SortedList(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = list(Median(data))
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

For me, the results were:

--- 7.407598257064819 seconds to generate
--- 4.535749673843384 seconds to sort
Result: [500]
--- 5.01109504699707 seconds ---

def median(values):
 sortedlist = sorted(values)
 length = len(sortedlist)
 half = length // 2
 # return not yield; see "General review"
 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]

def main():
 m, n = 1000, 15000000
 start_time = time.time()
 data = [random.randrange(m) for i in range(n)]
 print("--- %s seconds to generate" % (time.time() - start_time))
 # print("Random Data: ", data)
 start_time = time.time()
 result = sorted(data)
 print("--- %s seconds to sort" % (time.time() - start_time))
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds ---" % (time.time() - start_time))

--- 7.638948202133179 seconds to generate
--- 3.118924617767334 seconds to sort
Result: 500
--- 3.3397886753082275 seconds ---

General review

I don't see why you yield a single value, rather than simply returning. Is there a need for Median() to be iterable?

Given that length is an integer, if length % 2 is not 0, it must be 1 - so the elif could easily be else.

edited Aug 15 at 14:19

answered Aug 15 at 13:39

Toby Speight

19k43697

edited Aug 15 at 14:19

answered Aug 15 at 13:39

Toby Speight

19k43697

answered Aug 15 at 13:39

Toby Speight

19k43697

answered Aug 15 at 13:39

Toby Speight

19k43697

You are right on the General Review, yield is only necessary for the running median, so i changed to return and the result to result = Median(data). Else was used instead of elif. Is there any algorithm that generates the list on an improved way, faster?
â€“Â danieltakeshi
Aug 15 at 13:52

Yes, use Python's built-in sort for a one-time sorting of a list - see edit.
â€“Â Toby Speight
Aug 15 at 14:20

add a commentÂ |Â

You are right on the General Review, yield is only necessary for the running median, so i changed to return and the result to result = Median(data). Else was used instead of elif. Is there any algorithm that generates the list on an improved way, faster?
â€“Â danieltakeshi
Aug 15 at 13:52

Yes, use Python's built-in sort for a one-time sorting of a list - see edit.
â€“Â Toby Speight
Aug 15 at 14:20

You are right on the General Review, yield is only necessary for the running median, so i changed to return and the result to result = Median(data). Else was used instead of elif. Is there any algorithm that generates the list on an improved way, faster?
â€“Â danieltakeshi
Aug 15 at 13:52

Yes, use Python's built-in sort for a one-time sorting of a list - see edit.
â€“Â Toby Speight
Aug 15 at 14:20

add a commentÂ |Â

up vote
3
down vote

If you want to time something, you should time just that. When I ran your original code, I got 6.023478031158447 seconds. When I instead did

start_time = time.time()
result = list(Median(data))
end_time = time.time()
print("Result: ", result, end_time-start_time)

I got 1.8368241786956787. Note that calling time.time() before the print statement means that the execution of the print statement isn't included in your calculation of elapsed time.

I also tried

import statistics
start_time = time.time()
result = statistics.median(data)
end_time = time.time()
print("Result: ", result, end_time-start_time)

and got 0.7475762367248535.

answered Aug 15 at 19:47

Acccumulation

1,01915

Just taking some challenges to learn Python. It was without heap, because there are many answers and duplicates. Thanks for the tip for measuring the time.
â€“Â danieltakeshi
Aug 15 at 19:56

add a commentÂ |Â

up vote
3
down vote

If you want to time something, you should time just that. When I ran your original code, I got 6.023478031158447 seconds. When I instead did

start_time = time.time()
result = list(Median(data))
end_time = time.time()
print("Result: ", result, end_time-start_time)

I got 1.8368241786956787. Note that calling time.time() before the print statement means that the execution of the print statement isn't included in your calculation of elapsed time.

I also tried

import statistics
start_time = time.time()
result = statistics.median(data)
end_time = time.time()
print("Result: ", result, end_time-start_time)

and got 0.7475762367248535.

answered Aug 15 at 19:47

Acccumulation

1,01915

Just taking some challenges to learn Python. It was without heap, because there are many answers and duplicates. Thanks for the tip for measuring the time.
â€“Â danieltakeshi
Aug 15 at 19:56

add a commentÂ |Â

up vote
3
down vote

If you want to time something, you should time just that. When I ran your original code, I got 6.023478031158447 seconds. When I instead did

start_time = time.time()
result = list(Median(data))
end_time = time.time()
print("Result: ", result, end_time-start_time)

I got 1.8368241786956787. Note that calling time.time() before the print statement means that the execution of the print statement isn't included in your calculation of elapsed time.

I also tried

import statistics
start_time = time.time()
result = statistics.median(data)
end_time = time.time()
print("Result: ", result, end_time-start_time)

and got 0.7475762367248535.

answered Aug 15 at 19:47

Acccumulation

1,01915

If you want to time something, you should time just that. When I ran your original code, I got 6.023478031158447 seconds. When I instead did

start_time = time.time()
result = list(Median(data))
end_time = time.time()
print("Result: ", result, end_time-start_time)

I got 1.8368241786956787. Note that calling time.time() before the print statement means that the execution of the print statement isn't included in your calculation of elapsed time.

I also tried

import statistics
start_time = time.time()
result = statistics.median(data)
end_time = time.time()
print("Result: ", result, end_time-start_time)

and got 0.7475762367248535.

answered Aug 15 at 19:47

Acccumulation

1,01915

answered Aug 15 at 19:47

Acccumulation

1,01915

answered Aug 15 at 19:47

Acccumulation

1,01915

answered Aug 15 at 19:47

Acccumulation

1,01915

Just taking some challenges to learn Python. It was without heap, because there are many answers and duplicates. Thanks for the tip for measuring the time.
â€“Â danieltakeshi
Aug 15 at 19:56

add a commentÂ |Â

Just taking some challenges to learn Python. It was without heap, because there are many answers and duplicates. Thanks for the tip for measuring the time.
â€“Â danieltakeshi
Aug 15 at 19:56

Just taking some challenges to learn Python. It was without heap, because there are many answers and duplicates. Thanks for the tip for measuring the time.
â€“Â danieltakeshi
Aug 15 at 19:56

add a commentÂ |Â

up vote
2
down vote

Thanks to Toby now the program takes:

--- 7.169409990310669 seconds to finish the program--- instead of --- 186.7168517112732 seconds--- for 100 milion numbers.

Code

import time
import numpy

start_time0 = time.time()


def median(values):
 sortedlist = numpy.sort(values)
 length = len(sortedlist)
 half = length // 2

 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]


def main():

 m, n = 1000, 100000000
 # Random Number Generator
 start_time = time.time()
 data = numpy.random.randint(1, m, n)
 print("--- %s seconds to numpy random---" % (time.time() - start_time))
 # Median
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds to find the Median---" % (time.time() - start_time))


if __name__ == "__main__":

 main()
 print("--- %s seconds to finish the program---" % (time.time() - start_time0))

edited Aug 15 at 19:50

SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€

6,86361645

answered Aug 15 at 17:34

danieltakeshi

1585

In this case, you might want to consider posting another question if you want another review? As it is, your newly refactored code isn't more helpful than the already accepted answer :)
â€“Â IEatBagels
Aug 15 at 19:44

3

Oh forgive us - it was difficult to see the implicit change. Hopefully my edit makes it easier for others to see
â€“Â SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€
Aug 15 at 19:47

add a commentÂ |Â

up vote
2
down vote

Thanks to Toby now the program takes:

--- 7.169409990310669 seconds to finish the program--- instead of --- 186.7168517112732 seconds--- for 100 milion numbers.

Code

import time
import numpy

start_time0 = time.time()


def median(values):
 sortedlist = numpy.sort(values)
 length = len(sortedlist)
 half = length // 2

 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]


def main():

 m, n = 1000, 100000000
 # Random Number Generator
 start_time = time.time()
 data = numpy.random.randint(1, m, n)
 print("--- %s seconds to numpy random---" % (time.time() - start_time))
 # Median
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds to find the Median---" % (time.time() - start_time))


if __name__ == "__main__":

 main()
 print("--- %s seconds to finish the program---" % (time.time() - start_time0))

edited Aug 15 at 19:50

SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€

6,86361645

answered Aug 15 at 17:34

danieltakeshi

1585

In this case, you might want to consider posting another question if you want another review? As it is, your newly refactored code isn't more helpful than the already accepted answer :)
â€“Â IEatBagels
Aug 15 at 19:44

3

Oh forgive us - it was difficult to see the implicit change. Hopefully my edit makes it easier for others to see
â€“Â SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€
Aug 15 at 19:47

add a commentÂ |Â

up vote
2
down vote

Thanks to Toby now the program takes:

--- 7.169409990310669 seconds to finish the program--- instead of --- 186.7168517112732 seconds--- for 100 milion numbers.

Code

import time
import numpy

start_time0 = time.time()


def median(values):
 sortedlist = numpy.sort(values)
 length = len(sortedlist)
 half = length // 2

 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]


def main():

 m, n = 1000, 100000000
 # Random Number Generator
 start_time = time.time()
 data = numpy.random.randint(1, m, n)
 print("--- %s seconds to numpy random---" % (time.time() - start_time))
 # Median
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds to find the Median---" % (time.time() - start_time))


if __name__ == "__main__":

 main()
 print("--- %s seconds to finish the program---" % (time.time() - start_time0))

edited Aug 15 at 19:50

SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€

6,86361645

answered Aug 15 at 17:34

danieltakeshi

1585

Thanks to Toby now the program takes:

--- 7.169409990310669 seconds to finish the program--- instead of --- 186.7168517112732 seconds--- for 100 milion numbers.

Code

import time
import numpy

start_time0 = time.time()


def median(values):
 sortedlist = numpy.sort(values)
 length = len(sortedlist)
 half = length // 2

 if length % 2 == 0:
 return (sortedlist[half] + sortedlist[half - 1]) // 2
 else:
 return sortedlist[half]


def main():

 m, n = 1000, 100000000
 # Random Number Generator
 start_time = time.time()
 data = numpy.random.randint(1, m, n)
 print("--- %s seconds to numpy random---" % (time.time() - start_time))
 # Median
 start_time = time.time()
 result = median(data)
 print("Result: ", result)
 print("--- %s seconds to find the Median---" % (time.time() - start_time))


if __name__ == "__main__":

 main()
 print("--- %s seconds to finish the program---" % (time.time() - start_time0))

edited Aug 15 at 19:50

SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€

6,86361645

answered Aug 15 at 17:34

danieltakeshi

1585

edited Aug 15 at 19:50

SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€

6,86361645

edited Aug 15 at 19:50

SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€

6,86361645

edited Aug 15 at 19:50

SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€

6,86361645

answered Aug 15 at 17:34

danieltakeshi

1585

answered Aug 15 at 17:34

danieltakeshi

1585

answered Aug 15 at 17:34

danieltakeshi

1585

In this case, you might want to consider posting another question if you want another review? As it is, your newly refactored code isn't more helpful than the already accepted answer :)
â€“Â IEatBagels
Aug 15 at 19:44

3

Oh forgive us - it was difficult to see the implicit change. Hopefully my edit makes it easier for others to see
â€“Â SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€
Aug 15 at 19:47

add a commentÂ |Â

In this case, you might want to consider posting another question if you want another review? As it is, your newly refactored code isn't more helpful than the already accepted answer :)
â€“Â IEatBagels
Aug 15 at 19:44

3

Oh forgive us - it was difficult to see the implicit change. Hopefully my edit makes it easier for others to see
â€“Â SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€
Aug 15 at 19:47

In this case, you might want to consider posting another question if you want another review? As it is, your newly refactored code isn't more helpful than the already accepted answer :)
â€“Â IEatBagels
Aug 15 at 19:44

Oh forgive us - it was difficult to see the implicit change. Hopefully my edit makes it easier for others to see
â€“Â SÃ¡Â´Â€Ã¡Â´Â OnÃ¡Â´Â‡Ã¡Â´ÂŒÃ¡Â´Â€
Aug 15 at 19:47

add a commentÂ |Â

up vote
0
down vote

just a note :
you can enhance just a tidbit by defining the name in the list comprehension

from

data = [random.randrange(m) for i in range(n)]

data = [random.randrange(m) for num in range(n)]

answered Aug 16 at 19:51

Abdur-Rahmaan Janhangeer

21811

add a commentÂ |Â

up vote
0
down vote

just a note :
you can enhance just a tidbit by defining the name in the list comprehension

from

data = [random.randrange(m) for i in range(n)]

data = [random.randrange(m) for num in range(n)]

answered Aug 16 at 19:51

Abdur-Rahmaan Janhangeer

21811

add a commentÂ |Â

up vote
0
down vote

just a note :
you can enhance just a tidbit by defining the name in the list comprehension

from

data = [random.randrange(m) for i in range(n)]

data = [random.randrange(m) for num in range(n)]

answered Aug 16 at 19:51

Abdur-Rahmaan Janhangeer

21811

just a note :
you can enhance just a tidbit by defining the name in the list comprehension

from

data = [random.randrange(m) for i in range(n)]

data = [random.randrange(m) for num in range(n)]

answered Aug 16 at 19:51

Abdur-Rahmaan Janhangeer

21811

answered Aug 16 at 19:51

Abdur-Rahmaan Janhangeer

21811

answered Aug 16 at 19:51

Abdur-Rahmaan Janhangeer

21811

answered Aug 16 at 19:51

Abdur-Rahmaan Janhangeer

21811

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Nnf 7BT3 BMaqMC,SU kT3eeS1,zcIwcwLz9MH5nW