Pythonic way of collapsing/grouping a list to aggregating max/min
Clash Royale CLAN TAG#URR8PPP
up vote
6
down vote
favorite
Lets say I have the following list in python. It is ordered first by Equip, then by Date:
my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]
What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:
list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]
A couple of things to note:
A-2
onJob 1
is only there for a single day, thus itsFirst
andLast
Date should be the same.- A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.
- As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)
For point 3, the list
my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]
should yield
list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
]
Currently I am doing so in a simple loop/non-pythonic way:
list_by_job =
last_entry = None
for entry in my_list:
if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
else:
list_by_job[-1]['Last'] = entry['Date']
last_entry = entry
Is there a more pythonic way to do this using Python's list comprehension, etc?
python list aggregate list-comprehension
add a comment |Â
up vote
6
down vote
favorite
Lets say I have the following list in python. It is ordered first by Equip, then by Date:
my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]
What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:
list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]
A couple of things to note:
A-2
onJob 1
is only there for a single day, thus itsFirst
andLast
Date should be the same.- A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.
- As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)
For point 3, the list
my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]
should yield
list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
]
Currently I am doing so in a simple loop/non-pythonic way:
list_by_job =
last_entry = None
for entry in my_list:
if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
else:
list_by_job[-1]['Last'] = entry['Date']
last_entry = entry
Is there a more pythonic way to do this using Python's list comprehension, etc?
python list aggregate list-comprehension
1
Looks more like a job for pandas.
â Willem Van Onsem
2 hours ago
add a comment |Â
up vote
6
down vote
favorite
up vote
6
down vote
favorite
Lets say I have the following list in python. It is ordered first by Equip, then by Date:
my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]
What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:
list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]
A couple of things to note:
A-2
onJob 1
is only there for a single day, thus itsFirst
andLast
Date should be the same.- A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.
- As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)
For point 3, the list
my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]
should yield
list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
]
Currently I am doing so in a simple loop/non-pythonic way:
list_by_job =
last_entry = None
for entry in my_list:
if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
else:
list_by_job[-1]['Last'] = entry['Date']
last_entry = entry
Is there a more pythonic way to do this using Python's list comprehension, etc?
python list aggregate list-comprehension
Lets say I have the following list in python. It is ordered first by Equip, then by Date:
my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]
What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:
list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]
A couple of things to note:
A-2
onJob 1
is only there for a single day, thus itsFirst
andLast
Date should be the same.- A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.
- As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)
For point 3, the list
my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]
should yield
list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
]
Currently I am doing so in a simple loop/non-pythonic way:
list_by_job =
last_entry = None
for entry in my_list:
if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
else:
list_by_job[-1]['Last'] = entry['Date']
last_entry = entry
Is there a more pythonic way to do this using Python's list comprehension, etc?
python list aggregate list-comprehension
python list aggregate list-comprehension
edited 1 hour ago
asked 2 hours ago
MarkD
2,47611833
2,47611833
1
Looks more like a job for pandas.
â Willem Van Onsem
2 hours ago
add a comment |Â
1
Looks more like a job for pandas.
â Willem Van Onsem
2 hours ago
1
1
Looks more like a job for pandas.
â Willem Van Onsem
2 hours ago
Looks more like a job for pandas.
â Willem Van Onsem
2 hours ago
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
6
down vote
accepted
You can use itertools.groupby
:
import itertools
def _key(d):
return (d['Equip'], d['Job'])
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
Edit:
Using data as suggested in your comment:
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
I thinknew_data
should be sorted by date too third (and not justEquip
andJob
).itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â slider
2 hours ago
_key = itemgetter('Equip', 'Job')
would be the built-in way to create that key function.
â schwobaseggl
1 hour ago
Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the secondA-1
Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â MarkD
1 hour ago
@MarkD Please see my recent edit.
â Ajax1234
1 hour ago
add a comment |Â
up vote
3
down vote
I suggest using pandas
for this.
itertools.groupby
is cool but IMO a bit harder to comprehend.
>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>>
Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>>
['Equip': 'A-1',
'First': Timestamp('2018-01-01 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-1',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 2',
'Last': Timestamp('2018-01-05 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-03 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 3',
'Last': Timestamp('2018-01-05 00:00:00')]
I suggest keeping the dates as time stamps.
add a comment |Â
up vote
1
down vote
You can use pandas here, which is some sort of "database interface" for data:
import pandas as pd
df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
for the given sample input, this gives:
>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']
In case the date format is not '%Y-%m-%d'
, then one first needs to convert it with pd.to_datetime(..)
like:
import pandas as pd
df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â MarkD
1 hour ago
@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â Willem Van Onsem
1 hour ago
Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â timgeb
1 hour ago
@timgeb: if the strings are formatted'%Y-%m-%d'
, then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â Willem Van Onsem
1 hour ago
Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â timgeb
1 hour ago
 |Â
show 2 more comments
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
You can use itertools.groupby
:
import itertools
def _key(d):
return (d['Equip'], d['Job'])
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
Edit:
Using data as suggested in your comment:
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
I thinknew_data
should be sorted by date too third (and not justEquip
andJob
).itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â slider
2 hours ago
_key = itemgetter('Equip', 'Job')
would be the built-in way to create that key function.
â schwobaseggl
1 hour ago
Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the secondA-1
Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â MarkD
1 hour ago
@MarkD Please see my recent edit.
â Ajax1234
1 hour ago
add a comment |Â
up vote
6
down vote
accepted
You can use itertools.groupby
:
import itertools
def _key(d):
return (d['Equip'], d['Job'])
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
Edit:
Using data as suggested in your comment:
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
I thinknew_data
should be sorted by date too third (and not justEquip
andJob
).itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â slider
2 hours ago
_key = itemgetter('Equip', 'Job')
would be the built-in way to create that key function.
â schwobaseggl
1 hour ago
Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the secondA-1
Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â MarkD
1 hour ago
@MarkD Please see my recent edit.
â Ajax1234
1 hour ago
add a comment |Â
up vote
6
down vote
accepted
up vote
6
down vote
accepted
You can use itertools.groupby
:
import itertools
def _key(d):
return (d['Equip'], d['Job'])
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
Edit:
Using data as suggested in your comment:
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
You can use itertools.groupby
:
import itertools
def _key(d):
return (d['Equip'], d['Job'])
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
Edit:
Using data as suggested in your comment:
my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
Output:
['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']
edited 1 hour ago
answered 2 hours ago
Ajax1234
37.8k42249
37.8k42249
I thinknew_data
should be sorted by date too third (and not justEquip
andJob
).itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â slider
2 hours ago
_key = itemgetter('Equip', 'Job')
would be the built-in way to create that key function.
â schwobaseggl
1 hour ago
Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the secondA-1
Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â MarkD
1 hour ago
@MarkD Please see my recent edit.
â Ajax1234
1 hour ago
add a comment |Â
I thinknew_data
should be sorted by date too third (and not justEquip
andJob
).itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â slider
2 hours ago
_key = itemgetter('Equip', 'Job')
would be the built-in way to create that key function.
â schwobaseggl
1 hour ago
Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the secondA-1
Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â MarkD
1 hour ago
@MarkD Please see my recent edit.
â Ajax1234
1 hour ago
I think
new_data
should be sorted by date too third (and not just Equip
and Job
). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â slider
2 hours ago
I think
new_data
should be sorted by date too third (and not just Equip
and Job
). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â slider
2 hours ago
_key = itemgetter('Equip', 'Job')
would be the built-in way to create that key function.â schwobaseggl
1 hour ago
_key = itemgetter('Equip', 'Job')
would be the built-in way to create that key function.â schwobaseggl
1 hour ago
Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second
A-1
Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.â MarkD
1 hour ago
Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second
A-1
Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.â MarkD
1 hour ago
@MarkD Please see my recent edit.
â Ajax1234
1 hour ago
@MarkD Please see my recent edit.
â Ajax1234
1 hour ago
add a comment |Â
up vote
3
down vote
I suggest using pandas
for this.
itertools.groupby
is cool but IMO a bit harder to comprehend.
>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>>
Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>>
['Equip': 'A-1',
'First': Timestamp('2018-01-01 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-1',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 2',
'Last': Timestamp('2018-01-05 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-03 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 3',
'Last': Timestamp('2018-01-05 00:00:00')]
I suggest keeping the dates as time stamps.
add a comment |Â
up vote
3
down vote
I suggest using pandas
for this.
itertools.groupby
is cool but IMO a bit harder to comprehend.
>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>>
Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>>
['Equip': 'A-1',
'First': Timestamp('2018-01-01 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-1',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 2',
'Last': Timestamp('2018-01-05 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-03 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 3',
'Last': Timestamp('2018-01-05 00:00:00')]
I suggest keeping the dates as time stamps.
add a comment |Â
up vote
3
down vote
up vote
3
down vote
I suggest using pandas
for this.
itertools.groupby
is cool but IMO a bit harder to comprehend.
>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>>
Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>>
['Equip': 'A-1',
'First': Timestamp('2018-01-01 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-1',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 2',
'Last': Timestamp('2018-01-05 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-03 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 3',
'Last': Timestamp('2018-01-05 00:00:00')]
I suggest keeping the dates as time stamps.
I suggest using pandas
for this.
itertools.groupby
is cool but IMO a bit harder to comprehend.
>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>>
Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>>
['Equip': 'A-1',
'First': Timestamp('2018-01-01 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-1',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 2',
'Last': Timestamp('2018-01-05 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-03 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 3',
'Last': Timestamp('2018-01-05 00:00:00')]
I suggest keeping the dates as time stamps.
answered 1 hour ago
timgeb
41.6k105581
41.6k105581
add a comment |Â
add a comment |Â
up vote
1
down vote
You can use pandas here, which is some sort of "database interface" for data:
import pandas as pd
df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
for the given sample input, this gives:
>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']
In case the date format is not '%Y-%m-%d'
, then one first needs to convert it with pd.to_datetime(..)
like:
import pandas as pd
df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â MarkD
1 hour ago
@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â Willem Van Onsem
1 hour ago
Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â timgeb
1 hour ago
@timgeb: if the strings are formatted'%Y-%m-%d'
, then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â Willem Van Onsem
1 hour ago
Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â timgeb
1 hour ago
 |Â
show 2 more comments
up vote
1
down vote
You can use pandas here, which is some sort of "database interface" for data:
import pandas as pd
df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
for the given sample input, this gives:
>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']
In case the date format is not '%Y-%m-%d'
, then one first needs to convert it with pd.to_datetime(..)
like:
import pandas as pd
df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â MarkD
1 hour ago
@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â Willem Van Onsem
1 hour ago
Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â timgeb
1 hour ago
@timgeb: if the strings are formatted'%Y-%m-%d'
, then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â Willem Van Onsem
1 hour ago
Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â timgeb
1 hour ago
 |Â
show 2 more comments
up vote
1
down vote
up vote
1
down vote
You can use pandas here, which is some sort of "database interface" for data:
import pandas as pd
df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
for the given sample input, this gives:
>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']
In case the date format is not '%Y-%m-%d'
, then one first needs to convert it with pd.to_datetime(..)
like:
import pandas as pd
df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
You can use pandas here, which is some sort of "database interface" for data:
import pandas as pd
df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
for the given sample input, this gives:
>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']
In case the date format is not '%Y-%m-%d'
, then one first needs to convert it with pd.to_datetime(..)
like:
import pandas as pd
df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')
edited 1 hour ago
answered 1 hour ago
Willem Van Onsem
136k16128218
136k16128218
Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â MarkD
1 hour ago
@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â Willem Van Onsem
1 hour ago
Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â timgeb
1 hour ago
@timgeb: if the strings are formatted'%Y-%m-%d'
, then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â Willem Van Onsem
1 hour ago
Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â timgeb
1 hour ago
 |Â
show 2 more comments
Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â MarkD
1 hour ago
@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â Willem Van Onsem
1 hour ago
Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â timgeb
1 hour ago
@timgeb: if the strings are formatted'%Y-%m-%d'
, then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â Willem Van Onsem
1 hour ago
Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â timgeb
1 hour ago
Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â MarkD
1 hour ago
Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â MarkD
1 hour ago
@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â Willem Van Onsem
1 hour ago
@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â Willem Van Onsem
1 hour ago
Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â timgeb
1 hour ago
Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â timgeb
1 hour ago
@timgeb: if the strings are formatted
'%Y-%m-%d'
, then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.â Willem Van Onsem
1 hour ago
@timgeb: if the strings are formatted
'%Y-%m-%d'
, then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.â Willem Van Onsem
1 hour ago
Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â timgeb
1 hour ago
Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â timgeb
1 hour ago
 |Â
show 2 more comments
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53144837%2fpythonic-way-of-collapsing-grouping-a-list-to-aggregating-max-min%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Looks more like a job for pandas.
â Willem Van Onsem
2 hours ago