What data structure container can be sorted by date - python

is there any data structure that can be sorted by date in Python 3?
('2015-08-01', 10,10)
('2015-08-03', 11,11)
.. and so on ..
I know i can use pandas dataframe, but like to know if there are other more lightweight alternatives.

Since the date is a string in YYYY-MM-DD format, it's already sortable in the way you'd expect. And since the dates are the first item, you don't even need to provide a key function.
data = [('2015-08-03', 11,11), ('2015-08-01', 10,10)]
data.sort()
print(data)
Result:
[('2015-08-01', 10, 10), ('2015-08-03', 11, 11)]
If the date wasn't the first item, you could do this:
import operator
data = [('a', '2015-08-03', 11,11), ('b', '2015-08-01', 10,10)]
data.sort(key=operator.itemgetter(1))
print(data)
Result:
[('b', '2015-08-01', 10, 10), ('a', '2015-08-03', 11, 11)]

Related

Python group a list of years, months to remove duplication of years

I have a table that contains orders, in which contains the date column. I am getting back the aggregate of the years, and months from that column so that I can use that data in a filter on the front end.
I have managed to get this data back, however it is not formatted in the way I would like.
Python
years = purchase_orders.objects.filter(user_id=info.context.user.id).annotate(year=ExtractYear('date'), month=ExtractMonth('date'),).order_by().values_list('year', 'month').order_by('year', 'month').distinct()
Data Returned
<QuerySet [(2020, 3), (2021, 4), (2022, 1), (2022, 2), (2022, 3), (2022, 4), (2022, 5)]>
Ideal Format
<QuerySet [(2020, (3)), (2021, (4)), (2022, (1, 2, 3, 4, 5))]>
You can work with the groupby(…) function [Python-doc] of the itertools module [Python-doc]:
from itertools import groupby
from operator import itemgetter
years = purchase_orders.objects.filter(
user=info.context.user
).values(
year=ExtractYear('date'),
month=ExtractMonth('date')
).order_by('year', 'month').distinct()
years = [
(y, [ym['month'] for ym in yms])
for y, yms in groupby(years, itemgetter('year'))
]
We thus first fetch the data from the database, and then post-process this by constructing a list of 2-tuples where the first item contains the year, and the second is a list of months for that year.

Applying the counter from collection to a column in a dataframe

I have a column of strings, where each row is a list of strings. I want to count the elements of the column in its entirety and not just the rows which one gets with the value.counts() in pandas.
I want to apply the Counter() from the Collections module, but that runs only on a list. My column in the DataFrame looks like this:
[['FollowFriday', 'Awesome'],
['Covid_19', 'corona', 'Notagain'],
['Awesome'],
['FollowFriday', 'Awesome'],
[],
['corona', Notagain],
....]
I want to get the counts, such as
[('FollowFriday', 2),
('Awesome', 3),
('Corona', 2),
('Covid19'),
('Notagain', 2),
.....]
The basic command that I am using is:
from collection import Counter
Counter(df['column'])
OR
from collections import Counter
Counter(" ".join(df['column']).split()).most_common()
Any help would be greatly appreciated!
IIUC, your comparison to pandas was only to explain your goal and you want to work with lists?
You can use:
l = [['FollowFriday', 'Awesome'],
['Covid_19', 'corona', 'Notagain'],
['Awesome'],
['FollowFriday', 'Awesome'],
[],
['corona', 'Notagain'],
]
from collections import Counter
from itertools import chain
out = Counter(chain.from_iterable(l))
or if you have a Series of lists, use explode:
out = Counter(df['column'].explode())
# OR
out = df['column'].explode().value_counts()
output:
Counter({'FollowFriday': 2,
'Awesome': 3,
'Covid_19': 1,
'corona': 2,
'Notagain': 2})

Is there a way to sort a dictionary from the outside in

I'm trying to create an event manager in which a dictionary stores the events like this
my_dict = {'2020':
{'9': {'8': ['School ']},
'11': {'13': ['Doctors ']},
'8': {'31': ['Interview']}
},
'2021': {}}
In which the outer key is the year the middle key is a month and the most inner key is a date which leads to a list of events.
I'm trying to first sort it so that the months are in order then sort it again so that the days are in order. Thanks in advance
Use-case
DevOrangeCrush wishes to sort on keys in a nested dictionary where the nesting occurs on multiple levels
Solution
Normalize the data so that the dates match ISO8601 format, for easier sorting
In plain English, this means make sure you always use two digits for month and date, and always use four digits for year
Re-normalize the original dictionary data structure into a single list of dictionaries, where each dictionary represents a row, and the list represents an outer containing table
this is known as an Array of Hashes in perl-speak
this is known as a list of objects in JSON-speak
Once your data is restructured you are solving a much more well-known, well-documented, and more obvious problem, how to sort a simple list of dictionaries (which is already documented in the See also section of this answer).
Example
import pprint
## original data is formatted as a nested dictionary, which is clumsy
my_dict = {'2020':
{'9': {'8': ['School ']}, '11':
{'13': ['Doctors ']},'8':
{'31': ['Interview']}}, '2021': {}
}
## we want the data formatted as a standard table (aka list of dictionary)
## this is the most common format for this kind of data as you would see in
## databases and spreadsheets
mydata_table = []
ddtemp = dict()
for year in my_dict:
for month in my_dict[year].keys():
ddtemp['month'] = '{0:02d}'.format(*[int(month)])
ddtemp['year'] = year
for day in my_dict[year][month].keys():
ddtemp['day'] = '{0:02d}'.format(*[int(day)])
mydata_row = dict()
mydata_row['year'] = '{year}'.format(**ddtemp)
mydata_row['month'] = '{month}'.format(**ddtemp)
mydata_row['day'] = '{day}'.format(**ddtemp)
mydata_row['task_list'] = my_dict[year][month][day]
mydata_row['date'] = '{year}-{month}-{day}'.format(**ddtemp)
mydata_table.append(mydata_row)
pass
pass
pass
## output result is now easily sorted and there is no data loss
## you will have to modify this if you want to deal with years that
## do not have any associated task_list data
pprint.pprint(mydata_table)
'''
## now we have something that can be sorted using well-known python idioms
## and easily manipulated using data-table semantics
## (search, sort, filter-by, group-by, select, project ... etc)
[
{'date': '2020-09-08','day': '08',
'month': '09','task_list': ['School '],'year': '2020'},
{'date': '2020-11-13','day': '13',
'month': '11','task_list': ['Doctors '],'year': '2020'},
{'date': '2020-08-31','day': '31',
'month': '08','task_list': ['Interview'],'year': '2020'},
]
'''
See also
How to sort a python list-of-dictionary
How to sort objects by multiple keys
Why you should use ISO8601 date format
ISO8601 vs timestamp
To get sorted events data, you can do something like this:
def sort_events(my_dict):
new_events_data = dict()
for year, month_data in my_dict.items():
new_month_data = dict()
for month, day_data in month_data.items():
sorted_day_data = sorted(day_data.items(), key=lambda kv: int(kv[0]))
new_month_data[month] = OrderedDict(sorted_day_data)
sorted_months_data = sorted(new_month_data.items(), key=lambda kv: int(kv[0]))
new_events_data[year] = OrderedDict(sorted_months_data)
return new_events_data
Output:
{'2020': OrderedDict([('8', OrderedDict([('31', ['Interview'])])),
('9', OrderedDict([('8', ['School '])])),
('11', OrderedDict([('13', ['Doctors '])]))]),
'2021': OrderedDict()}
A simple dict can't be ordered, you could do it using a OrderedDict but if you simply need to get it sorted while iterating on it do like this
for year in sorted(map(int, my_dict)):
year_dict = my_dict[str(year)]
for month in sorted(map(int, year_dict)):
month_dict = year_dict[str(month)]
for day in sorted(map(int, month_dict)):
events = month_dict[str(day)]
for event in events:
print(year, month, day, event)
Online Demo
The conversion to int is to ensure right ordering between the numbers, without you'll get 1, 10, 11, .., 2, 20, 21
A dictionary in Python does not have an order, you might want to try the OrderedDict class from the collections Module which remembers the order of insertion.
Of course you would have to sort and reinsert the elements whenever you insert a new element which should be placed before any of the existing elements.
If you care about order maybe a different data structure works better. For example a list of lists.

Ways to categorise data?

I wan't to allocate downloaded data (csv) into for simplicity say 3 categories. Has anyone got any tips or similar projects i could look at or python tools i should look at.
3 categories are...
Shares: Include the following a,b,c
Bonds: Include the following d,e,f
Cash: g
My downloaded data may have any combination of the above investments with any value.
https://docs.google.com/spreadsheets/d/1GU7jVLA-YzqRTxyLMdbymdJ6b1RtB09bpOjIDX6eJok/edit?usp=sharing
Thats 2 basic example of what the data will be downloaded as and what I want it to be converted to.
The real data will have 10-15 investments and approx 4 catergories I just want to know is possible to sort like this? It gets tricky as we have longer investment names and some are similar but sorted into different catergories.
If some one could point me in the right direction, i.e do i need a dictionary or some basic framework or code to look at that would be awesome.
Keen to learn but don't know where to start cheers - this is my first proper coding project.
Im not to fussed about the formatting of the output, as long as it clearly categorises the info and sums each category i'm happy :)
You don't need a framework, just the builtins will do (as usual in Python).
from collections import defaultdict
# Input data "rows". These would probably be loaded from a file.
raw_data = [
('a', 1000.00),
('b', 2000.00),
('d', 3000.00),
('e', 4000.00),
('g', 5000.00),
('g', 10000.00),
('c', 5000.00),
('d', 2000.00),
('a', 4000.00),
('e', 5000.00),
]
# Category definitions, mapping a category name to the row "types" (first column).
categories = {
'Shares': {'a', 'b', 'c'},
'Bonds': {'d', 'e', 'f'},
'Cash': {'g'},
}
# Build an inverse map that makes lookups faster later.
# This will look like e.g. {"a": "Shares", "b": "Shares", ...}
category_map = {}
for category, members in categories.items():
for member in members:
category_map[member] = category
# Initialize an empty defaultdict to group the rows with.
rows_per_category = defaultdict(list)
# Iterate through the raw data...
for row in raw_data:
type = row[0] # Grab the first column per row,
category = category_map[type] # map it through the category map (this will crash if the category is undefined),
rows_per_category[category].append(row) # and put it in the defaultdict.
# Iterate through the now collated rows in sorted-by-category order:
for category, rows in sorted(rows_per_category.items()):
# Sum the second column (value) for the total.
total = sum(row[1] for row in rows)
# Print header.
print("###", category)
# Print each row.
for row in rows:
print(row)
# Print the total and an empty line.
print("=== Total", total)
print()
This will output something like
### Bonds
('d', 3000.0)
('e', 4000.0)
('d', 2000.0)
('e', 5000.0)
=== Total 14000.0
### Cash
('g', 5000.0)
('g', 10000.0)
=== Total 15000.0
### Shares
('a', 1000.0)
('b', 2000.0)
('c', 5000.0)
('a', 4000.0)
=== Total 12000.0

Django orm - how to get distinct items of field within query of distinct dates

I have the following models:
class Event(models.Model):
date = models.DateTimeField()
event_type = models.ForeignKey('EventType')
class EventType(models.Model):
name = models.CharField(unique=True)
I am trying to get a list of all dates, and what event types are available on that date.
Each item in the list would be a dictionary with two fields: date and event_types which would be a list of distinct event types available on that date.
Currently I have come up with a query to get me a list of all distinct dates, but this is only half of what I want to do:
query = Event.objects.all().select_related('event_type')
results = query.distinct('date').order_by('date').values_list('date', flat=True)
Now I can change this slightly to get me a list of all distinct date + event_type combinations:
query = Event.objects.all().select_related('event_type')
results = query.order_by('date').distinct('date', 'event_type').values_list('date', 'event_type__name')
But this will have an entry for each event type within a given date. I need to aggregate a list within each date.
Is there a way I can construct a queryset to do this? If not, how would I do this some other way to get to the same result?
You can perform such aggregate with the groupby function of itertools. It is a requirement that the elements appearch in "chunks" with respect to the "grouper criteria". But this is the case here, since you use order_by.
We can thus write it like:
from itertools import groupby
from operator import itemgetter
query = (Event.objects.all.select_related('event_type')
.order_by('date', 'event_type')
.distinct('date', 'event_type')
.values_list('date', 'event_type__name'))
result = [
{ 'date': k, 'datetypes': [v[1] for v in vs]}
for k, vs in groupby(query, itemgetter(0))
]
You also better use 'event_type' in the order by criterion.
This will result in something like:
[{'date': datetime.date(2018, 5, 19), 'datetypes': ['Famous person died',
'Royal wedding']},
{'date': datetime.date(2018, 5, 24), 'datetypes': ['Famous person died']},
{'date': datetime.date(2011, 5, 25), 'datetypes': ['Important law enforced',
'Referendum']}]
(based on quick Wikipedia scan of the last days in May).
The groupby works in linear time with the number of rows returned.

Categories

Resources