Searching a list of lists? - python

I have a list of lists called people in python
people = [['10000', '2018-02-04', 'Park', 'Chan'], ['10047', '2018-05-09', 'Tuckwell', 'Luke'], ['10207', '2018-05-06', 'Trentham', 'Sam'], ['10207', '2018-05-06', 'Smith', 'Tristin'], ['10511', '2018-02-07', 'Cotton', 'Marco'], ['10763', '2018-03-07', 'Wideman', 'Jocelyn'], ['10804', '2018-05-09', 'Hamm', 'Megan']]
Each individual list contains an ID, Expiry Date, Last Name, First Name.
What is the best way to go about defining a function that searches 'people' for a desired ID input and returns whether the expiry date of that person is passed today's date or not?
Thanks!

If you can it is better to convert your list of lists to a dictionary where the key is the ID, this way the searches (when performed by ID) will be more efficient.

One way is to use a dictionary to structure your data, mapping id to detail. Another improvement is to store your dates as datetime objects.
from datetime import datetime
people = [['10000', '2018-02-04', 'Park', 'Chan'], ['10047', '2018-05-09', 'Tuckwell', 'Luke'],
['10207', '2018-05-06', 'Trentham', 'Sam'], ['10207', '2018-05-06', 'Smith', 'Tristin'],
['10511', '2018-02-07', 'Cotton', 'Marco'], ['10763', '2018-03-07', 'Wideman', 'Jocelyn'],
['10804', '2018-05-09', 'Hamm', 'Megan']]
# create a dictionary mapping, convert dates to datetime objects
d = {k: [datetime.strptime(v[0], '%Y-%m-%d'), v[1], v[2]] for k, *v in people}
# function to calculate whether date has passed for given id
def return_date_passed(d, i):
return d[i][0] < datetime.now()
res = return_date_passed(d, '10000') # True
res = return_date_passed(d, '10207') # False

Related

python convert text rows to dictionary based on conditional match

I have the below string and need help on how write an if condition in a for loop that check if the row.startswith('name') then take the value and store is in a variable called name. Similarly for dob as well.
Once the for loop completes the output should be a dictionary as below which i can convert to a pandas dataframe.
'name john\n \n\nDOB\n12/08/1984\n\ncurrent company\ngoogle\n'
This is what i have tried so far but do not know how to get the values into a dictionary
for row in lines.split('\n'):
if row.startswith('name'):
name = row.split()[-1]
Final Ouput
data = {"name":"john", "dob": "12/08/1984"}
Try using a list comprehension and split:
s = '''name
john
dob
12/08/1984
current company
google'''
d = dict([i.splitlines() for i in s.split('\n\n')])
print(d)
Output:
{'name': 'john', 'dob': '12/08/1984', 'current company': 'google'}

Is there a way to sort a dictionary from the outside in

I'm trying to create an event manager in which a dictionary stores the events like this
my_dict = {'2020':
{'9': {'8': ['School ']},
'11': {'13': ['Doctors ']},
'8': {'31': ['Interview']}
},
'2021': {}}
In which the outer key is the year the middle key is a month and the most inner key is a date which leads to a list of events.
I'm trying to first sort it so that the months are in order then sort it again so that the days are in order. Thanks in advance
Use-case
DevOrangeCrush wishes to sort on keys in a nested dictionary where the nesting occurs on multiple levels
Solution
Normalize the data so that the dates match ISO8601 format, for easier sorting
In plain English, this means make sure you always use two digits for month and date, and always use four digits for year
Re-normalize the original dictionary data structure into a single list of dictionaries, where each dictionary represents a row, and the list represents an outer containing table
this is known as an Array of Hashes in perl-speak
this is known as a list of objects in JSON-speak
Once your data is restructured you are solving a much more well-known, well-documented, and more obvious problem, how to sort a simple list of dictionaries (which is already documented in the See also section of this answer).
Example
import pprint
## original data is formatted as a nested dictionary, which is clumsy
my_dict = {'2020':
{'9': {'8': ['School ']}, '11':
{'13': ['Doctors ']},'8':
{'31': ['Interview']}}, '2021': {}
}
## we want the data formatted as a standard table (aka list of dictionary)
## this is the most common format for this kind of data as you would see in
## databases and spreadsheets
mydata_table = []
ddtemp = dict()
for year in my_dict:
for month in my_dict[year].keys():
ddtemp['month'] = '{0:02d}'.format(*[int(month)])
ddtemp['year'] = year
for day in my_dict[year][month].keys():
ddtemp['day'] = '{0:02d}'.format(*[int(day)])
mydata_row = dict()
mydata_row['year'] = '{year}'.format(**ddtemp)
mydata_row['month'] = '{month}'.format(**ddtemp)
mydata_row['day'] = '{day}'.format(**ddtemp)
mydata_row['task_list'] = my_dict[year][month][day]
mydata_row['date'] = '{year}-{month}-{day}'.format(**ddtemp)
mydata_table.append(mydata_row)
pass
pass
pass
## output result is now easily sorted and there is no data loss
## you will have to modify this if you want to deal with years that
## do not have any associated task_list data
pprint.pprint(mydata_table)
'''
## now we have something that can be sorted using well-known python idioms
## and easily manipulated using data-table semantics
## (search, sort, filter-by, group-by, select, project ... etc)
[
{'date': '2020-09-08','day': '08',
'month': '09','task_list': ['School '],'year': '2020'},
{'date': '2020-11-13','day': '13',
'month': '11','task_list': ['Doctors '],'year': '2020'},
{'date': '2020-08-31','day': '31',
'month': '08','task_list': ['Interview'],'year': '2020'},
]
'''
See also
How to sort a python list-of-dictionary
How to sort objects by multiple keys
Why you should use ISO8601 date format
ISO8601 vs timestamp
To get sorted events data, you can do something like this:
def sort_events(my_dict):
new_events_data = dict()
for year, month_data in my_dict.items():
new_month_data = dict()
for month, day_data in month_data.items():
sorted_day_data = sorted(day_data.items(), key=lambda kv: int(kv[0]))
new_month_data[month] = OrderedDict(sorted_day_data)
sorted_months_data = sorted(new_month_data.items(), key=lambda kv: int(kv[0]))
new_events_data[year] = OrderedDict(sorted_months_data)
return new_events_data
Output:
{'2020': OrderedDict([('8', OrderedDict([('31', ['Interview'])])),
('9', OrderedDict([('8', ['School '])])),
('11', OrderedDict([('13', ['Doctors '])]))]),
'2021': OrderedDict()}
A simple dict can't be ordered, you could do it using a OrderedDict but if you simply need to get it sorted while iterating on it do like this
for year in sorted(map(int, my_dict)):
year_dict = my_dict[str(year)]
for month in sorted(map(int, year_dict)):
month_dict = year_dict[str(month)]
for day in sorted(map(int, month_dict)):
events = month_dict[str(day)]
for event in events:
print(year, month, day, event)
Online Demo
The conversion to int is to ensure right ordering between the numbers, without you'll get 1, 10, 11, .., 2, 20, 21
A dictionary in Python does not have an order, you might want to try the OrderedDict class from the collections Module which remembers the order of insertion.
Of course you would have to sort and reinsert the elements whenever you insert a new element which should be placed before any of the existing elements.
If you care about order maybe a different data structure works better. For example a list of lists.

Add keys from dicts (in column) to new column

I have a DataFrame with a 'budgetYearMap' column, which has 1-3 key-value pairs for each record. I'm a bit stuck as to how I'm supposed to make a new column containing only the keys of the "budgetYearMap" column.
Sample data below:
df_sample = pd.DataFrame({'identifier': ['BBI-2016-D02', 'BBI-2016-D03', 'BBI-2016-D04', 'BBI-2016-D05', 'BBI-2016-D06'],
'callIdentifier': ['H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016'],
'budgetYearMap': [{'0': 188650000}, {'2017': 188650000}, {'2015': 188650000}, {'2014': 188650000}, {'2020': 188650000, '2014': 188650000, '2012': 188650000}]
})
First I tried to extract the keys by position, then make a list out of them and add the list to the dataframe. As some records contained multiple keys (I then found out), this approach failed.
all_keys = [i for s in [list(d.keys()) for d in df_sample.budgetYearMap] for i in s]
df_TD_selected['budgetYear'] = all_keys
My problem is that extracting the keys by "name" wouldn't work either, given that the names of the keys are variable, and I do not know the set of years in advance. The data set will keep growing. It can be either 0 or a year within the 2000 range now, but in the future more years will be added.
My desired output would be:
df_output = pd.DataFrame({'identifier': ['BBI-2016-D02', 'BBI-2016-D03', 'BBI-2016-D04', 'BBI-2016-D05', 'BBI-2016-D06'],
'callIdentifier': ['H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016'],
'Year': ['0', '2017', '2015', '2014', '2020, 2014, 2012']
})
Any idea how I should approach this?
Perfect pipeline use-case.
df = (
df_sample
.assign(Year = df_sample['budgetYearMap'].apply(lambda s: list(s.keys())))
.drop(columns = ['budgetYearMap'])
)
.assign creates a new column which takes the 'budgetYearMap' Series and applies the lambda function to it. This returns the dictionary's keys in a list. If you prefer a string (as in your desired output), simply replace the lambda function with
lambda s: ', '.join(list(s.keys()))

Can we to refer to a dictionary get a value from the key while replacing in Python?

I have a flat file with terms and sentences. If any term is found in the sentence, I need to append its id to the term (term|id). Pattern match should be case insensitive. Also, we need to retain the same case as in the sentence. Is it possible to refer to dictionary to get the value using it's key in a replace call?
from pandas import DataFrame
import re
df = {'id':[11,12,13,14,15,16],
'term': ['Ford', 'EXpensive', 'TOYOTA', 'Mercedes Benz', 'electric', 'cars'],
'sentence': ['F-FORD FORD/FORD is less expensive than Mercedes Benz.' ,'toyota, hyundai mileage is good compared to ford','tesla is an electric-car','toyota too has electric cars','CARS','CArs are expensive.']
}
#Dataframe creation
df = DataFrame(df,columns= ['id','term','sentence'])
#Dictionary creation
dict = {}
l_term = list(df['term'])
l_id = list(df['id'])
for i,j in zip(l_term,l_id):
dict[str(i)] = j
#Building patterns to replace
pattern = r'(?i)(?<!-)(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(df["term"],key=len,reverse=True))))
#Replace
df["sentence"].replace(pattern, r"\g<0>|present",, inplace=True,regex=True)
Instead of |present I need to refer to dictionary like |dict.get(\g<0>) or is there any other approach to achieve this? Also, if we found cars twice for 16,17. We can append either one.
The expected outcome is
F-FORD FORD|11/FORD|11 is less expensive|12 than Mercedes Benz|14.
toyota|13, hyundai mileage is good compared to ford|11
tesla is an electric|15-car
toyota|13 too has electric|15 cars|16
CARS|16
CArs|16 are expensive|12.
You may use a slight modification of the current code:
from pandas import DataFrame
import re
df = {'id':[11,12,13,14,15,16],
'term': ['Ford', 'EXpensive', 'TOYOTA', 'Mercedes Benz', 'electric', 'cars'],
'sentence': ['F-FORD FORD/FORD is less expensive than Mercedes Benz.' ,'toyota, hyundai mileage is good compared to ford','tesla is an electric-car','toyota too has electric cars','CARS','CArs are expensive.']
}
#Dataframe creation
df = DataFrame(df,columns= ['id','term','sentence'])
#Dictionary creation
dct = {}
l_term = list(df['term'])
l_id = list(df['id'])
for i,j in zip(l_term,l_id):
dct[str(i).upper()] = j
#Building patterns to replace
pattern = r'(?i)(?<!-)(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(df["term"],key=len,reverse=True))))
#Replace
df["sentence"]=df["sentence"].str.replace(pattern, lambda x: "{}|{}".format(x.group(),dct[x.group().upper()]))
NOTES:
dict is a reserved name, do not name variables dict, use dct
dct[str(i).upper()] = j - the uppercased key is added to the dictionary to enable case insensitive search by key in the dictionary
df["sentence"]=df["sentence"].str.replace(pattern, lambda x: "{}|{}".format(x.group(),dct[x.group().upper()])) is the main (last) line, it uses Series.str.replace that allows using a callable as the replacement argument and once the pattern matches, the match is passed to the lambda expression as x Match object where the value is retrieved with dct[x.group().upper()] and the whole match is accessed with x.group().

Django orm - how to get distinct items of field within query of distinct dates

I have the following models:
class Event(models.Model):
date = models.DateTimeField()
event_type = models.ForeignKey('EventType')
class EventType(models.Model):
name = models.CharField(unique=True)
I am trying to get a list of all dates, and what event types are available on that date.
Each item in the list would be a dictionary with two fields: date and event_types which would be a list of distinct event types available on that date.
Currently I have come up with a query to get me a list of all distinct dates, but this is only half of what I want to do:
query = Event.objects.all().select_related('event_type')
results = query.distinct('date').order_by('date').values_list('date', flat=True)
Now I can change this slightly to get me a list of all distinct date + event_type combinations:
query = Event.objects.all().select_related('event_type')
results = query.order_by('date').distinct('date', 'event_type').values_list('date', 'event_type__name')
But this will have an entry for each event type within a given date. I need to aggregate a list within each date.
Is there a way I can construct a queryset to do this? If not, how would I do this some other way to get to the same result?
You can perform such aggregate with the groupby function of itertools. It is a requirement that the elements appearch in "chunks" with respect to the "grouper criteria". But this is the case here, since you use order_by.
We can thus write it like:
from itertools import groupby
from operator import itemgetter
query = (Event.objects.all.select_related('event_type')
.order_by('date', 'event_type')
.distinct('date', 'event_type')
.values_list('date', 'event_type__name'))
result = [
{ 'date': k, 'datetypes': [v[1] for v in vs]}
for k, vs in groupby(query, itemgetter(0))
]
You also better use 'event_type' in the order by criterion.
This will result in something like:
[{'date': datetime.date(2018, 5, 19), 'datetypes': ['Famous person died',
'Royal wedding']},
{'date': datetime.date(2018, 5, 24), 'datetypes': ['Famous person died']},
{'date': datetime.date(2011, 5, 25), 'datetypes': ['Important law enforced',
'Referendum']}]
(based on quick Wikipedia scan of the last days in May).
The groupby works in linear time with the number of rows returned.

Categories

Resources