sorting a nested dictionary by 3 criteria - python

I have a nested dictionary and I want to sort it by 3 fields, firstly by points secondly by wins(if points are equal) and lastly by alphabet if both points and wins are equal.
My dictionary:
{'Iran': {'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1},
'Morocco': {'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1},
'Portugal': {'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1},
'Spain': {'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1}}
My code:
sort_data=sorted(dic.keys(),key=lambda x:(-dic[x]["points"],dic[x]["wins"]))
for items in sort_data:
x=dic[items]
print(items," ",str(x).replace("'", "").replace("{","").replace("}", ""))
my code dose not work in this case and dose not sort it by alphabet when all the situations are equal.Could you help me?

I suggest using dic.items() rather than dic.keys(), although that's mostly a matter of taste.
def sorted_dict(dic):
def key(x):
k,v = x
return (v['points'], v['wins'], k)
return dict(sorted(dic.items(), key=key))
dic1 = {'Iran': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}, 'Morocco': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}, 'Portugal': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}, 'Spain': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}}
dic2 = sorted_dict(dic1)
print(dic2)
# {'Iran': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1},
# 'Morocco': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1},
# 'Portugal': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1},
# 'Spain': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}}
Note that in your dictionary, all countries have the same number of points and wins, and the countries are already sorted in alphabetical order, so this is a terrible example to test whether the sort worked correctly or not.

You could use dict.items with a lambda instead:
import json # For pretty printing only.
table = {
'Iran': {
'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1
},
'Morocco': {
'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1
},
'Portugal': {
'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1
},
'Spain': {
'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1
}
}
sorted_table = dict(
sorted(table.items(),
key=lambda kvp: (-kvp[1]['points'], -kvp[1]['wins'], kvp[0])))
print(json.dumps(sorted_table, indent=4))
Output:
{
"Iran": {
"draws": 1,
"goal difference": 0,
"loses": 1,
"points": 4,
"wins": 1
},
"Morocco": {
"draws": 1,
"goal difference": 0,
"loses": 1,
"points": 4,
"wins": 1
},
"Portugal": {
"draws": 1,
"goal difference": 0,
"loses": 1,
"points": 4,
"wins": 1
},
"Spain": {
"draws": 1,
"goal difference": 0,
"loses": 1,
"points": 4,
"wins": 1
}
}
Note: No change from input is because input is already sorted...

Related

Pandas, get first and last column index for row value

I have the following dataframe:
columns = pd.date_range(start="2022-05-21", end="2022-06-30")
data = [
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
]
df = pd.DataFrame(data, columns=columns)
2022-05-21 2022-05-22 2022-05-23 ... 2022-06-28 2022-06-29 2022-06-30
0 0 0 0 ... 5 5 5
1 5 5 5 ... 1 1 1
2 5 5 5 ... 5 5 5
I have to take the first and last column index for every distinct value in the order they are. The correct output for this dataframe will be:
[
[
{'value': 0, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}
],
[
{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 1, 'start': '2022-06-20', 'end': '2022-06-30'}
],
[
{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}
]
]
My best approach for the moment is:
series_set = df.apply(frozenset, axis=1)
container = []
for index in range(len(df.index)):
row = df.iloc[[index]]
values = series_set.iloc[[index]]
inner_container = []
for value in values[index]:
single_value_series = row[row.columns[row.isin([value]).all()]]
dates = single_value_series.columns
result = dict(value=value, start=dates[0].strftime("%Y-%m-%d"), end=dates[-1].strftime("%Y-%m-%d"))
inner_container.append(result)
container.append(inner_container)
The result is:
[
[
{'value': 0, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}
],
[
{'value': 1, 'start': '2022-06-20', 'end': '2022-06-30'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'}
],
[
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-05-21', 'end': '2022-06-30'}
]
]
It has several problems, only the first array is correct :)
When I convert dataframe to frozenset it is sorted and order is changed and also if some value appears more than once it is removed.
I will appreciate any idea and guidance. What I want to avoid is iterating the dataframe.
Thank you!
You can first transpose DataFrame by DataFrame.T and then aggregate minimal and maximal index with convert values to strings by Series.dt.strftime, last convert to dictionaries by DataFrame.to_dict.
For get consecutive groups is compared shifted values with Series.cumsum.
df1 = df.T.reset_index()
L = [df1.groupby(df1[x].ne(df1[x].shift()).cumsum())
.agg(value=(x, 'first'),
start=('index', 'min'),
end=('index', 'max'))
.assign(start=lambda x: x['start'].dt.strftime('%Y-%m-%d'),
end=lambda x: x['end'].dt.strftime('%Y-%m-%d'))
.to_dict(orient='records') for x in df1.columns.drop('index')]
print (L)
[[{'value': 0, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}],
[{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 1, 'start': '2022-06-20', 'end': '2022-06-30'}],
[{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}]]

Iterate through the list

The transaction csv looks like this and I add them to list as shown below.
Bread Milk
Bread Diapers Beer Eggs Beer
[{'Bread': 1, 'Milk': 1, '': 7}, {'Bread': 1, 'Diapers': 1, 'Beer': 6, 'Eggs': 1}, {'Milk': 1, 'Diapers': 1, 'Beer': 6, 'Cola': 1}, {'Bread': 1, 'Milk': 1, 'Diapers': 1, 'Beer': 6}, {'Bread': 1, 'Milk': 1, 'Diapers': 2, 'Cola': 1, 'Chips': 2, 'Beer': 1, '': 1}, {'Bread': 1, 'Milk': 1, '': 7}, {'Bread': 1, 'Cola': 1, 'Beer': 3, 'Milk': 1, 'Chips': 1, 'Diapers': 3, '': 1}, {'Milk': 1, 'Bread': 1, 'Beer': 4, 'Cola': 1, 'Diapers': 1, 'Chips': 1}, {'Bread': 1, 'Milk': 2, 'Diapers': 2, 'Beer': 2, 'Chips': 2}, {'Bread': 2, 'Beer': 3, 'Diapers': 3, 'Milk': 1}]
I would like to consider only the list which contains the count 3 Diapers.
I would expect the transactions to return only as shown below:
{'Bread': 2, 'Beer': 3, 'Diapers': 3, 'Milk': 1}
{'Bread': 1, 'Cola': 1, 'Beer': 3, 'Milk': 1, 'Chips': 1, 'Diapers': 3, '': 1}
{'Bread', 'Beer', 'Diapers', 'Milk'}
{'Bread', 'Cola', 'Beer', 'Milk', 'Chips', 'Diapers', ''}
The code i have is:
def M():
li = []
# Open the csv file
with open('transaction.csv') as fp:
DataCaptured = csv.reader(fp, delimiter=',')
# Iterate through each word in csv and add it's counter to the row
for row in DataCaptured:
li.append(dict(Counter(row)))
if li['Diaper']==3: ---> I am missing this logic not sure how to get it.
# Return the list of counters
return li
print(M())
li=[{'Bread': 1, 'Milk': 1, '': 7}, {'Bread': 1, 'Diapers': 1, 'Beer': 6, 'Eggs': 1}, {'Milk': 1, 'Diapers': 1, 'Beer': 6, 'Cola': 1}, {'Bread': 1, 'Milk': 1, 'Diapers': 1, 'Beer': 6}, {'Bread': 1, 'Milk': 1, 'Diapers': 2, 'Cola': 1, 'Chips': 2, 'Beer': 1, '': 1}, {'Bread': 1, 'Milk': 1, '': 7}, {'Bread': 1, 'Cola': 1, 'Beer': 3, 'Milk': 1, 'Chips': 1, 'Diapers': 3, '': 1}, {'Milk': 1, 'Bread': 1, 'Beer': 4, 'Cola': 1, 'Diapers': 1, 'Chips': 1}, {'Bread': 1, 'Milk': 2, 'Diapers': 2, 'Beer': 2, 'Chips': 2}, {'Bread': 2, 'Beer': 3, 'Diapers': 3, 'Milk': 1}]
for d in li:
if 'Diapers' in d and d['Diapers']==3:
print(d)
OUTPUT:
{'Bread': 1, 'Cola': 1, 'Beer': 3, 'Milk': 1, 'Chips': 1, 'Diapers': 3, '': 1}
{'Bread': 2, 'Beer': 3, 'Diapers': 3, 'Milk': 1}

pandas - pd.replace and TypeError

I have all_data dataframe. I want to replace some categorical values in certain columns with numerical values. I'm trying to use this nested dictionary notation (I've checked that the brackets and curly brackets are in place, I don't think that's the issue):
all_data = all_data.replace({'Street': {'Pave': 1, 'Grvl': 0}},
{'LotShape': {'IR3': 1, 'IR2': 2, 'IR1': 3, 'Reg': 4}},
{'Utilities': {'ELO': 0, 'NoSeWa': 0, 'NoSewr': 0, 'AllPub': 1}},
{'LandSlope': {'Sev': 1, 'Mod': 2, 'Gtl': 3}},
{'ExterQual': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}},
{'ExterCond': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}},
{'BsmtQual': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4,'Ex': 5}},
{'BsmtCond': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4,'Ex': 5}},
{'BsmtExposure': {'NA': 0, 'No': 1, 'Mn': 2, 'Av': 3, 'Gd': 4}},
{'BsmtFinType1': {'NA': 0, 'Unf': 1, 'LwQ': 2, 'Rec': 3, 'BLQ': 4, 'ALQ': 5, 'GLQ': 6}},
{'BsmtFinType2': {'NA': 0, 'Unf': 1,'LwQ': 2,'Rec': 3, 'BLQ': 4,'ALQ': 5, 'GLQ': 6}},
{'HeatingQC': {'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'CentralAir': {'No': 0,'Yes': 1}},
{'KitchenQual': {'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'Functional': {'Sal': -7,'Sev': -6,'Maj1': -5,'Maj2': -4,'Mod': -3,'Min2': -2,'Min1': -1,
'Typ': 0}},
{'FireplaceQu': {'NA': 0,'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'GarageFinish': {'NA': 0,'Unf': 1,'RFn': 2, 'Fin': 3}},
{'GarageQual': {'NA': 0, 'Po': 1,'Fa': 2, 'TA': 3,'Gd': 4, 'Ex': 5}},
{'GarageCond': {'NA': 0,'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'PavedDrive': {'N': 0,'P': 0, 'Y': 1}},
{'Fence': {'NA': 0, 'MnWw': 1,'GdWo': 2,'MnPrv': 3,'GdPrv': 4}},
{'SaleCondition': {'Abnorml': 1, 'Alloca': 1, 'AdjLand': 1, 'Family': 1, 'Normal': 0,
'Partial': 0}}
)
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-f9c9c28b7237> in <module>()
22 {'Fence': {'NA': 0, 'MnWw': 1,'GdWo': 2,'MnPrv': 3,'GdPrv': 4}},
23 {'SaleCondition': {'Abnorml': 1, 'Alloca': 1, 'AdjLand': 1, 'Family': 1, 'Normal': 0,
---> 24 'Partial': 0}}
25 )
TypeError: replace() takes from 1 to 8 positional arguments but 23 were given
If I remove the 'SaleCondition' row from the above code, the error is again there but this time referring to 'Fence', and so on, for each line of code from bottom up. I've googled but have no idea what this means. Help MUCH appreciated.
You should do something like :
df.replace({'Fence':{'NA': 0, 'MnWw': 1,'GdWo': 2,'MnPrv': 3,'GdPrv': 4},'SaleCondition':{'Abnorml': 1, 'Alloca': 1, 'AdjLand': 1, 'Family': 1, 'Normal': 0,
'Partial': 0}})
the format should be .replace({'col1':{},'col2':{}}) not .replace({'col1':{}},{'col2':{}})

Python sort multi dimensional dict

input={11: {'perc': 0, 'name': u'B test', 'cid': 11, 'total': 0, 'pending': 0, 'complete': 0}, 10: {'perc': 0, 'name': u'C test', 'cid': 10, 'total': 0, 'pending': 0,'complete': 0}, 3: {'perc': 9, 'name': u'Atest Pre-requisites', 'cid': 3, 'total': 11, 'pending': 10, 'complete': 1}}
I want to sort this dict based on name field. I'm new in python, anyone please help me.
First, you should avoid using reserved words (such as input) as variables (now input is redefined and no longer calls the function input()).
Also, a dictionary cannot be sorted. If you don't need the keys, you can transform the dictionary into a list, and then sort it. The code would be like this:
input_dict = {11: {'perc': 0, 'name': u'B test', 'cid': 11, 'total': 0, 'pending': 0, 'complete': 0}, 10: {'perc': 0, 'name': u'C test', 'cid': 10, 'total': 0, 'pending': 0,'complete': 0}, 3: {'perc': 9, 'name': u'Atest Pre-requisites', 'cid': 3, 'total': 11, 'pending': 10, 'complete': 1}}
input_list = sorted(input_dict.values(), key=lambda x: x['name'])
print(input_list)
# prints [{'perc': 9, 'complete': 1, 'cid': 3, 'total': 11, 'pending': 10, 'name': u'Atest Pre-requisites'}, {'perc': 0, 'complete': 0, 'cid': 11, 'total': 0, 'pending': 0, 'name': u'B test'}, {'perc': 0, 'complete': 0, 'cid': 10, 'total': 0, 'pending': 0, 'name': u'C test'}]
EDIT
If you wish to keep the keys and use iteritems() as you said in the comments, use this code instead:
input_dict = {11: {'perc': 0, 'name': u'B test', 'cid': 11, 'total': 0, 'pending': 0, 'complete': 0}, 10: {'perc': 0, 'name': u'C test', 'cid': 10, 'total': 0, 'pending': 0,'complete': 0}, 3: {'perc': 9, 'name': u'Atest Pre-requisites', 'cid': 3, 'total': 11, 'pending': 10, 'complete': 1}}
input_list = sorted(input_dict.iteritems(), key=lambda x: x[1]['name'])
print(input_list)
# prints [(3, {'perc': 9, 'complete': 1, 'cid': 3, 'total': 11, 'pending': 10, 'name': u'Atest Pre-requisites'}), (11, {'perc': 0, 'complete': 0, 'cid': 11, 'total': 0, 'pending': 0, 'name': u'B test'}), (10, {'perc': 0, 'complete': 0, 'cid': 10, 'total': 0, 'pending': 0, 'name': u'C test'})]

Specific Sort of elements to add in new list Python/Django

I have this :
[
[{ 'position': 1, 'user_id': 2, 'value': 4, 'points': 100}],
[{ 'position': 2, 'user_id': 6, 'value': 3, 'points': 88}],
[{ 'position': 3, 'user_id': 5, 'value': 2, 'points': 77}],
[{ 'position': 4, 'user_id': 7, 'value': 1, 'points': 66}],
[{ 'position': 5, 'user_id': 3, 'value': 1, 'points': 9}],
[{ 'position': 6, 'user_id': 11, 'value': 0, 'points': 9}],
[{ 'position': 7, 'user_id': 1, 'value': 0, 'points': 3}],
[{ 'position': 8, 'user_id': 10, 'value': 0, 'points': 3}],
[{ 'position': 9, 'user_id': 4, 'value': 0, 'points': 2}],
[{ 'position': 10, 'user_id': 8, 'value': 0, 'points': 2}]
]
is organized by points.
The idea is to choose the user_id and generate a new list with the selected 5 users.
Example:
user_id=3:
[{ 'position': 3, 'user_id': 5, 'value': 2, 'points': 77}],
[{ 'position': 4, 'user_id': 7, 'value': 1, 'points': 66}],
[{ 'position': 5, 'user_id': 3, 'value': 1, 'points': 9}],
[{ 'position': 6, 'user_id': 11, 'value': 0, 'points': 9}],
[{ 'position': 7, 'user_id': 1, 'value': 0, 'points': 3}]
It returns user_id 3 in the middle with 2 users hight and 2 users lower
user_id=2
[{ 'position': 1, 'user_id': 2, 'value': 4, 'points': 100}],
[{ 'position': 2, 'user_id': 6, 'value': 3, 'points': 88}],
[{ 'position': 3, 'user_id': 5, 'value': 2, 'points': 77}],
[{ 'position': 4, 'user_id': 7, 'value': 1, 'points': 66}],
[{ 'position': 5, 'user_id': 3, 'value': 1, 'points': 9}],
As user_id hasn't higher users it returns 4 lower users. So is always same logic.
user_id=9:
[{ 'position': 6, 'user_id': 11, 'value': 0, 'points': 9}],
[{ 'position': 7, 'user_id': 1, 'value': 0, 'points': 3}],
[{ 'position': 8, 'user_id': 10, 'value': 0, 'points': 3}],
[{ 'position': 9, 'user_id': 4, 'value': 0, 'points': 2}],
[{ 'position': 10, 'user_id': 8, 'value': 0, 'points': 2}]
on user_id=9 We only have 1 user lower so we add 3 higher users
If for example we just have 2 users in list, it should return that 2 users.
Main rules:
If we have 5 users or more, as to return 5 users.
if we have 4 users, as to return 4 users
How is a good way to do it?
thanks
This is basically only an update of my answer to your original question.
a = [
[{ 'position': 1, 'user_id': 2, 'value': 4, 'points': 100}],
[{ 'position': 2, 'user_id': 6, 'value': 3, 'points': 88}],
[{ 'position': 3, 'user_id': 5, 'value': 2, 'points': 77}],
[{ 'position': 4, 'user_id': 7, 'value': 1, 'points': 66}],
[{ 'position': 5, 'user_id': 3, 'value': 1, 'points': 9}],
[{ 'position': 6, 'user_id': 11, 'value': 0, 'points': 9}],
[{ 'position': 7, 'user_id': 1, 'value': 0, 'points': 3}],
[{ 'position': 8, 'user_id': 10, 'value': 0, 'points': 3}],
[{ 'position': 9, 'user_id': 4, 'value': 0, 'points': 2}],
[{ 'position': 10, 'user_id': 8, 'value': 0, 'points': 2}]
]
# Sort it if not already sorted
# a.sort(key=lambda x: x[0]['position'])
def find_index(l, user_id):
i = 0
while l[i][0]['user_id'] != user_id:
i += 1
return i
def get_subset(l, i):
return l[:(i + 1 + max(2, 4 - i))][-5:]
get_subset(a, find_index(a, 3))

Categories

Resources