Renaming key and sub key in python nested dictionary with iterations - python

I am trying to rename the key and subkey in python nested dictionary. However, I haven't got the result that I expected yet. Below is the original nested key that I have.
nested_dict = {
0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}
}
I am trying to change the key and subkey to another value into this.
nested_dict = {
4: {4: 33.97, 5: 55.32, 6: 57.31, 7: 71.56},
5: {4: 27.31, 5: 23.32, 6: 32.25, 7: 60.21},
6: {4: 65.38, 5: 36.88, 6: 70.88, 7: 21.93},
7: {4: 35.44, 5: 21.21, 6: 40.72, 7: 51.35}
}
What I have in mind is renaming the key using a list. I have tried to replace the key and subkey with a list below:
new_key = []
for i in range(4,8):
new_key.append(i)
However, I still haven't got it. Another idea is using pandas DataFrame to rename both key and subkey. I am not sure whether using lists or pandas is suitable for the given problem.

Code for renaming a key from here:
mydict[new_key] = mydict.pop(old_key)

You could use a (nested) dict comprehension ([Python]: PEP 274 -- Dict Comprehensions). Note that it generates a new dictionary (but you can assign it to the old variable):
>>> from pprint import pprint as pp
>>>
>>> nested_dict = {
... 0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
... 1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
... 2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
... 3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}
... }
>>>
>>> pp(nested_dict)
{0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}}
>>>
>>> modified_nested_dict = {k0 + 4: {k1 + 4: v1 for k1, v1 in v0.items()} for k0, v0 in nested_dict.items()}
>>>
>>> pp(modified_nested_dict)
{4: {4: 33.97, 5: 55.32, 6: 57.31, 7: 71.56},
5: {4: 27.31, 5: 23.32, 6: 32.25, 7: 60.21},
6: {4: 65.38, 5: 36.88, 6: 70.88, 7: 21.93},
7: {4: 35.44, 5: 21.21, 6: 40.72, 7: 51.35}}

You can use Pandas Dataframe for the desired task, as follows:
import pandas as pd
nested_dict = {
0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}
}
print("Dictionary before renaming: ", nested_dict)
# Convert nested dictionary to Pandas Dataframe
my_dataframe = pd.DataFrame.from_dict(nested_dict)
new_keys = list(range(4, 8)) # List of new keys
my_dataframe.columns = new_keys # Set columns to the new keys
my_dataframe.set_index([new_keys], inplace=True) # Set index to the new keys
nested_dict = my_dataframe.to_dict() # Convert back to nested dictionary
print("Dictionary after renaming: ", nested_dict)
This gives you the following expected output:
Dictionary before renaming: {0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56}, 1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21}, 2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93}, 3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}}
Dictionary after renaming: {4: {4: 33.97, 5: 55.32, 6: 57.31, 7: 71.56}, 5: {4: 27.31, 5: 23.32, 6: 32.25, 7: 60.21}, 6: {4: 65.38, 5: 36.88, 6: 70.88, 7: 21.93}, 7: {4: 35.44, 5: 21.21, 6: 40.72, 7: 51.35}}

Related

How to Fix Code to Avoid Stubnames Error (Python Pandas)?

dt = {'id': {0: 'x1', 1: 'x2', 2: 'x3', 3: 'x4', 4: 'x5', 5: 'x6', 6: 'x7', 7: 'x8', 8: 'x9', 9: 'x10'}, 'trt': {0: 'cnt', 1: 'cnt', 2: 'tr', 3: 'tr', 4: 'tr', 5: 'cnt', 6: 'tr', 7: 'tr', 8: 'cnt', 9: 'cnt'}, 'work.T1': {0: 0.6516556669957936, 1: 0.567737752571702, 2: 0.1135089821182191, 3: 0.5959253052715212, 4: 0.3580499750096351, 5: 0.4288094183430075, 6: 0.0519033221062272, 7: 0.2641776674427092, 8: 0.3987907308619469, 9: 0.8361341434065253}, 'play.T1': {0: 0.8647212258074433, 1: 0.6153524168767035, 2: 0.7751098964363337, 3: 0.3555686913896352, 4: 0.4058499720413238, 5: 0.7066469138953835, 6: 0.8382876652758569, 7: 0.2395891312044114, 8: 0.7707715332508087, 9: 0.3558977444190532}, 'talk.T1': {0: 0.5355970377568156, 1: 0.0930881295353174, 2: 0.169803041499108, 3: 0.8998324507847428, 4: 0.4226376069709658, 5: 0.7477464678231627, 6: 0.8226525799836963, 7: 0.9546536463312804, 8: 0.6854445093777031, 9: 0.5005032296758145}, 'work.T2': {0: 0.2754838624969125, 1: 0.2289039448369294, 2: 0.0144339059479534, 3: 0.7289645625278354, 4: 0.2498804717324674, 5: 0.1611832766793668, 6: 0.0170426501426845, 7: 0.4861003451514989, 8: 0.1029001718852669, 9: 0.8015470046084374}, 'play.T2': {0: 0.3543280649464577, 1: 0.9364325392525644, 2: 0.2458663922734558, 3: 0.4731414613779634, 4: 0.191560871200636, 5: 0.5832219698932022, 6: 0.4594731898978352, 7: 0.467434047954157, 8: 0.3998325555585325, 9: 0.5052855962421745}, 'talk.T2': {0: 0.0318881559651345, 1: 0.1144675880204886, 2: 0.468935475917533, 3: 0.3969867376144975, 4: 0.8336191941052675, 5: 0.7611217433586717, 6: 0.5733564489055425, 7: 0.447508045937866, 8: 0.0838020080700516, 9: 0.2191385473124683}}
mydt = pd.DataFrame(dt, columns = ['id', 'trt', 'work.T1', '', 'play.T1', 'talk.T1','work.T2', '', 'play.T2', 'talk.T2'])
So I have the above dataset and need to tidy it up. I have used the following code but it returns "ValueError: stubname can't be identical to a column name." How can I fix the code to avoid this problem?
names = ['play', 'talk', 'work']
activities = pd.wide_to_long(dt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
activities
Note: I am trying to get the dataframe to look like the following.
Changed :
activities = pd.wide_to_long(activities, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
To:
activities = pd.wide_to_long(mydt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
and then it works.

Join pandas dataframes according to common index value only

I have the following dataframes (this is just test data), in real samples, I have index values that are repeated a few times inside dataframe 1 and dataframe 2 - this causes the repeated/duplicate rows inside final dataframe.
DataFrame 1:
pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
'first_name': {0: 'Jennee',
1: 'Dagny',
2: 'Correy',
3: 'Pall',
4: 'Julie',
5: 'Janene',
6: 'Lemmy',
7: 'Coleman',
8: 'Beck',
9: 'Che'},
'last_name': {0: 'Strelitzki',
1: 'Dunsire',
2: 'Wickrath',
3: 'Jopp',
4: 'Gheeraert',
5: 'Gawith',
6: 'Farrow',
7: 'Legging',
8: 'Beckwith',
9: 'Burgoin'},
'email': {0: 'jstrelitzki0#google.de',
1: 'ddunsire1#geocities.com',
2: 'cwickrath2#github.com',
3: 'pjopp3#infoseek.co.jp',
4: 'jgheeraert4#theatlantic.com',
5: 'jgawith5#sciencedirect.com',
6: 'lfarrow6#wikimedia.org',
7: 'clegging7#businessinsider.com',
8: 'bbeckwith8#zdnet.com',
9: 'cburgoin9#reference.com'},
'gender': {0: 'Male',
1: 'Female',
2: 'Female',
3: 'Female',
4: 'Female',
5: 'Female',
6: 'Male',
7: 'Female',
8: 'Polygender',
9: 'Male'},
'ip_address': {0: '8.99.68.120',
1: '188.238.129.48',
2: '87.159.243.249',
3: '66.37.174.94',
4: '233.77.128.104',
5: '190.202.131.98',
6: '84.175.231.196',
7: '140.178.100.5',
8: '81.211.179.167',
9: '31.219.69.206'},
'Boolean': {0: False,
1: False,
2: True,
3: True,
4: False,
5: True,
6: True,
7: False,
8: False,
9: False}})
DataFrame 2:
pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
'Model': {0: 2005,
1: 2007,
2: 2011,
3: 2003,
4: 1998,
5: 1992,
6: 1992,
7: 1992,
8: 2008,
9: 1996},
'Make': {0: 'Cadillac',
1: 'Lexus',
2: 'Dodge',
3: 'Dodge',
4: 'Oldsmobile',
5: 'Volkswagen',
6: 'Chevrolet',
7: 'Suzuki',
8: 'Ford',
9: 'Mazda'},
'Colour': {0: 'Red',
1: 'Red',
2: 'Crimson',
3: 'Red',
4: 'Purple',
5: 'Crimson',
6: 'Red',
7: 'Aquamarine',
8: 'Puce',
9: 'Maroon'}})
The two dataframes should be connected based on common Index values found in both dataframes only. Which means, any index values that don't match in those two dataframes; should not appear in the final combined/merged dataframe.
I want to ensure that the final dataframe is unique, and only captures combinations of columns, based on unique Index values.
When I try using the following code, the output is supposed to 'inner join' based on the unique index found in both dataframes.
final = pd.merge(df1, df2, left_index=True, right_index=True)
However, when I try applying the above merge technique on my larger (other) pandas dataframes, there are many rows being repeated/duplicated multiple times. When the merging happpens a few times with more dataframes, the rows gets repeated very frequently, with the same Index value.
I am expecting to see one Index value returned per row (with all the column combinations from each dataframe).
I am not sure why this happens. I can confirm that there is nothing wrong with the datasets.
Is there a better technique of merging those two dataframes, based on only common index values, and at the same time ensure that I don't repeat any rows (with the same index) in my final dataframe ? I often find that this merging often creates a giant final CSV file around 20GB in size too. The source files are only around 15MB into total.
Any help is much appreciated.
My end output should look like this (please copy and use this as Pandas DF):
pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
'first_name': {0: 'Jennee',
1: 'Dagny',
2: 'Correy',
3: 'Pall',
4: 'Julie',
5: 'Janene',
6: 'Lemmy',
7: 'Coleman',
8: 'Beck',
9: 'Che'},
'last_name': {0: 'Strelitzki',
1: 'Dunsire',
2: 'Wickrath',
3: 'Jopp',
4: 'Gheeraert',
5: 'Gawith',
6: 'Farrow',
7: 'Legging',
8: 'Beckwith',
9: 'Burgoin'},
'email': {0: 'jstrelitzki0#google.de',
1: 'ddunsire1#geocities.com',
2: 'cwickrath2#github.com',
3: 'pjopp3#infoseek.co.jp',
4: 'jgheeraert4#theatlantic.com',
5: 'jgawith5#sciencedirect.com',
6: 'lfarrow6#wikimedia.org',
7: 'clegging7#businessinsider.com',
8: 'bbeckwith8#zdnet.com',
9: 'cburgoin9#reference.com'},
'gender': {0: 'Male',
1: 'Female',
2: 'Female',
3: 'Female',
4: 'Female',
5: 'Female',
6: 'Male',
7: 'Female',
8: 'Polygender',
9: 'Male'},
'ip_address': {0: '8.99.68.120',
1: '188.238.129.48',
2: '87.159.243.249',
3: '66.37.174.94',
4: '233.77.128.104',
5: '190.202.131.98',
6: '84.175.231.196',
7: '140.178.100.5',
8: '81.211.179.167',
9: '31.219.69.206'},
'Boolean': {0: False,
1: False,
2: True,
3: True,
4: False,
5: True,
6: True,
7: False,
8: False,
9: False},
'Model': {0: 2005,
1: 2007,
2: 2011,
3: 2003,
4: 1998,
5: 1992,
6: 1992,
7: 1992,
8: 2008,
9: 1996},
'Make': {0: 'Cadillac',
1: 'Lexus',
2: 'Dodge',
3: 'Dodge',
4: 'Oldsmobile',
5: 'Volkswagen',
6: 'Chevrolet',
7: 'Suzuki',
8: 'Ford',
9: 'Mazda'},
'Colour': {0: 'Red',
1: 'Red',
2: 'Crimson',
3: 'Red',
4: 'Purple',
5: 'Crimson',
6: 'Red',
7: 'Aquamarine',
8: 'Puce',
9: 'Maroon'}})
This is expected behavior with non-unique idx values. Since you have 3 ID1 rows in one df and 2 ID1 in the other, you end up with 6 ID1 rows in your merged df. If you add validate="one_to_one" to pd.merge() you will get this Error. MergeError: Merge keys are not unique in either left or right dataset; not a one-to-one mergeAll other validations fail except for many to many.
If it makes sense for your data, you can use the left_on, and right_on parameters to find unique combinations and give you a one-to-one if that's what you're after.
Edit after your new data:
Now that you have unique ids, this should work for you. Notice it doesn't throw a validation error.
final = pd.merge(df1, df2, left_on=['id'], right_on=['id'], validate='one_to_one')

Pandas Loc string giving KeyError

I am trying to pass into a url a date in the format 2015-12-20, search the pandas dataframe and do a model.predict on it.
The problem is that I am trying to convert a working code from the jupyter lab into the .py file in order to run everything on the flask server and following I can not transfer.
The following code only works if the 'Date' column is converted to datetime. If it is in object format, the following code also doesn't work.
data.loc[2015-12-06]
The above works but the following gives an error:
data.loc['2015-12-06']
KeyError: '2015-12-06'
How do I pass in the 2015-12-06 not as string for the .loc to work?
print(data.head(5).to_dict())
{'Date': {0: '2015-12-27', 1: '2015-12-20', 2: '2015-12-13', 3: '2015-12-06', 4: '2015-11-29'}, 'Total Volume': {0: 64236.62, 1: 54876.98, 2: 118220.22, 3: 78992.15, 4: 51039.6}, '4046': {0: 1036.74, 1: 674.28, 2: 794.7, 3: 1132.0, 4: 941.48}, '4225': {0: 54454.85, 1: 44638.81, 2: 109149.67, 3: 71976.41, 4: 43838.39}, '4770': {0: 48.16, 1: 58.33, 2: 130.5, 3: 72.58, 4: 75.78}, 'Total Bags': {0: 8696.87, 1: 9505.56, 2: 8145.35, 3: 5811.16, 4: 6183.95}, 'Small Bags': {0: 8603.62, 1: 9408.07, 2: 8042.21, 3: 5677.4, 4: 5986.26}, 'Large Bags': {0: 93.25, 1: 97.49, 2: 103.14, 3: 133.76, 4: 197.69}, 'XLarge Bags': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'type': {0: 'conventional', 1: 'conventional', 2: 'conventional', 3: 'conventional', 4: 'conventional'}, 'year': {0: 2015, 1: 2015, 2: 2015, 3: 2015, 4: 2015}, 'region': {0: 'Albany', 1: 'Albany', 2: 'Albany', 3: 'Albany', 4: 'Albany'}}

Pandas export Excel to CSV - merg cell

I would like to save excel to CSV but make up the cells
My code:
import pandas as pd
import openpyxl
in_xls = 'excel01.xlsx'
sheet = 'Arkusz1'
with pd.ExcelFile(in_xls, engine="openpyxl") as ex:
excel = pd.read_excel(ex, sheet, header=None)
excel.to_csv('excel_out.csv', index=False)
My excel file:
enter link description here
as a dictionary use pd.DatFrame(d)
d = {0: {0: 'TEST\nPpp666', 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 1: {0: 39191012, 1: 39191012, 2: 39191012, 3: 39191012, 4: 39191012, 5: 39191012}, 2: {0: 5906194003016, 1: 5906194003023, 2: 5906194003030, 3: 5906194003054, 4: 5906194003085, 5: 5906194003115}, 3: {0: 'DN-113H181-0019018', 1: 'DN-113H182-0019018', 2: 'DN-113H183-0019018', 3: 'DN-113H185-0019018', 4: 'DN-113H188-0019018', 5: 'DN-113H18T-K019018'}, 4: {0: 'Pierwszy, drugi\nTrzeci\nCzwarty', 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 5: {0: 'czarny', 1: 'czerwony', 2: 'niebieski', 3: 'biały ', 4: 'żółty', 5: 'Tęcza 5 kolorów'}, 6: {0: 100, 1: 100, 2: 100, 3: 100, 4: 100, 5: 20}, 7: {0: '19mmx18m', 1: '19mmx18m', 2: '19mmx18m', 3: '19mmx18m', 4: '19mmx18m', 5: '19mmx18m'}, 8: {0: '5,80', 1: '5,80', 2: '5,80', 3: '5,80', 4: '5,80', 5: '29,40'}}
My output file csv:
0,1,2,3,4,5,6,7,8
"TEST
Ppp666",39191012,5906194003016,DN-113H181-0019018,"Pierwszy, drugi
Trzeci
Czwarty",czarny,100,19mmx18m,5.8
,39191012,5906194003023,DN-113H182-0019018,,czerwony,100,19mmx18m,5.8
,39191012,5906194003030,DN-113H183-0019018,,niebieski,100,19mmx18m,5.8
,39191012,5906194003054,DN-113H185-0019018,,biały ,100,19mmx18m,5.8
,39191012,5906194003085,DN-113H188-0019018,,żółty,100,19mmx18m,5.8
,39191012,5906194003115,DN-113H18T-K019018,,Tęcza 5 kolorów,20,19mmx18m,29.4
I would like to get a CSV file

Python - How can I plot a line graph properly with a dictionary?

I am trying to plot a line graph to show the trends of each key of a dictionary in Jupyter Notebook with Python. This is what I have in the k_rmse_values variable as shown below:
k_rmse_values =
{'bore': {1: 8423.759328233446,
3: 6501.928933614838,
5: 6807.187615513473,
7: 6900.29659028346,
9: 7134.8868708101645},
'city-mpg': {1: 4265.365592771621,
3: 3865.0178306330113,
5: 3720.409335758634,
7: 3819.183283405616,
9: 4219.677972675927},
'compression-rate': {1: 7016.906657495168,
3: 7319.354017489066,
5: 6301.624922763969,
7: 6133.006310754547,
9: 6417.253959732598},
'curb-weight': {1: 3950.9888180049306,
3: 4201.343428000144,
5: 4047.052502155118,
7: 3842.0974736649846,
9: 3943.9478256384205},
'engine-size': {1: 2853.7338453331627,
3: 2793.6254775629623,
5: 3123.320055069605,
7: 2941.73029681235,
9: 2931.996240628853},
'height': {1: 6330.178232877807,
3: 7049.500497198366,
5: 6869.570862695864,
7: 6738.641089739572,
9: 6344.062937760911},
'highway-mpg': {1: 4826.0580187146525,
3: 3510.253629329685,
5: 3379.2250123364083,
7: 4044.271135312068,
9: 4462.027046251678},
'horsepower': {1: 3623.6389886411143,
3: 4294.825669466819,
5: 4778.254807521257,
7: 4730.538701514935,
9: 4662.8601512508885},
'length': {1: 4952.798701744297,
3: 5403.624431188139,
5: 5500.731909846179,
7: 5103.4515274528885,
9: 4471.077661709427},
'normalized-losses': {1: 9604.929081466453,
3: 7494.820436511842,
5: 6391.912634697067,
7: 6699.853883298577,
9: 6861.6389834002875},
'peak-rpm': {1: 8041.2366213164005,
3: 7502.080095843049,
5: 6521.863037752326,
7: 6869.602542315512,
9: 6884.533017667794},
'stroke': {1: 10330.231237489314,
3: 8947.585146097614,
5: 6973.912792744113,
7: 7266.333478250421,
9: 7026.017456146411},
'wheel-base': {1: 2797.4144312203725,
3: 3392.8627620671928,
5: 4238.25624378706,
7: 4456.687059524217,
9: 4426.032222634904},
'width': {1: 2849.2691940215127,
3: 4076.59327053035,
5: 3979.9751617315405,
7: 3845.3326184519606,
9: 3687.926625900343}}
When I used this code to plot
for k,v in k_rmse_values.items():
x = list(v.keys())
y = list(v.values())
plt.plot(x,y)
plt.xlabel('k value')
plt.ylabel('RMSE')
and it doesn't plot from 1 to 9 in order; it gives this graph
it plots in this k-value order 1, 3, 9 , 5, 7
I have spent hours on this problem and still can't figure out a way to do it. Your help with this would be greatly appreciated.
One solution is to sort the keys and get the matching values:
for k,v in k_rmse_values.items():
xs = list(v.keys()).sort()
ys = [v[x] for x in xs]
# Note I renamed these arrays so following uses should be changed accordingly

Categories

Resources