Python - How can I plot a line graph properly with a dictionary? - python

I am trying to plot a line graph to show the trends of each key of a dictionary in Jupyter Notebook with Python. This is what I have in the k_rmse_values variable as shown below:
k_rmse_values =
{'bore': {1: 8423.759328233446,
3: 6501.928933614838,
5: 6807.187615513473,
7: 6900.29659028346,
9: 7134.8868708101645},
'city-mpg': {1: 4265.365592771621,
3: 3865.0178306330113,
5: 3720.409335758634,
7: 3819.183283405616,
9: 4219.677972675927},
'compression-rate': {1: 7016.906657495168,
3: 7319.354017489066,
5: 6301.624922763969,
7: 6133.006310754547,
9: 6417.253959732598},
'curb-weight': {1: 3950.9888180049306,
3: 4201.343428000144,
5: 4047.052502155118,
7: 3842.0974736649846,
9: 3943.9478256384205},
'engine-size': {1: 2853.7338453331627,
3: 2793.6254775629623,
5: 3123.320055069605,
7: 2941.73029681235,
9: 2931.996240628853},
'height': {1: 6330.178232877807,
3: 7049.500497198366,
5: 6869.570862695864,
7: 6738.641089739572,
9: 6344.062937760911},
'highway-mpg': {1: 4826.0580187146525,
3: 3510.253629329685,
5: 3379.2250123364083,
7: 4044.271135312068,
9: 4462.027046251678},
'horsepower': {1: 3623.6389886411143,
3: 4294.825669466819,
5: 4778.254807521257,
7: 4730.538701514935,
9: 4662.8601512508885},
'length': {1: 4952.798701744297,
3: 5403.624431188139,
5: 5500.731909846179,
7: 5103.4515274528885,
9: 4471.077661709427},
'normalized-losses': {1: 9604.929081466453,
3: 7494.820436511842,
5: 6391.912634697067,
7: 6699.853883298577,
9: 6861.6389834002875},
'peak-rpm': {1: 8041.2366213164005,
3: 7502.080095843049,
5: 6521.863037752326,
7: 6869.602542315512,
9: 6884.533017667794},
'stroke': {1: 10330.231237489314,
3: 8947.585146097614,
5: 6973.912792744113,
7: 7266.333478250421,
9: 7026.017456146411},
'wheel-base': {1: 2797.4144312203725,
3: 3392.8627620671928,
5: 4238.25624378706,
7: 4456.687059524217,
9: 4426.032222634904},
'width': {1: 2849.2691940215127,
3: 4076.59327053035,
5: 3979.9751617315405,
7: 3845.3326184519606,
9: 3687.926625900343}}
When I used this code to plot
for k,v in k_rmse_values.items():
x = list(v.keys())
y = list(v.values())
plt.plot(x,y)
plt.xlabel('k value')
plt.ylabel('RMSE')
and it doesn't plot from 1 to 9 in order; it gives this graph
it plots in this k-value order 1, 3, 9 , 5, 7
I have spent hours on this problem and still can't figure out a way to do it. Your help with this would be greatly appreciated.

One solution is to sort the keys and get the matching values:
for k,v in k_rmse_values.items():
xs = list(v.keys()).sort()
ys = [v[x] for x in xs]
# Note I renamed these arrays so following uses should be changed accordingly

Related

How to Fix Code to Avoid Stubnames Error (Python Pandas)?

dt = {'id': {0: 'x1', 1: 'x2', 2: 'x3', 3: 'x4', 4: 'x5', 5: 'x6', 6: 'x7', 7: 'x8', 8: 'x9', 9: 'x10'}, 'trt': {0: 'cnt', 1: 'cnt', 2: 'tr', 3: 'tr', 4: 'tr', 5: 'cnt', 6: 'tr', 7: 'tr', 8: 'cnt', 9: 'cnt'}, 'work.T1': {0: 0.6516556669957936, 1: 0.567737752571702, 2: 0.1135089821182191, 3: 0.5959253052715212, 4: 0.3580499750096351, 5: 0.4288094183430075, 6: 0.0519033221062272, 7: 0.2641776674427092, 8: 0.3987907308619469, 9: 0.8361341434065253}, 'play.T1': {0: 0.8647212258074433, 1: 0.6153524168767035, 2: 0.7751098964363337, 3: 0.3555686913896352, 4: 0.4058499720413238, 5: 0.7066469138953835, 6: 0.8382876652758569, 7: 0.2395891312044114, 8: 0.7707715332508087, 9: 0.3558977444190532}, 'talk.T1': {0: 0.5355970377568156, 1: 0.0930881295353174, 2: 0.169803041499108, 3: 0.8998324507847428, 4: 0.4226376069709658, 5: 0.7477464678231627, 6: 0.8226525799836963, 7: 0.9546536463312804, 8: 0.6854445093777031, 9: 0.5005032296758145}, 'work.T2': {0: 0.2754838624969125, 1: 0.2289039448369294, 2: 0.0144339059479534, 3: 0.7289645625278354, 4: 0.2498804717324674, 5: 0.1611832766793668, 6: 0.0170426501426845, 7: 0.4861003451514989, 8: 0.1029001718852669, 9: 0.8015470046084374}, 'play.T2': {0: 0.3543280649464577, 1: 0.9364325392525644, 2: 0.2458663922734558, 3: 0.4731414613779634, 4: 0.191560871200636, 5: 0.5832219698932022, 6: 0.4594731898978352, 7: 0.467434047954157, 8: 0.3998325555585325, 9: 0.5052855962421745}, 'talk.T2': {0: 0.0318881559651345, 1: 0.1144675880204886, 2: 0.468935475917533, 3: 0.3969867376144975, 4: 0.8336191941052675, 5: 0.7611217433586717, 6: 0.5733564489055425, 7: 0.447508045937866, 8: 0.0838020080700516, 9: 0.2191385473124683}}
mydt = pd.DataFrame(dt, columns = ['id', 'trt', 'work.T1', '', 'play.T1', 'talk.T1','work.T2', '', 'play.T2', 'talk.T2'])
So I have the above dataset and need to tidy it up. I have used the following code but it returns "ValueError: stubname can't be identical to a column name." How can I fix the code to avoid this problem?
names = ['play', 'talk', 'work']
activities = pd.wide_to_long(dt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
activities
Note: I am trying to get the dataframe to look like the following.
Changed :
activities = pd.wide_to_long(activities, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
To:
activities = pd.wide_to_long(mydt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
and then it works.

Multiple plotting from multi-index dataframe

I want to plot by Step Typ: Traction and Stribeck. The different load stages should have his own plot. At the respective load level, the line plots should be broken down by temperature. y-axis is Traction (-) and x-axis the respective counterpart SRR (%) or Rolling speed (mm/s) (for Traction and Stribeck respectively). At the end, I should have four different plots.
Example, how it should look like:
My attempt so far, which leads to an empty plot.
import pandas as pd
import matplotlib.pyplot as plt
data = {'Step 1': {'Step Typ': 'Traction', 'SRR (%)': {1: 8.384, 2: 9.815, 3: 7.531, 4: 10.209, 5: 7.989, 6: 7.331, 7: 5.008, 8: 2.716, 9: 9.6, 10: 7.911}, 'Traction (-)': {1: 5.602, 2: 6.04, 3: 2.631, 4: 2.952, 5: 8.162, 6: 9.312, 7: 4.994, 8: 2.959, 9: 10.075, 10: 5.498}, 'Temperature': 30, 'Load': 40}, 'Step 3': {'Step Typ': 'Traction', 'SRR (%)': {1: 2.909, 2: 5.552, 3: 5.656, 4: 9.043, 5: 3.424, 6: 7.382, 7: 3.916, 8: 2.665, 9: 4.832, 10: 3.993}, 'Traction (-)': {1: 9.158, 2: 6.721, 3: 7.787, 4: 7.491, 5: 8.267, 6: 2.985, 7: 5.882, 8: 3.591, 9: 6.334, 10: 10.43}, 'Temperature': 80, 'Load': 40}, 'Step 5': {'Step Typ': 'Traction', 'SRR (%)': {1: 4.765, 2: 9.293, 3: 7.608, 4: 7.371, 5: 4.87, 6: 4.832, 7: 6.244, 8: 6.488, 9: 5.04, 10: 2.962}, 'Traction (-)': {1: 6.656, 2: 7.872, 3: 8.799, 4: 7.9, 5: 4.22, 6: 6.288, 7: 7.439, 8: 7.77, 9: 5.977, 10: 9.395}, 'Temperature': 30, 'Load': 70}, 'Step 7': {'Step Typ': 'Traction', 'SRR (%)': {1: 9.46, 2: 2.83, 3: 3.249, 4: 9.273, 5: 8.792, 6: 9.673, 7: 6.784, 8: 3.838, 9: 8.779, 10: 4.82}, 'Traction (-)': {1: 5.245, 2: 8.491, 3: 10.088, 4: 9.988, 5: 4.886, 6: 4.168, 7: 8.628, 8: 5.038, 9: 7.712, 10: 3.961}, 'Temperature': 80, 'Load': 70}, 'Step 2': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.862, 2: 4.71, 3: 4.537, 4: 6.35, 5: 6.691, 6: 5.337, 7: 8.419, 8: 10.303, 9: 5.018, 10: 10.195}, 'Traction (-)': {1: 6.674, 2: 10.137, 3: 2.822, 4: 5.494, 5: 9.986, 6: 9.095, 7: 3.53, 8: 6.96, 9: 8.251, 10: 7.836}, 'Temperature': 30, 'Load': 40}, 'Step 4': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.04, 2: 8.288, 3: 3.731, 4: 10.137, 5: 5.32, 6: 8.504, 7: 5.917, 8: 9.677, 9: 8.641, 10: 7.685}, 'Traction (-)': {1: 9.522, 2: 4.749, 3: 3.46, 4: 3.21, 5: 5.005, 6: 9.886, 7: 8.023, 8: 5.935, 9: 8.74, 10: 5.117}, 'Temperature': 80, 'Load': 40}, 'Step 6': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 6.244, 2: 7.015, 3: 5.998, 4: 4.894, 5: 6.117, 6: 6.644, 7: 7.619, 8: 10.477, 9: 9.61, 10: 2.958}, 'Traction (-)': {1: 7.353, 2: 7.98, 3: 6.675, 4: 8.853, 5: 7.537, 6: 5.256, 7: 4.923, 8: 10.293, 9: 2.873, 10: 10.407}, 'Temperature': 30, 'Load': 70}, 'Step 8': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 3.475, 2: 2.756, 3: 7.809, 4: 9.449, 5: 2.72, 6: 4.133, 7: 10.139, 8: 10.0, 9: 3.71, 10: 8.267}, 'Traction (-)': {1: 6.307, 2: 2.83, 3: 9.258, 4: 3.405, 5: 9.659, 6: 6.662, 7: 6.413, 8: 6.488, 9: 7.972, 10: 6.288}, 'Temperature': 80, 'Load': 70} }
df = pd.DataFrame(data)
items = list()
series = list()
for item, d in data.items():
items.append(item)
series.append(pd.DataFrame.from_dict(d))
df = pd.concat(series, keys=items)
df.set_index(['Step Typ', 'Load', 'Temperature'], inplace=True)
df.loc[('Stribeck')]
for force, _ in df.groupby(level=1):
plt.figure(figsize=(15, 12))
for i, row in df.loc[('Traction'), force].iterrows():
plt.ylim(0, 0.1)
plt.ylabel('Traction Coeff (-)')
plt.xlabel('Rolling Speed (mm/s)')
plt.title('Title comes later', loc='left')
plt.plot(row['Rolling Speed (mm/s)'], row['Traction (-)'], label=f"{i} - {force}")
print(f"{i} - {force}")
plt.show()
I have changed your plotting loop. The code below will generate two plots for Traction (one for each Load value), where each has two curves (one for each temperature). I have commented the line where you set the ylim(a, b) because this could lead to empty plot if data fall out of (a, b) range.
import pandas as pd
import matplotlib.pyplot as plt
data = {'Step 1': {'Step Typ': 'Traction', 'SRR (%)': {1: 8.384, 2: 9.815, 3: 7.531, 4: 10.209, 5: 7.989, 6: 7.331, 7: 5.008, 8: 2.716, 9: 9.6, 10: 7.911}, 'Traction (-)': {1: 5.602, 2: 6.04, 3: 2.631, 4: 2.952, 5: 8.162, 6: 9.312, 7: 4.994, 8: 2.959, 9: 10.075, 10: 5.498}, 'Temperature': 30, 'Load': 40}, 'Step 3': {'Step Typ': 'Traction', 'SRR (%)': {1: 2.909, 2: 5.552, 3: 5.656, 4: 9.043, 5: 3.424, 6: 7.382, 7: 3.916, 8: 2.665, 9: 4.832, 10: 3.993}, 'Traction (-)': {1: 9.158, 2: 6.721, 3: 7.787, 4: 7.491, 5: 8.267, 6: 2.985, 7: 5.882, 8: 3.591, 9: 6.334, 10: 10.43}, 'Temperature': 80, 'Load': 40}, 'Step 5': {'Step Typ': 'Traction', 'SRR (%)': {1: 4.765, 2: 9.293, 3: 7.608, 4: 7.371, 5: 4.87, 6: 4.832, 7: 6.244, 8: 6.488, 9: 5.04, 10: 2.962}, 'Traction (-)': {1: 6.656, 2: 7.872, 3: 8.799, 4: 7.9, 5: 4.22, 6: 6.288, 7: 7.439, 8: 7.77, 9: 5.977, 10: 9.395}, 'Temperature': 30, 'Load': 70}, 'Step 7': {'Step Typ': 'Traction', 'SRR (%)': {1: 9.46, 2: 2.83, 3: 3.249, 4: 9.273, 5: 8.792, 6: 9.673, 7: 6.784, 8: 3.838, 9: 8.779, 10: 4.82}, 'Traction (-)': {1: 5.245, 2: 8.491, 3: 10.088, 4: 9.988, 5: 4.886, 6: 4.168, 7: 8.628, 8: 5.038, 9: 7.712, 10: 3.961}, 'Temperature': 80, 'Load': 70}, 'Step 2': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.862, 2: 4.71, 3: 4.537, 4: 6.35, 5: 6.691, 6: 5.337, 7: 8.419, 8: 10.303, 9: 5.018, 10: 10.195}, 'Traction (-)': {1: 6.674, 2: 10.137, 3: 2.822, 4: 5.494, 5: 9.986, 6: 9.095, 7: 3.53, 8: 6.96, 9: 8.251, 10: 7.836}, 'Temperature': 30, 'Load': 40}, 'Step 4': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.04, 2: 8.288, 3: 3.731, 4: 10.137, 5: 5.32, 6: 8.504, 7: 5.917, 8: 9.677, 9: 8.641, 10: 7.685}, 'Traction (-)': {1: 9.522, 2: 4.749, 3: 3.46, 4: 3.21, 5: 5.005, 6: 9.886, 7: 8.023, 8: 5.935, 9: 8.74, 10: 5.117}, 'Temperature': 80, 'Load': 40}, 'Step 6': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 6.244, 2: 7.015, 3: 5.998, 4: 4.894, 5: 6.117, 6: 6.644, 7: 7.619, 8: 10.477, 9: 9.61, 10: 2.958}, 'Traction (-)': {1: 7.353, 2: 7.98, 3: 6.675, 4: 8.853, 5: 7.537, 6: 5.256, 7: 4.923, 8: 10.293, 9: 2.873, 10: 10.407}, 'Temperature': 30, 'Load': 70}, 'Step 8': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 3.475, 2: 2.756, 3: 7.809, 4: 9.449, 5: 2.72, 6: 4.133, 7: 10.139, 8: 10.0, 9: 3.71, 10: 8.267}, 'Traction (-)': {1: 6.307, 2: 2.83, 3: 9.258, 4: 3.405, 5: 9.659, 6: 6.662, 7: 6.413, 8: 6.488, 9: 7.972, 10: 6.288}, 'Temperature': 80, 'Load': 70} }
df = pd.DataFrame(data)
items = list()
series = list()
for item, d in data.items():
items.append(item)
series.append(pd.DataFrame.from_dict(d))
df = pd.concat(series, keys=items)
df.set_index(['Step Typ', 'Load', 'Temperature'], inplace=True)
for force, _ in df.groupby(level=1):
fig, ax = plt.subplots(figsize=(8, 6))
df_step = df.loc[('Traction'), force]
for temperature in df_step.index.unique():
df_temp = df_step.loc[temperature].sort_values('SRR (%)')
# ax.set_ylim(0, 0.1)
ax.set_ylabel('Traction Coeff (-)')
ax.set_xlabel('SRR (%)')
ax.set_title('Title comes later', loc='left')
ax.plot(df_temp['SRR (%)'], df_temp['Traction (-)'], label = f'T = {df_temp.index.unique().values[0]}°C - Load = {force}')
ax.legend(frameon = True)
plt.show()

How to highlight or increase visibility of a single line in a pandas multiple line plot?

I have the following df, weekly spend in a number of shops:
shop1 shop2 shop3 shop4 shop5 shop6 shop7 \
date_week
2 4328.85 5058.17 3028.68 2513.28 4204.10 1898.26 2209.75
3 5472.00 5085.59 3874.51 1951.60 2984.71 1416.40 1199.42
4 4665.53 4264.05 2781.70 2958.25 4593.46 2365.88 2079.73
5 5769.36 3460.79 3072.47 1866.19 3803.12 2166.84 1716.71
6 6267.00 4033.58 4053.70 2215.04 3991.31 2382.02 1974.92
7 5436.83 4402.83 3225.98 1761.87 4202.22 2430.71 3091.33
8 4850.43 4900.68 3176.00 3280.95 3483.53 4115.09 2594.01
9 6782.88 3800.03 3865.65 2221.43 4116.28 2638.28 2321.55
10 6248.18 4096.60 5186.52 3224.96 3614.24 2541.00 2708.36
11 4505.18 2889.33 2937.74 2418.34 5565.57 1570.55 1371.54
12 3115.26 1216.82 1759.49 2559.81 1403.61 1550.77 478.34
13 4561.82 827.16 4661.51 3197.90 1515.63 1688.57 247.25
shop8 shop9
date_week
2 3578.81 3134.39
3 4625.10 2676.20
4 3417.16 3870.00
5 3980.78 3439.60
6 3899.42 4192.41
7 4190.60 3989.00
8 4786.40 3484.51
9 6433.02 3474.66
10 4414.19 3809.20
11 3590.10 3414.50
12 4297.57 2094.00
13 3963.27 871.25
If I plot these in a line plot or "spaghetti plot" It works fine.
The goal is the look at trend in weekly sales over the last three months in 9 stores.
I'd like to highlight some lines in this plot to make them stand out. For example, i can see shop1 and shop3 (blue and green) have increased towards the end. So I want them to be more visible.
newgraph.plot()
I am trying to replicate the method here to highlight lines on plots, but it's not working.
Apparently you can just re-plot the single lines you are interested in after plotting all lines.
df.plot()
plt.plot(df.index, df['shop1'], marker='', color='orange', linewidth=4, alpha=0.7)
plt.plot(df.index, df['shop3'], marker='', color='orange', linewidth=4, alpha=0.7)
But it doesn't change the plot like in the tutorial link. The plot is the exact same.
I've tried all sorts of line widths and alphas. I've also tried to extract the series first, then plot it:
df.plot()
extraline = df['shop1']
plt.plot(extraline.index, extraline.values, marker='', color='orange', linewidth=4, alpha=0.7)
But this doesn't work either.
What am I doing wrong?
My df, if somebody want to test it out with pd.DataFrame.from_dict():
{'shop1': {2: 4328.849999999999,
3: 5472.0,
4: 4665.530000000001,
5: 5769.36,
6: 6267.0,
7: 5436.83,
8: 4850.43,
9: 6782.879999999999,
10: 6248.18,
11: 4505.18,
12: 3115.26,
13: 4561.82},
'shop2': {2: 5058.169999999993,
3: 5085.589999999996,
4: 4264.049999999997,
5: 3460.7899999999977,
6: 4033.579999999998,
7: 4402.829999999999,
8: 4900.679999999997,
9: 3800.0299999999997,
10: 4096.5999999999985,
11: 2889.3300000000004,
12: 1216.8200000000002,
13: 827.16},
'shop3': {2: 3028.679999999997,
3: 3874.5099999999984,
4: 2781.6999999999994,
5: 3072.4699999999984,
6: 4053.6999999999966,
7: 3225.9799999999987,
8: 3175.9999999999973,
9: 3865.6499999999974,
10: 5186.519999999996,
11: 2937.74,
12: 1759.49,
13: 4661.509999999998},
'shop4': {2: 2513.2799999999997,
3: 1951.6000000000001,
4: 2958.25,
5: 1866.1900000000003,
6: 2215.04,
7: 1761.8700000000001,
8: 3280.9499999999994,
9: 2221.43,
10: 3224.9600000000005,
11: 2418.3399999999997,
12: 2559.8099999999995,
13: 3197.9},
'shop5': {2: 4204.0999999999985,
3: 2984.71,
4: 4593.459999999999,
5: 3803.12,
6: 3991.31,
7: 4202.219999999999,
8: 3483.529999999999,
9: 4116.279999999999,
10: 3614.24,
11: 5565.569999999997,
12: 1403.6100000000001,
13: 1515.63},
'shop6': {2: 1898.260000000001,
3: 1416.4000000000005,
4: 2365.8799999999997,
5: 2166.84,
6: 2382.019999999999,
7: 2430.71,
8: 4115.0899999999965,
9: 2638.2800000000007,
10: 2541.0,
11: 1570.5500000000004,
12: 1550.7700000000002,
13: 1688.5700000000004},
'shop7': {2: 2209.75,
3: 1199.42,
4: 2079.7300000000005,
5: 1716.7100000000005,
6: 1974.9200000000005,
7: 3091.329999999999,
8: 2594.0099999999993,
9: 2321.5499999999997,
10: 2708.3599999999983,
11: 1371.5400000000004,
12: 478.34,
13: 247.25000000000003},
'shop8': {2: 3578.8100000000004,
3: 4625.1,
4: 3417.1599999999994,
5: 3980.7799999999997,
6: 3899.4200000000005,
7: 4190.600000000001,
8: 4786.4,
9: 6433.019999999998,
10: 4414.1900000000005,
11: 3590.1,
12: 4297.57,
13: 3963.27},
'shop9': {2: 3134.3900000000003,
3: 2676.2,
4: 3870.0,
5: 3439.6,
6: 4192.41,
7: 3989.0,
8: 3484.51,
9: 3474.66,
10: 3809.2,
11: 3414.5,
12: 2094.0,
13: 871.25}}

KeyError: 0 when trying to plot multiple histograms

Having troubles plotting multiple histograms. I get the error message:
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()
KeyError: 0
This is the code I wrote:
xaxes = ['price','bedrooms','sqft_living','sqft_lot','floors','waterfront',
'view','condition','grade','sqft_above','sqft_basement','yr_built',
'yr_renovated','zipcode','lat','long','sqft_living15','sqft_loft15']
a,b = plt.subplots(4,5)
b = b.ravel()
for idx,ax in enumerate(b):
ax.hist(file[idx])
ax.set_title(titles[idx])
ax.set_xlabel(xaxes[id])
plt.tight_layout()
Here is a sample of my data:
{'bathrooms': {0: 1.0,
1: 2.25,
2: 1.0,
3: 3.0,
4: 2.0,
5: 4.5,
6: 2.25,
7: 1.5,
8: 1.0,
9: 2.5},
'bedrooms': {0: 3, 1: 3, 2: 2, 3: 4, 4: 3, 5: 4, 6: 3, 7: 3, 8: 3, 9: 3},
'condition': {0: 3, 1: 3, 2: 3, 3: 5, 4: 3, 5: 3, 6: 3, 7: 3, 8: 3, 9: 3},
'date': {0: '20141013T000000',
1: '20141209T000000',
2: '20150225T000000',
3: '20141209T000000',
4: '20150218T000000',
5: '20140512T000000',
6: '20140627T000000',
7: '20150115T000000',
8: '20150415T000000',
9: '20150312T000000'},
'floors': {0: 1.0,
1: 2.0,
2: 1.0,
3: 1.0,
4: 1.0,
5: 1.0,
6: 2.0,
7: 1.0,
8: 1.0,
9: 2.0},
'grade': {0: 7, 1: 7, 2: 6, 3: 7, 4: 8, 5: 11, 6: 7, 7: 7, 8: 7, 9: 7},
'id': {0: 7129300520,
1: 6414100192,
2: 5631500400,
3: 2487200875,
4: 1954400510,
5: 7237550310,
6: 1321400060,
7: 2008000270,
8: 2414600126,
9: 3793500160},
'lat': {0: 47.511200000000002,
1: 47.721000000000004,
2: 47.737900000000003,
3: 47.520800000000001,
4: 47.616799999999998,
5: 47.656100000000002,
6: 47.309699999999999,
7: 47.409500000000001,
8: 47.512300000000003,
9: 47.368400000000001},
'long': {0: -122.25700000000001,
1: -122.319,
2: -122.23299999999999,
3: -122.39299999999999,
4: -122.045,
5: -122.005,
6: -122.32700000000001,
7: -122.315,
8: -122.337,
9: -122.03100000000001},
'price': {0: 221900.0,
1: 538000.0,
2: 180000.0,
3: 604000.0,
4: 510000.0,
5: 1230000.0,
6: 257500.0,
7: 291850.0,
8: 229500.0,
9: 323000.0},
'sqft_above': {0: 1180,
1: 2170,
2: 770,
3: 1050,
4: 1680,
5: 3890,
6: 1715,
7: 1060,
8: 1050,
9: 1890},
'sqft_basement': {0: 0,
1: 400,
2: 0,
3: 910,
4: 0,
5: 1530,
6: 0,
7: 0,
8: 730,
9: 0},
'sqft_living': {0: 1180,
1: 2570,
2: 770,
3: 1960,
4: 1680,
5: 5420,
6: 1715,
7: 1060,
8: 1780,
9: 1890},
'sqft_living15': {0: 1340,
1: 1690,
2: 2720,
3: 1360,
4: 1800,
5: 4760,
6: 2238,
7: 1650,
8: 1780,
9: 2390},
'sqft_lot': {0: 5650,
1: 7242,
2: 10000,
3: 5000,
4: 8080,
5: 101930,
6: 6819,
7: 9711,
8: 7470,
9: 6560},
'sqft_lot15': {0: 5650,
1: 7639,
2: 8062,
3: 5000,
4: 7503,
5: 101930,
6: 6819,
7: 9711,
8: 8113,
9: 7570},
'view': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0},
'waterfront': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0},
'yr_built': {0: 1955,
1: 1951,
2: 1933,
3: 1965,
4: 1987,
5: 2001,
6: 1995,
7: 1963,
8: 1960,
9: 2003},
'yr_renovated': {0: 0,
1: 1991,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 0,
8: 0,
9: 0},
'zipcode': {0: 98178,
1: 98125,
2: 98028,
3: 98136,
4: 98074,
5: 98053,
6: 98003,
7: 98198,
8: 98146,
9: 98038}}
If i understand you in a right way and variable file contains your data in pandas dataframe then you simply faced with a problem of indexing that dataframe.
file[idx] corresponds to file.loc[idx] which means "give me a row with idx number in my dataframe" while you need a column instead of a row. Just replace it with file.loc[:,idx].
Check this link for mode details about indexing and selecting in pandas.

Renaming key and sub key in python nested dictionary with iterations

I am trying to rename the key and subkey in python nested dictionary. However, I haven't got the result that I expected yet. Below is the original nested key that I have.
nested_dict = {
0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}
}
I am trying to change the key and subkey to another value into this.
nested_dict = {
4: {4: 33.97, 5: 55.32, 6: 57.31, 7: 71.56},
5: {4: 27.31, 5: 23.32, 6: 32.25, 7: 60.21},
6: {4: 65.38, 5: 36.88, 6: 70.88, 7: 21.93},
7: {4: 35.44, 5: 21.21, 6: 40.72, 7: 51.35}
}
What I have in mind is renaming the key using a list. I have tried to replace the key and subkey with a list below:
new_key = []
for i in range(4,8):
new_key.append(i)
However, I still haven't got it. Another idea is using pandas DataFrame to rename both key and subkey. I am not sure whether using lists or pandas is suitable for the given problem.
Code for renaming a key from here:
mydict[new_key] = mydict.pop(old_key)
You could use a (nested) dict comprehension ([Python]: PEP 274 -- Dict Comprehensions). Note that it generates a new dictionary (but you can assign it to the old variable):
>>> from pprint import pprint as pp
>>>
>>> nested_dict = {
... 0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
... 1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
... 2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
... 3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}
... }
>>>
>>> pp(nested_dict)
{0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}}
>>>
>>> modified_nested_dict = {k0 + 4: {k1 + 4: v1 for k1, v1 in v0.items()} for k0, v0 in nested_dict.items()}
>>>
>>> pp(modified_nested_dict)
{4: {4: 33.97, 5: 55.32, 6: 57.31, 7: 71.56},
5: {4: 27.31, 5: 23.32, 6: 32.25, 7: 60.21},
6: {4: 65.38, 5: 36.88, 6: 70.88, 7: 21.93},
7: {4: 35.44, 5: 21.21, 6: 40.72, 7: 51.35}}
You can use Pandas Dataframe for the desired task, as follows:
import pandas as pd
nested_dict = {
0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}
}
print("Dictionary before renaming: ", nested_dict)
# Convert nested dictionary to Pandas Dataframe
my_dataframe = pd.DataFrame.from_dict(nested_dict)
new_keys = list(range(4, 8)) # List of new keys
my_dataframe.columns = new_keys # Set columns to the new keys
my_dataframe.set_index([new_keys], inplace=True) # Set index to the new keys
nested_dict = my_dataframe.to_dict() # Convert back to nested dictionary
print("Dictionary after renaming: ", nested_dict)
This gives you the following expected output:
Dictionary before renaming: {0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56}, 1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21}, 2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93}, 3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}}
Dictionary after renaming: {4: {4: 33.97, 5: 55.32, 6: 57.31, 7: 71.56}, 5: {4: 27.31, 5: 23.32, 6: 32.25, 7: 60.21}, 6: {4: 65.38, 5: 36.88, 6: 70.88, 7: 21.93}, 7: {4: 35.44, 5: 21.21, 6: 40.72, 7: 51.35}}

Categories

Resources