I want to plot by Step Typ: Traction and Stribeck. The different load stages should have his own plot. At the respective load level, the line plots should be broken down by temperature. y-axis is Traction (-) and x-axis the respective counterpart SRR (%) or Rolling speed (mm/s) (for Traction and Stribeck respectively). At the end, I should have four different plots.
Example, how it should look like:
My attempt so far, which leads to an empty plot.
import pandas as pd
import matplotlib.pyplot as plt
data = {'Step 1': {'Step Typ': 'Traction', 'SRR (%)': {1: 8.384, 2: 9.815, 3: 7.531, 4: 10.209, 5: 7.989, 6: 7.331, 7: 5.008, 8: 2.716, 9: 9.6, 10: 7.911}, 'Traction (-)': {1: 5.602, 2: 6.04, 3: 2.631, 4: 2.952, 5: 8.162, 6: 9.312, 7: 4.994, 8: 2.959, 9: 10.075, 10: 5.498}, 'Temperature': 30, 'Load': 40}, 'Step 3': {'Step Typ': 'Traction', 'SRR (%)': {1: 2.909, 2: 5.552, 3: 5.656, 4: 9.043, 5: 3.424, 6: 7.382, 7: 3.916, 8: 2.665, 9: 4.832, 10: 3.993}, 'Traction (-)': {1: 9.158, 2: 6.721, 3: 7.787, 4: 7.491, 5: 8.267, 6: 2.985, 7: 5.882, 8: 3.591, 9: 6.334, 10: 10.43}, 'Temperature': 80, 'Load': 40}, 'Step 5': {'Step Typ': 'Traction', 'SRR (%)': {1: 4.765, 2: 9.293, 3: 7.608, 4: 7.371, 5: 4.87, 6: 4.832, 7: 6.244, 8: 6.488, 9: 5.04, 10: 2.962}, 'Traction (-)': {1: 6.656, 2: 7.872, 3: 8.799, 4: 7.9, 5: 4.22, 6: 6.288, 7: 7.439, 8: 7.77, 9: 5.977, 10: 9.395}, 'Temperature': 30, 'Load': 70}, 'Step 7': {'Step Typ': 'Traction', 'SRR (%)': {1: 9.46, 2: 2.83, 3: 3.249, 4: 9.273, 5: 8.792, 6: 9.673, 7: 6.784, 8: 3.838, 9: 8.779, 10: 4.82}, 'Traction (-)': {1: 5.245, 2: 8.491, 3: 10.088, 4: 9.988, 5: 4.886, 6: 4.168, 7: 8.628, 8: 5.038, 9: 7.712, 10: 3.961}, 'Temperature': 80, 'Load': 70}, 'Step 2': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.862, 2: 4.71, 3: 4.537, 4: 6.35, 5: 6.691, 6: 5.337, 7: 8.419, 8: 10.303, 9: 5.018, 10: 10.195}, 'Traction (-)': {1: 6.674, 2: 10.137, 3: 2.822, 4: 5.494, 5: 9.986, 6: 9.095, 7: 3.53, 8: 6.96, 9: 8.251, 10: 7.836}, 'Temperature': 30, 'Load': 40}, 'Step 4': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.04, 2: 8.288, 3: 3.731, 4: 10.137, 5: 5.32, 6: 8.504, 7: 5.917, 8: 9.677, 9: 8.641, 10: 7.685}, 'Traction (-)': {1: 9.522, 2: 4.749, 3: 3.46, 4: 3.21, 5: 5.005, 6: 9.886, 7: 8.023, 8: 5.935, 9: 8.74, 10: 5.117}, 'Temperature': 80, 'Load': 40}, 'Step 6': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 6.244, 2: 7.015, 3: 5.998, 4: 4.894, 5: 6.117, 6: 6.644, 7: 7.619, 8: 10.477, 9: 9.61, 10: 2.958}, 'Traction (-)': {1: 7.353, 2: 7.98, 3: 6.675, 4: 8.853, 5: 7.537, 6: 5.256, 7: 4.923, 8: 10.293, 9: 2.873, 10: 10.407}, 'Temperature': 30, 'Load': 70}, 'Step 8': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 3.475, 2: 2.756, 3: 7.809, 4: 9.449, 5: 2.72, 6: 4.133, 7: 10.139, 8: 10.0, 9: 3.71, 10: 8.267}, 'Traction (-)': {1: 6.307, 2: 2.83, 3: 9.258, 4: 3.405, 5: 9.659, 6: 6.662, 7: 6.413, 8: 6.488, 9: 7.972, 10: 6.288}, 'Temperature': 80, 'Load': 70} }
df = pd.DataFrame(data)
items = list()
series = list()
for item, d in data.items():
items.append(item)
series.append(pd.DataFrame.from_dict(d))
df = pd.concat(series, keys=items)
df.set_index(['Step Typ', 'Load', 'Temperature'], inplace=True)
df.loc[('Stribeck')]
for force, _ in df.groupby(level=1):
plt.figure(figsize=(15, 12))
for i, row in df.loc[('Traction'), force].iterrows():
plt.ylim(0, 0.1)
plt.ylabel('Traction Coeff (-)')
plt.xlabel('Rolling Speed (mm/s)')
plt.title('Title comes later', loc='left')
plt.plot(row['Rolling Speed (mm/s)'], row['Traction (-)'], label=f"{i} - {force}")
print(f"{i} - {force}")
plt.show()
I have changed your plotting loop. The code below will generate two plots for Traction (one for each Load value), where each has two curves (one for each temperature). I have commented the line where you set the ylim(a, b) because this could lead to empty plot if data fall out of (a, b) range.
import pandas as pd
import matplotlib.pyplot as plt
data = {'Step 1': {'Step Typ': 'Traction', 'SRR (%)': {1: 8.384, 2: 9.815, 3: 7.531, 4: 10.209, 5: 7.989, 6: 7.331, 7: 5.008, 8: 2.716, 9: 9.6, 10: 7.911}, 'Traction (-)': {1: 5.602, 2: 6.04, 3: 2.631, 4: 2.952, 5: 8.162, 6: 9.312, 7: 4.994, 8: 2.959, 9: 10.075, 10: 5.498}, 'Temperature': 30, 'Load': 40}, 'Step 3': {'Step Typ': 'Traction', 'SRR (%)': {1: 2.909, 2: 5.552, 3: 5.656, 4: 9.043, 5: 3.424, 6: 7.382, 7: 3.916, 8: 2.665, 9: 4.832, 10: 3.993}, 'Traction (-)': {1: 9.158, 2: 6.721, 3: 7.787, 4: 7.491, 5: 8.267, 6: 2.985, 7: 5.882, 8: 3.591, 9: 6.334, 10: 10.43}, 'Temperature': 80, 'Load': 40}, 'Step 5': {'Step Typ': 'Traction', 'SRR (%)': {1: 4.765, 2: 9.293, 3: 7.608, 4: 7.371, 5: 4.87, 6: 4.832, 7: 6.244, 8: 6.488, 9: 5.04, 10: 2.962}, 'Traction (-)': {1: 6.656, 2: 7.872, 3: 8.799, 4: 7.9, 5: 4.22, 6: 6.288, 7: 7.439, 8: 7.77, 9: 5.977, 10: 9.395}, 'Temperature': 30, 'Load': 70}, 'Step 7': {'Step Typ': 'Traction', 'SRR (%)': {1: 9.46, 2: 2.83, 3: 3.249, 4: 9.273, 5: 8.792, 6: 9.673, 7: 6.784, 8: 3.838, 9: 8.779, 10: 4.82}, 'Traction (-)': {1: 5.245, 2: 8.491, 3: 10.088, 4: 9.988, 5: 4.886, 6: 4.168, 7: 8.628, 8: 5.038, 9: 7.712, 10: 3.961}, 'Temperature': 80, 'Load': 70}, 'Step 2': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.862, 2: 4.71, 3: 4.537, 4: 6.35, 5: 6.691, 6: 5.337, 7: 8.419, 8: 10.303, 9: 5.018, 10: 10.195}, 'Traction (-)': {1: 6.674, 2: 10.137, 3: 2.822, 4: 5.494, 5: 9.986, 6: 9.095, 7: 3.53, 8: 6.96, 9: 8.251, 10: 7.836}, 'Temperature': 30, 'Load': 40}, 'Step 4': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.04, 2: 8.288, 3: 3.731, 4: 10.137, 5: 5.32, 6: 8.504, 7: 5.917, 8: 9.677, 9: 8.641, 10: 7.685}, 'Traction (-)': {1: 9.522, 2: 4.749, 3: 3.46, 4: 3.21, 5: 5.005, 6: 9.886, 7: 8.023, 8: 5.935, 9: 8.74, 10: 5.117}, 'Temperature': 80, 'Load': 40}, 'Step 6': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 6.244, 2: 7.015, 3: 5.998, 4: 4.894, 5: 6.117, 6: 6.644, 7: 7.619, 8: 10.477, 9: 9.61, 10: 2.958}, 'Traction (-)': {1: 7.353, 2: 7.98, 3: 6.675, 4: 8.853, 5: 7.537, 6: 5.256, 7: 4.923, 8: 10.293, 9: 2.873, 10: 10.407}, 'Temperature': 30, 'Load': 70}, 'Step 8': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 3.475, 2: 2.756, 3: 7.809, 4: 9.449, 5: 2.72, 6: 4.133, 7: 10.139, 8: 10.0, 9: 3.71, 10: 8.267}, 'Traction (-)': {1: 6.307, 2: 2.83, 3: 9.258, 4: 3.405, 5: 9.659, 6: 6.662, 7: 6.413, 8: 6.488, 9: 7.972, 10: 6.288}, 'Temperature': 80, 'Load': 70} }
df = pd.DataFrame(data)
items = list()
series = list()
for item, d in data.items():
items.append(item)
series.append(pd.DataFrame.from_dict(d))
df = pd.concat(series, keys=items)
df.set_index(['Step Typ', 'Load', 'Temperature'], inplace=True)
for force, _ in df.groupby(level=1):
fig, ax = plt.subplots(figsize=(8, 6))
df_step = df.loc[('Traction'), force]
for temperature in df_step.index.unique():
df_temp = df_step.loc[temperature].sort_values('SRR (%)')
# ax.set_ylim(0, 0.1)
ax.set_ylabel('Traction Coeff (-)')
ax.set_xlabel('SRR (%)')
ax.set_title('Title comes later', loc='left')
ax.plot(df_temp['SRR (%)'], df_temp['Traction (-)'], label = f'T = {df_temp.index.unique().values[0]}°C - Load = {force}')
ax.legend(frameon = True)
plt.show()
Related
In my table from a dataset I need to highlight rows in bold that contain "All" in columns Building, Floor or Teams:
My code :
headerColor = 'darkgrey'
rowEvenColor = 'lightgrey'
rowOddColor = 'white'
fig_occ_fl_team = go.Figure(data=[go.Table(
header=dict(
values=list(final_table_occ_fl_team.columns),
line_color='black',
fill_color=headerColor,
align=['left','left','left','left','left','left','left','left','left','left'],
font=dict(color='black', size=9)
),
cells=dict(
values=[final_table_occ_fl_team['Building'],
final_table_occ_fl_team['Floor'],
final_table_occ_fl_team['Team'],
final_table_occ_fl_team['Number of Desks'],
final_table_occ_fl_team['Avg Occu (#)'],
final_table_occ_fl_team['Avg Occu (%)'],
final_table_occ_fl_team['Avg Occu 10-4 (#)'],
final_table_occ_fl_team['Avg Occu 10-4 (%)'],
final_table_occ_fl_team['Max Occu (#)'],
final_table_occ_fl_team['Max Occu (%)'],
],
line_color='black',
# 2-D list of colors for alternating rows
fill_color = [[rowOddColor,rowEvenColor]*56],
align = ['left','left','left','left','left','left','left','left','left','left'],
font = dict(color = 'black', size = 7)
))
])
fig_occ_fl_team.show()
Dataset head :
data = {'Building': {0: 'All',
1: '1LWP',
2: '1LWP',
3: '1LWP',
4: '1LWP',
5: '1LWP',
6: '1LWP',
7: '1LWP',
8: '1LWP',
9: '1LWP'},
'Floor': {0: 'All',
1: 'All',
2: '2nd',
3: '2nd',
4: '2nd',
5: '2nd',
6: '2nd',
7: '2nd',
8: '2nd',
9: '2nd'},
'Team': {0: 'All',
1: 'All',
2: 'All',
3: 'Anderson/Money',
4: 'Banking & Treasury',
5: 'Charities',
6: 'Client Management',
7: 'Compliance, Legal & Risk',
8: 'DFM',
9: 'Emmerson'},
'Number of Desks': {0: 2297,
1: 2008,
2: 381,
3: 22,
4: 8,
5: 19,
6: 9,
7: 41,
8: 20,
9: 33},
'Avg Occu (#)': {0: 1261,
1: 1126,
2: 195,
3: 14,
4: 4,
5: 9,
6: 5,
7: 21,
8: 13,
9: 18},
'Avg Occu (%)': {0: '55%',
1: '56%',
2: '51%',
3: '64%',
4: '50%',
5: '48%',
6: '56%',
7: '52%',
8: '65%',
9: '55%'},
'Avg Occu 10-4 (#)': {0: 851,
1: 759,
2: 132,
3: 8,
4: 3,
5: 6,
6: 3,
7: 14,
8: 9,
9: 12},
'Avg Occu 10-4 (%)': {0: '37%',
1: '38%',
2: '35%',
3: '37%',
4: '38%',
5: '32%',
6: '34%',
7: '35%',
8: '45%',
9: '37%'},
'Max Occu (#)': {0: 1901,
1: 1680,
2: 274,
3: 22,
4: 6,
5: 13,
6: 7,
7: 27,
8: 17,
9: 25},
'Max Occu (%)': {0: '83%',
1: '84%',
2: '72%',
3: '100%',
4: '75%',
5: '69%',
6: '78%',
7: '66%',
8: '85%',
9: '76%'}}
You can add the bold style to your dataframe prior to creating the table as follows:
import pandas as pd
df = pd.DataFrame().from_dict(data)
indices = df.index[(df[["Building","Floor","Team"]] == "All").all(1)]
for i in indices:
for j in range(len(df.columns)):
df.iloc[i,j] = "<b>{}</b>".format(df.iloc[i,j])
You can now create the table, I increase the size of font to 12:
import plotly.graph_objects as go
headerColor = 'darkgrey'
rowEvenColor = 'lightgrey'
rowOddColor = 'white'
fig_occ_fl_team = go.Figure(data=[go.Table(
header=dict(
values=list(df.columns),
line_color='black',
fill_color=headerColor,
align=['left','left','left','left','left','left','left','left','left','left'],
font=dict(color='black', size=9)
),
cells=dict(
values=[df['Building'],
df['Floor'],
df['Team'],
df['Number of Desks'],
df['Avg Occu (#)'],
df['Avg Occu (%)'],
df['Avg Occu 10-4 (#)'],
df['Avg Occu 10-4 (%)'],
df['Max Occu (#)'],
df['Max Occu (%)'],
],
line_color='black',
# 2-D list of colors for alternating rows
fill_color = [[rowOddColor,rowEvenColor]*56],
align = ['left','left','left','left','left','left','left','left','left','left'],
font = dict(color = 'black', size = 12)
))
])
fig_occ_fl_team.show()
Output:
You will notice that the first and forth columns are bold. If you want to keep the original dataframe unchanged, you can use such that df2 = df1.copy().
new to network graphs so was hoping for a little guidance...
I'm trying to create a network graph between users collaborating with each other, my problem is I cannot figure out how to add multiple dimensions to the network.
At a high level I want to show:
User-to-User interactions
Add some sort of size indication of users who are collaborating more (via the edges, the more interactions the thicker the line between the two users).
Add color to the edges/lines indicating the project they worked on together
Add color to node based on user license type
So something like:
So far I have the following:
import pandas as pd
import networkx as nx
from pyvis.network import Network
df_dict = {'PROJECT': {0: 'Finance Project', 1: 'Finance Project', 2: 'Finance Project', 3: 'Finance Project', 4: 'Finance Project', 5: 'Finance Project', 6: 'Finance Project', 7: 'Finance Project', 8: 'Finance Project', 9: 'Finance Project', 10: 'Finance Project', 11: 'Finance Project', 12: 'HR Project', 13: 'Finance Project', 14: 'HR Project', 15: 'Finance Project'},
'PLAN': {0: 'COMPANY', 1: 'COMPANY', 2: 'COMPANY', 3: 'COMPANY', 4: 'COMPANY', 5: 'COMPANY', 6: 'COMPANY', 7: 'COMPANY', 8: 'COMPANY', 9: 'COMPANY', 10: 'COMPANY', 11: 'COMPANY', 12: 'COMPANY', 13: 'COMPANY', 14: 'COMPANY', 15: 'COMPANY'},
'USER_ONE': {0: 'Mike Jones', 1: 'Eminem', 2: 'Mike Jones', 3: 'Mike Jones', 4: 'Michael Jordan', 5: 'Eminem', 6: 'Michael Jordan', 7: 'Michael Jordan', 8: 'Mike Jones', 9: 'Kobe Bryant', 10: 'Eminem', 11: 'Elon Musk', 12: 'Bill Gates', 13: 'Elon Musk', 14: 'Mark Zuckerberg', 15: 'Elon Musk'},
'USER_ONE_LICENSE': {0: 'FULL', 1: 'FULL', 2: 'FULL', 3: 'FULL', 4: 'FULL', 5: 'FULL', 6: 'FULL', 7: 'FULL', 8: 'FULL', 9: 'OCCASIONAL', 10: 'FULL', 11: 'FULL', 12: 'FULL', 13: 'FULL', 14: 'FULL', 15: 'FULL'},
'USER_ONE_LICENSE_COLOR': {0: 'lightgreen', 1: 'lightgreen', 2: 'lightgreen', 3: 'lightgreen', 4: 'lightgreen', 5: 'lightgreen', 6: 'lightgreen', 7: 'lightgreen', 8: 'lightgreen', 9: 'gray', 10: 'lightgreen', 11: 'lightgreen', 12: 'lightgreen', 13: 'lightgreen', 14: 'lightgreen', 15: 'lightgreen'},
'USER_ONE_DAYS_COLLAB': {0: 88, 1: 55, 2: 67, 3: 1, 4: 70, 5: 54, 6: 2, 7: 114, 8: 4, 9: 1, 10: 10, 11: 19, 12: 5, 13: 11, 14: 100, 15: 13},
'USER_TWO': {0: 'Michael Jordan', 1: 'Mike Jones', 2: 'Eminem', 3: 'Kobe Bryant', 4: 'Eminem', 5: 'Michael Jordan', 6: 'Elon Musk', 7: 'Mike Jones', 8: 'Elon Musk', 9: 'Mike Jones', 10: 'Elon Musk', 11: 'Eminem', 12: 'Mark Zuckerberg', 13: 'Michael Jordan', 14: 'Bill Gates', 15: 'Mike Jones'},
'USER_TWO_LICENSE': {0: 'FULL', 1: 'FULL', 2: 'FULL', 3: 'OCCASIONAL', 4: 'FULL', 5: 'FULL', 6: 'FULL', 7: 'FULL', 8: 'FULL', 9: 'FULL', 10: 'FULL', 11: 'FULL', 12: 'FULL', 13: 'FULL', 14: 'FULL', 15: 'FULL'},
'USER_TWO_LICENSE_COLOR': {0: 'lightgreen', 1: 'lightgreen', 2: 'lightgreen', 3: 'gray', 4: 'lightgreen', 5: 'lightgreen', 6: 'lightgreen', 7: 'lightgreen', 8: 'lightgreen', 9: 'lightgreen', 10: 'lightgreen', 11: 'lightgreen', 12: 'lightgreen', 13: 'lightgreen', 14: 'lightgreen', 15: 'lightgreen'},
'USER_TWO_DAYS_COLLAB': {0: 114, 1: 67, 2: 55, 3: 1, 4: 54, 5: 70, 6: 11, 7: 88, 8: 13, 9: 1, 10: 19, 11: 10, 12: 100, 13: 2, 14: 5, 15: 4}
, 'TOTAL_COLLABS': {0: 202, 1: 122, 2: 122, 3: 2, 4: 124, 5: 124, 6: 13, 7: 202, 8: 17, 9: 2, 10: 29, 11: 29, 12: 105, 13: 13, 14: 105, 15: 17}}
df = pd.DataFrame(df_dict)
# where do I add all the other attributes?
#i.e. license type, project, # of interactions (I'm assuming this can be something like weights?)
#In my case I believe my source + target needs to be Project + User?
G = nx.from_pandas_edgelist(df
,source='USER_ONE'
,target='USER_TWO' #I tried ['PROJECT','USER_TWO']
)
net = Network(notebook=True)
net.from_nx(G)
net.show_buttons(filter_=True)
net.show('example4.html')
All the examples I've seen only have one source and one target - mine needs user + project for both source and target. Is there a way to do this without creating one field that combines both?
Haven't found a clear way to color nodes , the example provided shows a case on the node value (in my case the node is just text, i have another dimension i want to refer to to dictate color)
Haven't found a clear way to build a case statement on edge width either. Concretely:
if count of interactions <= 1 then "small width"
if count of interactions > 1 and <=5 then "medium width"
etc...
Any direction or resources would be greatly appreciated -- everything I come across seems to be different than my setup leaving me unsure how to proceed.
my table looks something like this for reference:
dt = {'id': {0: 'x1', 1: 'x2', 2: 'x3', 3: 'x4', 4: 'x5', 5: 'x6', 6: 'x7', 7: 'x8', 8: 'x9', 9: 'x10'}, 'trt': {0: 'cnt', 1: 'cnt', 2: 'tr', 3: 'tr', 4: 'tr', 5: 'cnt', 6: 'tr', 7: 'tr', 8: 'cnt', 9: 'cnt'}, 'work.T1': {0: 0.6516556669957936, 1: 0.567737752571702, 2: 0.1135089821182191, 3: 0.5959253052715212, 4: 0.3580499750096351, 5: 0.4288094183430075, 6: 0.0519033221062272, 7: 0.2641776674427092, 8: 0.3987907308619469, 9: 0.8361341434065253}, 'play.T1': {0: 0.8647212258074433, 1: 0.6153524168767035, 2: 0.7751098964363337, 3: 0.3555686913896352, 4: 0.4058499720413238, 5: 0.7066469138953835, 6: 0.8382876652758569, 7: 0.2395891312044114, 8: 0.7707715332508087, 9: 0.3558977444190532}, 'talk.T1': {0: 0.5355970377568156, 1: 0.0930881295353174, 2: 0.169803041499108, 3: 0.8998324507847428, 4: 0.4226376069709658, 5: 0.7477464678231627, 6: 0.8226525799836963, 7: 0.9546536463312804, 8: 0.6854445093777031, 9: 0.5005032296758145}, 'work.T2': {0: 0.2754838624969125, 1: 0.2289039448369294, 2: 0.0144339059479534, 3: 0.7289645625278354, 4: 0.2498804717324674, 5: 0.1611832766793668, 6: 0.0170426501426845, 7: 0.4861003451514989, 8: 0.1029001718852669, 9: 0.8015470046084374}, 'play.T2': {0: 0.3543280649464577, 1: 0.9364325392525644, 2: 0.2458663922734558, 3: 0.4731414613779634, 4: 0.191560871200636, 5: 0.5832219698932022, 6: 0.4594731898978352, 7: 0.467434047954157, 8: 0.3998325555585325, 9: 0.5052855962421745}, 'talk.T2': {0: 0.0318881559651345, 1: 0.1144675880204886, 2: 0.468935475917533, 3: 0.3969867376144975, 4: 0.8336191941052675, 5: 0.7611217433586717, 6: 0.5733564489055425, 7: 0.447508045937866, 8: 0.0838020080700516, 9: 0.2191385473124683}}
mydt = pd.DataFrame(dt, columns = ['id', 'trt', 'work.T1', '', 'play.T1', 'talk.T1','work.T2', '', 'play.T2', 'talk.T2'])
So I have the above dataset and need to tidy it up. I have used the following code but it returns "ValueError: stubname can't be identical to a column name." How can I fix the code to avoid this problem?
names = ['play', 'talk', 'work']
activities = pd.wide_to_long(dt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
activities
Note: I am trying to get the dataframe to look like the following.
Changed :
activities = pd.wide_to_long(activities, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
To:
activities = pd.wide_to_long(mydt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
and then it works.
Here is the df:
{'Type 1': {1: 123.0,
2: 123.0,
3: 123.0,
4: 123.0,
5: 123.0,
6: 45.0,
7: 45.0,
8: 45.0,
9: 45.0,
10: 9.5,
11: 9.5,
12: 9.5,
13: 2.34,
14: 2.34,
15: 2.34},
'Type 2': {1: 0,
2: 0,
3: -90,
4: -90,
5: -90,
6: -90,
7: -90,
8: -270,
9: -270,
10: -270,
11: -270,
12: 180,
13: 180,
14: 181,
15: 181},
'Type 3': {1: 0,
2: 0,
3: 0,
4: 0,
5: 55,
6: 55,
7: 55,
8: 55,
9: 55,
10: 9,
11: 9,
12: 3,
13: 3,
14: 3,
15: 3},
'Type 4': {1: 5.0,
2: 5.0,
3: 5.0,
4: 5.0,
5: 10.0,
6: 123.0,
7: 12.0,
8: 23.0,
9: 16.0,
10: 3.14,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 18.0},
'Type 5': {1: 65536,
2: 65536,
3: 65536,
4: 65536,
5: 78888888,
6: 665,
7: 665,
8: 665,
9: 665,
10: 665,
11: 665,
12: 665,
13: 665,
14: 665,
15: 665},
'Type 6': {1: 3.4124,
2: 3.4124,
3: 3.4124,
4: 3.4124,
5: 3.4124,
6: 3.4124,
7: 3.4124,
8: 3.4124,
9: 3.4124,
10: 3.4124,
11: 3.4124,
12: 3.4124,
13: 3.4124,
14: 3.4124,
15: 3.4124},
'Type 7': {1: 0,
2: 0,
3: 2,
4: 2,
5: 2,
6: 1,
7: 1,
8: 1,
9: 1,
10: 10,
11: 10,
12: 9,
13: 9,
14: -5,
15: -5},
'Type 8': {1: 'convert the string to 0 and non-zero value to 1',
2: 'convert the string to 0 and non-zero value to 1',
3: 'convert the string to 0 and non-zero value to 1',
4: 'convert the string to 0 and non-zero value to 1',
5: 'convert the string to 0 and non-zero value to 1',
6: 'convert the string to 0 and non-zero value to 1',
7: 'convert the string to 0 and non-zero value to 1',
8: 'convert the string to 0 and non-zero value to 1',
9: 'convert the string to 0 and non-zero value to 1',
10: 'convert the string to 0 and non-zero value to 1',
11: 'convert the string to 0 and non-zero value to 1',
12: 'convert the string to 0 and non-zero value to 1',
13: 'convert the string to 0 and non-zero value to 1',
14: 'convert the string to 0 and non-zero value to 1',
15: 'convert the string to 0 and non-zero value to 1'},
'Type 9': {1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 1,
7: 1,
8: 0,
9: 0,
10: 8,
11: 8,
12: 0,
13: 0,
14: 45,
15: 45}}
each column in the dataframe has a lower and an upper limit as mentioned in the below list
eg:
lower_limit = [3,-90,0,0,0,1,0,0,0] #Type 1 lower limit is 3...
upper_limit = [100,90,50,100,65535,3,1,1,1] #Type 1 upper limit is 100...
lower_limit = pd.Series(lower_limit)
upper_limit = pd.Series(upper_limit)
df.clip(lower_limit, upper_limit, axis = 1)
But this returns every element as nan
whereas the expected result is to clip each column based on the upper limit and lower limit mentioned in the list...
Using for loop, I was able to make the necessary change, but it was extremely slower when the size of df is huge
I understand clipping is the faster way to make the changes to df but it doesnt work as expected, I am doing some mistake in it and advice if any other alternative ways of clipping the columns in a faster way?
From documentation, lower and upper must be float or array-like, not Series.
You could do
lower_limit = [3,-90,0,0,0,1,0,'',0] #Type 1 lower limit is 3...
upper_limit = [100,90,50,100,65535,3,1,'',1] #Type 1 upper limit is 100...
df.clip(lower_limit, upper_limit, axis = 1)
but column Type 8 is as string so you'd get an empty column with clip, you can fix with
lower_limit = [3,-90,0,0,0,1,0,df['Type 8'].min(),0]
upper_limit = [100,90,50,100,65535,3,1,df['Type 8'].max(),1]
Having troubles plotting multiple histograms. I get the error message:
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()
KeyError: 0
This is the code I wrote:
xaxes = ['price','bedrooms','sqft_living','sqft_lot','floors','waterfront',
'view','condition','grade','sqft_above','sqft_basement','yr_built',
'yr_renovated','zipcode','lat','long','sqft_living15','sqft_loft15']
a,b = plt.subplots(4,5)
b = b.ravel()
for idx,ax in enumerate(b):
ax.hist(file[idx])
ax.set_title(titles[idx])
ax.set_xlabel(xaxes[id])
plt.tight_layout()
Here is a sample of my data:
{'bathrooms': {0: 1.0,
1: 2.25,
2: 1.0,
3: 3.0,
4: 2.0,
5: 4.5,
6: 2.25,
7: 1.5,
8: 1.0,
9: 2.5},
'bedrooms': {0: 3, 1: 3, 2: 2, 3: 4, 4: 3, 5: 4, 6: 3, 7: 3, 8: 3, 9: 3},
'condition': {0: 3, 1: 3, 2: 3, 3: 5, 4: 3, 5: 3, 6: 3, 7: 3, 8: 3, 9: 3},
'date': {0: '20141013T000000',
1: '20141209T000000',
2: '20150225T000000',
3: '20141209T000000',
4: '20150218T000000',
5: '20140512T000000',
6: '20140627T000000',
7: '20150115T000000',
8: '20150415T000000',
9: '20150312T000000'},
'floors': {0: 1.0,
1: 2.0,
2: 1.0,
3: 1.0,
4: 1.0,
5: 1.0,
6: 2.0,
7: 1.0,
8: 1.0,
9: 2.0},
'grade': {0: 7, 1: 7, 2: 6, 3: 7, 4: 8, 5: 11, 6: 7, 7: 7, 8: 7, 9: 7},
'id': {0: 7129300520,
1: 6414100192,
2: 5631500400,
3: 2487200875,
4: 1954400510,
5: 7237550310,
6: 1321400060,
7: 2008000270,
8: 2414600126,
9: 3793500160},
'lat': {0: 47.511200000000002,
1: 47.721000000000004,
2: 47.737900000000003,
3: 47.520800000000001,
4: 47.616799999999998,
5: 47.656100000000002,
6: 47.309699999999999,
7: 47.409500000000001,
8: 47.512300000000003,
9: 47.368400000000001},
'long': {0: -122.25700000000001,
1: -122.319,
2: -122.23299999999999,
3: -122.39299999999999,
4: -122.045,
5: -122.005,
6: -122.32700000000001,
7: -122.315,
8: -122.337,
9: -122.03100000000001},
'price': {0: 221900.0,
1: 538000.0,
2: 180000.0,
3: 604000.0,
4: 510000.0,
5: 1230000.0,
6: 257500.0,
7: 291850.0,
8: 229500.0,
9: 323000.0},
'sqft_above': {0: 1180,
1: 2170,
2: 770,
3: 1050,
4: 1680,
5: 3890,
6: 1715,
7: 1060,
8: 1050,
9: 1890},
'sqft_basement': {0: 0,
1: 400,
2: 0,
3: 910,
4: 0,
5: 1530,
6: 0,
7: 0,
8: 730,
9: 0},
'sqft_living': {0: 1180,
1: 2570,
2: 770,
3: 1960,
4: 1680,
5: 5420,
6: 1715,
7: 1060,
8: 1780,
9: 1890},
'sqft_living15': {0: 1340,
1: 1690,
2: 2720,
3: 1360,
4: 1800,
5: 4760,
6: 2238,
7: 1650,
8: 1780,
9: 2390},
'sqft_lot': {0: 5650,
1: 7242,
2: 10000,
3: 5000,
4: 8080,
5: 101930,
6: 6819,
7: 9711,
8: 7470,
9: 6560},
'sqft_lot15': {0: 5650,
1: 7639,
2: 8062,
3: 5000,
4: 7503,
5: 101930,
6: 6819,
7: 9711,
8: 8113,
9: 7570},
'view': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0},
'waterfront': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0},
'yr_built': {0: 1955,
1: 1951,
2: 1933,
3: 1965,
4: 1987,
5: 2001,
6: 1995,
7: 1963,
8: 1960,
9: 2003},
'yr_renovated': {0: 0,
1: 1991,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 0,
8: 0,
9: 0},
'zipcode': {0: 98178,
1: 98125,
2: 98028,
3: 98136,
4: 98074,
5: 98053,
6: 98003,
7: 98198,
8: 98146,
9: 98038}}
If i understand you in a right way and variable file contains your data in pandas dataframe then you simply faced with a problem of indexing that dataframe.
file[idx] corresponds to file.loc[idx] which means "give me a row with idx number in my dataframe" while you need a column instead of a row. Just replace it with file.loc[:,idx].
Check this link for mode details about indexing and selecting in pandas.