networkx referring to multi-dimensions and displaying data - python

new to network graphs so was hoping for a little guidance...
I'm trying to create a network graph between users collaborating with each other, my problem is I cannot figure out how to add multiple dimensions to the network.
At a high level I want to show:
User-to-User interactions
Add some sort of size indication of users who are collaborating more (via the edges, the more interactions the thicker the line between the two users).
Add color to the edges/lines indicating the project they worked on together
Add color to node based on user license type
So something like:
So far I have the following:
import pandas as pd
import networkx as nx
from pyvis.network import Network
df_dict = {'PROJECT': {0: 'Finance Project', 1: 'Finance Project', 2: 'Finance Project', 3: 'Finance Project', 4: 'Finance Project', 5: 'Finance Project', 6: 'Finance Project', 7: 'Finance Project', 8: 'Finance Project', 9: 'Finance Project', 10: 'Finance Project', 11: 'Finance Project', 12: 'HR Project', 13: 'Finance Project', 14: 'HR Project', 15: 'Finance Project'},
'PLAN': {0: 'COMPANY', 1: 'COMPANY', 2: 'COMPANY', 3: 'COMPANY', 4: 'COMPANY', 5: 'COMPANY', 6: 'COMPANY', 7: 'COMPANY', 8: 'COMPANY', 9: 'COMPANY', 10: 'COMPANY', 11: 'COMPANY', 12: 'COMPANY', 13: 'COMPANY', 14: 'COMPANY', 15: 'COMPANY'},
'USER_ONE': {0: 'Mike Jones', 1: 'Eminem', 2: 'Mike Jones', 3: 'Mike Jones', 4: 'Michael Jordan', 5: 'Eminem', 6: 'Michael Jordan', 7: 'Michael Jordan', 8: 'Mike Jones', 9: 'Kobe Bryant', 10: 'Eminem', 11: 'Elon Musk', 12: 'Bill Gates', 13: 'Elon Musk', 14: 'Mark Zuckerberg', 15: 'Elon Musk'},
'USER_ONE_LICENSE': {0: 'FULL', 1: 'FULL', 2: 'FULL', 3: 'FULL', 4: 'FULL', 5: 'FULL', 6: 'FULL', 7: 'FULL', 8: 'FULL', 9: 'OCCASIONAL', 10: 'FULL', 11: 'FULL', 12: 'FULL', 13: 'FULL', 14: 'FULL', 15: 'FULL'},
'USER_ONE_LICENSE_COLOR': {0: 'lightgreen', 1: 'lightgreen', 2: 'lightgreen', 3: 'lightgreen', 4: 'lightgreen', 5: 'lightgreen', 6: 'lightgreen', 7: 'lightgreen', 8: 'lightgreen', 9: 'gray', 10: 'lightgreen', 11: 'lightgreen', 12: 'lightgreen', 13: 'lightgreen', 14: 'lightgreen', 15: 'lightgreen'},
'USER_ONE_DAYS_COLLAB': {0: 88, 1: 55, 2: 67, 3: 1, 4: 70, 5: 54, 6: 2, 7: 114, 8: 4, 9: 1, 10: 10, 11: 19, 12: 5, 13: 11, 14: 100, 15: 13},
'USER_TWO': {0: 'Michael Jordan', 1: 'Mike Jones', 2: 'Eminem', 3: 'Kobe Bryant', 4: 'Eminem', 5: 'Michael Jordan', 6: 'Elon Musk', 7: 'Mike Jones', 8: 'Elon Musk', 9: 'Mike Jones', 10: 'Elon Musk', 11: 'Eminem', 12: 'Mark Zuckerberg', 13: 'Michael Jordan', 14: 'Bill Gates', 15: 'Mike Jones'},
'USER_TWO_LICENSE': {0: 'FULL', 1: 'FULL', 2: 'FULL', 3: 'OCCASIONAL', 4: 'FULL', 5: 'FULL', 6: 'FULL', 7: 'FULL', 8: 'FULL', 9: 'FULL', 10: 'FULL', 11: 'FULL', 12: 'FULL', 13: 'FULL', 14: 'FULL', 15: 'FULL'},
'USER_TWO_LICENSE_COLOR': {0: 'lightgreen', 1: 'lightgreen', 2: 'lightgreen', 3: 'gray', 4: 'lightgreen', 5: 'lightgreen', 6: 'lightgreen', 7: 'lightgreen', 8: 'lightgreen', 9: 'lightgreen', 10: 'lightgreen', 11: 'lightgreen', 12: 'lightgreen', 13: 'lightgreen', 14: 'lightgreen', 15: 'lightgreen'},
'USER_TWO_DAYS_COLLAB': {0: 114, 1: 67, 2: 55, 3: 1, 4: 54, 5: 70, 6: 11, 7: 88, 8: 13, 9: 1, 10: 19, 11: 10, 12: 100, 13: 2, 14: 5, 15: 4}
, 'TOTAL_COLLABS': {0: 202, 1: 122, 2: 122, 3: 2, 4: 124, 5: 124, 6: 13, 7: 202, 8: 17, 9: 2, 10: 29, 11: 29, 12: 105, 13: 13, 14: 105, 15: 17}}
df = pd.DataFrame(df_dict)
# where do I add all the other attributes?
#i.e. license type, project, # of interactions (I'm assuming this can be something like weights?)
#In my case I believe my source + target needs to be Project + User?
G = nx.from_pandas_edgelist(df
,source='USER_ONE'
,target='USER_TWO' #I tried ['PROJECT','USER_TWO']
)
net = Network(notebook=True)
net.from_nx(G)
net.show_buttons(filter_=True)
net.show('example4.html')
All the examples I've seen only have one source and one target - mine needs user + project for both source and target. Is there a way to do this without creating one field that combines both?
Haven't found a clear way to color nodes , the example provided shows a case on the node value (in my case the node is just text, i have another dimension i want to refer to to dictate color)
Haven't found a clear way to build a case statement on edge width either. Concretely:
if count of interactions <= 1 then "small width"
if count of interactions > 1 and <=5 then "medium width"
etc...
Any direction or resources would be greatly appreciated -- everything I come across seems to be different than my setup leaving me unsure how to proceed.
my table looks something like this for reference:

Related

How to Fix Code to Avoid Stubnames Error (Python Pandas)?

dt = {'id': {0: 'x1', 1: 'x2', 2: 'x3', 3: 'x4', 4: 'x5', 5: 'x6', 6: 'x7', 7: 'x8', 8: 'x9', 9: 'x10'}, 'trt': {0: 'cnt', 1: 'cnt', 2: 'tr', 3: 'tr', 4: 'tr', 5: 'cnt', 6: 'tr', 7: 'tr', 8: 'cnt', 9: 'cnt'}, 'work.T1': {0: 0.6516556669957936, 1: 0.567737752571702, 2: 0.1135089821182191, 3: 0.5959253052715212, 4: 0.3580499750096351, 5: 0.4288094183430075, 6: 0.0519033221062272, 7: 0.2641776674427092, 8: 0.3987907308619469, 9: 0.8361341434065253}, 'play.T1': {0: 0.8647212258074433, 1: 0.6153524168767035, 2: 0.7751098964363337, 3: 0.3555686913896352, 4: 0.4058499720413238, 5: 0.7066469138953835, 6: 0.8382876652758569, 7: 0.2395891312044114, 8: 0.7707715332508087, 9: 0.3558977444190532}, 'talk.T1': {0: 0.5355970377568156, 1: 0.0930881295353174, 2: 0.169803041499108, 3: 0.8998324507847428, 4: 0.4226376069709658, 5: 0.7477464678231627, 6: 0.8226525799836963, 7: 0.9546536463312804, 8: 0.6854445093777031, 9: 0.5005032296758145}, 'work.T2': {0: 0.2754838624969125, 1: 0.2289039448369294, 2: 0.0144339059479534, 3: 0.7289645625278354, 4: 0.2498804717324674, 5: 0.1611832766793668, 6: 0.0170426501426845, 7: 0.4861003451514989, 8: 0.1029001718852669, 9: 0.8015470046084374}, 'play.T2': {0: 0.3543280649464577, 1: 0.9364325392525644, 2: 0.2458663922734558, 3: 0.4731414613779634, 4: 0.191560871200636, 5: 0.5832219698932022, 6: 0.4594731898978352, 7: 0.467434047954157, 8: 0.3998325555585325, 9: 0.5052855962421745}, 'talk.T2': {0: 0.0318881559651345, 1: 0.1144675880204886, 2: 0.468935475917533, 3: 0.3969867376144975, 4: 0.8336191941052675, 5: 0.7611217433586717, 6: 0.5733564489055425, 7: 0.447508045937866, 8: 0.0838020080700516, 9: 0.2191385473124683}}
mydt = pd.DataFrame(dt, columns = ['id', 'trt', 'work.T1', '', 'play.T1', 'talk.T1','work.T2', '', 'play.T2', 'talk.T2'])
So I have the above dataset and need to tidy it up. I have used the following code but it returns "ValueError: stubname can't be identical to a column name." How can I fix the code to avoid this problem?
names = ['play', 'talk', 'work']
activities = pd.wide_to_long(dt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
activities
Note: I am trying to get the dataframe to look like the following.
Changed :
activities = pd.wide_to_long(activities, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
To:
activities = pd.wide_to_long(mydt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
and then it works.

Multiple plotting from multi-index dataframe

I want to plot by Step Typ: Traction and Stribeck. The different load stages should have his own plot. At the respective load level, the line plots should be broken down by temperature. y-axis is Traction (-) and x-axis the respective counterpart SRR (%) or Rolling speed (mm/s) (for Traction and Stribeck respectively). At the end, I should have four different plots.
Example, how it should look like:
My attempt so far, which leads to an empty plot.
import pandas as pd
import matplotlib.pyplot as plt
data = {'Step 1': {'Step Typ': 'Traction', 'SRR (%)': {1: 8.384, 2: 9.815, 3: 7.531, 4: 10.209, 5: 7.989, 6: 7.331, 7: 5.008, 8: 2.716, 9: 9.6, 10: 7.911}, 'Traction (-)': {1: 5.602, 2: 6.04, 3: 2.631, 4: 2.952, 5: 8.162, 6: 9.312, 7: 4.994, 8: 2.959, 9: 10.075, 10: 5.498}, 'Temperature': 30, 'Load': 40}, 'Step 3': {'Step Typ': 'Traction', 'SRR (%)': {1: 2.909, 2: 5.552, 3: 5.656, 4: 9.043, 5: 3.424, 6: 7.382, 7: 3.916, 8: 2.665, 9: 4.832, 10: 3.993}, 'Traction (-)': {1: 9.158, 2: 6.721, 3: 7.787, 4: 7.491, 5: 8.267, 6: 2.985, 7: 5.882, 8: 3.591, 9: 6.334, 10: 10.43}, 'Temperature': 80, 'Load': 40}, 'Step 5': {'Step Typ': 'Traction', 'SRR (%)': {1: 4.765, 2: 9.293, 3: 7.608, 4: 7.371, 5: 4.87, 6: 4.832, 7: 6.244, 8: 6.488, 9: 5.04, 10: 2.962}, 'Traction (-)': {1: 6.656, 2: 7.872, 3: 8.799, 4: 7.9, 5: 4.22, 6: 6.288, 7: 7.439, 8: 7.77, 9: 5.977, 10: 9.395}, 'Temperature': 30, 'Load': 70}, 'Step 7': {'Step Typ': 'Traction', 'SRR (%)': {1: 9.46, 2: 2.83, 3: 3.249, 4: 9.273, 5: 8.792, 6: 9.673, 7: 6.784, 8: 3.838, 9: 8.779, 10: 4.82}, 'Traction (-)': {1: 5.245, 2: 8.491, 3: 10.088, 4: 9.988, 5: 4.886, 6: 4.168, 7: 8.628, 8: 5.038, 9: 7.712, 10: 3.961}, 'Temperature': 80, 'Load': 70}, 'Step 2': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.862, 2: 4.71, 3: 4.537, 4: 6.35, 5: 6.691, 6: 5.337, 7: 8.419, 8: 10.303, 9: 5.018, 10: 10.195}, 'Traction (-)': {1: 6.674, 2: 10.137, 3: 2.822, 4: 5.494, 5: 9.986, 6: 9.095, 7: 3.53, 8: 6.96, 9: 8.251, 10: 7.836}, 'Temperature': 30, 'Load': 40}, 'Step 4': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.04, 2: 8.288, 3: 3.731, 4: 10.137, 5: 5.32, 6: 8.504, 7: 5.917, 8: 9.677, 9: 8.641, 10: 7.685}, 'Traction (-)': {1: 9.522, 2: 4.749, 3: 3.46, 4: 3.21, 5: 5.005, 6: 9.886, 7: 8.023, 8: 5.935, 9: 8.74, 10: 5.117}, 'Temperature': 80, 'Load': 40}, 'Step 6': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 6.244, 2: 7.015, 3: 5.998, 4: 4.894, 5: 6.117, 6: 6.644, 7: 7.619, 8: 10.477, 9: 9.61, 10: 2.958}, 'Traction (-)': {1: 7.353, 2: 7.98, 3: 6.675, 4: 8.853, 5: 7.537, 6: 5.256, 7: 4.923, 8: 10.293, 9: 2.873, 10: 10.407}, 'Temperature': 30, 'Load': 70}, 'Step 8': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 3.475, 2: 2.756, 3: 7.809, 4: 9.449, 5: 2.72, 6: 4.133, 7: 10.139, 8: 10.0, 9: 3.71, 10: 8.267}, 'Traction (-)': {1: 6.307, 2: 2.83, 3: 9.258, 4: 3.405, 5: 9.659, 6: 6.662, 7: 6.413, 8: 6.488, 9: 7.972, 10: 6.288}, 'Temperature': 80, 'Load': 70} }
df = pd.DataFrame(data)
items = list()
series = list()
for item, d in data.items():
items.append(item)
series.append(pd.DataFrame.from_dict(d))
df = pd.concat(series, keys=items)
df.set_index(['Step Typ', 'Load', 'Temperature'], inplace=True)
df.loc[('Stribeck')]
for force, _ in df.groupby(level=1):
plt.figure(figsize=(15, 12))
for i, row in df.loc[('Traction'), force].iterrows():
plt.ylim(0, 0.1)
plt.ylabel('Traction Coeff (-)')
plt.xlabel('Rolling Speed (mm/s)')
plt.title('Title comes later', loc='left')
plt.plot(row['Rolling Speed (mm/s)'], row['Traction (-)'], label=f"{i} - {force}")
print(f"{i} - {force}")
plt.show()
I have changed your plotting loop. The code below will generate two plots for Traction (one for each Load value), where each has two curves (one for each temperature). I have commented the line where you set the ylim(a, b) because this could lead to empty plot if data fall out of (a, b) range.
import pandas as pd
import matplotlib.pyplot as plt
data = {'Step 1': {'Step Typ': 'Traction', 'SRR (%)': {1: 8.384, 2: 9.815, 3: 7.531, 4: 10.209, 5: 7.989, 6: 7.331, 7: 5.008, 8: 2.716, 9: 9.6, 10: 7.911}, 'Traction (-)': {1: 5.602, 2: 6.04, 3: 2.631, 4: 2.952, 5: 8.162, 6: 9.312, 7: 4.994, 8: 2.959, 9: 10.075, 10: 5.498}, 'Temperature': 30, 'Load': 40}, 'Step 3': {'Step Typ': 'Traction', 'SRR (%)': {1: 2.909, 2: 5.552, 3: 5.656, 4: 9.043, 5: 3.424, 6: 7.382, 7: 3.916, 8: 2.665, 9: 4.832, 10: 3.993}, 'Traction (-)': {1: 9.158, 2: 6.721, 3: 7.787, 4: 7.491, 5: 8.267, 6: 2.985, 7: 5.882, 8: 3.591, 9: 6.334, 10: 10.43}, 'Temperature': 80, 'Load': 40}, 'Step 5': {'Step Typ': 'Traction', 'SRR (%)': {1: 4.765, 2: 9.293, 3: 7.608, 4: 7.371, 5: 4.87, 6: 4.832, 7: 6.244, 8: 6.488, 9: 5.04, 10: 2.962}, 'Traction (-)': {1: 6.656, 2: 7.872, 3: 8.799, 4: 7.9, 5: 4.22, 6: 6.288, 7: 7.439, 8: 7.77, 9: 5.977, 10: 9.395}, 'Temperature': 30, 'Load': 70}, 'Step 7': {'Step Typ': 'Traction', 'SRR (%)': {1: 9.46, 2: 2.83, 3: 3.249, 4: 9.273, 5: 8.792, 6: 9.673, 7: 6.784, 8: 3.838, 9: 8.779, 10: 4.82}, 'Traction (-)': {1: 5.245, 2: 8.491, 3: 10.088, 4: 9.988, 5: 4.886, 6: 4.168, 7: 8.628, 8: 5.038, 9: 7.712, 10: 3.961}, 'Temperature': 80, 'Load': 70}, 'Step 2': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.862, 2: 4.71, 3: 4.537, 4: 6.35, 5: 6.691, 6: 5.337, 7: 8.419, 8: 10.303, 9: 5.018, 10: 10.195}, 'Traction (-)': {1: 6.674, 2: 10.137, 3: 2.822, 4: 5.494, 5: 9.986, 6: 9.095, 7: 3.53, 8: 6.96, 9: 8.251, 10: 7.836}, 'Temperature': 30, 'Load': 40}, 'Step 4': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 4.04, 2: 8.288, 3: 3.731, 4: 10.137, 5: 5.32, 6: 8.504, 7: 5.917, 8: 9.677, 9: 8.641, 10: 7.685}, 'Traction (-)': {1: 9.522, 2: 4.749, 3: 3.46, 4: 3.21, 5: 5.005, 6: 9.886, 7: 8.023, 8: 5.935, 9: 8.74, 10: 5.117}, 'Temperature': 80, 'Load': 40}, 'Step 6': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 6.244, 2: 7.015, 3: 5.998, 4: 4.894, 5: 6.117, 6: 6.644, 7: 7.619, 8: 10.477, 9: 9.61, 10: 2.958}, 'Traction (-)': {1: 7.353, 2: 7.98, 3: 6.675, 4: 8.853, 5: 7.537, 6: 5.256, 7: 4.923, 8: 10.293, 9: 2.873, 10: 10.407}, 'Temperature': 30, 'Load': 70}, 'Step 8': {'Step Typ': 'Stribeck', 'Rolling Speed (mm/s)': {1: 3.475, 2: 2.756, 3: 7.809, 4: 9.449, 5: 2.72, 6: 4.133, 7: 10.139, 8: 10.0, 9: 3.71, 10: 8.267}, 'Traction (-)': {1: 6.307, 2: 2.83, 3: 9.258, 4: 3.405, 5: 9.659, 6: 6.662, 7: 6.413, 8: 6.488, 9: 7.972, 10: 6.288}, 'Temperature': 80, 'Load': 70} }
df = pd.DataFrame(data)
items = list()
series = list()
for item, d in data.items():
items.append(item)
series.append(pd.DataFrame.from_dict(d))
df = pd.concat(series, keys=items)
df.set_index(['Step Typ', 'Load', 'Temperature'], inplace=True)
for force, _ in df.groupby(level=1):
fig, ax = plt.subplots(figsize=(8, 6))
df_step = df.loc[('Traction'), force]
for temperature in df_step.index.unique():
df_temp = df_step.loc[temperature].sort_values('SRR (%)')
# ax.set_ylim(0, 0.1)
ax.set_ylabel('Traction Coeff (-)')
ax.set_xlabel('SRR (%)')
ax.set_title('Title comes later', loc='left')
ax.plot(df_temp['SRR (%)'], df_temp['Traction (-)'], label = f'T = {df_temp.index.unique().values[0]}°C - Load = {force}')
ax.legend(frameon = True)
plt.show()

Python: looping through 2 dataframes having thresholds and calculating revenue, stuck

I am trying to solve a business problem using Python but have difficulties to come up with a script to solve it. I have tried to loop through the dataframe using df.iterrows() but I am totally stuck because I just don't know how to proceed.
We process volumes in production orders of 1 type of resource that we need to process FIFO (first in first out). Each lot has a certain volume and price, after using up a lot we start with the next lot (FIFO).
Question: How can I automate the calculation of column Revenu ? Can you come up with some Python code that I can use to automate this process? Would you use a while or for loop, and would you iterate through the dataframe?
Herebelow I posted a print screen of the solution, on the left the production orders and on the right the volume and price per lot.
Below the image I posted 2 dictionaries containing the data of the screenshot.
Would really appreciate your help...
{'Productionorder': {0: 'Productionorder 1',
1: 'Productionorder 2',
2: 'Productionorder 3',
3: 'Productionorder 4',
4: 'Productionorder 5',
5: 'Productionorder 6',
6: 'Productionorder 7',
7: 'Productionorder 8',
8: 'Productionorder 9',
9: 'Productionorder 10',
10: 'Productionorder 11',
11: 'Productionorder 12',
12: 'Productionorder 13',
13: 'Productionorder 14',
14: 'Productionorder 15',
15: 'Productionorder 16',
16: 'Productionorder 17',
17: 'Productionorder 18',
18: 'Productionorder 19',
19: 'Productionorder 20',
20: 'Productionorder 21',
21: 'Productionorder 22'},
'Processed volume': {0: 810,
1: 3240,
2: 3177,
3: 1620,
4: 6480,
5: 5120,
6: 10880,
7: 13770,
8: 21060,
9: 4860,
10: 810,
11: 1620,
12: 15390,
13: 15390,
14: 6800,
15: 4480,
16: 10200,
17: 16650,
18: 2550,
19: 9050,
20: 9900,
21: 3200},
'Lotno.': {0: 1,
1: 1,
2: 1,
3: 1,
4: 2,
5: 2,
6: 2,
7: 2,
8: 2,
9: 2,
10: 2,
11: 2,
12: 2,
13: 3,
14: 3,
15: 3,
16: 3,
17: 3,
18: 3,
19: 3,
20: 4,
21: 4},
'Left of Lotno.': {0: 8490,
1: 5250,
2: 2073,
3: 453,
4: 75973,
5: 70853,
6: 59973,
7: 46203,
8: 25143,
9: 20283,
10: 19473,
11: 17853,
12: 2463,
13: 52073,
14: 45273,
15: 40793,
16: 30593,
17: 13943,
18: 11393,
19: 2343,
20: 38443,
21: 35243},
'Revenu': {0: 1741.5,
1: 6966.0,
2: 6830.549999999999,
3: 3483.0,
4: 10315.800000000001,
5: 7936.0,
6: 16864.0,
7: 21343.5,
8: 32643.0,
9: 7533.0,
10: 1255.5,
11: 2511.0,
12: 23854.5,
13: 20622.750000000004,
14: 8840.0,
15: 5824.0,
16: 13260.0,
17: 21645.0,
18: 3315.0,
19: 11765.0,
20: 12492.15,
21: 4000.0}}
{'Date': {0: Timestamp('2021-01-01 00:00:00'),
1: Timestamp('2021-01-02 00:00:00'),
2: Timestamp('2021-01-03 00:00:00'),
3: Timestamp('2021-01-04 00:00:00')},
'Lotno.': {0: 1, 1: 2, 2: 3, 3: 4},
'Volume': {0: 9300, 1: 82000, 2: 65000, 3: 46000},
'Price': {0: 2.15, 1: 1.55, 2: 1.3, 3: 1.25}}
Assuming you have two dataframes:
One for the Production Orders
And another for the Lot Details
The following function should allow you to calculate the Revenues (Along with the 'Lotno.' and 'Left of Lotno.' intermediary columns)
Requirements for each dataframe:
The Production Orders DataFrame must:
contain a column with the title 'Processed volume'
the index should be of consecutive integers starting at 0.
The Lot Details must:
contain the Columns ['Lotno.', 'Volume', 'Price']
have at least one row
rows should be ordered in the order of expected depletion.
In the event that the Quantity available in the lot is depleted, no additional revenue will be generated.
def fill_revenue(df1_orig, df2):
"""
df1_orig is the Production Orders DataFrame
df2 is the Lot Details DataFrame
The returned DataFrame is based on a copy of the df1_orig
"""
df1 = df1_orig.copy()
# Create Empty Columns for calculated fields
df1['Lotno.'] = None
df1['Left of Lotno.'] = None
df1['Revenu'] = None
def recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict=None):
"""A function used to update the new values of a row"""
if return_dict is None:
return_dict = {'Revenu': 0}
return_dict.update({'Lotno.': current_lot, 'Left of Lotno.': current_lot_quantity})
lot_info = df2.loc[df2['Lotno.'] == current_lot].iloc[0]
# start calculation
if current_lot_quantity > order_volume:
return_dict['Revenu'] += order_volume * lot_info['Price']
current_lot_quantity -= order_volume
order_volume = 0
return_dict['Left of Lotno.'] = current_lot_quantity
else:
return_dict['Revenu'] += current_lot_quantity * lot_info['Price']
order_volume -= current_lot_quantity
try:
lot_info = df2.iloc[df2.index.get_loc(lot_info.name) + 1]
except IndexError:
return_dict['Left of Lotno.'] = 0
return return_dict
current_lot = lot_info['Lotno.']
current_lot_quantity = lot_info['Volume']
recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict)
return return_dict
# updating each row of the Production Orders DataFrame
for idx, row in df1.iterrows():
order_volume = row['Processed volume']
current_lot = df2.iloc[0]['Lotno.'] if idx == 0 else df1.iloc[idx - 1]['Lotno.']
current_lot_quantity = df2.iloc[0]['Volume'] if idx == 0 else df1.iloc[idx - 1]['Left of Lotno.']
update_dict = recursive_revenu_calc(order_volume, current_lot, current_lot_quantity)
for key, value in update_dict.items():
df1.loc[idx, key] = value
return df1

Change size of scatterplot marker based on column value - Python 3.6.x

I have a dataset that looks like this:
{'ScoreDate': {0: '12/1/2019',
1: '1/1/2020',
2: '2/1/2020',
3: '3/1/2020',
4: '4/1/2020',
5: '5/1/2020',
6: '6/1/2020',
7: '7/1/2020',
8: '7/1/2020',
9: '7/1/2020',
10: '7/1/2020',
11: '7/1/2020',
12: '7/1/2020',
13: '8/1/2020',
14: '8/1/2020',
15: '8/1/2020',
16: '8/1/2020',
17: '8/1/2020',
18: '9/1/2020'},
'CustomerID': {0: 4554,
1: 4554,
2: 4554,
3: 4554,
4: 4554,
5: 4554,
6: 4554,
7: 4554,
8: 4554,
9: 4554,
10: 4554,
11: 4554,
12: 4554,
13: 4554,
14: 4554,
15: 4554,
16: 4554,
17: 4554,
18: 4554},
'Supplier_Name': {0: 'ABC Company',
1: 'ABC Company',
2: 'ABC Company',
3: 'ABC Company',
4: 'ABC Company',
5: 'ABC Company',
6: 'ABC Company',
7: 'ABC Company',
8: 'ABC Company',
9: 'ABC Company',
10: 'ABC Company',
11: 'ABC Company',
12: 'ABC Company',
13: 'ABC Company',
14: 'ABC Company',
15: 'ABC Company',
16: 'ABC Company',
17: 'ABC Company',
18: 'ABC Company'},
'Score': {0: 90,
1: 90,
2: 90,
3: 75,
4: 75,
5: 75,
6: 90,
7: 90,
8: 90,
9: 90,
10: 90,
11: 90,
12: 90,
13: 90,
14: 90,
15: 90,
16: 90,
17: 90,
18: 90},
'EDate': {0: nan,
1: nan,
2: nan,
3: nan,
4: '4/1/2020',
5: nan,
6: '6/1/2020',
7: '7/1/2020',
8: '7/1/2020',
9: '7/1/2020',
10: '7/1/2020',
11: '7/1/2020',
12: '7/1/2020',
13: '8/1/2020',
14: '8/1/2020',
15: '8/1/2020',
16: '8/1/2020',
17: '8/1/2020',
18: nan}}
And some code to produce a line plot of the Score with markers for each EDate:
size = 15
params = {'legend.fontsize': 'large',
'figure.figsize': (20,8),
'axes.labelsize': size,
'axes.titlesize': size,
'xtick.labelsize': size*0.75,
'ytick.labelsize': size*0.75,
'axes.titlepad': 25}
plt.figure(figsize=(10,5))
sns.set(style="darkgrid")
plt.rcParams.update(params)
sns.lineplot(data=df, x='ScoreDate', y='Score', ci=None,
linewidth=2, palette="deep").set(title="Score")
sns.scatterplot(data=df, x='EDate', y='Score', color='orange')
Which produces:
I am looking to accomplish:
Setting the marker size equal to how many EDates (events) occurred for that date
I have successfully grouped the data using:
c_df = df.groupby(['ScoreDate', 'Score'])['EDate'].count().reset_index(name='count')
size = 15
params = {'legend.fontsize': 'large',
'figure.figsize': (20,8),
'axes.labelsize': size,
'axes.titlesize': size,
'xtick.labelsize': size*0.75,
'ytick.labelsize': size*0.75,
'axes.titlepad': 25}
plt.figure(figsize=(10,5))
sns.set(style="darkgrid")
plt.rcParams.update(params)
sns.lineplot(data=c_df, x='ScoreDate', y='Score', ci=None,
linewidth=2, palette="deep").set(title="Score")
sns.scatterplot(data=c_df, x='ScoreDate', y='count', color='orange')
Which produces:
Which is clearly not what I am looking for. How can I accomplish my three objectives?
I believe you're looking for the size parameter:
sns.lineplot(data=df, x='ScoreDate', y='Score', ci=None,
linewidth=2, palette="deep").set(title="Score")
sns.scatterplot(data=c_df, x='ScoreDate', y='Score', size='count', color='orange')
Output:
Note: You can also specify the sizes (e.g. sizes=[0,30,60,90]) parameter to manually set the desired sizes for each count group. So, for example:
See that marker sizes is different (the zeros, for example, not show at all). Alternatively, you can just filter them out from c_df with c_df.query('count>0') for plotting.

How to remove duplicates based on lower frequency [duplicate]

This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 2 years ago.
I have a table that looks like this
I want to be able to keep ids for brands that have highest freq. For example in case of audi both ids have same frequencies so keep only one. In case of mercedes-benz keep the latter one since it has frequency 7.
This is my dataframe:
{'Brand':
{0: 'audi',
1: 'audi',
2: 'bmw',
3: 'dacia',
4: 'fiat',
5: 'ford',
6: 'ford',
7: 'honda',
8: 'honda',
9: 'hyundai',
10: 'kia',
11: 'mercedes-benz',
12: 'mercedes-benz',
13: 'nissan',
14: 'nissan',
15: 'opel',
16: 'renault',
17: 'renault',
18: 'renault',
19: 'renault',
20: 'toyota',
21: 'toyota',
22: 'volvo',
23: 'vw',
24: 'vw',
25: 'vw',
26: 'vw'},
'id':
{0: 'audi_a4_dynamic_2016_otomatik',
1: 'audi_a6_standart_2015_otomatik',
2: 'bmw_5 series_executive_2016_otomatik',
3: 'dacia_duster_laureate_2017_manuel',
4: 'fiat_egea_easy_2017_manuel',
5: 'ford_focus_trend x_2015_manuel',
6: 'ford_focus_trend x_2015_otomatik',
7: 'honda_civic_eco elegance_2017_otomatik',
8: 'honda_cr-v_executive_2018_otomatik',
9: 'hyundai_tucson_elite plus_2017_otomatik',
10: 'kia_sportage_concept plus_2015_otomatik',
11: 'mercedes-benz_c-class_amg_2016_otomatik',
12: 'mercedes-benz_e-class_edition e_2015_otomatik',
13: 'nissan_qashqai_black edition_2014_manuel',
14: 'nissan_qashqai_sky pack_2015_otomatik',
15: 'opel_astra_edition_2016_manuel',
16: 'renault_clio_joy_2016_manuel',
17: 'renault_kadjar_icon_2015_otomatik',
18: 'renault_kadjar_icon_2016_otomatik',
19: 'renault_mégane_touch_2017_otomatik',
20: 'toyota_corolla_touch_2015_otomatik',
21: 'toyota_corolla_touch_2016_otomatik',
22: 'volvo_s60_advance_2018_otomatik',
23: 'vw_jetta_comfortline_2013_otomatik',
24: 'vw_passat_highline_2017_otomatik',
25: 'vw_tiguan_sport&style_2012_manuel',
26: 'vw_tiguan_sport&style_2013_manuel'},
'freq': {0: 4,
1: 4,
2: 7,
3: 4,
4: 4,
5: 4,
6: 4,
7: 4,
8: 4,
9: 4,
10: 4,
11: 4,
12: 7,
13: 4,
14: 4,
15: 4,
16: 4,
17: 4,
18: 4,
19: 4,
20: 4,
21: 4,
22: 4,
23: 4,
24: 7,
25: 4,
26: 4}}
Edit: tried one of the answers and got an extra level of header
You need to pandas.groupby Brand and then aggregate with respect to the maximal frequency.
Something like this should work:
df.groupby('Brand')[['id', 'freq']].agg({'freq': 'max'})
To get your result, run:
result = df.groupby('Brand', as_index=False).apply(
lambda grp: grp[grp.freq == grp.freq.max()].iloc[0])

Categories

Resources