Here is my pandas code:
def calcObj(row):
d = dict(calc1 = iferror(row.Hours1.sum(), row.Hours2.sum(), 0, '+'))
if row.Process == 'A': # this doesn't work
d['ProcessKey'] = 700
else:
d['ProcessKey'] = 500
return pd.Series(d)
df.groupby(['MainProcess']).apply(calcObj)
I am trying to check if a process name is A and if it is return a different value.
Unfortunately it doesn't work and i get the following error:
AttributeError: 'DataFrame' object has no attribute 'Process '
I assume it's because i am not grouping by process only by MainProcess.
Is there any way to get access to this item within the apply function ? Any other work-around would also be very helful
Here is my example dataframe, BG/MainProcess, CoreProcess and Process1 are indexes, Hours1/Hours2 are columns :
Bg MainProcess CoreProcess Process Hours1 Hours2
Building1 MainProcess-1 CoreProcess-1 S-Process-1 150 250
S-Process-2 150 250
CoreProcess-2 S-Process-3 150 250
S-Process-1 150 250
S-Process-2 150 250
Building2 MainProcess-2 CoreProcess-3 S-Process-1 150 250
S-Process-2 150 250
MainProcess-3 CoreProcess-4 S-Process-1 150 250
S-Process-2 150 250
S-Process-3 150 250
Beware, the columns in the index are not columns of the DataFrame!
In your example, as Process is in the (multi-) index, df['Process'] will raise a KeyError independently of the groupby. You must reset the column from the index to be able to use it. For example you could reset it before the groupby:
df.reset_index(level='Process').groupby(['MainProcess']).apply(calcObj)
But beware: calcObj will not receive rows here but the sub-dataframes having same values in the MainProcess column...
Related
I am trying to create levels of managers within a dataset I have. It looks similar to this:
EID ReporngManager ManagerLevel1 Manager Level2 ManagerLevel3
123 201 101 201 301
124 101 101 204 306
125 401 101 206 304
The "EID" is the employee the Reporting manager the is ID of who they report to and the Mangers Levels starting at 1 is the highest level manager to 3 being a lower level manager. What I want is to be able to create another column that ranks the level of the manager's ID.
Something like this:
EID ReportingManager ManagerLevel1 Managerevel2 ManagerLevel3 ReportingManagerLevel
123 201 101 201 301 1
124 101 101 204 306 0
125 401 101 206 304 3
The idea is to see how far the reporting manager is away from the top level. If the Reporting manager is the top then 0 and everyone that reports to him would be a 1. if the EID is reporting to the level 2 manager then that manager is 1 away from the top manager and all the EIDs would then be 2 away from the top. So far I have been just working on getting the managers' levels figured out but run into an issue of all managers having a Manager level of 3.
My code looks like this:
manager_level = []
num = list(df['ID'])
for num in df['ReportingManager']:
if num is df['ManagerLevel1']
manager_level.append('0')
elif num is df['ManagerLeve2']:
manager_level.append('1')
elif num is df['ManagerLevel3']:
manager_level.append('2')
else:
manager_level.append('3')
df['Manager_Level'] = manager_level
Note: the 'df['postitonNum'] contains the ID of all the managers and employees.
Reproduced you df with this:
import pandas as pd
data={
"EID":[123,124,125],
"ReportingManager": [201,101,401],
"ManagerLevel1": [101, 101, 101],
"Managerevel2": [201, 204, 206],
"ManagerLevel3": [301, 306,304],
}
df = pd.DataFrame(data=data)
I suggest leveraging the report numbers themselves. 101 = 0, 201 = 1 and so on. Assuming you use pandas based on the df variable and dataframe tag you can use the apply method as such:
import math
df["ReportingManagerLevel"] = df["ReportingManager"].apply(lambda x: math.floor(x/100)) -1
This will take the values of the Reporting Manager and find the starting number, then take away 1. This would mean that if you had a manager with the ID 502 it would get the value 4. If this is something you would like to avoid you could always use the modulo operator.
Insted of use in you need to use the equality operator == to compare the values in the columns.
You can try with this code :
manager_level = []
for i, row in df.iterrows():
if row['ReportingManager'] == row['ManagerLevel1']:
manager_level.append(0)
elif row['ReportingManager'] == row['ManagerLevel2']:
manager_level.append(1)
elif row['ReportingManager'] == row['ManagerLevel3']:
manager_level.append(2)
else:
manager_level.append(3)
df['ReportingManagerLevel'] = manager_level
I am trying to sum dublicate rows in the amount column like shown in the screenshot:
So if report_name, line_item and column_item are the same I want to sum the amounts in the amount row and create one row instead of two but without losing the structure of the dataframe.
But I don't want to sum dublicates if they have column_item 50 or 30.
This is my data frame:
entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0050;0010;UNDEFINED;n/a;40409261.0100539;22/03/2022
456;test;456;C_74_00_a;0040;0010;UNDEFINED;n/a;46860662.1948734;22/03/2022
456;test;456;C_74_00_a;0060;0010;UNDEFINED;n/a;1783648.53838003;22/03/2022
456;test;456;C_74_00_a;0070;0010;UNDEFINED;n/a;7847645.76582712;22/03/2022
456;test;456;C_73_00_a;0310;0010;UNDEFINED;n/a;48100909.2077918;22/03/2022
456;test;456;C_74_00_a;0201;0010;UNDEFINED;n/a;45652287.0078367;22/03/2022
456;test;456;C_72_00_a;0590;0010;UNDEFINED;n/a;19988230.281333;22/03/2022
456;test;456;C_73_00_a;0480;0010;UNDEFINED;n/a;28243908.6235795;22/03/2022
456;test;456;C_73_00_a;0490;0010;UNDEFINED;n/a;12655653.8647408;22/03/2022
456;test;456;C_73_00_a;0530;0010;UNDEFINED;n/a;27792100.4510517;22/03/2022
456;test;456;C_73_00_a;0570;0010;UNDEFINED;n/a;20768476.5051213;22/03/2022
456;test;456;C_73_00_a;0480;0010;UNDEFINED;n/a;28601515.4535418;22/03/2022
456;test;456;C_73_00_a;0490;0010;UNDEFINED;n/a;17269663.9202129;22/03/2022
456;test;456;C_73_00_a;0530;0010;UNDEFINED;n/a;21250486.2477187;22/03/2022
456;test;456;C_73_00_a;0570;0010;UNDEFINED;n/a;12924566.8399212;22/03/2022
456;test;456;C_73_00_a;0110;0010;UNDEFINED;n/a;17299383.641137;22/03/2022
456;test;456;C_73_00_a;0035;0010;UNDEFINED;n/a;19054145.8837998;22/03/2022
456;test;456;C_72_00_a;0280;0010;UNDEFINED;n/a;294348.91379545;22/03/2022
456;test;456;C_73_00_a;0340;0010;UNDEFINED;n/a;40803729.9712868;22/03/2022
456;test;456;C_74_00_a;0240;0010;UNDEFINED;n/a;25387904.3875074;22/03/2022
456;test;456;C_73_00_a;0340;0010;UNDEFINED;n/a;6951075.43742419;22/03/2022
456;test;456;C_74_00_a;0240;0010;UNDEFINED;n/a;12298844.1430509;22/03/2022
456;test;456;C_72_00_a;0040;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0050;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0060;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0070;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0090;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0110;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0240;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0260;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0080;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0100;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0120;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0130;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0140;0030;UNDEFINED;n/a;0.95;22/03/2022
456;test;456;C_72_00_a;0150;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0170;0030;UNDEFINED;n/a;0.8;22/03/2022
456;test;456;C_72_00_a;0190;0030;UNDEFINED;n/a;0.93;22/03/2022
456;test;456;C_72_00_a;0200;0030;UNDEFINED;n/a;0.88;22/03/2022
456;test;456;C_72_00_a;0250;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0270;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0280;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0290;0030;UNDEFINED;n/a;0.8;22/03/2022
456;test;456;C_72_00_a;0320;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0330;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0340;0030;UNDEFINED;n/a;0.7;22/03/2022
456;test;456;C_72_00_a;0350;0030;UNDEFINED;n/a;0.65;22/03/2022
456;test;456;C_72_00_a;0360;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0370;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0380;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0390;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0400;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0410;0030;UNDEFINED;n/a;0.7;22/03/2022
456;test;456;C_72_00_a;0420;0030;UNDEFINED;n/a;0.65;22/03/2022
456;test;456;C_72_00_a;0430;0030;UNDEFINED;n/a;0.6;22/03/2022
456;test;456;C_72_00_a;0440;0030;UNDEFINED;n/a;0.45;22/03/2022
456;test;456;C_72_00_a;0450;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0460;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0040;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0070;0050;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_73_00_a;0090;0050;UNDEFINED;n/a;0.03;22/03/2022
456;test;456;C_73_00_a;0110;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0260;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0310;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0480;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0490;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0530;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0570;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0590;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0080;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0140;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0150;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0170;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0190;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0200;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0250;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0280;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0290;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0360;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0370;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0380;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0390;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0400;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0420;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0430;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0450;0050;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_73_00_a;0035;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0180;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0204;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0206;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0207;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0220;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0230;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0300;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0510;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0520;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0540;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0560;0050;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0600;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;0610;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0630;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0640;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0660;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0670;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;0680;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0700;0050;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0710;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0890;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0900;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0913;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0914;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0915;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0916;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0917;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0918;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0940;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0950;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0960;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0970;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0980;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0990;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1000;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1010;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1030;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1040;0050;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_73_00_a;1050;0050;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_73_00_a;1060;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;1070;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;1080;0050;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_73_00_a;1090;0050;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_73_00_a;1100;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0040;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0060;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0070;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0090;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0201;0080;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_74_00_a;0260;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0080;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0130;0080;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_74_00_a;0150;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0170;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0190;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0180;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0230;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0160;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0210;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0269;0080;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_74_00_a;0273;0080;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_74_00_a;0277;0080;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_74_00_a;0281;0080;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_74_00_a;0285;0080;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_74_00_a;0289;0080;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_74_00_a;0293;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0301;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0303;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0309;0080;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_74_00_a;0313;0080;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_74_00_a;0317;0080;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_74_00_a;0321;0080;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_74_00_a;0325;0080;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_74_00_a;0329;0080;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_74_00_a;0333;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0341;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0343;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0345;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0070;0010;UNDEFINED;n/a;5198630.14;22/03/2022
456;test;456;C_72_00_a;0190;0010;UNDEFINED;n/a;835892217.0;22/03/2022
456;test;456;C_72_00_a;0260;0010;UNDEFINED;n/a;4745984333.0;22/03/2022
456;test;456;C_73_00_a;0035;0010;UNDEFINED;n/a;25424822307.28;22/03/2022
456;test;456;C_73_00_a;0070;0010;UNDEFINED;n/a;-33216232069.67;22/03/2022
456;test;456;C_73_00_a;0080;0010;UNDEFINED;n/a;-20966122130.53;22/03/2022
456;test;456;C_73_00_a;0110;0010;UNDEFINED;n/a;-9384698955.8;22/03/2022
456;test;456;C_73_00_a;0230;0010;UNDEFINED;n/a;2193605666.84;22/03/2022
456;test;456;C_73_00_a;0250;0010;UNDEFINED;n/a;-573769151.28;22/03/2022
456;test;456;C_73_00_a;0260;0010;UNDEFINED;n/a;3333715453.55;22/03/2022
456;test;456;C_73_00_a;0918;0010;UNDEFINED;n/a;124366.0;22/03/2022
456;test;456;C_74_00_a;0160;0010;UNDEFINED;n/a;-54345799619.07;22/03/2022
456;test;456;C_74_00_a;0260;0010;UNDEFINED;n/a;150348.16;22/03/2022
456;test;456;C_73_00_a;1100;0010;UNDEFINED;n/a;-37633449687.15;22/03/2022
456;test;456;C_73_00_a;1100;0020;UNDEFINED;n/a;-3764349687.15;22/03/2022
456;test;456;C_73_00_a;1040;0040;UNDEFINED;n/a;33764349687.15;22/03/2022
456;test;456;C_73_00_a;1045;0040;UNDEFINED;n/a;33764349687.15;22/03/2022
456;test;456;C_73_00_a;1045;0030;UNDEFINED;n/a;335098209.05;22/03/2022
456;test;456;C_73_00_a;1040;0010;UNDEFINED;n/a;7449687.15;22/03/2022
456;test;456;C_73_00_a;1045;0010;UNDEFINED;n/a;76449687.15;22/03/2022
456;test;456;C_72_00_a;0050;0010;UNDEFINED;n/a;40409261.0100539;22/03/2022
456;test;456;C_74_00_a;0040;0010;UNDEFINED;n/a;46860662.1948734;22/03/2022
456;test;456;C_74_00_a;0060;0010;UNDEFINED;n/a;1783648.53838003;22/03/2022
456;test;456;C_74_00_a;0070;0010;UNDEFINED;n/a;7847645.76582712;22/03/2022
456;test;456;C_73_00_a;0310;0010;UNDEFINED;n/a;48100909.2077918;22/03/2022
456;test;456;C_74_00_a;0201;0010;UNDEFINED;n/a;45652287.0078367;22/03/2022
456;test;456;C_72_00_a;0590;0010;UNDEFINED;n/a;19988230.281333;22/03/2022
456;test;456;C_73_00_a;0480;0010;UNDEFINED;n/a;28243908.6235795;22/03/2022
456;test;456;C_73_00_a;0490;0010;UNDEFINED;n/a;12655653.8647408;22/03/2022
456;test;456;C_73_00_a;0530;0010;UNDEFINED;n/a;27792100.4510517;22/03/2022
456;test;456;C_73_00_a;0570;0010;UNDEFINED;n/a;20768476.5051213;22/03/2022
456;test;456;C_73_00_a;0480;0010;UNDEFINED;n/a;28601515.4535418;22/03/2022
456;test;456;C_73_00_a;0490;0010;UNDEFINED;n/a;17269663.9202129;22/03/2022
456;test;456;C_73_00_a;0530;0010;UNDEFINED;n/a;21250486.2477187;22/03/2022
456;test;456;C_73_00_a;0570;0010;UNDEFINED;n/a;12924566.8399212;22/03/2022
456;test;456;C_73_00_a;0110;0010;UNDEFINED;n/a;17299383.641137;22/03/2022
456;test;456;C_73_00_a;0035;0010;UNDEFINED;n/a;19054145.8837998;22/03/2022
456;test;456;C_72_00_a;0280;0010;UNDEFINED;n/a;294348.91379545;22/03/2022
456;test;456;C_73_00_a;0340;0010;UNDEFINED;n/a;40803729.9712868;22/03/2022
456;test;456;C_74_00_a;0240;0010;UNDEFINED;n/a;25387904.3875074;22/03/2022
456;test;456;C_73_00_a;0340;0010;UNDEFINED;n/a;6951075.43742419;22/03/2022
456;test;456;C_74_00_a;0240;0010;UNDEFINED;n/a;12298844.1430509;22/03/2022
456;test;456;C_72_00_a;0040;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0050;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0060;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0070;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0090;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0110;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0240;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0260;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0080;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0100;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0120;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0130;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0140;0030;UNDEFINED;n/a;0.95;22/03/2022
456;test;456;C_72_00_a;0150;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0170;0030;UNDEFINED;n/a;0.8;22/03/2022
456;test;456;C_72_00_a;0190;0030;UNDEFINED;n/a;0.93;22/03/2022
456;test;456;C_72_00_a;0200;0030;UNDEFINED;n/a;0.88;22/03/2022
456;test;456;C_72_00_a;0250;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0270;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0280;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0290;0030;UNDEFINED;n/a;0.8;22/03/2022
456;test;456;C_72_00_a;0320;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0330;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0340;0030;UNDEFINED;n/a;0.7;22/03/2022
456;test;456;C_72_00_a;0350;0030;UNDEFINED;n/a;0.65;22/03/2022
456;test;456;C_72_00_a;0360;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0370;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0380;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0390;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0400;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0410;0030;UNDEFINED;n/a;0.7;22/03/2022
456;test;456;C_72_00_a;0420;0030;UNDEFINED;n/a;0.65;22/03/2022
456;test;456;C_72_00_a;0430;0030;UNDEFINED;n/a;0.6;22/03/2022
456;test;456;C_72_00_a;0440;0030;UNDEFINED;n/a;0.45;22/03/2022
456;test;456;C_72_00_a;0450;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0460;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0040;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0070;0050;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_73_00_a;0090;0050;UNDEFINED;n/a;0.03;22/03/2022
456;test;456;C_73_00_a;0110;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0260;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0310;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0480;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0490;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0530;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0570;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0590;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0080;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0140;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0150;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0170;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0190;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0200;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0250;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0280;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0290;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0360;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0370;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0380;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0390;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0400;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0420;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0430;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0450;0050;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_73_00_a;0035;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0180;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0204;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0206;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0207;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0220;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0230;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0300;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0510;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0520;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0540;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0560;0050;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0600;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;0610;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0630;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0640;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0660;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0670;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;0680;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0700;0050;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0710;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0890;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0900;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0913;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0914;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0915;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0916;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0917;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0918;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0940;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0950;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0960;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0970;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0980;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0990;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1000;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1010;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1030;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1040;0050;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_73_00_a;1050;0050;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_73_00_a;1060;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;1070;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;1080;0050;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_73_00_a;1090;0050;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_73_00_a;1100;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0040;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0060;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0070;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0090;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0201;0080;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_74_00_a;0260;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0080;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0130;0080;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_74_00_a;0150;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0170;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0190;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0180;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0230;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0160;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0210;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0269;0080;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_74_00_a;0273;0080;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_74_00_a;0277;0080;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_74_00_a;0281;0080;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_74_00_a;0285;0080;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_74_00_a;0289;0080;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_74_00_a;0293;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0301;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0303;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0309;0080;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_74_00_a;0313;0080;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_74_00_a;0317;0080;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_74_00_a;0321;0080;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_74_00_a;0325;0080;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_74_00_a;0329;0080;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_74_00_a;0333;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0341;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0343;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0345;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0070;0010;UNDEFINED;n/a;5198630.14;22/03/2022
456;test;456;C_72_00_a;0190;0010;UNDEFINED;n/a;835892217.0;22/03/2022
456;test;456;C_72_00_a;0260;0010;UNDEFINED;n/a;4745984333.0;22/03/2022
456;test;456;C_73_00_a;0035;0010;UNDEFINED;n/a;25424822307.28;22/03/2022
456;test;456;C_73_00_a;0070;0010;UNDEFINED;n/a;-33216232069.67;22/03/2022
456;test;456;C_73_00_a;0080;0010;UNDEFINED;n/a;-20966122130.53;22/03/2022
456;test;456;C_73_00_a;0110;0010;UNDEFINED;n/a;-9384698955.8;22/03/2022
456;test;456;C_73_00_a;0230;0010;UNDEFINED;n/a;2193605666.84;22/03/2022
456;test;456;C_73_00_a;0250;0010;UNDEFINED;n/a;-573769151.28;22/03/2022
456;test;456;C_73_00_a;0260;0010;UNDEFINED;n/a;3333715453.55;22/03/2022
456;test;456;C_73_00_a;0918;0010;UNDEFINED;n/a;124366.0;22/03/2022
456;test;456;C_74_00_a;0160;0010;UNDEFINED;n/a;-54345799619.07;22/03/2022
456;test;456;C_74_00_a;0260;0010;UNDEFINED;n/a;150348.16;22/03/2022
456;test;456;C_73_00_a;1100;0010;UNDEFINED;n/a;-37633449687.15;22/03/2022
456;test;456;C_73_00_a;1100;0020;UNDEFINED;n/a;-3764349687.15;22/03/2022
456;test;456;C_73_00_a;1040;0040;UNDEFINED;n/a;33764349687.15;22/03/2022
456;test;456;C_73_00_a;1045;0040;UNDEFINED;n/a;33764349687.15;22/03/2022
456;test;456;C_73_00_a;1045;0030;UNDEFINED;n/a;335098209.05;22/03/2022
456;test;456;C_73_00_a;1040;0010;UNDEFINED;n/a;7449687.15;22/03/2022
456;test;456;C_73_00_a;1045;0010;UNDEFINED;n/a;76449687.15;22/03/2022
I hope you can lead me in the right direction.
Because need omit sum values by condition first filter for not match condition, get sum with remove duplicates and then add rows by condition:
m = df['column_item'].isin([30, 50])
df1 = df[~m].copy()
df1['amount'] = df1.groupby(['report_name', 'line_item', 'column_item'])['amount'].transform('sum')
df1 = df1.drop_duplicates(['report_name', 'line_item', 'column_item'])
df = pd.concat([df1, df[m]])
If you need to get just duplicated rows and sum over them, you can do something like:
(df[(df[["report_name", "line_item","column_item"]].duplicated(keep=False)) & (~df['column_item'].isin([30, 50]))]
.groupby(["report_name", "line_item","column_item"])["amount"]
.sum())
This will result in something like:
report_name line_item column_item
C_72_00_a 50 10 4.040926e+07
70 10 5.198630e+06
190 10 8.358922e+08
260 10 4.745984e+09
280 10 2.943489e+05
...
C_74_00_a 329 80 3.500000e-01
333 80 5.000000e-01
341 80 5.000000e-01
343 80 1.000000e+00
345 80 1.000000e+00
Name: amount, Length: 67, dtype: float64
To make sure that you are getting the correct values let's check the example you have shown in your question ( the one with C_73_00_a and 1100 and 10):
dfResult = (df[(df[["report_name", "line_item","column_item"]].duplicated(keep=False)) & (~df['column_item'].isin([30, 50]))]
.groupby(["report_name", "line_item","column_item"])["amount"]
.sum())
dfResult[('C_73_00_a', 1100, 10)]
This will output:
-75266899374.3
Which is the result of -37633449687.15 + -37633449687.15 (as shown in your question).
Following up on my previous question
I have a list of records as shown below
taken from this table
itemImage
name
nameFontSize
nameW
nameH
conutry
countryFont
countryW
countryH
code
codeFontSize
codeW
codeH
sample.jpg
Apple
142
1200
200
US
132
1200
400
1564
82
1300
600
sample2.jpg
Orange
142
1200
200
UK
132
1200
400
1562
82
1300
600
sample3.jpg
Lemon
142
1200
200
FR
132
1200
400
1563
82
1300
600
Right now, I have one function setText which takes all the elements of a row from this table.
I only have name, country and code for now but will be adding other stuff in the future.
I want to make this code more future proof and dynamic. For example, If I added four new columns in my data following the same pattern. How do I make python automatically adjust to that? instead of me going and declaring variables in my code every time.
Basically, I want to send each 4 columns starting from name to a function then continue till no column is left. Once that's done go to the next row and continue the loop.
Thanks to #Samwise who helped me clean up the code a bit.
import os
from PIL import Image,ImageFont,ImageDraw, features
import pandas as pd
path='./'
files = []
for (dirpath, dirnames, filenames) in os.walk(path):
files.extend(filenames)
df = pd.read_excel (r'./data.xlsx')
records = list(df.to_records(index=False))
def setText(itemImage, name, nameFontSize, nameW, nameH,
conutry, countryFontSize,countryW, countryH,
code, codeFontSize, codeW, codeH):
font1 = ImageFont.truetype(r'./font.ttf', nameFontSize)
font2 = ImageFont.truetype(r'./font.ttf', countryFontSize)
font3 = ImageFont.truetype(r'./font.ttf', codeFontSize)
file = Image.open(f"./{itemImage}")
draw = ImageDraw.Draw(file)
draw.text((nameW, nameH), name, font=font1, fill='#ff0000',
align="right",anchor="rm")
draw.text((countryW, countryH), conutry, font=font2, fill='#ff0000',
align="right",anchor="rm")
draw.text((codeW, codeH), str(code), font=font3, fill='#ff0000',
align="right",anchor="rm")
file.save(f'done {itemImage}')
for i in records:
setText(*i)
Sounds like df.columns might help. It returns a list, then you can iterate through whatever cols are present.
for col in df.columns():
The answers in this thread should help dial you in:
How to iterate over columns of pandas dataframe to run regression
It sounds like you also want row-wise results, so you could nest within df.iterrows or vice versa...though going cell by cell is generally not desirable and could end up being quite slow as your df grows.
So perhaps be thinking about how you could use your function with df.apply()
I have I guess a moderately sized dataframe of ~500k rows and 200 columns with 8GB of memory.
My problem is that when I got to slice my data, even very small sized datasets when this gets trimmed down to 6k rows and 200 columns, that it just hangs and hangs for 10/15 min+. Then if I hit the STOP button for python interactive and re-try the process happens in 2-3 seconds.
I don't know why I can do my row-slicing in this 2-3 seconds normally. It is making it impossible to run programs as things just hang and hang and have to be manually stopped before it works.
I am following the approach laid out on the h2o webpage:
import h2o
h2o.init()
# Import the iris with headers dataset
path = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv"
df = h2o.import_file(path=path)
# Slice 1 row by index
c1 = df[15,:]
c1.describe
# Slice a range of rows
c1_1 = df[range(25,50,1),:]
c1_1.describe
# Slice using a boolean mask. The output dataset will include rows with a sepal length
# less than 4.6.
mask = df["sepal_len"] < 4.6
cols = df[mask,:]
cols.describe
# Filter out rows that contain missing values in a column. Note the use of '~' to
# perform a logical not.
mask = df["sepal_len"].isna()
cols = df[~mask,:]
cols.describe
The error message from the console is as follows. I have this same error message repeated several times.:
/opt/anaconda3/lib/python3.7/site-packages/h2o/expr.py in (.0)
149 return self._cache._id # Data already computed under ID, but not cached
150 assert isinstance(self._children,tuple)
--> 151 exec_str = "({} {})".format(self._op, " ".join([ExprNode._arg_to_expr(ast) for ast in self._children]))
152 gc_ref_cnt = len(gc.get_referrers(self))
153 if top or gc_ref_cnt >= ExprNode.MAGIC_REF_COUNT:
~/opt/anaconda3/lib/python3.7/site-packages/h2o/expr.py in _arg_to_expr(arg)
161 return "[]" # empty list
162 if isinstance(arg, ExprNode):
--> 163 return arg._get_ast_str(False)
164 if isinstance(arg, ASTId):
TLDR: The df.query() tool doesn't seem to work if the df's columns are tuples or even tuples converted into strings. How can I work around this to get the slice I'm aiming for?
Long Version: I have a pandas dataframe that looks like this (although there are a lot more columns and rows...):
> dosage_df
Score ("A_dose","Super") ("A_dose","Light") ("B_dose","Regular")
28 1 40 130
11 2 40 130
72 3 40 130
67 1 90 130
74 2 90 130
89 3 90 130
43 1 40 700
61 2 40 700
5 3 40 700
Along with my data frame, I also have a python dictionary with the relevant ranges for each feature. The keys are the feature names, and the different values which it can take are the keys:
# Original Version
dosage_df.columns = ['First Score', 'Last Score', ("A_dose","Super"), ("A_dose","Light"), ("B_dose","Regular")]
dict_of_dose_ranges = {("A_dose","Super"):[1,2,3],
("A_dose","Light"):[40,70,90],
("B_dose","Regular"):[130,200,500,700]}
For my purposes, I need to generate a particular combination (say A_dose = 1, B_dose = 90, and C_dose = 700), and based on those settings take the relevant slice out of my dataframe, and do relevant calculations from that smaller subset, and save the results somewhere.
I'm doing this by implementing the following:
from itertools import product
for dosage_comb in product(*dict_of_dose_ranges.values()):
dosage_items = zip(dict_of_dose_ranges.keys(), dosage_comb)
query_str = ' & '.join('{} == {}'.format(*x) for x in dosage_items)
**sub_df = dosage_df.query(query_str)**
...
The problem is that is gets hung up on the query step, as it returns the following error message:
TypeError: argument of type 'int' is not iterable
In this case, the query generated looks like this:
query_str = "("A_dose","Light") == 40 & ("A_dose","Super") == 1 & ("B_dose","Regular") == 130"
Troubleshooting Attempts:
I've confirmed that indeed that solution should work for a dataframe with just string columns as found here. In addition, I've also tried "tricking" the tool by converting the columns and the dictionary keys into strings by the following code... but that returned the same error.
# String Version
dosage_df.columns = ['First Score', 'Last Score', '("A_dose","Super")', '("A_dose","Light")', '("B_dose","Regular")']
dict_of_dose_ranges = {
'("A_dose","Super")':[1,2,3],
'("A_dose","Light")':[40,70,90],
'("B_dose","Regular")':[130,200,500,700]}
Is there an alternate tool in python that can take tuples as inputs or a different way for me to trick it into working?
You can build a list of conditions and logically condense them with np.all instead of using query:
for dosage_comb in product(*dict_of_dose_ranges.values()):
dosage_items = zip(dict_of_dose_ranges.keys(), dosage_comb)
condition = np.all([dosage_df[col] == dose for col, dose in dosage_items], axis=0)
sub_df = dosage_df[condition]
This method seems to be a bit more flexible than query, but when filtering across many columns I've found that query often performs better. I don't know if this is true in general though.