I am trying to sum dublicate rows in the amount column like shown in the screenshot:
So if report_name, line_item and column_item are the same I want to sum the amounts in the amount row and create one row instead of two but without losing the structure of the dataframe.
But I don't want to sum dublicates if they have column_item 50 or 30.
This is my data frame:
entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0050;0010;UNDEFINED;n/a;40409261.0100539;22/03/2022
456;test;456;C_74_00_a;0040;0010;UNDEFINED;n/a;46860662.1948734;22/03/2022
456;test;456;C_74_00_a;0060;0010;UNDEFINED;n/a;1783648.53838003;22/03/2022
456;test;456;C_74_00_a;0070;0010;UNDEFINED;n/a;7847645.76582712;22/03/2022
456;test;456;C_73_00_a;0310;0010;UNDEFINED;n/a;48100909.2077918;22/03/2022
456;test;456;C_74_00_a;0201;0010;UNDEFINED;n/a;45652287.0078367;22/03/2022
456;test;456;C_72_00_a;0590;0010;UNDEFINED;n/a;19988230.281333;22/03/2022
456;test;456;C_73_00_a;0480;0010;UNDEFINED;n/a;28243908.6235795;22/03/2022
456;test;456;C_73_00_a;0490;0010;UNDEFINED;n/a;12655653.8647408;22/03/2022
456;test;456;C_73_00_a;0530;0010;UNDEFINED;n/a;27792100.4510517;22/03/2022
456;test;456;C_73_00_a;0570;0010;UNDEFINED;n/a;20768476.5051213;22/03/2022
456;test;456;C_73_00_a;0480;0010;UNDEFINED;n/a;28601515.4535418;22/03/2022
456;test;456;C_73_00_a;0490;0010;UNDEFINED;n/a;17269663.9202129;22/03/2022
456;test;456;C_73_00_a;0530;0010;UNDEFINED;n/a;21250486.2477187;22/03/2022
456;test;456;C_73_00_a;0570;0010;UNDEFINED;n/a;12924566.8399212;22/03/2022
456;test;456;C_73_00_a;0110;0010;UNDEFINED;n/a;17299383.641137;22/03/2022
456;test;456;C_73_00_a;0035;0010;UNDEFINED;n/a;19054145.8837998;22/03/2022
456;test;456;C_72_00_a;0280;0010;UNDEFINED;n/a;294348.91379545;22/03/2022
456;test;456;C_73_00_a;0340;0010;UNDEFINED;n/a;40803729.9712868;22/03/2022
456;test;456;C_74_00_a;0240;0010;UNDEFINED;n/a;25387904.3875074;22/03/2022
456;test;456;C_73_00_a;0340;0010;UNDEFINED;n/a;6951075.43742419;22/03/2022
456;test;456;C_74_00_a;0240;0010;UNDEFINED;n/a;12298844.1430509;22/03/2022
456;test;456;C_72_00_a;0040;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0050;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0060;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0070;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0090;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0110;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0240;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0260;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0080;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0100;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0120;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0130;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0140;0030;UNDEFINED;n/a;0.95;22/03/2022
456;test;456;C_72_00_a;0150;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0170;0030;UNDEFINED;n/a;0.8;22/03/2022
456;test;456;C_72_00_a;0190;0030;UNDEFINED;n/a;0.93;22/03/2022
456;test;456;C_72_00_a;0200;0030;UNDEFINED;n/a;0.88;22/03/2022
456;test;456;C_72_00_a;0250;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0270;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0280;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0290;0030;UNDEFINED;n/a;0.8;22/03/2022
456;test;456;C_72_00_a;0320;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0330;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0340;0030;UNDEFINED;n/a;0.7;22/03/2022
456;test;456;C_72_00_a;0350;0030;UNDEFINED;n/a;0.65;22/03/2022
456;test;456;C_72_00_a;0360;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0370;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0380;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0390;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0400;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0410;0030;UNDEFINED;n/a;0.7;22/03/2022
456;test;456;C_72_00_a;0420;0030;UNDEFINED;n/a;0.65;22/03/2022
456;test;456;C_72_00_a;0430;0030;UNDEFINED;n/a;0.6;22/03/2022
456;test;456;C_72_00_a;0440;0030;UNDEFINED;n/a;0.45;22/03/2022
456;test;456;C_72_00_a;0450;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0460;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0040;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0070;0050;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_73_00_a;0090;0050;UNDEFINED;n/a;0.03;22/03/2022
456;test;456;C_73_00_a;0110;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0260;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0310;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0480;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0490;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0530;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0570;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0590;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0080;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0140;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0150;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0170;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0190;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0200;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0250;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0280;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0290;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0360;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0370;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0380;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0390;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0400;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0420;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0430;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0450;0050;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_73_00_a;0035;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0180;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0204;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0206;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0207;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0220;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0230;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0300;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0510;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0520;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0540;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0560;0050;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0600;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;0610;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0630;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0640;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0660;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0670;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;0680;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0700;0050;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0710;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0890;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0900;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0913;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0914;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0915;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0916;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0917;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0918;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0940;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0950;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0960;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0970;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0980;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0990;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1000;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1010;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1030;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1040;0050;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_73_00_a;1050;0050;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_73_00_a;1060;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;1070;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;1080;0050;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_73_00_a;1090;0050;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_73_00_a;1100;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0040;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0060;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0070;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0090;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0201;0080;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_74_00_a;0260;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0080;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0130;0080;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_74_00_a;0150;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0170;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0190;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0180;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0230;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0160;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0210;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0269;0080;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_74_00_a;0273;0080;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_74_00_a;0277;0080;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_74_00_a;0281;0080;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_74_00_a;0285;0080;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_74_00_a;0289;0080;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_74_00_a;0293;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0301;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0303;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0309;0080;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_74_00_a;0313;0080;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_74_00_a;0317;0080;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_74_00_a;0321;0080;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_74_00_a;0325;0080;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_74_00_a;0329;0080;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_74_00_a;0333;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0341;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0343;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0345;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0070;0010;UNDEFINED;n/a;5198630.14;22/03/2022
456;test;456;C_72_00_a;0190;0010;UNDEFINED;n/a;835892217.0;22/03/2022
456;test;456;C_72_00_a;0260;0010;UNDEFINED;n/a;4745984333.0;22/03/2022
456;test;456;C_73_00_a;0035;0010;UNDEFINED;n/a;25424822307.28;22/03/2022
456;test;456;C_73_00_a;0070;0010;UNDEFINED;n/a;-33216232069.67;22/03/2022
456;test;456;C_73_00_a;0080;0010;UNDEFINED;n/a;-20966122130.53;22/03/2022
456;test;456;C_73_00_a;0110;0010;UNDEFINED;n/a;-9384698955.8;22/03/2022
456;test;456;C_73_00_a;0230;0010;UNDEFINED;n/a;2193605666.84;22/03/2022
456;test;456;C_73_00_a;0250;0010;UNDEFINED;n/a;-573769151.28;22/03/2022
456;test;456;C_73_00_a;0260;0010;UNDEFINED;n/a;3333715453.55;22/03/2022
456;test;456;C_73_00_a;0918;0010;UNDEFINED;n/a;124366.0;22/03/2022
456;test;456;C_74_00_a;0160;0010;UNDEFINED;n/a;-54345799619.07;22/03/2022
456;test;456;C_74_00_a;0260;0010;UNDEFINED;n/a;150348.16;22/03/2022
456;test;456;C_73_00_a;1100;0010;UNDEFINED;n/a;-37633449687.15;22/03/2022
456;test;456;C_73_00_a;1100;0020;UNDEFINED;n/a;-3764349687.15;22/03/2022
456;test;456;C_73_00_a;1040;0040;UNDEFINED;n/a;33764349687.15;22/03/2022
456;test;456;C_73_00_a;1045;0040;UNDEFINED;n/a;33764349687.15;22/03/2022
456;test;456;C_73_00_a;1045;0030;UNDEFINED;n/a;335098209.05;22/03/2022
456;test;456;C_73_00_a;1040;0010;UNDEFINED;n/a;7449687.15;22/03/2022
456;test;456;C_73_00_a;1045;0010;UNDEFINED;n/a;76449687.15;22/03/2022
456;test;456;C_72_00_a;0050;0010;UNDEFINED;n/a;40409261.0100539;22/03/2022
456;test;456;C_74_00_a;0040;0010;UNDEFINED;n/a;46860662.1948734;22/03/2022
456;test;456;C_74_00_a;0060;0010;UNDEFINED;n/a;1783648.53838003;22/03/2022
456;test;456;C_74_00_a;0070;0010;UNDEFINED;n/a;7847645.76582712;22/03/2022
456;test;456;C_73_00_a;0310;0010;UNDEFINED;n/a;48100909.2077918;22/03/2022
456;test;456;C_74_00_a;0201;0010;UNDEFINED;n/a;45652287.0078367;22/03/2022
456;test;456;C_72_00_a;0590;0010;UNDEFINED;n/a;19988230.281333;22/03/2022
456;test;456;C_73_00_a;0480;0010;UNDEFINED;n/a;28243908.6235795;22/03/2022
456;test;456;C_73_00_a;0490;0010;UNDEFINED;n/a;12655653.8647408;22/03/2022
456;test;456;C_73_00_a;0530;0010;UNDEFINED;n/a;27792100.4510517;22/03/2022
456;test;456;C_73_00_a;0570;0010;UNDEFINED;n/a;20768476.5051213;22/03/2022
456;test;456;C_73_00_a;0480;0010;UNDEFINED;n/a;28601515.4535418;22/03/2022
456;test;456;C_73_00_a;0490;0010;UNDEFINED;n/a;17269663.9202129;22/03/2022
456;test;456;C_73_00_a;0530;0010;UNDEFINED;n/a;21250486.2477187;22/03/2022
456;test;456;C_73_00_a;0570;0010;UNDEFINED;n/a;12924566.8399212;22/03/2022
456;test;456;C_73_00_a;0110;0010;UNDEFINED;n/a;17299383.641137;22/03/2022
456;test;456;C_73_00_a;0035;0010;UNDEFINED;n/a;19054145.8837998;22/03/2022
456;test;456;C_72_00_a;0280;0010;UNDEFINED;n/a;294348.91379545;22/03/2022
456;test;456;C_73_00_a;0340;0010;UNDEFINED;n/a;40803729.9712868;22/03/2022
456;test;456;C_74_00_a;0240;0010;UNDEFINED;n/a;25387904.3875074;22/03/2022
456;test;456;C_73_00_a;0340;0010;UNDEFINED;n/a;6951075.43742419;22/03/2022
456;test;456;C_74_00_a;0240;0010;UNDEFINED;n/a;12298844.1430509;22/03/2022
456;test;456;C_72_00_a;0040;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0050;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0060;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0070;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0090;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0110;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0240;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0260;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0080;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0100;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0120;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0130;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0140;0030;UNDEFINED;n/a;0.95;22/03/2022
456;test;456;C_72_00_a;0150;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0170;0030;UNDEFINED;n/a;0.8;22/03/2022
456;test;456;C_72_00_a;0190;0030;UNDEFINED;n/a;0.93;22/03/2022
456;test;456;C_72_00_a;0200;0030;UNDEFINED;n/a;0.88;22/03/2022
456;test;456;C_72_00_a;0250;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0270;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0280;0030;UNDEFINED;n/a;0.85;22/03/2022
456;test;456;C_72_00_a;0290;0030;UNDEFINED;n/a;0.8;22/03/2022
456;test;456;C_72_00_a;0320;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0330;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0340;0030;UNDEFINED;n/a;0.7;22/03/2022
456;test;456;C_72_00_a;0350;0030;UNDEFINED;n/a;0.65;22/03/2022
456;test;456;C_72_00_a;0360;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0370;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0380;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0390;0030;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_72_00_a;0400;0030;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0410;0030;UNDEFINED;n/a;0.7;22/03/2022
456;test;456;C_72_00_a;0420;0030;UNDEFINED;n/a;0.65;22/03/2022
456;test;456;C_72_00_a;0430;0030;UNDEFINED;n/a;0.6;22/03/2022
456;test;456;C_72_00_a;0440;0030;UNDEFINED;n/a;0.45;22/03/2022
456;test;456;C_72_00_a;0450;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_72_00_a;0460;0030;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0040;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0070;0050;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_73_00_a;0090;0050;UNDEFINED;n/a;0.03;22/03/2022
456;test;456;C_73_00_a;0110;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0260;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0310;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0480;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0490;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0530;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0570;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0590;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0080;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0140;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0150;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0170;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0190;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0200;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;0250;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0280;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0290;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0360;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0370;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0380;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0390;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0400;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0420;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0430;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0450;0050;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_73_00_a;0035;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0180;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0204;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0206;0050;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_73_00_a;0207;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0220;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0230;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0300;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0510;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0520;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0540;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0560;0050;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0600;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;0610;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0630;0050;UNDEFINED;n/a;0.1;22/03/2022
456;test;456;C_73_00_a;0640;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0660;0050;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_73_00_a;0670;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;0680;0050;UNDEFINED;n/a;0.4;22/03/2022
456;test;456;C_73_00_a;0700;0050;UNDEFINED;n/a;0.75;22/03/2022
456;test;456;C_73_00_a;0710;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0890;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0900;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0913;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0914;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0915;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0916;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0917;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0918;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_73_00_a;0940;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0950;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0960;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0970;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0980;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;0990;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1000;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1010;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1030;0050;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_73_00_a;1040;0050;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_73_00_a;1050;0050;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_73_00_a;1060;0050;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_73_00_a;1070;0050;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_73_00_a;1080;0050;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_73_00_a;1090;0050;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_73_00_a;1100;0050;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0040;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0060;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0070;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0090;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0201;0080;UNDEFINED;n/a;0.2;22/03/2022
456;test;456;C_74_00_a;0260;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0080;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0130;0080;UNDEFINED;n/a;0.05;22/03/2022
456;test;456;C_74_00_a;0150;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0170;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0190;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0180;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0230;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0160;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0210;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0269;0080;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_74_00_a;0273;0080;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_74_00_a;0277;0080;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_74_00_a;0281;0080;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_74_00_a;0285;0080;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_74_00_a;0289;0080;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_74_00_a;0293;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0301;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0303;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0309;0080;UNDEFINED;n/a;0.0;22/03/2022
456;test;456;C_74_00_a;0313;0080;UNDEFINED;n/a;0.07;22/03/2022
456;test;456;C_74_00_a;0317;0080;UNDEFINED;n/a;0.15;22/03/2022
456;test;456;C_74_00_a;0321;0080;UNDEFINED;n/a;0.25;22/03/2022
456;test;456;C_74_00_a;0325;0080;UNDEFINED;n/a;0.3;22/03/2022
456;test;456;C_74_00_a;0329;0080;UNDEFINED;n/a;0.35;22/03/2022
456;test;456;C_74_00_a;0333;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0341;0080;UNDEFINED;n/a;0.5;22/03/2022
456;test;456;C_74_00_a;0343;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_74_00_a;0345;0080;UNDEFINED;n/a;1.0;22/03/2022
456;test;456;C_72_00_a;0070;0010;UNDEFINED;n/a;5198630.14;22/03/2022
456;test;456;C_72_00_a;0190;0010;UNDEFINED;n/a;835892217.0;22/03/2022
456;test;456;C_72_00_a;0260;0010;UNDEFINED;n/a;4745984333.0;22/03/2022
456;test;456;C_73_00_a;0035;0010;UNDEFINED;n/a;25424822307.28;22/03/2022
456;test;456;C_73_00_a;0070;0010;UNDEFINED;n/a;-33216232069.67;22/03/2022
456;test;456;C_73_00_a;0080;0010;UNDEFINED;n/a;-20966122130.53;22/03/2022
456;test;456;C_73_00_a;0110;0010;UNDEFINED;n/a;-9384698955.8;22/03/2022
456;test;456;C_73_00_a;0230;0010;UNDEFINED;n/a;2193605666.84;22/03/2022
456;test;456;C_73_00_a;0250;0010;UNDEFINED;n/a;-573769151.28;22/03/2022
456;test;456;C_73_00_a;0260;0010;UNDEFINED;n/a;3333715453.55;22/03/2022
456;test;456;C_73_00_a;0918;0010;UNDEFINED;n/a;124366.0;22/03/2022
456;test;456;C_74_00_a;0160;0010;UNDEFINED;n/a;-54345799619.07;22/03/2022
456;test;456;C_74_00_a;0260;0010;UNDEFINED;n/a;150348.16;22/03/2022
456;test;456;C_73_00_a;1100;0010;UNDEFINED;n/a;-37633449687.15;22/03/2022
456;test;456;C_73_00_a;1100;0020;UNDEFINED;n/a;-3764349687.15;22/03/2022
456;test;456;C_73_00_a;1040;0040;UNDEFINED;n/a;33764349687.15;22/03/2022
456;test;456;C_73_00_a;1045;0040;UNDEFINED;n/a;33764349687.15;22/03/2022
456;test;456;C_73_00_a;1045;0030;UNDEFINED;n/a;335098209.05;22/03/2022
456;test;456;C_73_00_a;1040;0010;UNDEFINED;n/a;7449687.15;22/03/2022
456;test;456;C_73_00_a;1045;0010;UNDEFINED;n/a;76449687.15;22/03/2022
I hope you can lead me in the right direction.
Because need omit sum values by condition first filter for not match condition, get sum with remove duplicates and then add rows by condition:
m = df['column_item'].isin([30, 50])
df1 = df[~m].copy()
df1['amount'] = df1.groupby(['report_name', 'line_item', 'column_item'])['amount'].transform('sum')
df1 = df1.drop_duplicates(['report_name', 'line_item', 'column_item'])
df = pd.concat([df1, df[m]])
If you need to get just duplicated rows and sum over them, you can do something like:
(df[(df[["report_name", "line_item","column_item"]].duplicated(keep=False)) & (~df['column_item'].isin([30, 50]))]
.groupby(["report_name", "line_item","column_item"])["amount"]
.sum())
This will result in something like:
report_name line_item column_item
C_72_00_a 50 10 4.040926e+07
70 10 5.198630e+06
190 10 8.358922e+08
260 10 4.745984e+09
280 10 2.943489e+05
...
C_74_00_a 329 80 3.500000e-01
333 80 5.000000e-01
341 80 5.000000e-01
343 80 1.000000e+00
345 80 1.000000e+00
Name: amount, Length: 67, dtype: float64
To make sure that you are getting the correct values let's check the example you have shown in your question ( the one with C_73_00_a and 1100 and 10):
dfResult = (df[(df[["report_name", "line_item","column_item"]].duplicated(keep=False)) & (~df['column_item'].isin([30, 50]))]
.groupby(["report_name", "line_item","column_item"])["amount"]
.sum())
dfResult[('C_73_00_a', 1100, 10)]
This will output:
-75266899374.3
Which is the result of -37633449687.15 + -37633449687.15 (as shown in your question).
Related
I have one dataframe where format is given as below image.
Every row where three columns are representing as one type of data. In given example there are one column for ticker and next three column is kind one type of data and column 5-7are second type of data.
Now I want to transform this in column where every type of data appended by another group.
Expected output is:
is there anyway to do this transformation in pandas using any API? I am doing it very basic way where creating a new dataframe for one group and then appending it.
here is one way to do it
use pd.melt to unstack the table, then split what used to be columns (and now as rows) on "/" to separate them into two columns (txt, year)
create the new row value by combining ticker and year, then using pivot to get the desired result set
df2=df.melt(id_vars='ticker', var_name='col') # line missed in earlier solution,updated
df2[['txt','year']] = df.melt(id_vars='ticker', var_name='col')['col'].str.split('/', expand=True)
df2.assign(ticker2=df2['ticker'] + '/' + df2['year']).pivot(index='ticker2', columns='txt', values='value').reset_index()
Result set
txt ticker2 data1 data2
0 AAPL/2020 0.824676 0.616524
1 AAPL/2021 0.018540 0.046365
2 AAPL/2022 0.222349 0.729845
3 AMZ/2020 0.122288 0.087217
4 AMZ/2021 0.012168 0.734674
5 AMZ/2022 0.923501 0.437676
6 APPL/2020 0.886927 0.520650
7 APPL/2021 0.725515 0.543404
8 APPL/2022 0.211378 0.464898
9 GGL/2020 0.777676 0.052658
10 GGL/2021 0.297292 0.213876
11 GGL/2022 0.894150 0.185207
12 MICO/2020 0.898251 0.882252
13 MICO/2021 0.141342 0.105316
14 MICO/2022 0.440459 0.811005
based on the code that you posted in comment. I missed a line, unfortunately, in posting the solution. its added now
df2 = pd.DataFrame(np.random.randint(0,100,size=(2, 6)),
columns=["data1/2020","data1/2021", "data1/2022", "data2/2020", "data2/2021", "data2/2022"])
ticker = ['APPL', 'MICO']
df2.insert(loc=0, column='ticker', value=ticker)
df2.head()
df3=df2.melt(id_vars='ticker', var_name='col') # missed line in earlier posting
df3[['txt','year']] = df2.melt(id_vars='ticker', var_name='col')['col'].str.split('/', expand=True)
df3.head()
df3.assign(ticker2=df3['ticker'] + '/' + df3['year']).pivot(index='ticker2', columns='txt', values='value').reset_index()
txt ticker2 data1 data2
0 APPL/2020 26 9
1 APPL/2021 75 59
2 APPL/2022 20 44
3 MICO/2020 79 90
4 MICO/2021 63 30
5 MICO/2022 73 91
:Edit: fixed a misunderstanding on my part - i am getting a nested list, not an array.
i'm working with a function in a for loop - bootstrapping some model predictions.
code looks like this:
def revenue(product):
revenue = predict * 4500
profit = revenue - 500000
return profit
and the loop i am feeding it into looks like this:
# set up a loop to select 500 random samples and train our region 2 data set
model = LinearRegression(fit_intercept = True, normalize = False)
features = r2_test.drop(['product'],axis=1)
values = []
for i in range(1000):
subsample = r2_test.sample(500,replace=False)
features = subsample.drop(['product'],axis=1)
predict = model2.predict(features)
result = (revenue(predict))
values.append(result)
so doing a 1000 loop of predictions on 500 samples from this dataframe:
id f0 f1 f2 product
0 74613 -15.001348 -8.276000 -0.005876 3.179103
1 9753 14.272088 -3.475083 0.999183 26.953261
2 93502 6.263187 -5.948386 5.001160 134.766305
3 33405 -13.081196 -11.506057 4.999415 137.945408
4 16486 12.702195 -8.147433 5.004363 134.766305
5 27901 -3.327590 -2.205276 3.003647 84.038886
6 69620 -11.142655 -10.133399 4.002382 110.992147
7 78940 4.234715 -0.001354 2.004588 53.906522
8 56159 13.355129 -0.332068 4.998647 134.766305
9 73142 1.069227 -11.025667 4.997844 137.945408
10 12663 11.777049 -5.334084 2.003033 53.906522
11 39849 16.320755 -0.562946 -0.001783 0.000000
12 61800 7.736313 -6.093374 3.982531 107.813044
13 72213 6.695604 -0.749449 -0.007630 0.000000
14 5479 -10.985487 -5.605994 2.991130 84.038886
15 6297 -0.347599 -6.275884 -0.003448 3.179103
16 88123 12.300570 2.944454 2.005541 53.906522
17 68352 8.900460 -5.632857 4.994324 134.766305
18 99029 -13.412826 -4.729495 2.998590 84.038886
19 64238 -4.373526 -8.590017 2.995379 84.038886
now, once i have my output, i want to select the top 200 predictions from each iteration, i'm using this loop:
# calculate the max value of each of the 500 iterations, then total them for the total profit
top_200 = []
for i in range(0,500):
profits = values.nlargest(200,[i],keep = 'all')
top_200.append(profits)
the problem i am running into is - when i feed values into the top_200 loop, i end up with an array of the selected 200 by column:
[ 0 1 2 3 \
628 125790.297387 -10140.964686 -361625.210913 -243132.040492
32 125429.134599 -368765.455544 -249361.525792 -497190.522207
815 124522.095794 -1793.660411 -11410.126264 114928.508488
645 123891.732231 115946.193531 104048.117460 -246350.752024
119 123063.545808 -124032.987348 -367200.191889 -131237.863430
.. ... ... ... ...
but i'd like to turn it into a dataframe, however, i haven't figured out how to do that while preserving the structure where 0 has it's 200 values, 1 has it's 200 values, etc.
i thought i could do something like:
top_200 = pd.DataFrame(top_200,columns= range(0,500))
and it gives me 500 columns, but only column 0 has anything in it and i end up with a [500,500] dataframe instead of the anticipated 200 rows by 500 columns.
i'm fairly sure there is a good way to do this, but my searching thus far has not turned anything up. I also am not sure what i am looking for is called so, i'm not sure what exactly i am looking for.
any input would be appreciated! Thanks in advance.
:Further editing:
so now that i know i'm getting a lists of lists, not an array, i thought i'd try to write to a dataframe instead:
# calculate the top 200 values of each of the 500 iterations
top_200 = pd.DataFrame(columns=['profits'])
for i in range(0,500):
top_200.loc[i] = i
profits = values.nlargest(200,[i],keep = 'all')
top_200.append(profits)
top_200.head()
but i've futzed something up here as my results are:
profits
0 0
1 1
2 2
3 3
4 4
where my expected results would be something like:
col 1 col2 col3
0 first n_largest first n_largest first n_largest
1 second n_largest second n_largest second n_largest
3 third n_largest third n_largest third n_largest
So, After doing some research based on #CygnusX 's recommended question i figured out that i was laboring under the impression that i had an array as the output, but of course top-200 = [] is a list, which, when combined with the nlargest gives me a list of lists.
Now that i understood the problem better, i converted the list of lists into a dataframe, and then transposed the data - which gave me the results i was looking for.
# calculate the max value of each of the 500 iterations, then total them for the total profit
top_200 = []
for i in range(0,500):
profits = (values.nlargest(200,[i],keep = 'all')).mean()
top_200.append(profits)
test = pd.DataFrame(top_200)
test = test.transpose()
output (screenshot, because, 500 columns.):
there is probably a more elegant way to accomplish this, like not using a list but a dataframe, but, i couldn't get the .append to work the way i wanted in a dataframe, since i wanted to preserve the list of 200 nlargest, not just have a sum or a mean. (which the append worked great for!)
I have this table with models df['model'] and
pd.value_counts(df2['model'].values, sort=True)
returns this:
'''
MONSTER 331
MULTISTRADA 134
HYPERMOTARD 69
SCRAMBLER 63
SUPERSPORT 31
...
900 1
T-MAX 1
FC 1
GTS 1
SCOUT 1
Length: 75, dtype: int64
'''
I want to rename all the values in df2['model'] that have count <5 into 'OTHER'.
Please can anyone help me, how to go about this?
You first can get a list of the categories you want to change to other with the first line of code. It takes your functiona and selects the rows which meet the condicion you want (in this case less than 5 occurences).
Then you select the dataframe and just select the rows whose model cell is in the list of categories you want to substitute and change te value to 'OTHER'.
other_classes = data['model'].value_counts()[data['model'].value_counts() < 5].index
data['model'][data['model'].isin(other_classes)] = 'OTHER'
Hope it helps
I suspect it is not at all elegant or pythonic, but this worked in the end:
df_pooled_other = df_final.assign(freq=df_final.groupby('model name')['model name'].transform('count'))\
.sort_values(by=['freq','model name', 'Age in months_x_x'],ascending=[False,True, True])
df_pooled_other['model name'] = np.where(df_pooled_other['freq'] <= 5, 'Other', df_pooled_other['model name'])
Using Python, how do I break a text file into data frames where every 84 rows is a new, different dataframe? The first column x_ft is the same value every 84 rows then increments up by 5 ft for the next 84 rows. I need each identical x_ft value and corresponding values in the row for the other two columns (depth_ft and vel_ft_s) to be in the new dataframe too.
My text file is formatted like this:
x_ft depth_ft vel_ft_s
0 270 3535.755 551.735107
1 270 3534.555 551.735107
2 270 3533.355 551.735107
3 270 3532.155 551.735107
4 270 3530.955 551.735107
.
.
33848 2280 3471.334 1093.897339
33849 2280 3470.134 1102.685547
33850 2280 3468.934 1113.144287
33851 2280 3467.734 1123.937134
I have tried many, many different ways but keep running into errors and would really appreciate some help.
I suggest looking into pandas.read_table, which automatically outputs a DataFrame. Once doing so, you can isolate the rows of the DataFrame that you are looking to separate (every 84 rows) by doing something like this:
df = #Read txt datatable with Pandas
arr = []
#This gives you an array of all x values in your dataset
for x in range(0,403):
val = 270+5*x
arr.append(val)
#This generates csv files for every row with a specific x_ft value with its corresponding columns (depth_ft and vel_ft_s)
for x_value in arr:
tempdf = df[(df['x_ft'])] = x_value
tempdf.to_csv("df"+x_value+".csv")
You can get indexes to split your data:
rows = 84
datasets = round(len(data)/rows) # total datasets
index_list = []
for index in data.index:
x = index % rows
if x == 0:
index_list.append(index)
print(index_list)
So, split original dataset by indexes:
l_mod = index_list + [max(index_list)+1]
dfs_list = [data.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]
print(len(dfs_list))
Outputs
print(type(dfs_list[1]))
# pandas.core.frame.DataFrame
print(len(dfs_list[0]))
# 84
Let's say I have some data like this:
timestamp
patient_id
99 10
99 100
3014 20
3014 200
How exactly would one in pandas be able to find the largest, smallest, and average range of timestamps per patient id?
What I'm looking for is to be able to report this:
shortest range = 90 (100 - 10)
longest range = 180 (200 - 20)
average range = (180 + 90) / 2 = 135
The Setup
Create dummy DataFrame:
import pandas as pd
data = '''99 10
99 100
3014 20
3014 200'''.split('\n')
Using two nested list comprehensions, split the rows, then the columns and convert all elements to int. Then import into a DataFrame.
data = [[int(n) for n in item.split()] for item in data]
DF = pd.DataFrame(data, columns=['pid', 'timestamp'])
As learning exercise, loop through each group (assumes arbitrary number of timestamps per pid, not just two). This is not the solution -- it is just to demonstrate how groupby works:
for pid, grp in DF.groupby('pid'):
print(pid, grp.timestamp.min(), grp.timestamp.max())
# Prints:
# (99, 10, 100)
# (3014, 20, 200)
The Solution
The solution is more efficient: get vector of mins and maxs, extract ranges, and then find the min, max, and average of the ranges. The strength of Pandas is that it will operate on any column in the DataFrame as a unit, making the calculation on arrays very simple, like this:
mins = DF.groupby('pid').timestamp.min()
maxs = DF.groupby('pid').timestamp.max()
ranges = maxs - mins
shortest_range = ranges.min()
longest_range = ranges.max()
average_range = ranges.mean()
print(shortest_range, longest_range, average_range)
# (90, 180, 135.0)