Pandas Groupby, MultiIndex, Multiple Columns

Pandas Groupby, MultiIndex, Multiple Columns - python

I just worked on creating some columns using .transform() to count some entries.
I used this reference.
For example:
userID deviceName POWER_DOWN USER LOW_RSSI NONE CMD_SUCCESS
0 24 IR_00 85 0 39 0 0
1 24 IR_00 85 0 39 0 0
2 24 IR_00 85 0 39 0 0
3 24 IR_00 85 0 39 0 0
4 25 BED_08 0 109 78 0 0
5 25 BED_08 0 109 78 0 0
6 25 BED_08 0 109 78 0 0
7 24 IR_00 85 0 39 0 0
8 23 IR_09 2 0 0 0 0
9 23 V33_17 3 0 2 0 134
10 23 V33_17 3 0 2 0 134
11 23 V33_17 3 0 2 0 134
12 23 V33_17 3 0 2 0 134
I want to group them by userID and deviceName?
So that it would look like:
userID deviceName POWER_DOWN USER LOW_RSSI NONE CMD_SUCCESS
0 23 IR_09 2 0 0 0 0
1 V33_17 3 0 2 0 134
2 24 IR_00 85 0 39 0 0
3 25 BED_08 0 109 78 0 0
I also want them to be sorted by userID and maybe make userID and deviceName as multi-index.
I tried the df = df.groupby(['userID', 'deviceName'])
but returned a
<pandas.core.groupby.DataFrameGroupBy object at0x00000249BBB13DD8>.
not the dataframe.
By the way, Im sorry. I dont know how to copy a Jupyter notebook In and Out.

I believe need drop_duplicates with sort_values:
df1 = df.drop_duplicates(['userID', 'deviceName']).sort_values('userID')
print (df1)
userID deviceName POWER_DOWN USER LOW_RSSI NONE CMD_SUCCESS
8 23 IR_09 2 0 0 0 0
9 23 V33_17 3 0 2 0 134
0 24 IR_00 85 0 39 0 0
4 25 BED_08 0 109 78 0 0
If want create MultiIndex add set_index:
df1 = (df.drop_duplicates(['userID', 'deviceName'])
.sort_values('userID')
.set_index(['userID', 'deviceName']))
print (df1)
POWER_DOWN USER LOW_RSSI NONE CMD_SUCCESS
userID deviceName
23 IR_09 2 0 0 0 0
V33_17 3 0 2 0 134
24 IR_00 85 0 39 0 0
25 BED_08 0 109 78 0 0

Related

Read, Process and save bson file

I have a .bson file.
Inside the .bson file, I have a PDF whose data type is bytes.
I need to burn the PDF. which is inside the .bson file in a readable format. Does PDF make sense?
I need help, the steps I have to do in between
Note: I already saved the content in a PDF file and it says the file is damaged
My code:
with open ('LOL.bson') as myfile:
content = myfile.read()
print(content)
{"_id":{"$oid":"59d3522618206812388e35f1"},"files_id":{"$oid":"59d3522618206812388e35f0"},"n":0,"data":{"$binary":"JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhwdC1QVCkgL1N0cnVjdFRyZWVSb290IDUzIDAgUi9NYXJrSW5mbzw8L01hcmtlZCB0cnVlPj4+Pg0KZW5kb2JqDQoyIDAgb2JqDQo8PC9UeXBlL1BhZ2VzL0NvdW50IDEvS2lkc1sgMyAwIFJdID4+DQplbmRvYmoNCjMgMCBvYmoNCjw8L1R5cGUvUGFnZS9QYXJlbnQgMiAwIFIvUmVzb3VyY2VzPDwvRXh0R1N0YXRlPDwvR1M1IDUgMCBSL0dTOCA4IDA....
Type of data
read_content = bson.json_util.loads(content)
print(read_content['data'])
b'%PDF-1.5\r\n%\xb5\xb5\xb5\xb5\r\n1 0 obj\r\n<</Type/Catalog/Pages 2 0 R/Lang(pt-PT) /StructTreeRoot 130 0 R/MarkInfo<</Marked true>>>>\r\nendobj\r\n2 0 obj\r\n<</Type/Pages/Count 1/Kids[ 3 0 R] >>\r\nendobj\r\n3 0 obj\r\n<</Type/Page/Parent 2 0 R/Resources<</ExtGState<</GS5 5 0 R/GS8 8 0 R>>/Font<</F1 6 0 R/F2 29 0 R>>/XObject<</Image9 9 0 R/Image11 11 0 R/Image13 13 0 R/Image15 15 0 R/Image17 17 0 R/Image19 19 0 R/Image21 21 0 R/Image23 23 0 R/Image25 25 0 R/Image27 27 0 R/Image32 32 0 R/Image34 34 0 R/Image35 35 0 R/Image37 37 0 R/Image39 39 0 R/Image41 41 0 R/Image43 43 0 R/Image45 45 0 R/Image47 47 0 R/Image49 49 0 R/Image51 51 0 R/Image53 53 0 R/Image55 55 0 R/Image57 57 0 R/Image59 59 0 R/Image61 61 0 R/Image63 63 0 R/Image65 65 0 R/Image67 67 0 R/Image69 69 0 R/Image71 71 0 R/Image73 73 0 R/Image75 75 0 R/Image77 77 0 R/Image79 79 0 R/Image81 81 0 R/Image83 83 0 R/Image85 85 0 R/Image87 87 0 R/Image89 89 0 R/Image91 91 0 R/Image93 93 0 R/Image95 95 0 R/Image97 97 0 R/Image99 99 0 R/Image101 101 0 R/Image103 103 0 R/Image105 105 0 R/Image107 107 0 R/Image109 109 0 R/Image111 111 0 R/Image113 113 0 R/Image115 115 0 R/Image117 117 0 R/Image119 119 0 R/Image121 121 0 R/Image123 123 0 R/Image125 125 0 R/Image127 127 0 R>>/Pattern<</P31 31 0 R/P33 33 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 960 540] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>\r\nendobj\r\n4 0 obj\r\n<</Filter/FlateDecode/Length 4008>>\r\nstream\r\nx\x9c\xbd[\xcb\x8e\x1d\xb7\x11\xdd\x0f0\xff\xd0K\xc9\x80Z|?\x00\xc3\x0b?"\xd8\x88\x11\'V\x90\x85\xe1\x850\x91\x15\x07\x1a\t\x91\x8c
read_content = bson.json_util.loads(content)
print(type(read_content['data']))
> `<class 'bytes'>
How to save .bson content in a readable format (PDF).

Counting values in data frame rows against another df to see how many values are higher

I have two data frames
df2022fl One is a list of 24 rows
df One is one row of values
1759 columns in each df.
I want to reference every row in dataframe with 24 rows too count how many columns are above the corresponding column in the one row df.
I used the code below, but keep getting the error below the line of code
( df2022fl > df.T[df2022fl.columns].values ).sum(axis=1)
KeyError: "None of [Index(['id', 'table_position', 'performance_rank', 'risk', 'competition_id',\n 'suspended_matches', 'homeAttackAdvantage', 'homeDefenceAdvantage',\n 'homeOverallAdvantage', 'seasonGoals_overall',\n ...\n 'freekicks_total_over275_away', 'freekicks_total_over285_overall',\n 'freekicks_total_over285_home', 'freekicks_total_over285_away',\n 'freekicks_total_over295_overall', 'freekicks_total_over295_home',\n 'freekicks_total_over295_away', 'freekicks_total_over305_overall',\n 'freekicks_total_over305_home', 'freekicks_total_over305_away'],\n dtype='object', length=1759)] are in the [columns]"
I have no idea why this is happening as I removed all type object also, to have just float64 dtypes
Any ideas to help please?
df in text format, this is the dataframe with one row
234 5 5 42 32 0 4 -33 -2 54 30 84 55 29 54 31 19 30 20 10 35 31 34 56 49 58 74 71 71 3 4 -4 16 8 7 13 5 6 7 3 4 38 19 19 4 3 3 1 13 5 5 28 26 21 22 10 9 48 50 39 10 23 9 13 2 3 19 50 42 42 9 10 18 10 6 47 42 32 6 2 2 13 9 9 1 1 1 2 2 1 1 1 1 0 0 0 35 35 30 27 26 25 18 13 21 10 6 2 21 26 8 8 8 17 35 33 39 2 3 8 9 16 17 51 26 17 1 1 0 0 0 0 0 0 0 0 0 0 20 12 7 16 7 5 37 19 14 -8 -2 -9 0 3 5 14 27 34 0 8 13 37 60 81 96 85 67 44 26 4 37 35 32 21 11 2 0 8 25 48 67 79 0 2 5 11 16 18 92 78 65 37 16 0 18 16 14 7 3 0 0 0 0 11 50 83 0 0 0 2 11 16 92 83 67 48 21 4 19 19 16 11 5 1 1 8 24 3 17 52 4 25 48 1 6 11 0 9 57 0 2 12 38 19 19 25 25 22 15 14 9 5 3 2 66 64 49 39 36 19 13 8 4 12 12 9 6 5 3 1 1 1 63 63 47 32 26 16 5 5 5 13 13 12 9 9 4 3 2 0 68 63 50 46 39 17 13 9 0 31 24 19 13 8 5 2 82 63 50 34 21 13 5 16 14 11 6 3 2 1 84 74 56 32 16 11 4 15 10 8 7 5 2 1 78 53 42 37 26 9 5 26 21 15 10 5 3 2 57 47 32 21 11 6 4 12 9 3 2 1 0 0 52 41 14 9 5 0 0 14 12 9 6 2 1 0 61 52 38 25 8 4 0 37 34 25 18 12 4 2 0 0 94 81 57 46 28 8 4 0 0 18 15 11 8 5 1 1 0 0 88 74 46 39 21 4 4 0 0 19 19 14 10 5 3 1 0 0 96 83 63 42 26 13 4 0 0 29 19 7 2 0 0 0 75 40 15 4 0 0 0 12 8 3 2 0 0 0 63 33 13 8 0 0 0 17 10 4 0 0 0 0 83 42 17 0 0 0 0 33 21 11 5 0 0 0 77 55 25 11 0 0 0 17 10 4 2 0 0 0 71 46 17 8 0 0 0 16 11 4 1 0 0 0 83 57 21 4 0 0 0 5 6 2 7 7 14 30 29 30 176 91 85 66 27 35 8 8 9 4 4 4 161 63 94 3 3 3 10 0 4 0 1 1 1 374 229 145 9 12 7 177 88 75 197 127 70 4 4 3 5 6 3 48 51 46 9 9 8 377 182 195 151 66 79 62 28 31 32 16 16 3 2 3 1 1 1 32 31 27 19 12 6 4 96 78 59 41 26 13 9 16 15 12 7 5 1 1 96 78 52 30 22 4 4 16 16 14 10 7 4 1 96 75 57 46 28 17 4 30 18 4 1 0 0 0 78 38 9 2 0 0 0 16 9 0 0 0 0 0 78 39 0 0 0 0 0 14 9 4 1 0 0 0 75 38 17 4 0 0 0 8 4 3 17 17 13 38 19 19 6 4 1 13 17 5 7 3 3 15 13 13 1 1 0 3 5 0 1 1 0 0 0 0 0 0 0 46 31 15 14 9 5 32 19 10 4 10 26 11 26 67 3 7 14 13 37 70 0 3 12 0 16 63 57 33 24 1 1 1 14 8 5 30 35 22 14 5 9 33 26 39 5 2 3 13 11 13 0 2 0 6 4 2 16 21 9 6 2 2 13 9 9 18 9 9 39 39 39 17 6 10 38 26 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 3 3 1 3 4 4 3 6 3 5 1 5 4 8 2 5 5 5 7 7 10 7 10 11 9 9 16 13 12 16 3 4 5 7 9 11 5 3 6 5 3 5 1 1 0 2 3 2 4 3 4 1 1 2 4 5 6 0 0 0 1 0 2 0 2 1 2 1 2 2 1 3 1 2 0 4 6 7 5 6 5 3 2 9 9 9 9 0 0 0 0 0 2 0 1 2 0 0 3 2 1 3 1 2 0 2 1 1 0 0 1 2 1 2 2 1 1 2 2 1 3 1 3 2 2 6 3 3 7 4 3 7 42 0 0 0 0 0 0 -2 -1 -1 -2 -1 -1 1 5 17 28 2 11 37 67 1 3 9 15 4 13 39 65 0 1 6 12 0 5 28 56 0 1 5 24 0 3 13 63 0 1 5 13 0 4 21 54 0 0 0 10 0 0 0 53 37 18 19 38 19 19 44 21 21 92 44 48 19 8 7 40 19 20 22 11 11 47 23 22 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 2 2 2 50 52 48 23 22 25 0 0 0 27 27 21 40 28 39 15 9 18 14 11 9 56 52 61 34 36 25 50 43 56 66 27 35 73 38 34 1 1 1 1 1 1 139 66 73 3 2 3 4 1 1 3 1 1 3 1 1 0 0 0 38 19 19 10 11 9 3 5 3 2 3 6 4 1 12 8 7 4 8 5 5 3 4 2 2 1 31 21 16 9 42 25 23 14 17 9 9 4 19 16 13 5 4 3 11 9 4 42 36 28 23 18 14 57 47 21 19 13 11 8 4 4 13 11 9 7 7 7 6 2 2 1 1 1 50 34 28 21 11 11 61 56 47 37 37 37 32 11 9 5 5 5 28 18 12 6 11 7 5 2 13 7 3 1 62 40 31 13 50 32 23 9 68 37 16 5 1 0 1 32 15 17 1 0 0 2 0 4 78 74 74 2 0 0 4 1 2 0 0 0 1 0 0 8 4 11 0 0 0 2 0 0 32 16 16 350 172 178 10 10 11 23 27 20 6 9 21 13 8 13 26 17 4 2 5 6 7 2 2 0 0 0 1 1 3 2 0 0 0 1 4 3 1 0 0 0 0 38 19 19 10 8 1 1 1 0 26 35 5 2 4 0 0 0 0 3 1 0 26 11 13 6 4 1 7 3 3 10 7 3 12 3 5 15 13 13 22 13 25 13 21 0 0 0 7 4 0 57 48 54 15 17 4 30 6 6 6 2 2 3 9 10 7 18 12 6 0 0 0 0 0 0 41 52 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 2 7 2 3 17 8 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
df2022fl below why is the 24 rows dataframe

Compare both the dataframe using
df2022fl.ge(df.iloc[0]).sum()
This gives us the number of values in df2022fl which is greater than the value in df
Output :
id 24
table_position 20
performance_rank 20
risk 23
competition_id 24
..
freekicks_total_over295_home 24
freekicks_total_over295_away 24
freekicks_total_over305_overall 24
freekicks_total_over305_home 24
freekicks_total_over305_away 24
Length: 1759, dtype: int64
To get the number of column which came out to be greater than the values in dataframe df you can use axis = 1.
df2022fl['stats'] = df2022fl.ge(df.iloc[0]).sum(axis=1)
This gives you expected output :
id table_position ... freekicks_total_over305_away stats
1 234.0 6.0 ... 0.0 1688
2 235.0 18.0 ... 0.0 1529
3 236.0 16.0 ... 0.0 1565
4 237.0 24.0 ... 0.0 1409
5 242.0 3.0 ... 0.0 1566
6 244.0 4.0 ... 0.0 1681
7 246.0 23.0 ... 0.0 1607
8 247.0 5.0 ... 0.0 1642
9 248.0 14.0 ... 0.0 1603
10 253.0 15.0 ... 0.0 1575
11 254.0 12.0 ... 0.0 1554
12 255.0 13.0 ... 0.0 1593
13 257.0 20.0 ... 0.0 1533
14 258.0 21.0 ... 0.0 1537
15 259.0 9.0 ... 0.0 1585
16 262.0 17.0 ... 0.0 1488
17 265.0 11.0 ... 0.0 1647
18 267.0 7.0 ... 0.0 1628
19 268.0 2.0 ... 0.0 1615
20 1020.0 1.0 ... 0.0 1601
21 1827.0 8.0 ... 0.0 1603
22 1833.0 22.0 ... 0.0 1587
23 3124.0 19.0 ... 0.0 1594
24 3141.0 10.0 ... 0.0 1623

how to replace specific data using pandas python or excel

I having a data csv file containing some data. In which i have some data within semi colons. In these semi colon there is some specific id numbers and i need to replace it with the specific location name.
Available data
24CFA4A-12L - GF Electrical corridor
Replacing data within semicolons of id number
1;1;35;;1/2/1/37 24CFA4A;;;0;;;
Files with data - https://gofile.io/d/bQDppz
Thank you if anyone have solution.
[![Main data to replaced after finding id number and replacing with location ][3]][3]

Supposing you have dataframes:
df1 = pd.read_excel("ID_list.xlsx", header=None)
df2 = pd.read_excel("location.xlsx", header=None)
df1:
0
0 1;1;27;;1/2/1/29 25BAB3D;;;0;;;
1 1;1;27;1;;;;0;;;
2 1;1;28;;1/2/1/30 290E6D2;;;0;;;
3 1;1;28;1;;;;0;;;
4 1;1;29;;1/2/1/31 28BA737;;;0;;;
5 1;1;29;1;;;;0;;;
6 1;1;30;;1/2/1/32 2717823;;;0;;;
7 1;1;30;1;;;;0;;;
8 1;1;31;;1/2/1/33 254DEAA;;;0;;;
9 1;1;31;1;;;;0;;;
10 1;1;32;;1/2/1/34 28AE041;;;0;;;
11 1;1;32;1;;;;0;;;
12 1;1;33;;1/2/1/35 254DE82;;;0;;;
13 1;1;33;1;;;;0;;;
14 1;1;34;;1/2/1/36 2539D70;;;0;;;
15 1;1;34;1;;;;0;;;
16 1;1;35;;1/2/1/37 24CFA4A;;;0;;;
17 1;1;35;1;;;;0;;;
18 1;1;36;;1/2/1/39 28F023E;;;0;;;
19 1;1;36;1;;;;0;;;
20 1;1;37;;1/2/1/40 2717831;;;0;;;
21 1;1;37;1;;;;0;;;
22 1;1;38;;1/2/1/41 2397D75;;;0;;;
23 1;1;38;1;;;;0;;;
24 1;1;39;;1/2/1/42 287844C;;;0;;;
25 1;1;39;1;;;;0;;;
26 1;1;40;;1/2/1/43 28784F0;;;0;;;
27 1;1;40;1;;;;0;;;
28 1;1;41;;1/2/1/44 2865B67;;;0;;;
29 1;1;41;1;;;;0;;;
30 1;1;42;;1/2/1/45 2865998;;;0;;;
31 1;1;42;1;;;;0;;;
32 1;1;43;;1/2/1/46 287852F;;;0;;;
33 1;1;43;1;;;;0;;;
34 1;1;44;;1/2/1/47 287AC43;;;0;;;
35 1;1;44;1;;;;0;;;
36 1;1;45;;1/2/1/48 287ACF8;;;0;;;
37 1;1;45;1;;;;0;;;
38 1;1;46;;1/2/1/49 2878586;;;0;;;
39 1;1;46;1;;;;0;;;
40 1;1;47;;1/2/1/50 2878474;;;0;;;
41 1;1;47;1;;;;0;;;
42 1;1;48;;1/2/1/51 2846315;;;0;;;
df2:
0 1
0 GF General Dining TC 254DEAA-02L
1 GF General Dining TC 2717823-26L
2 GF General Dining FC 28BA737-50L
3 GF Preparation FC 25BAB3D-10L
4 GF Preparation TC 290E6D2-01M
5 GF Hospital Kitchen FC 25BAB2F-10L
6 GF Hospital Kitchen TC 2906F5C-01M
7 GF Food Preparation FC 25F5723-10L
8 GF Food Preparation TC 29070D6-01M
9 GF KITCHEN Corridor 254DF5D-02L
Then:
df1 = df1[0].str.split(";", expand=True)
df1[4] = df1[4].apply(lambda x: v[-1] if (v := x.split()) else "")
df2[1] = df2[1].apply(lambda x: x.split("-")[0])
df1:
0 1 2 3 4 5 6 7 8 9 10
0 1 1 27 25BAB3D 0
1 1 1 27 1 0
2 1 1 28 290E6D2 0
3 1 1 28 1 0
4 1 1 29 28BA737 0
5 1 1 29 1 0
6 1 1 30 2717823 0
7 1 1 30 1 0
8 1 1 31 254DEAA 0
9 1 1 31 1 0
10 1 1 32 28AE041 0
11 1 1 32 1 0
12 1 1 33 254DE82 0
13 1 1 33 1 0
14 1 1 34 2539D70 0
15 1 1 34 1 0
16 1 1 35 24CFA4A 0
17 1 1 35 1 0
18 1 1 36 28F023E 0
19 1 1 36 1 0
20 1 1 37 2717831 0
21 1 1 37 1 0
22 1 1 38 2397D75 0
23 1 1 38 1 0
24 1 1 39 287844C 0
25 1 1 39 1 0
26 1 1 40 28784F0 0
27 1 1 40 1 0
28 1 1 41 2865B67 0
29 1 1 41 1 0
30 1 1 42 2865998 0
31 1 1 42 1 0
32 1 1 43 287852F 0
33 1 1 43 1 0
34 1 1 44 287AC43 0
35 1 1 44 1 0
36 1 1 45 287ACF8 0
37 1 1 45 1 0
38 1 1 46 2878586 0
39 1 1 46 1 0
40 1 1 47 2878474 0
41 1 1 47 1 0
42 1 1 48 2846315 0
df2:
0 1
0 GF General Dining TC 254DEAA
1 GF General Dining TC 2717823
2 GF General Dining FC 28BA737
3 GF Preparation FC 25BAB3D
4 GF Preparation TC 290E6D2
5 GF Hospital Kitchen FC 25BAB2F
6 GF Hospital Kitchen TC 2906F5C
7 GF Food Preparation FC 25F5723
8 GF Food Preparation TC 29070D6
9 GF KITCHEN Corridor 254DF5D
To replace the values:
m = dict(zip(df2[1], df2[0]))
df1[4] = df1[4].replace(m)
df1:
0 1 2 3 4 5 6 7 8 9 10
0 1 1 27 GF Preparation FC 0
1 1 1 27 1 0
2 1 1 28 GF Preparation TC 0
3 1 1 28 1 0
4 1 1 29 GF General Dining FC 0
5 1 1 29 1 0
6 1 1 30 GF General Dining TC 0
7 1 1 30 1 0
8 1 1 31 GF General Dining TC 0
9 1 1 31 1 0
10 1 1 32 28AE041 0
11 1 1 32 1 0
12 1 1 33 254DE82 0
13 1 1 33 1 0
14 1 1 34 2539D70 0
15 1 1 34 1 0
16 1 1 35 24CFA4A 0
17 1 1 35 1 0
18 1 1 36 28F023E 0
19 1 1 36 1 0
20 1 1 37 2717831 0
21 1 1 37 1 0
22 1 1 38 2397D75 0
23 1 1 38 1 0
24 1 1 39 287844C 0
25 1 1 39 1 0
26 1 1 40 28784F0 0
27 1 1 40 1 0
28 1 1 41 2865B67 0
29 1 1 41 1 0
30 1 1 42 2865998 0
31 1 1 42 1 0
32 1 1 43 287852F 0
33 1 1 43 1 0
34 1 1 44 287AC43 0
35 1 1 44 1 0
36 1 1 45 287ACF8 0
37 1 1 45 1 0
38 1 1 46 2878586 0
39 1 1 46 1 0
40 1 1 47 2878474 0
41 1 1 47 1 0
42 1 1 48 2846315 0

Reshape data to new format for object detection

I have a data set in this format in dataframe
0--Parade/0_Parade_marchingband_1_849.jpg
2
449 330 122 149 0 0 0 0 0 0
0--Parade/0_Parade_Parade_0_904.jpg
1
361 98 263 339 0 0 0 0 0 0
0--Parade/0_Parade_marchingband_1_799.jpg
45
78 221 7 8 0 0 0 0 0
78 238 14 17 2 0 0 0 0 0
3 232 11 15 2 0 0 0 2 0
20 215 12 16 2 0 0 0 2 0
0--Parade/0_Parade_marchingband_1_117.jpg
23
69 359 50 36 1 0 0 0 0 1
227 382 56 43 1 0 1 0 0 1
296 305 44 26 1 0 0 0 0 1
353 280 40 36 2 0 0 0 2 1
885 377 63 41 1 0 0 0 0 1
819 391 34 43 2 0 0 0 1 0
727 342 37 31 2 0 0 0 0 1
598 246 33 29 2 0 0 0 0 1
740 308 45 33 1 0 0 0 2 1
0--Parade/0_Parade_marchingband_1_778.jpg
35
27 226 33 36 1 0 0 0 2 0
63 95 16 19 2 0 0 0 0 0
64 63 17 18 2 0 0 0 0 0
88 13 16 15 2 0 0 0 1 0
231 1 13 13 2 0 0 0 1 0
263 122 14 20 2 0 0 0 0 0
367 68 15 23 2 0 0 0 0 0
198 98 15 18 2 0 0 0 0 0
293 161 52 59 1 0 0 0 1 0
412 36 14 20 2 0 0 0 1 0
Can anyone tell me how to put these in dataframe where 1st column contain all the .jpg path next column contains all the coordinates but all the coordinate should be in correspondence to that .jpg path
eg.
Column1 coulmn2 column3
0--Parade/0_Parade_marchingband_1_849.jpg | 2 | 449 330 122 149 0 0 0 0 0 0
0--Parade/0_Parade_Parade_0_904.jpg | 1 | 361 98 263 339 0 0 0 0 0 0
0--Parade/0_Parade_marchingband_1_799.jpg | 45 | 78 221 7 8 0 0 0 0 0
| | 78 238 14 17 2 0 0 0 0 0
| | 3 232 11 15 2 0 0 0 2 0
| | 20 215 12 16 2 0 0 0 2 0
I have tried this
count1=0
count2=0
dict1 = {}
dict2 = {}
dict3 = {}
for i in data[0]:
if (i.find('.jpg') == -1):
dict1[count1] = i
count1+=1
else:
dict2[count2] = i
count2+=1

How to update values in third column using 2 columns values?

I want to update the Column with 0 or 1, where for each empID the month is minimum and Sal Hike is Max:
I have written the code to find Min Month and Max Sal Hike for each employee.
df.sort_values(['salhike','month'],ascending=[False,True]).groupby("empid").head(1)
How can I update this in "Yes_or_No" with 1 col'n?
Input DF:
empid age salhike month YES_or_NO
123 23 12 1 0
123 23 24 2 0
123 23 87 3 0
123 23 35 4 0
111 23 87 1 0
111 23 35 2 0
111 23 14 3 0
111 23 12 4 0
I am trying to get output table is:
empid age salhike month YES_or_NO
123 23 12 1 0
123 23 24 2 0
123 23 87 3 1
123 23 35 4 0
111 23 87 1 1
111 23 35 2 0
111 23 14 3 0
111 23 12 4 0

Try, using sort_values, then duplicated with subset on empid and convert boolean series to integer and assign back to column in dataframe:
df.assign(YES_or_NO = (~df.sort_values(['empid','salhike'])
.duplicated(subset='empid', keep='last')).astype(int))
df.assign(YES_or_NO = (~df.sort_values(['salhike','month'],
ascending=['True','False','False'])
.duplicated(subset='empid', keep='last')).astype(int))
Output:
empid age salhike month YES_or_NO
0 123 23 12 1 0
1 123 23 24 2 0
2 123 23 87 3 1
3 123 23 35 4 0
4 111 23 87 1 1
5 111 23 35 2 0
6 111 23 14 3 0
7 111 23 12 4 0

Using groupby transform max
df['YES_or_NO']=df.salhike.eq(df.groupby('empid')['salhike'].transform('max')).astype(int)
df
Out[380]:
empid age salhike month YES_or_NO
0 123 23 12 1 0
1 123 23 24 2 0
2 123 23 87 3 1
3 123 23 35 4 0
4 111 23 87 1 1
5 111 23 35 2 0
6 111 23 14 3 0
7 111 23 12 4 0
Update
df['YES_or_NO']=0
df.loc[df.groupby('empid')['salhike'].idxmax(),'YES_or_NO']=1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas Groupby, MultiIndex, Multiple Columns - python

Related

Read, Process and save bson file

Counting values in data frame rows against another df to see how many values are higher

how to replace specific data using pandas python or excel

Reshape data to new format for object detection

How to update values in third column using 2 columns values?

Categories

Resources