Use mapping between two columns to create chain in pandas dataframe

Use mapping between two columns to create chain in pandas dataframe - python

Here is a test dataframe. I want to use the relationship between EmpID and MgrID to further map the manager of MgrID in a new column.
Test_df = pd.DataFrame({'EmpID':['1','2','3','4','5','6','7','8','9','10'],
'MgrID':['4','4','4','6','8','8','10','10','10','12']})
Test_df
If I create a dictionary for the initial relationship, I will be able to create the first link of the chain, but I affraid I need to loop through each of the new columns to create a new one.
ID_Dict = {'1':'4',
'2':'4',
'3':'4',
'4':'6',
'5':'8',
'6':'8',
'7':'10',
'8':'10',
'9':'10',
'10':'12'}
Test_df['MgrID_L2'] = Test_df['MgrID'].map(ID_Dict)
Test_df
What is the most efficient way to do this?
Thank you!

Here's a way with a simple while loop. Note I changed the name of MgrID to MgrID_1
Test_df = pd.DataFrame({'EmpID':['1','2','3','4','5','6','7','8','9','10'],
'MgrID_1':['4','4','4','6','8','8','10','10','10','12']})
d = Test_df.set_index('EmpID').MgrID_1.to_dict()
s = 2
while s:
Test_df['MgrID_'+str(s)] = Test_df['MgrID_'+str(s-1)].map(d)
if Test_df['MgrID_'+str(s)].isnull().all():
Test_df = Test_df.drop(columns='MgrID_'+str(s))
s = 0
else:
s+=1
Ouptut: Test_df
EmpID MgrID_1 MgrID_2 MgrID_3 MgrID_4 MgrID_5
0 1 4 6 8 10 12
1 2 4 6 8 10 12
2 3 4 6 8 10 12
3 4 6 8 10 12 NaN
4 5 8 10 12 NaN NaN
5 6 8 10 12 NaN NaN
6 7 10 12 NaN NaN NaN
7 8 10 12 NaN NaN NaN
8 9 10 12 NaN NaN NaN
9 10 12 NaN NaN NaN NaN

Related

Remove pandas row that is based on previous row

I have the following dataframe, which the value should be increasing. Originally the dataframe has some unknown values.
index
value
0
1
1
2
3
2
4
5
6
7
4
8
9
10
3
11
3
12
13
14
15
5
Based on the assumsion that the value should be increasing, I would like to remove the value at index 10 and 11. This would be the desired dataframe:
index
value
0
1
1
2
3
2
4
5
6
7
4
8
9
12
13
14
15
5
Thank you very much

Assuming NaN in the empty cells (if not, temporarily replace them with NaN), use boolean indexing:
# if not NaNs uncomment below
# and use s in place of df['value'] afterwards
# s = pd.to_numeric(df['value'], errors='coerce')
# is the cell empty?
m1 = df['value'].isna()
# are the values strictly increasing?
m2 = df['value'].ge(df['value'].cummax())
out = df[m1|m2]
Output:
index value
1 1 NaN
2 2 NaN
3 3 2.0
4 4 NaN
5 5 NaN
6 6 NaN
7 7 4.0
8 8 NaN
9 9 NaN
12 12 NaN
13 13 NaN
14 14 NaN
15 15 5.0

Try this:
def del_df(df):
df_no_na = df.dropna().reset_index(drop = True)
num_tmp = df_no_na['value'][0] # First value which is not NaN.
del_index_list = [] # indicies to delete
for row_index in range(1, len(df_no_na)):
if df_no_na['value'][row_index] > num_tmp : #Increasing
num_tmp = df_no_na['value'][row_index] # to compare following two values.
else : # Not increasing(same or decreasing)
del_index_list.append(df_no_na['index'][row_index]) # index to delete
df_goal = df.drop([df.index[i] for i in del_index_list])
return df_goal
output:
index value
0 0 1.0
1 1 NaN
2 2 NaN
3 3 2.0
4 4 NaN
5 5 NaN
6 6 NaN
7 7 4.0
8 8 NaN
9 9 NaN
12 12 NaN
13 13 NaN
14 14 NaN
15 15 5.0

UFC record split into multiple columns in pandas

I'm trying to split a UFC record column into multiple columns and am having trouble. The data looks like this
record
1 22–8–1
2 18–7–1
3 12–4
4 8–2 (1 NC)
5 23–9–1
6 23–12
7 19–4–1
8 18–5–1 (1 NC)
The first number is wins, the second losses. If there is a third it is the draws, and if there is a parenthesis and a number it is the "no contests". I want to split it up and have it look like this.
wins loses draws no_contests
1 22 8 1 NaN
2 18 7 1 NaN
3 12 4 NaN NaN
4 8 2 NaN 1
5 23 9 1 NaN
6 23 12 NaN NaN
7 19 4 1 NaN
8 18 5 1 1
I tried using .str.split("-") which just made things more complicated for me. Then I tried making a for loop with a bunch of if statements to try and filter out some of the ore complicated records but failed miserably at that. Does anyone have any ideas as to what I could do? Thanks so much!
# So you can copy and paste the data in
import pandas as pd
data = {'record': ['22–8–1', '18–7–1', '12–4', '8–2 (1 NC)', '23–9–1', '23–12', '19–4–1', '18–5–1 (1 NC)']}
df = pd.DataFrame(data)

This is a job for pandas.Series.str.extract():
# Fix em-dashes
df['record'] = df['record'].str.replace('–', '-')
new_df = df['record'].str.extract(r'^(?P<wins>\d+)-(?P<loses>\d+)(?:-(?P<draws>\d+))?\s*(?:\((?P<no_contests>\d+) NC\))?')
Output:
>>> new_df
wins loses draws no_contests
0 22 8 1 NaN
1 18 7 1 NaN
2 12 4 NaN NaN
3 8 2 NaN 1
4 23 9 1 NaN
5 23 12 NaN NaN
6 19 4 1 NaN
7 18 5 1 1

pivot table in specific intervals pandas

I have a one column dataframe which looks like this:
Neive Bayes
0 8.322087e-07
1 3.213342e-24
2 4.474122e-28
3 2.230054e-16
4 3.957606e-29
5 9.999992e-01
6 3.254807e-13
7 8.836033e-18
8 1.222642e-09
9 6.825381e-03
10 5.275194e-07
11 2.224289e-06
12 2.259303e-09
13 2.014053e-09
14 1.755933e-05
15 1.889681e-04
16 9.929193e-01
17 4.599619e-05
18 6.944654e-01
19 5.377576e-05
I want to pivot it to wide format but with specific intervals. The first 9 rows should make up 9 columns of the first row, and continue this pattern until the final table has 9 columns and has 9 times fewer rows than now. How would I achieve this?

Using pivot_table:
df.pivot_table(columns=df.index % 9, index=df.index // 9, values='Neive Bayes')
0 1 2 3 4 \
0 8.322087e-07 3.213342e-24 4.474122e-28 2.230054e-16 3.957606e-29
1 6.825381e-03 5.275194e-07 2.224289e-06 2.259303e-09 2.014053e-09
2 6.944654e-01 5.377576e-05 NaN NaN NaN
5 6 7 8
0 0.999999 3.254807e-13 8.836033e-18 1.222642e-09
1 0.000018 1.889681e-04 9.929193e-01 4.599619e-05
2 NaN NaN NaN NaN

Construct multiindex, set_index and unstack
iix = pd.MultiIndex.from_arrays([np.arange(df.shape[0]) // 9,
np.arange(df.shape[0]) % 9])
df_wide = df.set_index(iix)['Neive Bayes'].unstack()
Out[204]:
0 1 2 3 4 \
0 8.322087e-07 3.213342e-24 4.474122e-28 2.230054e-16 3.957606e-29
1 6.825381e-03 5.275194e-07 2.224289e-06 2.259303e-09 2.014053e-09
2 6.944654e-01 5.377576e-05 NaN NaN NaN
5 6 7 8
0 0.999999 3.254807e-13 8.836033e-18 1.222642e-09
1 0.000018 1.889681e-04 9.929193e-01 4.599619e-05
2 NaN NaN NaN NaN

Add Uneven Sized Data Columns in Pandas

I want to add a list as a column to the df dataframe. The list has a different size than the column length.
df =
A B C
1 2 3
5 6 9
4
6 6
8 4
2 3
4
6 6
8 4
D = [11,17,18]
I want the following output
df =
A B C D
1 2 3 11
5 6 9 17
4 18
6 6
8 4
2 3
4
6 6
8 4
I am doing the following to extend the list to the size of the dataframe by adding "nan"
# number of nan value require for the list to match the size of the column
extend_length = df.shape[0]-len(D)
# extend the list
D.extend(extend_length * ['nan'])
# add to the dataframe
df["D"] = D
A B C D
1 2 3 11
5 6 9 17
4 18
6 6 nan
8 4 nan
2 3 nan
4 nan
6 6 nan
8 4 nan
Where "nan" is treated like string but I want it to be empty ot "nan", thus, if I search for number of valid cell in D column it will provide output of 3.

Adding the list as a Series will handle this directly.
D = [11,17,18]
df.loc[:, 'D'] = pd.Series(D)

A simple pd.concat on df and series of D as follows:
pd.concat([df, pd.Series(D, name='D')], axis=1)
or
df.assign(D=pd.Series(D))
Out[654]:
A B C D
0 1 2.0 3.0 11.0
1 5 6.0 9.0 17.0
2 4 NaN NaN 18.0
3 6 NaN 6.0 NaN
4 8 NaN 4.0 NaN
5 2 NaN 3.0 NaN
6 4 NaN NaN NaN
7 6 NaN 6.0 NaN
8 8 NaN 4.0 NaN

How to set a value of a panda dataframe between two indices?

I would like to set a value to a panda dataframe based on the values of another column. In a nutshell, for example, if I wanted to set indices of a column my_column of a pandas dataframe pd where another column, my_interesting_column is between 10 and 30, I would like to do something like:
start_index=pd.find_closest_index_where_pd["my_interesting_column"].is_closest_to(10)
end_index=pd.find_closest_index_where_pd["my_interesting_column"].is_closest_to(30)
pd["my_column"].between(star_index, end_index)= some_value
As a simple illustration, suppose I have the following dataframe
df = pd.DataFrame(np.arange(10, 20), columns=list('A'))
df["B"]=np.nan
>>> df
A B
0 10 NaN
1 11 NaN
2 12 NaN
3 13 NaN
4 14 NaN
5 15 NaN
6 16 NaN
7 17 NaN
8 18 NaN
9 19 NaN
How can I do something like
df.where(df["A"].is_between(13,16))= 5
So that the end results looks like
>>> df
A B
0 10 NaN
1 11 NaN
2 12 NaN
3 13 5
4 14 5
5 15 5
6 16 5
7 17 NaN
8 18 NaN
9 19 NaN

pd.loc[start_idx:end_idx, 'my_column'] = some_value
I think this is what you are looking for
df.loc[(df['A'] >= 13) & (df['A'] <= 16), 'B'] = 5

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Use mapping between two columns to create chain in pandas dataframe - python

Related

Remove pandas row that is based on previous row

UFC record split into multiple columns in pandas

pivot table in specific intervals pandas

Add Uneven Sized Data Columns in Pandas

How to set a value of a panda dataframe between two indices?

Categories

Resources