Here is a test dataframe. I want to use the relationship between EmpID and MgrID to further map the manager of MgrID in a new column.
Test_df = pd.DataFrame({'EmpID':['1','2','3','4','5','6','7','8','9','10'],
'MgrID':['4','4','4','6','8','8','10','10','10','12']})
Test_df
If I create a dictionary for the initial relationship, I will be able to create the first link of the chain, but I affraid I need to loop through each of the new columns to create a new one.
ID_Dict = {'1':'4',
'2':'4',
'3':'4',
'4':'6',
'5':'8',
'6':'8',
'7':'10',
'8':'10',
'9':'10',
'10':'12'}
Test_df['MgrID_L2'] = Test_df['MgrID'].map(ID_Dict)
Test_df
What is the most efficient way to do this?
Thank you!
Here's a way with a simple while loop. Note I changed the name of MgrID to MgrID_1
Test_df = pd.DataFrame({'EmpID':['1','2','3','4','5','6','7','8','9','10'],
'MgrID_1':['4','4','4','6','8','8','10','10','10','12']})
d = Test_df.set_index('EmpID').MgrID_1.to_dict()
s = 2
while s:
Test_df['MgrID_'+str(s)] = Test_df['MgrID_'+str(s-1)].map(d)
if Test_df['MgrID_'+str(s)].isnull().all():
Test_df = Test_df.drop(columns='MgrID_'+str(s))
s = 0
else:
s+=1
Ouptut: Test_df
EmpID MgrID_1 MgrID_2 MgrID_3 MgrID_4 MgrID_5
0 1 4 6 8 10 12
1 2 4 6 8 10 12
2 3 4 6 8 10 12
3 4 6 8 10 12 NaN
4 5 8 10 12 NaN NaN
5 6 8 10 12 NaN NaN
6 7 10 12 NaN NaN NaN
7 8 10 12 NaN NaN NaN
8 9 10 12 NaN NaN NaN
9 10 12 NaN NaN NaN NaN
Related
I have the following dataframe, which the value should be increasing. Originally the dataframe has some unknown values.
index
value
0
1
1
2
3
2
4
5
6
7
4
8
9
10
3
11
3
12
13
14
15
5
Based on the assumsion that the value should be increasing, I would like to remove the value at index 10 and 11. This would be the desired dataframe:
index
value
0
1
1
2
3
2
4
5
6
7
4
8
9
12
13
14
15
5
Thank you very much
Assuming NaN in the empty cells (if not, temporarily replace them with NaN), use boolean indexing:
# if not NaNs uncomment below
# and use s in place of df['value'] afterwards
# s = pd.to_numeric(df['value'], errors='coerce')
# is the cell empty?
m1 = df['value'].isna()
# are the values strictly increasing?
m2 = df['value'].ge(df['value'].cummax())
out = df[m1|m2]
Output:
index value
1 1 NaN
2 2 NaN
3 3 2.0
4 4 NaN
5 5 NaN
6 6 NaN
7 7 4.0
8 8 NaN
9 9 NaN
12 12 NaN
13 13 NaN
14 14 NaN
15 15 5.0
Try this:
def del_df(df):
df_no_na = df.dropna().reset_index(drop = True)
num_tmp = df_no_na['value'][0] # First value which is not NaN.
del_index_list = [] # indicies to delete
for row_index in range(1, len(df_no_na)):
if df_no_na['value'][row_index] > num_tmp : #Increasing
num_tmp = df_no_na['value'][row_index] # to compare following two values.
else : # Not increasing(same or decreasing)
del_index_list.append(df_no_na['index'][row_index]) # index to delete
df_goal = df.drop([df.index[i] for i in del_index_list])
return df_goal
output:
index value
0 0 1.0
1 1 NaN
2 2 NaN
3 3 2.0
4 4 NaN
5 5 NaN
6 6 NaN
7 7 4.0
8 8 NaN
9 9 NaN
12 12 NaN
13 13 NaN
14 14 NaN
15 15 5.0
I'm trying to split a UFC record column into multiple columns and am having trouble. The data looks like this
record
1 22–8–1
2 18–7–1
3 12–4
4 8–2 (1 NC)
5 23–9–1
6 23–12
7 19–4–1
8 18–5–1 (1 NC)
The first number is wins, the second losses. If there is a third it is the draws, and if there is a parenthesis and a number it is the "no contests". I want to split it up and have it look like this.
wins loses draws no_contests
1 22 8 1 NaN
2 18 7 1 NaN
3 12 4 NaN NaN
4 8 2 NaN 1
5 23 9 1 NaN
6 23 12 NaN NaN
7 19 4 1 NaN
8 18 5 1 1
I tried using .str.split("-") which just made things more complicated for me. Then I tried making a for loop with a bunch of if statements to try and filter out some of the ore complicated records but failed miserably at that. Does anyone have any ideas as to what I could do? Thanks so much!
# So you can copy and paste the data in
import pandas as pd
data = {'record': ['22–8–1', '18–7–1', '12–4', '8–2 (1 NC)', '23–9–1', '23–12', '19–4–1', '18–5–1 (1 NC)']}
df = pd.DataFrame(data)
This is a job for pandas.Series.str.extract():
# Fix em-dashes
df['record'] = df['record'].str.replace('–', '-')
new_df = df['record'].str.extract(r'^(?P<wins>\d+)-(?P<loses>\d+)(?:-(?P<draws>\d+))?\s*(?:\((?P<no_contests>\d+) NC\))?')
Output:
>>> new_df
wins loses draws no_contests
0 22 8 1 NaN
1 18 7 1 NaN
2 12 4 NaN NaN
3 8 2 NaN 1
4 23 9 1 NaN
5 23 12 NaN NaN
6 19 4 1 NaN
7 18 5 1 1
I have a one column dataframe which looks like this:
Neive Bayes
0 8.322087e-07
1 3.213342e-24
2 4.474122e-28
3 2.230054e-16
4 3.957606e-29
5 9.999992e-01
6 3.254807e-13
7 8.836033e-18
8 1.222642e-09
9 6.825381e-03
10 5.275194e-07
11 2.224289e-06
12 2.259303e-09
13 2.014053e-09
14 1.755933e-05
15 1.889681e-04
16 9.929193e-01
17 4.599619e-05
18 6.944654e-01
19 5.377576e-05
I want to pivot it to wide format but with specific intervals. The first 9 rows should make up 9 columns of the first row, and continue this pattern until the final table has 9 columns and has 9 times fewer rows than now. How would I achieve this?
Using pivot_table:
df.pivot_table(columns=df.index % 9, index=df.index // 9, values='Neive Bayes')
0 1 2 3 4 \
0 8.322087e-07 3.213342e-24 4.474122e-28 2.230054e-16 3.957606e-29
1 6.825381e-03 5.275194e-07 2.224289e-06 2.259303e-09 2.014053e-09
2 6.944654e-01 5.377576e-05 NaN NaN NaN
5 6 7 8
0 0.999999 3.254807e-13 8.836033e-18 1.222642e-09
1 0.000018 1.889681e-04 9.929193e-01 4.599619e-05
2 NaN NaN NaN NaN
Construct multiindex, set_index and unstack
iix = pd.MultiIndex.from_arrays([np.arange(df.shape[0]) // 9,
np.arange(df.shape[0]) % 9])
df_wide = df.set_index(iix)['Neive Bayes'].unstack()
Out[204]:
0 1 2 3 4 \
0 8.322087e-07 3.213342e-24 4.474122e-28 2.230054e-16 3.957606e-29
1 6.825381e-03 5.275194e-07 2.224289e-06 2.259303e-09 2.014053e-09
2 6.944654e-01 5.377576e-05 NaN NaN NaN
5 6 7 8
0 0.999999 3.254807e-13 8.836033e-18 1.222642e-09
1 0.000018 1.889681e-04 9.929193e-01 4.599619e-05
2 NaN NaN NaN NaN
I want to add a list as a column to the df dataframe. The list has a different size than the column length.
df =
A B C
1 2 3
5 6 9
4
6 6
8 4
2 3
4
6 6
8 4
D = [11,17,18]
I want the following output
df =
A B C D
1 2 3 11
5 6 9 17
4 18
6 6
8 4
2 3
4
6 6
8 4
I am doing the following to extend the list to the size of the dataframe by adding "nan"
# number of nan value require for the list to match the size of the column
extend_length = df.shape[0]-len(D)
# extend the list
D.extend(extend_length * ['nan'])
# add to the dataframe
df["D"] = D
A B C D
1 2 3 11
5 6 9 17
4 18
6 6 nan
8 4 nan
2 3 nan
4 nan
6 6 nan
8 4 nan
Where "nan" is treated like string but I want it to be empty ot "nan", thus, if I search for number of valid cell in D column it will provide output of 3.
Adding the list as a Series will handle this directly.
D = [11,17,18]
df.loc[:, 'D'] = pd.Series(D)
A simple pd.concat on df and series of D as follows:
pd.concat([df, pd.Series(D, name='D')], axis=1)
or
df.assign(D=pd.Series(D))
Out[654]:
A B C D
0 1 2.0 3.0 11.0
1 5 6.0 9.0 17.0
2 4 NaN NaN 18.0
3 6 NaN 6.0 NaN
4 8 NaN 4.0 NaN
5 2 NaN 3.0 NaN
6 4 NaN NaN NaN
7 6 NaN 6.0 NaN
8 8 NaN 4.0 NaN
I would like to set a value to a panda dataframe based on the values of another column. In a nutshell, for example, if I wanted to set indices of a column my_column of a pandas dataframe pd where another column, my_interesting_column is between 10 and 30, I would like to do something like:
start_index=pd.find_closest_index_where_pd["my_interesting_column"].is_closest_to(10)
end_index=pd.find_closest_index_where_pd["my_interesting_column"].is_closest_to(30)
pd["my_column"].between(star_index, end_index)= some_value
As a simple illustration, suppose I have the following dataframe
df = pd.DataFrame(np.arange(10, 20), columns=list('A'))
df["B"]=np.nan
>>> df
A B
0 10 NaN
1 11 NaN
2 12 NaN
3 13 NaN
4 14 NaN
5 15 NaN
6 16 NaN
7 17 NaN
8 18 NaN
9 19 NaN
How can I do something like
df.where(df["A"].is_between(13,16))= 5
So that the end results looks like
>>> df
A B
0 10 NaN
1 11 NaN
2 12 NaN
3 13 5
4 14 5
5 15 5
6 16 5
7 17 NaN
8 18 NaN
9 19 NaN
pd.loc[start_idx:end_idx, 'my_column'] = some_value
I think this is what you are looking for
df.loc[(df['A'] >= 13) & (df['A'] <= 16), 'B'] = 5