This question already has answers here:
How to pass another entire column as argument to pandas fillna()
(7 answers)
Closed 3 years ago.
I would like my code to look at column one first. If it has a valid number for that row, take that value as the COL 3 value. If not, the second option would be to take the value of COL 2 as the value in COL 3. Is there a Function that can do this?
COL1 COL2 COL3
0 1 2
1 nan 4
2 3 nan
3 4 8
4 nan 10
5 6 nan
COL3
0 1
1 4
2 3
3 4
4 10
5 6
try this
df['col 3'] = np.where(df['col 1'].isnull(),df['col 2'],df['col 1'])
IIUC:
df['COL3'] = df.bfill(axis=1)['COL1']
gives:
COL1 COL2 COL3
0 1.0 2.0 1.0
1 NaN 4.0 4.0
2 3.0 NaN 3.0
3 4.0 8.0 4.0
4 NaN 10.0 10.0
5 6.0 NaN 6.0
Related
I expect to describe well want I need. I have a data frame with the same columns name and another column that works as an index. The data frame looks as follows:
df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],'X':[1,2,3,4,5,2,3,4,1,3,4,5],'Y':[1,2,3,4,5,2,3,4,5,4,3,2]})
df
Out[21]:
ID X Y
0 1 1 1
1 1 2 2
2 1 3 3
3 1 4 4
4 1 5 5
5 2 2 2
6 2 3 3
7 2 4 4
8 3 1 5
9 3 3 4
10 3 4 3
11 3 5 2
My intention is to copy X as an index or one column (it doesn't matter) and append Y columns from each 'ID' in the following way:
You can try
out = pd.concat([group.rename(columns={'Y': f'Y{name}'}) for name, group in df.groupby('ID')])
out.columns = out.columns.str.replace(r'\d+$', '', regex=True)
print(out)
ID X Y Y Y
0 1 1 1.0 NaN NaN
1 1 2 2.0 NaN NaN
2 1 3 3.0 NaN NaN
3 1 4 4.0 NaN NaN
4 1 5 5.0 NaN NaN
5 2 2 NaN 2.0 NaN
6 2 3 NaN 3.0 NaN
7 2 4 NaN 4.0 NaN
8 3 1 NaN NaN 5.0
9 3 3 NaN NaN 4.0
10 3 4 NaN NaN 3.0
11 3 5 NaN NaN 2.0
Here's another way to do it:
df_org = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],'X':[1,2,3,4,5,2,3,4,1,3,4,5]})
df = df_org.copy()
for i in set(df_org['ID']):
df1 = df_org[df_org['ID']==i]
col = 'Y'+str(i)
df1.columns = ['ID', col]
df = pd.concat([ df, df1[[col]] ], axis=1)
df.columns = df.columns.str.replace(r'\d+$', '', regex=True)
print(df)
Output:
ID X Y Y Y
0 1 1 1.0 NaN NaN
1 1 2 2.0 NaN NaN
2 1 3 3.0 NaN NaN
3 1 4 4.0 NaN NaN
4 1 5 5.0 NaN NaN
5 2 2 NaN 2.0 NaN
6 2 3 NaN 3.0 NaN
7 2 4 NaN 4.0 NaN
8 3 1 NaN NaN 1.0
9 3 3 NaN NaN 3.0
10 3 4 NaN NaN 4.0
11 3 5 NaN NaN 5.0
Another solution could be as follow.
Get unique values for column ID (stored in array s).
Use np.transpose to repeat column ID n times (n == len(s)) and evaluate the array's matches with s.
Use np.where to replace True with values from df.Y and False with NaN.
Finally, drop the orignal df.Y and rename the new columns as required.
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],
'X':[1,2,3,4,5,2,3,4,1,3,4,5],
'Y':[1,2,3,4,5,2,3,4,5,4,3,2]})
s = df.ID.unique()
df[s] = np.where((np.transpose([df.ID]*len(s))==s),
np.transpose([df.Y]*len(s)),
np.nan)
df.drop('Y', axis=1, inplace=True)
df.rename(columns={k:'Y' for k in s}, inplace=True)
print(df)
ID X Y Y Y
0 1 1 1.0 NaN NaN
1 1 2 2.0 NaN NaN
2 1 3 3.0 NaN NaN
3 1 4 4.0 NaN NaN
4 1 5 5.0 NaN NaN
5 2 2 NaN 2.0 NaN
6 2 3 NaN 3.0 NaN
7 2 4 NaN 4.0 NaN
8 3 1 NaN NaN 5.0
9 3 3 NaN NaN 4.0
10 3 4 NaN NaN 3.0
11 3 5 NaN NaN 2.0
If performance is an issue, this method should be faster than this answer, especially when the number of unique values for ID increases.
How can I add a field that returns 1/0 if the value in any specified column in not NaN?
Example:
df = pd.DataFrame({'id': [1,2,3,4,5,6,7,8,9,10],
'val1': [2,2,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,2],
'val2': [7,0.2,5,8,np.nan,1,0,np.nan,1,1],
})
display(df)
mycols = ['val1', 'val2']
# if entry in mycols != np.nan, then df[row, 'countif'] =1; else 0
Desired output dataframe:
We do not need countif logic in pandas , try notna + any
df['out'] = df[['val1','val2']].notna().any(1).astype(int)
df
Out[381]:
id val1 val2 out
0 1 2.0 7.0 1
1 2 2.0 0.2 1
2 3 NaN 5.0 1
3 4 NaN 8.0 1
4 5 NaN NaN 0
5 6 1.0 1.0 1
6 7 NaN 0.0 1
7 8 NaN NaN 0
8 9 NaN 1.0 1
9 10 2.0 1.0 1
Using iloc accessor filtre last two columns. Check if the sum of not NaNs in each row is more than zero. Convert resulting Boolean to integers.
df['countif']=df.iloc[:,1:].notna().sum(1).gt(0).astype(int)
id val1 val2 countif
0 1 2.0 7.0 1
1 2 2.0 0.2 1
2 3 NaN 5.0 1
3 4 NaN 8.0 1
4 5 NaN NaN 0
5 6 1.0 1.0 1
6 7 NaN 0.0 1
7 8 NaN NaN 0
8 9 NaN 1.0 1
9 10 2.0 1.0 1
I am a newbie at python and programming in general. I hope the following question is well explained.
I have a big dataset, with 80+ columns and some of these columns have only data on a weekly basis. I would like transform these columns to have values on a daily basis by simply dividing the weekly value by 7 and attributing the result to the value itself and the 6 other days of that week.
This is what my input dataset looks like:
date col1 col2 col3
02-09-2019 14 NaN 1
09-09-2019 NaN NaN 2
16-09-2019 NaN 7 3
23-09-2019 NaN NaN 4
30-09-2019 NaN NaN 5
07-10-2019 NaN NaN 6
14-10-2019 NaN NaN 7
21-10-2019 21 NaN 8
28-10-2019 NaN NaN 9
04-11-2019 NaN 14 10
11-11-2019 NaN NaN 11
..
This is what the output should look like:
date col1 col2 col3
02-09-2019 2 NaN 1
09-09-2019 2 NaN 2
16-09-2019 2 1 3
23-09-2019 2 1 4
30-09-2019 2 1 5
07-10-2019 2 1 6
14-10-2019 2 1 7
21-10-2019 3 1 8
28-10-2019 3 1 9
04-11-2019 3 2 10
11-11-2019 3 2 11
..
I can´t come up with a solution, but here is what I thought might work:
def convert_to_daily(df):
for column in df.columns.tolist():
if column.isna(): # if true
for line in range(len(df[column])):
# check if value is not empty and
succeeded by an 6 empty values or some
better logic
# I don´t know how to do that.
I believe you need select columns contains at least one missing value, forward filling missing values and divide by 7:
m = df.isna().any()
df.loc[:, m] = df.loc[:, m].ffill(limit=7).div(7)
print (df)
date col1 col2 col3
0 02-09-2019 2.0 NaN 1
1 09-09-2019 2.0 NaN 2
2 16-09-2019 2.0 1.0 3
3 23-09-2019 2.0 1.0 4
4 30-09-2019 2.0 1.0 5
5 07-10-2019 2.0 1.0 6
6 14-10-2019 2.0 1.0 7
7 21-10-2019 3.0 1.0 8
8 28-10-2019 3.0 1.0 9
9 04-11-2019 3.0 2.0 10
10 11-11-2019 3.0 2.0 11
This question already has answers here:
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 4 years ago.
I have a list of NaN values in my dataframe and I want to replace NaN values with an empty string.
What I've tried so far, which isn't working:
df_conbid_N_1 = pd.read_csv("test-2019.csv",dtype=str, sep=';', encoding='utf-8')
df_conbid_N_1['Excep_Test'] = df_conbid_N_1['Excep_Test'].replace("NaN","")
Use fillna (docs):
An example -
df = pd.DataFrame({'no': [1, 2, 3],
'Col1':['State','City','Town'],
'Col2':['abc', np.NaN, 'defg'],
'Col3':['Madhya Pradesh', 'VBI', 'KJI']})
df
no Col1 Col2 Col3
0 1 State abc Madhya Pradesh
1 2 City NaN VBI
2 3 Town defg KJI
df.Col2.fillna('', inplace=True)
df
no Col1 Col2 Col3
0 1 State abc Madhya Pradesh
1 2 City VBI
2 3 Town defg KJI
Simple! you can do this way
df_conbid_N_1 = pd.read_csv("test-2019.csv",dtype=str, sep=';',encoding='utf-8').fillna("")
We have pandas' fillna to fill missing values.
Let's go through some uses cases with a sample dataframe:
df = pd.DataFrame({'col1':['John', np.nan, 'Anne'], 'col2':[np.nan, 3, 4]})
col1 col2
0 John NaN
1 NaN 3.0
2 Anne 4.0
As mentioned in the docs, fillna accepts the following as fill values:
values: scalar, dict, Series, or DataFrame
So we can replace with a constant value, such as an empty string with:
df.fillna('')
col1 col2
0 John
1 3
2 Anne 4
1
You can also replace with a dictionary mapping column_name:replace_value:
df.fillna({'col1':'Alex', 'col2':2})
col1 col2
0 John 2.0
1 Alex 3.0
2 Anne 4.0
Or you can also replace with another pd.Series or pd.DataFrame:
df_other = pd.DataFrame({'col1':['John', 'Franc', 'Anne'], 'col2':[5, 3, 4]})
df.fillna(df_other)
col1 col2
0 John 5.0
1 Franc 3.0
2 Anne 4.0
This is very useful since it allows you to fill missing values on the dataframes' columns using some extracted statistic from the columns, such as the mean or mode. Say we have:
df = pd.DataFrame(np.random.choice(np.r_[np.nan, np.arange(3)], (3,5)))
print(df)
0 1 2 3 4
0 NaN NaN 0.0 1.0 2.0
1 NaN 2.0 NaN 2.0 1.0
2 1.0 1.0 2.0 NaN NaN
Then we can easilty do:
df.fillna(df.mean())
0 1 2 3 4
0 1.0 1.5 0.0 1.0 2.0
1 1.0 2.0 1.0 2.0 1.0
2 1.0 1.0 2.0 1.5 1.5
I currently have a dataframe which looks like this:
col1 col2 col3
1 2 3
2 3 NaN
3 4 NaN
2 NaN NaN
0 2 NaN
What I want to do is apply some condition to the column values and return the final result in a new column.
The condition is to assign values based on this order of priority where 2 being the first priority: [2,1,3,0,4]
I tried to define a function to append the final results but wasnt really getting anywhere...any thoughts?
The desired outcome would look something like:
col1 col2 col3 col4
1 2 3 2
2 3 NaN 2
3 4 NaN 3
2 NaN NaN 2
0 2 NaN 2
where col4 is the new column created.
Thanks
first you may want to get ride of the NaNs:
df.fillna(5)
and then apply a function to every row to find your value:
def func(x,l=[2,1,3,0,4,5]):
for j in l:
if(j in x):
return j
df['new'] = df.apply(lambda x: func(list(x)),axis =1)
Output:
col1 col2 col3 new
0 1 2 3 2
1 2 3 5 2
2 3 4 5 3
3 2 5 5 2
4 0 2 5 2
maybe a little later.
import numpy as np
def f(x):
for i in [2,1,3,0,4]:
if i in x.tolist():
return i
return np.nan
df["col4"] = df.apply(f, axis=1)
and the Output:
col1 col2 col3 col4
0 1 2.0 3.0 2
1 2 3.0 NaN 2
2 3 4.0 NaN 3
3 2 NaN NaN 2
4 0 2.0 NaN 2