Concatenate three Float64 variables into one variable

Concatenate three Float64 variables into one variable - python

I have df that has many variables and I need to concatenate only 3 float variables of it:
v1 v2 v3
0 2.0 NaN 1.0
1 1.0 1.0 1.0
2 NaN NaN 2.0
3 NaN NaN NaN
4 NaN NaN 2.0
df.dtypes()
v1 float64
v2 float64
v3 float64
dtype: object
I need to concatenate all 3 variables into df['concatenated'] and to have these result:
v1 v2 v3 concatenated
0 2.0 NaN 1.0 2.0_NaN_1.0
1 1.0 1.0 1.0 1.0_1.0_1.0
2 NaN NaN 2.0 NaN_NaN_2.0
3 NaN NaN NaN NaN_NaN_NaN
4 NaN NaN 2.0 NaN_NaN_2.0

If the capitalization of your NaNs doesn't matter, this would be sufficient:
df['concatenated'] = df.astype(str).apply('_'.join,1)
>>> df
v1 v2 v3 concatenated
0 2.0 NaN 1.0 2.0_nan_1.0
1 1.0 1.0 1.0 1.0_1.0_1.0
2 NaN NaN 2.0 nan_nan_2.0
3 NaN NaN NaN nan_nan_nan
4 NaN NaN 2.0 nan_nan_2.0
If the capitalization matters, then you have to use replace beforehand:
df['concatenated'] = df.astype(str).replace('nan','NaN').apply('_'.join,1)
>>> df
v1 v2 v3 concatenated
0 2.0 NaN 1.0 2.0_NaN_1.0
1 1.0 1.0 1.0 1.0_1.0_1.0
2 NaN NaN 2.0 NaN_NaN_2.0
3 NaN NaN NaN NaN_NaN_NaN
4 NaN NaN 2.0 NaN_NaN_2.0

Related

Dataframe compare, combine and merge for rectangular meshgrid

I have two dataframes shown below:
df_1 =
Lon Lat N
0 2 1 1
1 2 2 3
2 2 3 1
3 3 2 2
and
df_2 =
Lon Lat N
0 1.0 1.0 NaN
1 2.0 1.0 NaN
2 3.0 1.0 NaN
3 4.0 1.0 NaN
4 1.0 2.0 NaN
5 2.0 2.0 NaN
6 3.0 2.0 NaN
7 4.0 2.0 NaN
8 1.0 3.0 NaN
9 2.0 3.0 NaN
10 3.0 3.0 NaN
11 4.0 3.0 NaN
What I want to do is to compare these two dfs and merge them according to Lon and Lat. That is to say NaN in df_2 will be covered with values in df_1 if the corresponding Lon and Lat are identical. The ideal output should be as:
Lon Lat N
0 1.0 1.0 NaN
1 2.0 1.0 1
2 3.0 1.0 NaN
3 4.0 1.0 NaN
4 1.0 2.0 NaN
5 2.0 2.0 3
6 3.0 2.0 2
7 4.0 2.0 NaN
8 1.0 3.0 NaN
9 2.0 3.0 1
10 3.0 3.0 NaN
11 4.0 3.0 NaN
The reason I want to do this is df_1's coordinates Lat and Lon are non-rectangular or unstructured grid, and I need to fill some NaN values so as to get a rectangular meshgrid and make contourf applicable. It would be highly appreciated if you can provide better ways to make the contour plot.
I have tried df_2.combine_first(df_1), but it doesn't work.
Thanks!

df_2.drop(columns = 'N').merge(df_1, on = ['Lon', 'Lat'], how = 'left')
Lon Lat N
0 1.0 1.0 NaN
1 2.0 1.0 1.0
2 3.0 1.0 NaN
3 4.0 1.0 NaN
4 1.0 2.0 NaN
5 2.0 2.0 3.0
6 3.0 2.0 2.0
7 4.0 2.0 NaN
8 1.0 3.0 NaN
9 2.0 3.0 1.0
10 3.0 3.0 NaN
11 4.0 3.0 NaN

If you first create the df_2 with all needed values you can update it with the second DataFrame by using pandas.DataFrame.update.
For this you need to first set the the correct index by using pandas.DataFrame.set_index.
Have a look at this Post for more information.

Convert two pandas rows into one

I want to convert below dataframe,
ID TYPE A B
0 1 MISSING 0.0 0.0
1 2 1T 1.0 2.0
2 2 2T 3.0 4.0
3 3 MISSING 0.0 0.0
4 4 2T 10.0 4.0
5 5 CBN 15.0 20.0
6 5 DSV 25.0 35.0
to:
ID MISSING_A MISSING_B 1T_A 1T_B 2T_A 2T_B CBN_A CBN_B DSV_A DSV_B
0 1 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN
1 2 NaN NaN 1.0 2.0 3.0 4.0 NaN NaN NaN NaN
3 3 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN
4 4 10.0 4.0 NaN NaN 10.0 4.0 NaN NaN NaN NaN
5 5 NaN NaN NaN NaN NaN NaN 15.0 20.0 25.0 35.0
For IDs with multiple types, multiple rows for A and B to merge into one row as shown above.

You are looking for a pivot, which will end up giving you a multi-index. You'll need to join those columns to get the suffix you are looking for.
df = df.pivot(index='ID',columns='TYPE', values=['A','B'])
df.columns = ['_'.join(reversed(col)).strip() for col in df.columns.values]
df.reset_index()

Forward Fill Pandas Dataframe Horizontally (along rows) without forward filling last value in each row

I have a Pandas dataframe that I want to forward fill HORIZONTALLY but I don't want to forward fill past the last entry in each row. This is time series pricing data on products where some have been discontinued so I dont want the last value recorded to be forward filled to current.
FWDFILL.apply(lambda series: series.iloc[:,series.last_valid_index()].ffill(axis=1))
^The code I have included does what I want but it does it VERTICALLY. This could maybe help people as a starting point.
>>> print(FWDFILL)
1 1 NaN NaN 2 NaN
2 NaN 1 NaN 5 NaN
3 NaN 3 1 NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN 5 NaN NaN 1
Desired Output:
1 1 1 1 2 NaN
2 NaN 1 1 5 NaN
3 NaN 3 1 NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN 5 5 5 1

IIUC, you need to apply with axis=1, so you are applying to dataframe rows instead of dataframe columns.
df.apply(lambda x: x[:x.last_valid_index()].ffill(), axis=1)
Output:
1 2 3 4 5
0
1 1.0 1.0 1.0 2.0 NaN
2 NaN 1.0 1.0 5.0 NaN
3 NaN 3.0 1.0 NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN 5.0 5.0 5.0 1.0

Usage of bfill and ffill
s1=df.ffill(1)
s2=df.bfill(1)
df=df.mask(s1.notnull()&s2.notnull(),s1)
df
Out[222]:
1 2 3 4 5
1 1.0 1.0 1.0 2.0 NaN
2 NaN 1.0 1.0 5.0 NaN
3 NaN 3.0 1.0 NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN 5.0 5.0 5.0 1.0
Or just using interpolate
df.mask(df.interpolate(axis=1,limit_area='inside').notnull(),df.ffill(1))
Out[226]:
1 2 3 4 5
1 1.0 1.0 1.0 2.0 NaN
2 NaN 1.0 1.0 5.0 NaN
3 NaN 3.0 1.0 NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN 5.0 5.0 5.0 1.0

You can use numpy to find the last valid indices and mask your ffill. This allows you to use the vectorized ffill and then a vectorized mask.
u = df.values
m = (~np.isnan(u)).cumsum(1).argmax(1)
df.ffill(1).mask(np.arange(df.shape[0]) > m[:, None])
0 1 2 3 4
0 1.0 1.0 1.0 2.0 NaN
1 NaN 1.0 1.0 5.0 NaN
2 NaN 3.0 1.0 NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN 5.0 5.0 5.0 1.0
Info
>>> np.arange(df.shape[0]) > m[:, None]
array([[False, False, False, False, True],
[False, False, False, False, True],
[False, False, False, True, True],
[False, True, True, True, True],
[False, False, False, False, False]])

Little modification to - Most efficient way to forward-fill NaN values in numpy array's solution, solves it here -
def ffillrows_stoplast(arr):
# Identical to earlier solution of forward-filling
mask = np.isnan(arr)
idx = np.where(~mask,np.arange(mask.shape[1]),0)
idx_acc = np.maximum.accumulate(idx,axis=1)
out = arr[np.arange(idx.shape[0])[:,None], idx_acc]
# Perform flipped index accumulation to get trailing NaNs mask and
# accordingly assign NaNs there
out[np.maximum.accumulate(idx[:,::-1],axis=1)[:,::-1]==0] = np.nan
return out
Sample run -
In [121]: df
Out[121]:
A B C D E
1 1.0 NaN NaN 2.0 NaN
2 NaN 1.0 NaN 5.0 NaN
3 NaN 3.0 1.0 NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN 5.0 NaN NaN 1.0
In [122]: out = ffillrows_stoplast(df.to_numpy())
In [123]: pd.DataFrame(out,columns=df.columns,index=df.index)
Out[123]:
A B C D E
1 1.0 1.0 1.0 2.0 NaN
2 NaN 1.0 1.0 5.0 NaN
3 NaN 3.0 1.0 NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN 5.0 5.0 5.0 1.0

I think of using where on ffill to flip back to NaN those got ignored on bfill
df.ffill(1).where(df.bfill(1).notna())
Out[1623]:
a b c d e
1 1.0 1.0 1.0 2.0 NaN
2 NaN 1.0 1.0 5.0 NaN
3 NaN 3.0 1.0 NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN 5.0 5.0 5.0 1.0

Manipulating value in a column based on a rule

I have 3 columns -A, B and C in a pandas dataframe. What i want to do is, where ever A is not null AND B|C are not null, that row in A should be set to null.
if(dffinal['A'].loc[dffinal['A'].notnull()] &
(dffinal['B'].loc[dffinal['B'].notnull()] |
dffinal['C'].loc[dffinal['C'].notnull()])):
dffinal['A'] = np.nan
this is the error I'm getting: cannot do a non-empty take from an empty axes.

Use df.loc[]:
df.loc[df.A.notna() & (df.B.notna()|df.C.notna()),'A']=np.nan

Here first condition is not necessary, so solution should be simplify:
dffinal = pd.DataFrame({
'A':[np.nan,np.nan,4,5,5,np.nan],
'B':[7,np.nan,np.nan,4,np.nan,np.nan],
'C':[1,3,5,7,np.nan,np.nan],
})
print (dffinal)
A B C
0 NaN 7.0 1.0
1 NaN NaN 3.0
2 4.0 NaN 5.0
3 5.0 4.0 7.0
4 5.0 NaN NaN
5 NaN NaN NaN
mask = (dffinal['B'].notnull() | dffinal['C'].notnull())
dffinal.loc[mask, 'A'] = np.nan
print (dffinal)
A B C
0 NaN 7.0 1.0
1 NaN NaN 3.0
2 NaN NaN 5.0
3 NaN 4.0 7.0
4 5.0 NaN NaN
5 NaN NaN NaN
Same output like in first condition:
mask = dffinal['A'].notnull() & (dffinal['B'].notnull() | dffinal['C'].notnull())
dffinal.loc[mask, 'A'] = np.nan
print (dffinal)
A B C
0 NaN 7.0 1.0
1 NaN NaN 3.0
2 NaN NaN 5.0
3 NaN 4.0 7.0
4 5.0 NaN NaN
5 NaN NaN NaN

replace nan in pandas dataframe

given the dataframe df
df = pd.DataFrame(data=[[np.nan,1],
[np.nan,np.nan],
[1,2],
[2,3],
[np.nan,np.nan],
[np.nan,np.nan],
[3,4],
[4,5],
[np.nan,np.nan],
[np.nan,np.nan]],columns=['A','B'])
df
Out[16]:
A B
0 NaN 1.0
1 NaN NaN
2 1.0 2.0
3 2.0 3.0
4 NaN NaN
5 NaN NaN
6 3.0 4.0
7 4.0 5.0
8 NaN NaN
9 NaN NaN
I would need to replace the nan using the following rules:
1) if nan is at the beginning replace with the first values after the nan
2) if nan is in the middle of 2 or more values replace the nan with the average of these values
3) if nan is at the end replace with the last value
df
Out[16]:
A B
0 1.0 1.0
1 1.0 1.5
2 1.0 2.0
3 2.0 3.0
4 2.5 3.5
5 2.5 3.5
6 3.0 4.0
7 4.0 5.0
8 4.0 5.0
9 4.0 5.0

Use add between forward filling and backfilling values, then divide by 2 and last replace last and first NaNs:
df = df.bfill().add(df.ffill()).div(2).ffill().bfill()
print (df)
A B
0 1.0 1.0
1 1.0 1.5
2 1.0 2.0
3 2.0 3.0
4 2.5 3.5
5 2.5 3.5
6 3.0 4.0
7 4.0 5.0
8 4.0 5.0
9 4.0 5.0
Detail:
print (df.bfill().add(df.ffill()))
A B
0 NaN 2.0
1 NaN 3.0
2 2.0 4.0
3 4.0 6.0
4 5.0 7.0
5 5.0 7.0
6 6.0 8.0
7 8.0 10.0
8 NaN NaN
9 NaN NaN

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Concatenate three Float64 variables into one variable - python

Related

Dataframe compare, combine and merge for rectangular meshgrid

Convert two pandas rows into one

Forward Fill Pandas Dataframe Horizontally (along rows) without forward filling last value in each row

Manipulating value in a column based on a rule

replace nan in pandas dataframe

Categories

Resources