Imagine you have the following two dfs:
lines
line amount#1 line amount#2
0 18.20 0.82
1 NaN NaN
2 40.00 259.00
3 388.00 NaN
4 17.41 NaN
btws
btw-amount#1 btw-amount#2
0 0.0 0.14
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
I want to subtract these two dfs such that there is a new df that is like the following:
line amount#1 line amount#2
0 18.20 0.68
1 NaN NaN
2 40.00 259.00
3 388.00 NaN
4 17.41 NaN
I've tried:
lines.subtract(btws, axis =0)
However, everyting turns NaN.
Please help!
result = lines.to_numpy() - btws.to_numpy()
result = pd.DataFrame(result, columns=lines.columns)
Related
I'm trying to replace the entire column with a single value, however, I want to leave the NaNs in place. How do I go about doing that? Lets say for column 'Q1' I would like to replace every value with '1' but leave every row that has NaN in place. In the end, for column 'Q1' every row that has a integer value would now have the integer value '1' and every row that has NaN would still remain as NaN.
Q1 Q2 Q3 Q4
0 NaN NaN 1.33 NaN
1 NaN NaN NaN 1.35
2 0.93 NaN NaN NaN
3 NaN 1.08 NaN NaN
4 NaN NaN 1.28 NaN
...
In [13]: df
Out[13]:
Q1 Q2
0 NaN 1.0
1 NaN 2.0
2 0.93 NaN
3 NaN 3.0
4 NaN 4.0
In [14]: df.loc[~df.Q1.isna(), 'Q1'] = 1
In [15]: df
Out[15]:
Q1 Q2
0 NaN 1.0
1 NaN 2.0
2 1.0 NaN
3 NaN 3.0
4 NaN 4.0
For each row I would like to set all values to NaN after the appearance of the first NaN. E.g.:
a b c
1 2 3 4
2 nan 2 nan
3 3 nan 23
Should become this:
a b c
1 2 3 4
2 nan nan nan
3 3 nan nan
So far I only know how to do this with an apply with a for loop over each column per row - it's very slow!
Check with cumprod
df=df.where(df.notna().cumprod(axis=1).eq(1))
a b c
1 2.0 3.0 4.0
2 NaN NaN NaN
3 3.0 NaN NaN
If I have a pandas dataframe like this:
2 3 4 NaN NaN NaN
1 NaN NaN NaN NaN NaN
5 6 7 2 3 NaN
4 3 NaN NaN NaN NaN
and an array for the number I would like to shift:
array = [2, 4, 0, 3]
How do I iterate through each row to shift the columns by the number in my array to get something like this:
NaN NaN 2 3 4 NaN
NaN NaN NaN NaN 1 NaN
5 6 7 2 3 NaN
NaN NaN NaN 3 4 NaN
I was trying to do something like this but had no luck.
df = pd.DataFrame(values)
for rows in df.iterrows():
df[rows] = df.shift[change_in_bins[rows]]
Use for loop with loc and shift:
for index,value in enumerate([2, 4, 0, 3]):
df.loc[index,:] = df.loc[index,:].shift(value)
print(df)
0 1 2 3 4 5
0 NaN NaN 2.0 3.0 4.0 NaN
1 NaN NaN NaN NaN 1.0 NaN
2 5.0 6.0 7.0 2.0 3.0 NaN
3 NaN NaN NaN 4.0 3.0 NaN
So I have two dataframes
eqdf
symbol qty
0 DABIND 1
1 INFTEC 6
2 DISHTV 8
3 HINDAL 40
4 NATMIN 5
5 POWGRI 40
6 CHEPET 6
premdf
share strike lprice premperc d_strike
0 HINDAL 250.0 237.90 1.975620 5.086171
1 RELIND 1280.0 1254.30 1.642350 2.048952
2 POWGRI 205.0 201.15 1.118568 1.913995
I want to compare columns premdf['share'] and eqdf['symbol'] and if there is a match premperc,d_strike,strike value is to be added to the end of the eqdf row in which there is a match.
I have tried
eqdf.loc[eqdf['symbol']==premdf['share'],eqdf['premperc'] == premdf['premperc']]
I keep getting errors
ValueError: Can only compare identically-labeled Series objects
Expected Output:
eqdf
symbol qty premperc d_strike strike
0 DABIND 1 NaN NaN NaN
1 INFTEC 6 NaN NaN NaN
2 DISHTV 8 NaN NaN NaN
3 HINDAL 40 1.975620 5.086171 250.0
4 NATMIN 5 NaN NaN NaN
5 POWGRI 40 1.118568 1.913995 205.0
6 CHEPET 6 NaN NaN NaN
What is the correct way to do this?
Thanks
rename and merge
eqdf.merge(premdf.rename(columns={'share': 'symbol'}), 'left')
symbol qty strike lprice premperc d_strike
0 DABIND 1 NaN NaN NaN NaN
1 INFTEC 6 NaN NaN NaN NaN
2 DISHTV 8 NaN NaN NaN NaN
3 HINDAL 40 250.0 237.90 1.975620 5.086171
4 NATMIN 5 NaN NaN NaN NaN
5 POWGRI 40 205.0 201.15 1.118568 1.913995
6 CHEPET 6 NaN NaN NaN NaN
I have a data frame in pandas:
d1_a d2_a d3_a group
BI59 NaN 0.023333 NaN 2
BI71 NaN 0.173333 NaN 2
BI52 NaN NaN NaN 1
BI44 0.450000 NaN NaN 1
BI36 NaN 0.286667 NaN 2
BI29 NaN 0.030000 NaN 2
BI50 NaN 0.633333 NaN 2
BI63 NaN 0.110000 NaN 2
BI64 NaN 0.320000 NaN 2
BI65 0.206667 NaN NaN 1
BI67 NaN 0.216667 NaN 2
BI68 NaN 0.473333 NaN 2
BI71 NaN 0.053333 NaN 2
BI72 NaN 0.006667 NaN 2
BI75 NaN 0.430000 NaN 2
BI76 NaN 0.260000 NaN 2
BI78 NaN 0.250000 NaN 2
BI81 NaN 0.006667 NaN 2
BI83 NaN 0.603333 NaN 2
BI84 NaN NaN 0.196667 3
BI86 NaN NaN 0.046667 3
BI89 NaN 0.110000 NaN 2
BI91 NaN NaN 0.213333 3
BI93 NaN 0.443333 NaN 2
BI97 0.586667 NaN NaN 1
BI98 0.380000 NaN NaN 1
BI99 0.016667 NaN NaN 1
BI11 NaN 0.206667 NaN 2
BI12 NaN 0.500000 NaN 2
BI17 0.626667 NaN NaN 1
The BI## is the index column, the groups that the rows belong to are denoted by the group column. So d1_a is group 1, d2_a is group 2 and d3_a is group 3. Also, the numbers on the index column would be the x axis. How do I create a scatter plot, with each group being represented by a different color? When I try plotting I get empty plots.
If I try something like subset_d1_a = df['d1_a'].dropna() and do something similar for each group then I can remove the NaNs but now the arrays are of different lengths and I cannot plot them all on the same graph.
Preferably I'd like to do this in seaborn but any method in python will do.
So far, this is what I'm doing, now sure if I'm going down the right path:
subset = pd.concat([df.d1_a, df.d2_a, df.d3_a], axis=1)
subset = subset.sum(axis=1)
subset = pd.concat([subset,df.group], axis=1)
subset = subset.dropna()
g = subset.groupby('groups')
It is not clear what a scatter chart would look like given your data, but you could do something like this:
colors = {1: 'red', 2: 'green', 3: 'blue'}
df.iloc[:, :3].sum(axis=1).plot(kind='bar', colors=df.group.map(colors).tolist()