Let's say we have
In [0]: df = pd.DataFrame(data={'col1': [1, 2, 3], 'col2': [3, 4, 5]})
In [1]: df
Out[2]:
col1 col2
0 1 3
1 2 4
2 3 5
What I need is to divide df[1:] on df[:-1] and get a dataframe as a result, like this:
Out[3]:
col1 col2
0 2.0 1.3333333333333333
1 1.5 1.25
But of course I'm getting
Out[3]:
col1 col2
0 NaN NaN
1 1.0 1.0
2 NaN NaN
I've tried using iloc for slicing, but got the same result. I'm aware of df.values, but I need a dataframe as a result. Thank you so much.
You can divide numpy array created by values with DataFrame contructor:
df1 = pd.DataFrame(df[1:].values / df[:-1].values, columns=df.columns)
print (df1)
col1 col2
0 2.0 1.333333
1 1.5 1.250000
Or set same indices in both DataFrames:
df1 = df[1:].reset_index(drop=True).div(df[:-1].reset_index(drop=True))
a = df[1:]
b = df[:-1]
b.index = a.index
df1 = a / b
df2 = df[1:]
df1 = df2.div(df[:-1].set_index(df2.index))
print (df1)
col1 col2
1 2.0 1.333333
2 1.5 1.250000
Related
There are two dataframes, one dataframe might have less columns than another one. For instance,
import pandas as pd
import numpy as np
df = pd.DataFrame({
'col1': ['A', 'B'],
'col2': [2, 9],
'col3': [0, 1]
})
df1 = pd.DataFrame({
'col1': ['G'],
'col2': [3]
})
The df and df1 are shown as follows.
I would like to combine these two dataframes together, and the missing values should be assigned as some given value, like -100. How to perform this kind of combination.
You could reindex the DataFrames first to "preserve" the dtypes; then concatenate:
cols = df.columns.union(df1.columns)
out = pd.concat([d.reindex(columns=cols, fill_value=-100) for d in [df, df1]],
ignore_index=True)
Output:
col1 col2 col3
0 A 2 0
1 B 9 1
2 G 3 -100
Use concat with DataFrame.fillna:
df = pd.concat([df, df1], ignore_index=True).fillna(-100)
print (df)
col1 col2 col3
0 A 2 0.0
1 B 9 1.0
2 G 3 -100.0
If need same dtypes add DataFrame.astype:
d = df.dtypes.append(df1.dtypes).to_dict()
df = pd.concat([df, df1], ignore_index=True).fillna(-100).astype(d)
print (df)
col1 col2 col3
0 A 2 0
1 B 9 1
2 G 3 -100
I have a dataframes with many rows, and some values are NaNs.
For example -
index col1 col2 col3
0 1.0 NaN 3.0
1 NaN 4.0 NaN
3 1.0 5.0 NaN
I would like to filter the DF and return only the rows with 2+ values.
The number should be configurable.
The resulted DF will be -
index col1 col2 col3
0 1.0 NaN 3.0
3 1.0 5.0 NaN
Any idea how can I achieve this result? I've tried creating new column but it doesn't seem the right way.
Thanks!
Code to create the DF:
d = {'col1': [1, None, 1], 'col2': [None, 4, 5], 'col3': [3, None, None]}
df = pd.DataFrame(data=d)
df
You can use dropna() set the threshold to be 2 thresh=2, and perform operation along the rows axis=0:
res = df.dropna(thresh=2,axis=0)
res
col1 col2 col3
0 1.00 NaN 3.00
2 1.00 5.00 NaN
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
You can delete the 2nd row by using the drop() method.
ax = df.drop([1])
print(ax)
I have a pandas dataframe:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
And I want to add a new row summing over two columns [Col1,Col2] like:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
Total 3 5 NaN
Ignoring Col3. What should I do? Thanks in advance.
You can use the pandas.DataFrame.append and pandas.DataFrame.sum methods:
df2 = df.append(df.sum(), ignore_index=True)
df2.iloc[-1, df2.columns.get_loc('Col3')] = np.nan
You can use pd.DataFrame.loc. Note the final column will be converted to float since NaN is considered float:
import numpy as np
df.loc['Total'] = [df['Col1'].sum(), df['Col2'].sum(), np.nan]
df[['Col1', 'Col2']] = df[['Col1', 'Col2']].astype(int)
print(df)
Col1 Col2 Col3
0 1 2 3.0
1 2 3 4.0
Total 3 5 NaN
I want python to perform updating of values next to a value found in both dataframes (somewhat similar to VLOOKUP in MS Excel). So, for
import pandas as pd
df1 = pd.DataFrame(data = {'col1':['a', 'b', 'd'], 'col2': [1, 2, 4], 'col3': [2, 3, 4]})
df2 = pd.DataFrame(data = {'col1':['a', 'f', 'c', 'd']})
In [3]: df1
Out[3]:
col1 col2 col3
0 a 1 2
1 b 2 3
2 d 4 4
In [4]: df2
Out[4]:
col1
0 a
1 f
2 c
3 d
Outcome must be the following:
In [6]: df3 = *somecode*
df3
Out[6]:
col1 col2 col3
0 a 1 2
1 f
2 c
3 d 4 4
The main part is that I want some sort of "for loop" to do this.
So, for instance python searches for first value in col1 in df2, finds it in df1, and updates col2 and col3 respectivly, then moves forward.
First for loop in pandas is best avoid if some vectorized solution exist.
I think merge with left join is necessary, parameter on should be omit if only col1 is same in both DataFrames:
df3 = df2.merge(df1, how='left')
print (df3)
col1 col2 col3
0 a 1.0 2.0
1 f NaN NaN
2 c NaN NaN
3 d 4.0 4.0
try this,
Simple left join will solve your problem,
pd.merge(df2,df1,how='left',on=['col1'])
col1 col2 col3
0 a 1.0 2.0
1 f NaN NaN
2 c NaN NaN
3 d 4.0 4.0
The pivot code:
result = pandas.pivot_table(result, values=['value'], index=['index'], columns=['columns'], fill_value=0)
The result:
value value value
columns col1 col2 col3
index
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0
I tried:
result.columns = result.columns.get_level_values(1)
Then I got this:
columns col1 col2 col3
index
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0
Actually what I would like is this one:
index col1 col2 col3
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0
Is there anyway to achieve this? Help really is appreciated. Thank you in advance.
You need remove index name by rename_axis (new in pandas 0.18.0):
df = df.rename_axis(None)
If need also remove columns name, use:
df = df.rename_axis(None, axis=1)
If use older version of pandas, use:
df.columns.name = None
df.index.name = None
Sample (if remove [] from pivot_table, you remove Multiindex from columns):
print (result)
index columns value
0 1 Toys 5
1 2 Toys 6
2 2 Cars 7
3 1 Toys 2
4 1 Cars 9
print (pd.pivot_table(result, index='index',columns='columns',values='value', fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
Cars Toys
1 9 3.5
2 7 6.0
If use [], get:
result = pd.pivot_table(result, values=['value'], index=['index'], columns=['columns'], fill_value=0)
.rename_axis(None)
.rename_axis((None,None), axis=1)
print (result)
value
Cars Toys
1 9 3.5
2 7 6.0
Consider this dataframe:
results = pd.DataFrame(
[
[14, 1, 1],
[2, 0, 1],
[6, 0, 0]
],
pd.Index(['idx1', 'idx2', 'idx3'], name='index'),
pd.MultiIndex.from_product([['value'], ['col1', 'col2', 'col3']], names=[None, 'columns'])
)
print results
value
columns col1 col2 col3
index
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0
Then all you need is:
print results.value.rename_axis(None, 1) # <---- Solution
col1 col2 col3
index
idx1 14 1 1
idx2 2 0 1
idx3 6 0 0