I have a dataframe, which I need to filter and than do s.th. with the results (pivot...).
Sometimes the result is an empty dataframe and the call to pivot fails.
How can I deal with this.
The filtering is done like this:
df_sparen = df[(df['INCOME_EXPENSES'] == "Transaktion abbuchen") & (df['CATEGORY'] == "Trade Republic")]
than the pivot table call
table_sparen = df_sparen.pivot_table(values='AMOUNT', index=['INCOME_EXPENSES'],
columns=['MONTHYEAR'], aggfunc=np.sum, margins=True)
This breaks as df_sparen is empty with the error:
ValueError: No objects to concatenate
Any advice how to deal with this is very much appriciated?
You can use df.empty:
table_sparen = (
df_sparen.pivot_table('AMOUNT', 'INCOME_EXPENSES', 'MONTHYEAR',
aggfunc=np.sum, margins=True)
if not df_sparen.empty else pd.DataFrame({'All': {'All': 0}})
)
Just check if the dataframe isn't empty?
df_sparen = df[(df['INCOME_EXPENSES'] == "Transaktion abbuchen") & (df['CATEGORY'] == "Trade Republic")]
if len(df_sparen) > 0:
table_sparen = df_sparen.pivot_table(values='AMOUNT', index=['INCOME_EXPENSES'], columns=['MONTHYEAR'], aggfunc=np.sum, margins=True)
or use a try/except clause:
try:
df_sparen = df[(df['INCOME_EXPENSES'] == "Transaktion abbuchen") & (df['CATEGORY'] == "Trade Republic")]
table_sparen = df_sparen.pivot_table(values='AMOUNT', index=['INCOME_EXPENSES'], columns=['MONTHYEAR'], aggfunc=np.sum, margins=True)
except ValueError:
print(f'Empty DataFrame for {"Transaktion abbuchen"} and {"Trade Republic"}')
Related
df["load_weight"] = df.loc[(df["dropoff_site"] == "HORNSBY BEND") & (df['load_type'] == "BRUSH")].fillna(1000, inplace=True)
i want to change the NaN value in "load_weight" column, but only for the rows that contain "HORNSBY BEND" and "BRUSH", but above code gave me "none" to the whole "load_weight" column, what did i do wrong?
I would use a mask for boolean indexing:
m = (df["dropoff_site"] == "HORNSBY BEND") & (df['load_type'] == "BRUSH")
df.loc[m, "load_weight"] = df.loc[m, 'load_weight'].fillna(1000)
NB. you can't keep inplace=True when you assign the output. This is what was causing your data to be replaced with None as methods called with inplace=True return nothing.
Alternative with only boolean indexing:
m1 = (df["dropoff_site"] == "HORNSBY BEND") & (df['load_type'] == "BRUSH")
m2 = df['load_weight'].isna()
df.loc[m1&m2, "load_weight"] = 1000
Instead of fillna, you can directly use df.loc to do the required imputation
df.loc[((df['dropoff_site']=='HORNSBY BEND')&(df['load_type']=='BRUSH')
&(df['load_weight'].isnull())),'load_weight'] = 1000
I would like to print out the rows from Excel where either the data exists or does not under a specific column. Whenever I run the code, I get this:
`Series([], dtype: int64)
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:15: FutureWarning:
Automatic reindexing on DataFrame vs Series comparisons is deprecated and will
raise ValueError in a future version. Do `left, right = left.align(right,
axis=1, copy=False)` before e.g. `left == right`
My snippet is:
'at5 = input("Erkély igen?: ")
if at5 == 'igen':
erkely = tables2[~tables2['balcony'].isnull()]
else:
erkely = tables2[~tables2['balcony'].notnull()]
#bt = tables2[(tables2['lakas_tipus'] ==at1) & (tables2['nm2'] >= at2) &
(tables2['nm2'] < at3 ) & (tables2['room'] == at4 ) & (tables2['balcony'] == erkely
)]'
Any idea how to approach this problem? I'm not getting the output I want.
I have 4 dataframes for 4 newspapers (newspaper1,newspaper2,newspaper3,newspaper4])
which have a single column for author name.
Now I'd like to merge these 4 dataframes into one, which has 5 columns: author, and newspaper1,newspaper2,newspaper3,newspaper4 which contain 1/0 value (1 for author writing for that newspaper)
import pandas as pd
listOfMedia =[newspaper1,newspaper2,newspaper3,newspaper4]
merged = pd.DataFrame(columns=['author','newspaper1','newspaper2', 'newspaper4', 'newspaper4'])
while this loop does what I intended (fills the merged df author columns with the name):
for item in listOfMedia:
merged.author = item.author
I can't figure out how to fill the newspapers columns with the 1/0 values...
for item in listOfMedia:
if item == newspaper1:
merged['newspaper1'] = '1'
elif item == newspaper2:
merged['newspaper2'] = '1'
elif item == newspaper3:
merged['newspaper3'] = '1'
else:
merged['newspaper4'] = '1'
I keep getting error
During handling of the above exception, another exception occurred:
TypeError: attrib() got an unexpected keyword argument 'convert'
Tried to google that error but didn't help me identify what the problem is.
What am I missing here? I also think there must be smarter way to fill the newspaper/author matrix, however don't seem to be able to figure out even this simple way. I am using jupyter notebook.
Actually you are setting all rows to 1 so use:
for col in merged.columns:
merged[col].values[:] = 1
I've taken a guess at what I think your dataframes look like.
newspaper1 = pd.DataFrame({'author': ['author1', 'author2', 'author3']})
newspaper2 = pd.DataFrame({'author': ['author1', 'author2', 'author4']})
newspaper3 = pd.DataFrame({'author': ['author1', 'author2', 'author5']})
newspaper4 = pd.DataFrame({'author': ['author1', 'author2', 'author6']})
Firstly we will copy the dataframes so we don't affect the originals:
newspaper1_temp = newspaper1.copy()
newspaper2_temp = newspaper2.copy()
newspaper3_temp = newspaper3.copy()
newspaper4_temp = newspaper4.copy()
Next we replace the index of each dataframe with the author name:
newspaper1_temp.index = newspaper1['author']
newspaper2_temp.index = newspaper2['author']
newspaper3_temp.index = newspaper3['author']
newspaper4_temp.index = newspaper4['author']
Then we concatenate these dataframes (matching them together by the index we set):
merged = pd.concat([newspaper1_temp, newspaper2_temp, newspaper3_temp, newspaper4_temp], axis =1)
merged.columns = ['newspaper1', 'newspaper2', 'newspaper3', 'newspaper4']
And finally we replace NaN's with 0 and then non-zero entries (they will still have the author names in them) as 1:
merged = merged.fillna(0)
merged[merged != 0] = 1
I want to run multiple filters on different columns like 'Frequency', 'Decile' and 'Audience' as 'all' and 'Dimension' = 'campaign' and KPI name='honda_2018...' from an excel sheet imported in pandas. I am running the following code:
def filter_df(df, *args):
for 'Frequency', 'All' in args:
df = df[df['Frequency'] == 'All']
return df
It is giving me an error SyntaxError: can't assign to literal. Please help
You can try .loc
Sample Data:
my_frame = pd.DataFrame(data={'name' : ['alex5','martha1','collin4','cynthia9'],
'simulation1':[71,4.8,65,4.7],
'simulation2':[71,4.8,69,4.7],
'simulation3':[70,3.8,68,4.9],
'experiment':[70.3,3.5,65,4.4]})
my_frame
Running this code below will return the index [1,2,3]:
my_frame.loc[(my_frame["simulation1"] == 4.8)]
Then if you want to filter more column use &, this code below will return index [2,3]:
my_frame.loc[(my_frame["simulation1"] == 4.8) & \
(my_frame["simulation2"] == 69)
]
Rinse and repeat until you're satisfied.
As I know it's possible
df = df[df['Frequency'] == 'All' and df['Something'] == 'Something else']
I want to compare two Data Frames and print out my differences in a selective way. Here is what I want to accomplish in pictures:
Dataframe 1
Dataframe 2
Desired Output - Dataframe 3
What I have tried so far?
import pandas as pd
import numpy as np
df1 = pd.read_excel("01.xlsx")
df2 = pd.read_excel("02.xlsx")
def diff_pd(df1, df2):
"""Identify differences between two pandas DataFrames"""
assert (df1.columns == df2.columns).all(), \
"DataFrame column names are different"
if any(df1.dtypes != df2.dtypes):
"Data Types are different, trying to convert"
df2 = df2.astype(df1.dtypes)
if df1.equals(df2):
return None
else: # need to account for np.nan != np.nan returning True
diff_mask = (df1 != df2) & ~(df1.isnull() & df2.isnull())
ne_stacked = diff_mask.stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'Naziv usluge']
difference_locations = np.where(diff_mask)
changed_from = df1.values[difference_locations]
changed_to = df2.values[difference_locations]
return pd.DataFrame({'Service Previous': changed_from, 'Service Current': changed_to},
index=changed.index)
df3 = diff_pd(df1, df2)
df3 = df3.fillna(0)
df3 = df3.reset_index()
print(df3)
To be fair i found that code on another thread, but it does get job done, but I still have some issues.
My dataframes are not equal, what do I do?
I don't fully understand the code I provided.
Thank you!
How about something easier to start with ...
Try this
import pandas as pd
data1={'Name':['Tom','Bob','Mary'],'Age':[20,30,40],'Pay':[10,10,20]}
data2={'Name':['Tom','Bob','Mary'],'Age':[40,30,20]}
df1=pd.DataFrame.from_records(data1)
df2=pd.DataFrame.from_records(data2)
# Checking Columns
for col in df1.columns:
if col not in df2.columns:
print(f"DF2 Missing Col {col}")
# Check Col Values
for col in df1.columns:
if col in df2.columns:
# Ok we have the same column
if list(df1[col]) == list(df2[col]):
print(f"Columns {col} are the same")
else:
print(f"Columns {col} have differences")
It should output
DF2 Missing Col Pay
Columns Age have differences
Columns Name are the same
Python3.7 needed or change the f-string formatting.