I am trying to fill in the missing values of county column based on its add_suburb value. I tried the following two codes which doesn't work
for index, row in fileco.iterrows():
df.loc[df['add_suburb'].str.contains(str(row['place'])) & ( df['county'].str=='') , 'county'] = str('County '+row['county']).title()
for index, row in fileco.iterrows():
df.loc[df['add_suburb'].str.contains(str(row['place'])) & ( df['county'].str is None) , 'county'] = str('County '+row['county']).title()
But the following code works if i do not check for None or ==''.
for index, row in fileco.iterrows():
df.loc[df['add_suburb'].str.contains(str(row['place'])) , 'county'] = str('County '+row['county']).title()
What's the correct way to fill in only the missing column values? How should I correct the condition after the & ?
I don't exactly understand what you're trying to do in the loop (what are you looping over?), but I think it should work if you enclose your conditions in brackets like this:
df.loc[(condition1) & (condition2)] = "replacement"
Related
i do'nt know why but the code to calculate rows with missing values doesn't work.
Can somebody please hlep?
excel file showing data
code in IDE
in excel, the rows that have missing values were 156 in total but i can't get this in python
using the code below
(kidney_df.isna().sum(axis=1) > 0).sum()
count=0
for i in kidney_df.isnull().sum(axis=1):
if i>0:
count=count+1
kidney_df.isna().sum().sum()
kidney_df is a whole dataframe, do you want to count each empty cell or just the empty cells in one column? Based on the formula in your image, it seems your are interested only in column 'Z'. You can specify that by using .iloc[] (index location) or by specifying the column name (not visible in your imgage) like so:
kidney_df.iloc[:, 26].isnull().sum()
Explaination:
.iloc[] # index location
: # meaning -> from row 0 to last row or '0:-1' which can be shortened to ':'
26 # which is the column index of column 'Z' in excel
I have a dataframe that might look like this:
print(df_selection_names)
name
0 fatty red meat, like prime rib
0 grilled
I have another dataframe, df_everything, with columns called name, suggestion and a lot of other columns. I want to find all the rows in df_everything with a name value matching the name values from df_selection_names so that I can print the values for each name and suggestion pair, e.g., "suggestion1 is suggested for name1", "suggestion2 is suggested for name2", etc.
I've tried several ways to get cell values from a dataframe and searching for values within a row including
# number of items in df_selection_names = df_selection_names.shape[0]
# so, in other words, we are looping through all the items the user selected
for i in range(df_selection_names.shape[0]):
# get the cell value using at() function
# in 'name' column and i-1 row
sel = df_selection_names.at[i, 'name']
# this line finds the row 'sel' in df_everything
row = df_everything[df_everything['name'] == sel]
but everything I tried gives me ValueErrors. This post leads me to think I may be
way off, but I'm feeling pretty confused about everything at this point!
https://pandas.pydata.org/docs/reference/api/pandas.Series.isin.html?highlight=isin#pandas.Series.isin
df_everything[df_everything['name'].isin(df_selection_names["name"])]
I am trying to get the first non-null value of the list inside each row of Emails column to write to the Email_final1 then get the next value of the list inside each row of Emails, if there is one, to Emails_final2 otherwise to write Emails2 value to Emails2_final if not blank and doesn't equal 'Emails' otherwise leave Emails_final2 blank. Lastly if a value from Emails 2 was written to Emails_final1 then make Emails_final2 None I have tried many different ways to achieve this to no avail here is what I have so far including pseudo-code:
My Current Code:
df = pd.DataFrame({'Emails': [['jjj#gmail.com', 'jp#gmail.com', 'jc#gmail.com'],[None, 'www#gmail.com'],[None,None,None]],
'Emails 2': ['sss#gmail.com', 'zzz#gmail.com','ccc#gmail.com'],
'num_specimen_seen': [10, 2,3]},
index=['falcon', 'dog','cat'])
df['Emails_final1'] = df['Emails'].explode().groupby(level=0).first()
#pseudo code
df['Emails_final2'] = df['Emails'].explode().groupby(level=0).next() #I know next doesn't exist but I want it to try to get the next value of 'Emails' before trying to get 'Emails 2 values.
Desired Output:
Emails_final1 Emails_final2
falcon jjj#gmail.com jp#gmail.com
falcon www#gmail.com zzz#gmail.com
falcon ccc#gmail.com None
Any direction of how to approach a problem like this would be appreciated.
It looks a bit messy but it works. Basically, we keep a boolean mask from the first step in filling "Emails_final1" and use it in the second step to fill "Emails_final1".
To fill the second column, the idea is to use groupby + nth to get the second elements and if they don't match the previously selected emails; keep it (for example for the first row) but if it doesn't select from "Emails 2" column, unless it was already selected before (for example in the 3rd row):
exp_g = df['Emails'].explode().groupby(level=0)
df['Emails_final1'] = exp_g.first()
msk = df['Emails_final1'].notna()
df['Emails_final1'] = df['Emails_final1'].fillna(df['Emails 2'])
df['Emails_final2'] = exp_g.nth(1)
df['Emails_final2'] = df['Emails_final2'].mask(lambda x: ((x == df['Emails_final1']) | x.isna()) & msk, df['Emails 2'])
The relevant columns are:
Emails_final1 Emails_final2
falcon jjj#gmail.com jp#gmail.com
dog www#gmail.com zzz#gmail.com
cat ccc#gmail.com None
I know that one can compare a whole column of a dataframe and making a list out of all rows that contain a certain value with:
values = parsedData[parsedData['column'] == valueToCompare]
But is there a possibility to make a list out of all rows, by comparing two columns with values like:
values = parsedData[parsedData['column01'] == valueToCompare01 and parsedData['column02'] == valueToCompare02]
Thank you!
It is completely possible, but I have never tried using and in order to mask the dataframe, rather using & would be of interest in this case. Note that, if you want your code to be more clear, use ( ) in each statement:
values = parsedData[(parsedData['column01'] == valueToCompare01) & (parsedData['column02'] == valueToCompare02)]
Basically I am trying the take the previous row for the combination of ['dealer','State','city']. If I have multiple values in this combination I will get the Shifted value of this combination.
df['ShiftBY_D_S_C']= df.groupby(['dealer','State','city'])['dealer'].shift(1)
I am taking this ShiftBY_D_S_C column again and trying to take the count for the ['ShiftBY_D_S_C','State','city'] combination.
df['NewColumn'] = (df.groupby(['ShiftBY_D_S_C','State','city'])['ShiftBY_D_S_C'].transform("count"))+1
Below table shows what I am trying to do and it works well also. But when all the rows in ShiftBY_D_S_C column is nulls, this not working, as it have all null values. Any suggestions?
I am trying to see the NewColumn values like below when all the values in ShiftBY_D_S_C are NaN.
You could simply handle the special case that you describe with an if/else case:
if df['ShiftBY_D_S_C'].isna().all():
df['NewColumn'] = 1
else:
df['NewColumn'] = df.groupby(...)