Weird SettingWithCopyWarning in pandas - python

I read lot of posts and information about that error in pandas. I try to write a code in different way, but nothing helps and it's still unclear to me.
I have a dataframe - df1 with three cols:
SampleIdInt, Username, Signature
When I try to add a column filled '1' to df I got an SettingWithCopyWarning.
Code:
df1['PolName'] = 1
In my another script that code didn't throw error, so why in this case i got it? I have just delared dataframe and dictionary.
Next, when I try to translate values by a dictionary it again throw a error. Code:
df1.loc[:,'PolName'] = df1['SampleIdInt'].apply(lambda y:slownik[y] if y in slownik.keys() else 'None')
I tried with loc, iloc, with different code syntax. Everytime I gor an error. What is weird? Somethimes I got an error and code modified a df anyway, sometimes I got an error, and df stay unchaged - I don't change anything in that code.
Can someone explain me what is a problem, exactly on above example?
Important parts of code. Could be useful:
baza = pd.read_csv('Zeszyt1.csv', sep = ';')
snps = pd.read_csv('snps.csv', header = None, low_memory = False, sep = ',')
vals = snps.loc[snps.duplicated(['SampleIdInt',5]), 'SampleIdInt'].unique()
mask = snps['SampleIdInt'].isin(vals)
df1 = snps[~mask]

What you can do is:
df1['PolName'] = df1['SampleIdInt'].map(slownik).fillna('None')
Also you can add copy() to df1 assignment:
df1 = snps[~mask].copy()

Related

How to delete specific values from a column in a dataset (Python)?

I have a data set as below:
I want to remove 'Undecided' from my ['Grad Intention'] column. For this, I created a copy DataFrame and using the code as follows:
df_copy=df_copy.drop([df_copy['Grad Intention'] =='Undecided'], axis=1)
However, this is giving me an error.
How can I remove the row with 'Undecided'? Also, what's wrong with my code?
you could simply use:
df = df[df['Grad Intention'] != 'Undecided']
or
df.drop(df[df['Grad Intention'] == 'Undecided'].index, inplace = True)

pandas df.apply() not working with html.unescape()

I'm trying to decode html characters within a pandas dataframe.
I don't know why but my apply function won't work.
# requirements
import html
import pandas as pd
# This code works fine.
df = df.apply(lambda x: x + "TESTSTRING")
print(df) # "TESTSTRING" is appended to all values.
# This code also works fine. html.unescape() is working well.
fn = lambda x: html.unescape(x)
str = "Someting wrong with <b>E&S</b>"
print(fn(str)) # returns "Something wrong with <b>E&S</b>"
# However, the code below doesn't work. The "&" within the values dont' get decoded.
df2 = df.apply(fn)
print(df2) # The html characters aren't decoded!
It's really frustrating that the apply function and html.unescape() is working well separately, but I don't know why they don't work when they are together.
I've also tried axis=1
I'd really appreciate your help. Thanks in advance.
The problem is that html.unexcape() seems unvectorized, i.e. it accepts only one single string.
In case Your df is not really large, using applymap should still be sufficiently fast:
df2 = df.applymap(lambda x: html.unescape(x))
print(df2)

Changing column values for a value in an adjacent column in the same dataframe using Python

I am quite new to Python programming.
I am working with the following dataframe:
Before
Note that in column "FBgn", there is a mix of FBgn and FBtr string values. I would like to replace the FBtr-containing values with FBgn values provided in the adjacent column called "## FlyBase_FBgn". However, I want to keep the FBgn values in column "FBgn". Maybe keep in mind that I am showing only a portion of the dataframe (reality: 1432 rows). How would I do that? I tried the replace() method from Pandas, but it did not work.
This is actually what I would like to have:
After
Thanks a lot!
With Pandas, you could try:
df.loc[df["FBgn"].str.contains("FBtr"), "FBgn"] = df["## FlyBase_FBgn"]
Welcome to stackoverflow. Please next time provide more info including your code. It is always helpful
Please see the code below, I think you need something similar
import pandas as pd
#ignore the dict1, I just wanted to recreate your df
dict1= {"FBgn": ['FBtr389394949', 'FBgn3093840', 'FBtr000025'], "FBtr": ['FBgn546466646', '', 'FBgn15565555']}
df = pd.DataFrame(dict1) #recreating your dataframe
#print df
print(df)
#function to replace the values
def replace_values(df):
for i in range(0, (df.size//2)):
if 'tr' in df['FBgn'][i]:
df['FBgn'][i] = df['FBtr'][i]
return df
df = replace_values(df)
#print new df
print(df)

Cannot load data from spreadsheet properly

I have a spreadsheet looking like this:
I'm trying to read it into dataframe:
def loading_nasdaq_info_from_spreadsheet():
excel_file = 'nasdaq.xlsx'
nasdaq_info_dataframe = pandas.read_excel(excel_file, index_col=0)
# data cleaning
nasdaq_info_dataframe.dropna()
return nasdaq_info_dataframe
if __name__ == '__main__':
df = loading_nasdaq_info_from_spreadsheet()
print(df.loc['symbol'])
I constantly get
"raise KeyError(key) from err KeyError: 'Symbol'"
It doesn't matter which key I wanna print or use. It is always the same error. What's even worse, even I manually (in excel) set everything to text, when I'm trying to
nasdaq_info_dataframe.applymap(lambda text: text.strip())
I get
'float' doesn't have strip()
I fight with this for a few hours now, so please help me.
EDIT:
Printing
print(df.loc)
gives
<pandas.core.indexing._LocIndexer object at 0x1160e8778>
Printing
print(df.columns)
gives
Index(['Name', 'Sector', 'Industry'], dtype='object')
Furthermore, if I remove multiindex by removing "index_col=0", I still have the same keyerror when I'm printing df.loc['Symbol']
Printing df.head() gives
The problem is in df.loc['symbol'].
use df.loc[:, 'Symbol'] or df['Symbol'] instead.
if Symbol is the df's index, then apply df = df.reset_index() first.
You can get more detail in pandas official guide Indexing and selecting data.

Pandas Dataframe subset not working as expected

This seemingly simple exercise is throwing me off my tracks, I'm sure it's something simple skipping my eye.
Let's say I have a dataframe
datas = pd.DataFrame({'age':[10,20,30],
'name':['John','Mark','Lisa']})
I now want to subset the dataframe by the name 'Mark' so I did:
if (datas['name']=='Mark').any():
datas.loc[datas['name'] == 'Mark']
else:
print('no')
Expected result is
age name
20 Mark
but I get the original dataframe back again, please assist.
I've looked at several posts but none seems to help.
Posts example I looked at: Check if string is in a pandas dataframe
I think you need assign back to original DataFrame if need overwrite original DataFrame by subset:
datas = datas.loc[datas['name'] == 'Mark']
Or assign to new variable, e.g. df1:
df1 = datas.loc[datas['name'] == 'Mark']
Next if data pare processing and assign putput to new variable like df1is necessary use DataFrame.copy for prevent SettingWithCopyWarning:
df1 = datas.loc[datas['name'] == 'Mark'].copy()
If you modify values in df1 later you will find that the modifications do not propagate back to the original data (df), and that Pandas does warning.
Did you mean to print the subset? Right now your code doesn't change anything.
if (datas['name']=='Mark').any():
print( datas.loc[datas['name'] == 'Mark'] )
else:
print('no')
You can change your dataset even in one line:
datas = datas[datas['name']=='Mark']

Categories

Resources