Cannot load data from spreadsheet properly - python

I have a spreadsheet looking like this:
I'm trying to read it into dataframe:
def loading_nasdaq_info_from_spreadsheet():
excel_file = 'nasdaq.xlsx'
nasdaq_info_dataframe = pandas.read_excel(excel_file, index_col=0)
# data cleaning
nasdaq_info_dataframe.dropna()
return nasdaq_info_dataframe
if __name__ == '__main__':
df = loading_nasdaq_info_from_spreadsheet()
print(df.loc['symbol'])
I constantly get
"raise KeyError(key) from err KeyError: 'Symbol'"
It doesn't matter which key I wanna print or use. It is always the same error. What's even worse, even I manually (in excel) set everything to text, when I'm trying to
nasdaq_info_dataframe.applymap(lambda text: text.strip())
I get
'float' doesn't have strip()
I fight with this for a few hours now, so please help me.
EDIT:
Printing
print(df.loc)
gives
<pandas.core.indexing._LocIndexer object at 0x1160e8778>
Printing
print(df.columns)
gives
Index(['Name', 'Sector', 'Industry'], dtype='object')
Furthermore, if I remove multiindex by removing "index_col=0", I still have the same keyerror when I'm printing df.loc['Symbol']
Printing df.head() gives

The problem is in df.loc['symbol'].
use df.loc[:, 'Symbol'] or df['Symbol'] instead.
if Symbol is the df's index, then apply df = df.reset_index() first.
You can get more detail in pandas official guide Indexing and selecting data.

Related

Cannot drop index column from DataFrame when convert to html

I'm trying to get rid off index column, when converting DataFrame into HTML, but even though I reset index or set index=False in to_html it is still there, however with no values.
df = df.set_index(['ID','Name','PM', 'Theme'])['Score'].unstack()
df = df.reset_index()
df_HTML = df.to_html(table_id = "table_score", index=False, escape=False)
Any idea how to get rid off that, please?
Try this:
df = df.set_index(['ID','Name','PM', 'Theme'])['Score'].unstack()
df = df.reset_index(drop=True).drop('Theme',axis=1)
df_HTML = df.to_html(table_id = "table_score", index=False, escape=False)
The error was caused because your theme columns seens to be your old index. And since you didnt drop in the reset_index method well, it stayed there.
If this doesnt work well just drop 'Theme'.

Changing column values for a value in an adjacent column in the same dataframe using Python

I am quite new to Python programming.
I am working with the following dataframe:
Before
Note that in column "FBgn", there is a mix of FBgn and FBtr string values. I would like to replace the FBtr-containing values with FBgn values provided in the adjacent column called "## FlyBase_FBgn". However, I want to keep the FBgn values in column "FBgn". Maybe keep in mind that I am showing only a portion of the dataframe (reality: 1432 rows). How would I do that? I tried the replace() method from Pandas, but it did not work.
This is actually what I would like to have:
After
Thanks a lot!
With Pandas, you could try:
df.loc[df["FBgn"].str.contains("FBtr"), "FBgn"] = df["## FlyBase_FBgn"]
Welcome to stackoverflow. Please next time provide more info including your code. It is always helpful
Please see the code below, I think you need something similar
import pandas as pd
#ignore the dict1, I just wanted to recreate your df
dict1= {"FBgn": ['FBtr389394949', 'FBgn3093840', 'FBtr000025'], "FBtr": ['FBgn546466646', '', 'FBgn15565555']}
df = pd.DataFrame(dict1) #recreating your dataframe
#print df
print(df)
#function to replace the values
def replace_values(df):
for i in range(0, (df.size//2)):
if 'tr' in df['FBgn'][i]:
df['FBgn'][i] = df['FBtr'][i]
return df
df = replace_values(df)
#print new df
print(df)

Pandas Dataframe subset not working as expected

This seemingly simple exercise is throwing me off my tracks, I'm sure it's something simple skipping my eye.
Let's say I have a dataframe
datas = pd.DataFrame({'age':[10,20,30],
'name':['John','Mark','Lisa']})
I now want to subset the dataframe by the name 'Mark' so I did:
if (datas['name']=='Mark').any():
datas.loc[datas['name'] == 'Mark']
else:
print('no')
Expected result is
age name
20 Mark
but I get the original dataframe back again, please assist.
I've looked at several posts but none seems to help.
Posts example I looked at: Check if string is in a pandas dataframe
I think you need assign back to original DataFrame if need overwrite original DataFrame by subset:
datas = datas.loc[datas['name'] == 'Mark']
Or assign to new variable, e.g. df1:
df1 = datas.loc[datas['name'] == 'Mark']
Next if data pare processing and assign putput to new variable like df1is necessary use DataFrame.copy for prevent SettingWithCopyWarning:
df1 = datas.loc[datas['name'] == 'Mark'].copy()
If you modify values in df1 later you will find that the modifications do not propagate back to the original data (df), and that Pandas does warning.
Did you mean to print the subset? Right now your code doesn't change anything.
if (datas['name']=='Mark').any():
print( datas.loc[datas['name'] == 'Mark'] )
else:
print('no')
You can change your dataset even in one line:
datas = datas[datas['name']=='Mark']

Weird SettingWithCopyWarning in pandas

I read lot of posts and information about that error in pandas. I try to write a code in different way, but nothing helps and it's still unclear to me.
I have a dataframe - df1 with three cols:
SampleIdInt, Username, Signature
When I try to add a column filled '1' to df I got an SettingWithCopyWarning.
Code:
df1['PolName'] = 1
In my another script that code didn't throw error, so why in this case i got it? I have just delared dataframe and dictionary.
Next, when I try to translate values by a dictionary it again throw a error. Code:
df1.loc[:,'PolName'] = df1['SampleIdInt'].apply(lambda y:slownik[y] if y in slownik.keys() else 'None')
I tried with loc, iloc, with different code syntax. Everytime I gor an error. What is weird? Somethimes I got an error and code modified a df anyway, sometimes I got an error, and df stay unchaged - I don't change anything in that code.
Can someone explain me what is a problem, exactly on above example?
Important parts of code. Could be useful:
baza = pd.read_csv('Zeszyt1.csv', sep = ';')
snps = pd.read_csv('snps.csv', header = None, low_memory = False, sep = ',')
vals = snps.loc[snps.duplicated(['SampleIdInt',5]), 'SampleIdInt'].unique()
mask = snps['SampleIdInt'].isin(vals)
df1 = snps[~mask]
What you can do is:
df1['PolName'] = df1['SampleIdInt'].map(slownik).fillna('None')
Also you can add copy() to df1 assignment:
df1 = snps[~mask].copy()

Unable to drop column, object has no attribute error

I have a csv file with column titles: name, mfr, type, calories, protein, fat, sodium, fiber, carbo, sugars, vitamins, rating. When I try to drop the sodium column, I don't understand why I'm getting a NoneType' object has no attribute 'drop' error
I've tried
df.drop(['sodium'],axis=1)
df = df.drop(['sodium'],axis=1)
df = df.drop (['sodium'], 1, inplace=True)
Here's your problem:
df = df.drop (['sodium'], 1, inplace=True)
This returns None (documentation) due to the inplace flag, and so you no longer have a reference to your dataframe. df is now None and None has no drop attribute.
My expectation is that you have done this (or something like it, perhaps dropping another column?) at some prior point in your code.
There is a similar question, you should have a look at,
Delete column from pandas DataFrame using del df.column_name
According to the answer,
`df = df.drop (['sodium'], 1, inplace=True)`
should rather be
df.drop (['sodium'], 1, inplace=True)
Although the first code,
df = df.drop(['sodium'],axis=1)
should work fine, if there is an error, try
print(df.columns)
to make sure that the columns are actually read from the csv file
use pd.read_csv(r'File_Path_with_name') and this will be sorted out as there is some issue with reading csv file.

Categories

Resources