dropping column using pandas-drop()-function not working - python

I have a file with time series data. From this file I want to remove the first column (containing the dates).
However, the following code:
from pandas import read_csv
dataset = read_csv('USrealGDPGrowthPred_Quarterly.txt', header=0)
dataset.drop('DATE', axis=1)
results in this error message:
ValueError: labels ['DATE'] not contained in axis
But: the label is contained in the file, as you can see in the screenshot.
What is going on here? How can I get rid of that column?
UPDATE:
the following code:
dataset = read_csv('USrealGDPGrowthPred_Quarterly.txt', header=0, sep='\t')
dataset.drop('DATE', axis=1)
print(dataset.head(5))
does not result in an error message but doesn't drop the column either. The data looks like nothing happened.

So there are 2 problems:
First need change separator to tab, because read_csv have default sep=',' as commented #cᴏʟᴅsᴘᴇᴇᴅ:
df = read_csv('USrealGDPGrowthPred_Quarterly.txt', header=0, sep='\t')
Or use read_table with default sep='\t':
df = df.read_table('USrealGDPGrowthPred_Quarterly.txt', header=0)
And then assign output back or use inplace=True in drop:
dataset = dataset.drop('DATE', axis=1)
Or:
dataset.drop('DATE', axis=1, inplace=True)`

I had a similar issue using df.drop(columns=['column'])
Adding The inplace=True to df.drop(columns=['column'], inplace=True) fixed it for me thank you!

Related

Cannot drop index column from DataFrame when convert to html

I'm trying to get rid off index column, when converting DataFrame into HTML, but even though I reset index or set index=False in to_html it is still there, however with no values.
df = df.set_index(['ID','Name','PM', 'Theme'])['Score'].unstack()
df = df.reset_index()
df_HTML = df.to_html(table_id = "table_score", index=False, escape=False)
Any idea how to get rid off that, please?
Try this:
df = df.set_index(['ID','Name','PM', 'Theme'])['Score'].unstack()
df = df.reset_index(drop=True).drop('Theme',axis=1)
df_HTML = df.to_html(table_id = "table_score", index=False, escape=False)
The error was caused because your theme columns seens to be your old index. And since you didnt drop in the reset_index method well, it stayed there.
If this doesnt work well just drop 'Theme'.

Cannot load data from spreadsheet properly

I have a spreadsheet looking like this:
I'm trying to read it into dataframe:
def loading_nasdaq_info_from_spreadsheet():
excel_file = 'nasdaq.xlsx'
nasdaq_info_dataframe = pandas.read_excel(excel_file, index_col=0)
# data cleaning
nasdaq_info_dataframe.dropna()
return nasdaq_info_dataframe
if __name__ == '__main__':
df = loading_nasdaq_info_from_spreadsheet()
print(df.loc['symbol'])
I constantly get
"raise KeyError(key) from err KeyError: 'Symbol'"
It doesn't matter which key I wanna print or use. It is always the same error. What's even worse, even I manually (in excel) set everything to text, when I'm trying to
nasdaq_info_dataframe.applymap(lambda text: text.strip())
I get
'float' doesn't have strip()
I fight with this for a few hours now, so please help me.
EDIT:
Printing
print(df.loc)
gives
<pandas.core.indexing._LocIndexer object at 0x1160e8778>
Printing
print(df.columns)
gives
Index(['Name', 'Sector', 'Industry'], dtype='object')
Furthermore, if I remove multiindex by removing "index_col=0", I still have the same keyerror when I'm printing df.loc['Symbol']
Printing df.head() gives
The problem is in df.loc['symbol'].
use df.loc[:, 'Symbol'] or df['Symbol'] instead.
if Symbol is the df's index, then apply df = df.reset_index() first.
You can get more detail in pandas official guide Indexing and selecting data.

When trying to drop a column from my dataset using pandas, i get the error "['churn'] not found in axis"

I want the x to be all the columns except "churn" column.
But when i do the below i get the "['churn'] not found in axis" error, eventhough i can see the column name when i write "print(list(df.column))"
Here is my code:
import pandas as pd
import numpy as np
df = pd.read_csv("/Users/utkusenel/Documents/Data Analyzing/data.csv", header=0)
print(df.head())
print(df.columns)
print(len(df.columns))
x = df.drop(["churn"], axis=1) ## this is the part it gives the error
I am adding a snippet of my dataset as well:
account_length;area_code;international_plan;voice_mail_plan;number_vmail_messages;total_day_minutes;total_day_calls;total_day_charge;total_eve_minutes;total_eve_calls;total_eve_charge;total_night_minutes;total_night_calls;total_night_charge;total_intl_minutes;total_intl_calls;total_intl_charge;number_customer_service_calls;churn;
1;KS;128;area_code_415;no;yes;25;265.1;110;45.07;197.4;99;16.78;244.7;91;11.01;10;3;2.7;1;no
2;OH;107;area_code_415;no;yes;26;161.6;123;27.47;195.5;103;16.62;254.4;103;11.45;13.7;3;3.7;1;no
3;NJ;137;area_code_415;no;no;0;243.4;114;41.38;121.2;110;10.3;162.6;104;7.32;12.2;5;3.29;0;no
I see that your df snippet is separeted with ';' (semi colon). If that is what your actual data looks like, then probably your csv is being read wrong. Please try adding sep=';' to read_csv function.
df = pd.read_csv("/Users/utkusenel/Documents/Data Analyzing/data.csv", header=0, sep=';')
I also suggest print df.columns again and check if there is a leading or trailing whitespace in the column name for churn.

dataframe values converted as 'nan' after applied df.iloc()

nan values
I ran into a problem after runnning: pd.DataFrame(), the whole data-frame became 'nan' (empty). I could not reverse this again. I also assigned the data-frame columns names, but their values also disappeared:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('PuntaCapi.csv', header=None, sep='\n')
df = df[0].str.split(',', expand=True)
df.to_csv("PuntaCapi.tab",sep="\t",header=None, index=False)
print(df)
Akim =df.iloc[:,0:1]
A= pd.DataFrame(data =Akim ,columns=['Akim'])
veriler2 = pd.DataFrame(data = df, columns=['Akim','Kuvvet','Zaman','Soguma','Yaklasma','Baski','SacKalinliği','PuntaCapi'])
print(veriler2)
Please view the following results from the above mentioned code:
[![Spyder View DataFrame code [][2]][2]1
There is no nan value into the csv file. But after .iloc[], entire dataframe became nan value. I have tried solve the problem but I could not. I need help to solve problem
enter image description here
I do not understand your question.
You read data using pd.read_csv('PuntaCapi.csv', header=None, sep='\n') and save it as df, but you modify df as df[0].str.split(',', expand=True), which directly impact on the result.
Try this code.
df = pd.read_csv('PuntaCapi.csv', header=None, sep='\n')
veriler2 = pd.DataFrame(data = df.values, columns=['Akim','Kuvvet','Zaman','Soguma','Yaklasma','Baski','SacKalinliği','PuntaCapi'])

Sum all columns in each column in csv using Pandas

The program I have written generally has done what I've wanted it to do - for the most part. To add totals of each column. My dataframe uses the csv file format. My code is below:
import pandas as pd
import matplotlib.pyplot
class ColumnCalculation:
"""This houses the functions for all the column manipulation calculations"""
def max_electricity(self):
df.set_index('Date', inplace=True)
df.loc['Total'] = df.sum()
print(df)
df = pd.read_csv("2011-onwards-city-elec-consumption.csv")
ColumnCalculation.max_electricity(df)
Also my dataset (I didn't know how to format it properly)
The code nicely adds up all totals into a total column at the bottom of each column, except when it comes to the last column(2017)(image below):
I am not sure the program does is, I've tried to use different formatting options like .iloc or .ix but it doesn't seem to make a difference. I have also tried adding each column individually (below):
def max_electricity(self):
df.set_index('Date', inplace=True)
df.loc['Total', '2011'] = df['2011'].sum()
df.loc['Total', '2012'] = df['2012'].sum()
df.loc['Total', '2013'] = df['2013'].sum()
df.loc['Total', '2014'] = df['2014'].sum()
df.loc['Total', '2015'] = df['2015'].sum()
df.loc['Total', '2016'] = df['2016'].sum()
df.loc['Total', '2017'] = df['2017'].sum()
print(df)
But I receive an error, as I assume this would be too much? I've tried to figure this out for a good hour and a bit.
Your last column isn't being parsed as floats, but strings.
To fix this, try casting to numeric before summing:
import locale
locale.setlocale(locale.LC_NUMERIC, '')
df['2017'] = df['2017'].map(locale.atoi)
Better still, try reading in the data as numeric data. For example:
df = pd.read_csv('file.csv', sep='\t', thousands=',')

Categories

Resources