I am reading a CSV file using this piece of code:
import pandas as pd
import os
#Open document (.csv format)
path=os.getcwd()
database=pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0)
#Date in the requiered format
size=len(database.Date)
I get the next error: 'DataFrame' object has no attribute 'Date'
As you can see in the image, the first column of mydocument.csv is called Date. This is weird because I used this same procedure to work with this document, and it worked.
Try using delimeter=',' . It must be a comma.
(Can't post comments yet, but think I can help).
The explanation for your problem is, simply, you don't have a column called Date. Has pandas interpretted Date as an index column?
IF your Date is spelled correctly (no trailing whitespace or something else that might confuse things), then try this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=False)
or if that fails this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=None)
In my experience, I've had pandas unexpectedly infer an index_col.
Related
Having a comma separated string.When i export it in a CSV file,and then load the resultant CSV in pandas,i get my desired result. But when I try to load the string directly in pandas dataframe,i get error Filename too big. Please help me solve it.
I found the error. Actually it was showing that behaviour for large strings.
I used
import io
df= pd.read_csv(io.StringIO(str),sep=',', engine = 'python')
It solved the issue..str is the name of the string.
I have been trying to rename the column name in a csv file which I have been working on through Google-Colab. But the same line of code is working on one column name and is also not working for the other.
import pandas as pd
import numpy as np
data = pd.read_csv("Daily Bike Sharing.csv",
index_col="dteday",
parse_dates=True)
dataset = data.loc[:,["cnt","holiday","workingday","weathersit",
"temp","atemp","hum","windspeed"]]
dataset = dataset.rename(columns={'cnt' : 'y'})
dataset = dataset.rename(columns={"dteday" : 'ds'})
dataset.head(1)
The Image below is the dataframe called data
The Image below is dataset
This image is the final output which I get when I try to rename the dataframe.
The column name "dtedate" is not getting renamed but "cnt" is getting replaced "y" by the same code. Can someone help me out, I have been racking my brain on this for sometime now.
That's because you're setting dteday as your index, upon reading in the csv, whereas cnt is quite simply a column. Avoid the index_col attribute in read_csv and instead perform dataset = dataset.set_index('ds') after renaming.
An alternative in which only your penultimate line (trying to rename the index) would need to be changed:
dataset.index.names = ['ds']
You can remove the 'index-col' in the read statement, include 'dtedate' in your dataset and then change the column name. You can make the column index using df.set_index later.
When I run the following code
import glob,os
import pandas as pd
dirpath = os.getcwd()
inputdirectory = dirpath
for xls_file in glob.glob(os.path.join(inputdirectory,"*.xls*")):
data_xls = pd.read_excel(xls_file, sheet_name=0, index_col=None)
csv_file = os.path.splitext(xls_file)[0]+".csv"
data_xls.to_csv(csv_file, encoding='utf-8', index=False)
It will convert all xls files in the folder into CSV as I want.
HOWEVER, on doing so, any dates such as 20/12/2018 will be converted to 20/12/2018 00:00:00 which is causing major issues with later data processing.
What is going wrong with this?
Nothing is "going wrong" per se. You simply need to provide a custom date_format to df.to_csv:
date_format : string, default None
Format string for datetime objects
In your case that would be
data_xls.to_csv(csv_file, encoding='utf-8', index=False, date_format='%d/%m/%Y')
This will fix the way the raw data is saved to the file. If you will open the file in Excel you may still see it using the full format. This is because Excel tries to assume the cell formats based on their content. You will need to right click the column and select another cell formatting, there is nothing that pandas or Python can do about that (as long as you are using to_csv and not to_excel).
if the above answers still don't work, try this?
import datetime as dt
xls_data['date']=pd.to_datetime(xls_data['date'], format="%d/%m/%y")
xls_data['date'] = xls_data['date'].dt.date
The original xls file is actually storing this fields as datetime.
When you open it with Excel - you seeing it formated the way Excel think you want to see it based on your settings / OS locale / etc.
When python reads the file, the date cells becomes python date objects.
CSV files are basically just text, it cannot holds datetime objects.
When python needs to write datetime object to a text file it gets the full text.
So you have 2 options:
Change the original file date column to text type.
or the better option:
Use python to iterate this fields and change it the text format you would like to see in the csv.
I just tried to reproduce your issue with no success:
>>>import pandas as pd
>>>xls_data = pd.read_excel('test.xls', sheet_name=0, index_cole=None)
>>>xls_data
name date
0 walla 1988-12-10
1 cool 1999-12-10
>>>xls_data.to_csv(encoding='utf-8', index=False)
'name,date\nwalla,1988-12-10\ncool,1999-12-10\n'`
P.S. Any time you deal with datetime objects you should test the result to see if anything change based on your pc locale settings.
I'm trying loop through a folder of csv's and put them into a dataframe, change certain columns into an integer, before passing them through a Django model. Here is my code:
import glob
import pandas as pd
path = 'DIV1FCS_2017/*/*'
for fname in glob.glob(path):
df = pd.read_csv(fname)
df['Number'].apply(pd.to_numeric)
I am receiving the following: ValueError: Unable to parse string
Does anybody know if I can convert a column of strings into integers using pd.to_numeric from within a loop? Outside of the loop it seems to work properly.
I think you probably have some non-numbers data stored in your dataframe, and that's what's casuing the error.
You can examine your data and make sure everything's fine. In the meantime, you can also do pd.to_numeric(errors="ignore") to ignore errors for now.
I want to read this csv file in pandas as a DataFrame. Then I would like to split the resulting strings from colons.
I import using:
df_r = pd.read_csv("report.csv", sep=";|,", engine="python")
Then split using:
for c in df_r:
if df_r[c].dtype == "object":
df_r[c] = df_r[c].str.split(':')
But I get the following error:
ValueError: could not convert string to float: '"\x001\x000\x00'
Any idea what I am doing wrong?
Edit:
The error actually shows when I try to convert one of the strings to a float
print(float(df_r["Laptime"].iloc[0][2]))
I ran your code and everything works fine. You can try catching the error and print the row that has that strange behaviour and manual inspect that.
Is that the entire dump you are using? I saw that you are assigning the csv to the variable a and using df_r afterwards so I think you are doing something else in between.
If the csv file is complete be aware that the last line is empty and create a row full of NaNs. You want to read the csv with skipfooter=1.
a = pd.read_csv('report.csv', sep=";|,", engine="python", skipfooter=1)
Edit:
You can convert it to float like this:
print(float(df_r["Laptime"].iloc[0][2].replace(b'\x00',b'')))