when I was using pd.to_datetime, my code is like below
rate = pd.read_csv('P2training.csv', header=0)
rate['Date'] = pd.to_datetime(rate['Date'], format='%Y-%m-%d')
rate.set_index('Date', inplace=True, drop=True)
rate.tail(10)
print(rate)
in P2training.csv, first column is 'Date' and this code ran well when I first downloaded P2training dataset. However after I open the csv file and save it without doing anything else, this code started to report errors below. If I put the original downloaded file to replace the 'saved' file, the code can still run properly.
C:\Users\yaojia\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\compat\pandas.py:56:
FutureWarning: The pandas.core.datetools module is deprecated and will
be removed in a future version. Please use the pandas.tseries module
instead. from pandas.core import datetools Traceback (most recent
call last): File
"C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 444, in _convert_listlike
values, tz = tslib.datetime_to_datetime64(arg) File "pandas_libs\tslib.pyx", line 1810, in
pandas._libs.tslib.datetime_to_datetime64 (pandas_libs\tslib.c:33275)
TypeError: Unrecognized value type:
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File
"C:/Users/yaojia/.PyCharmEdu4.0/config/scratches/scratch_7.py", line
23, in
rate['Date'] = pd.to_datetime(rate['Date'], format='%Y-%m-%d') File
"C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 509, in to_datetime
values = _convert_listlike(arg._values, False, format) File "C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 447, in _convert_listlike
raise e File "C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 435, in _convert_listlike
require_iso8601=require_iso8601 File "pandas_libs\tslib.pyx", line 2355, in pandas._libs.tslib.array_to_datetime
(pandas_libs\tslib.c:46617) File "pandas_libs\tslib.pyx", line
2484, in pandas._libs.tslib.array_to_datetime
(pandas_libs\tslib.c:44616) ValueError: time data '12/31/1979'
doesn't match format specified
Process finished with exit code 1
Could anyone give any hint what's going wrong?
I guess you open the csv with excel? If yes, excel recognize that column 'Date' are indeed dates and parse the column in it's own date format (in your case 'day/month/year') and save it this way while you are expecting 'year-month-day'.
I suggest you to open/save your csv with a text editor or change the default excel date format...
Related
gss = pd.read_hdf('gss.hdf5', 'gs')
this the code i have used on VS code. and i got this
Traceback (most recent call last):
File "d:\pthon_txt\t.py", line 4, in <module>
gss = pd.read_hdf('gss.hdf5', 'gs')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mohammed\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\pytables.py", line 442, in read_hdf
return store.select(
^^^^^^^^^^^^^
File "C:\Users\Mohammed\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\pytables.py", line 847, in select
raise KeyError(f"No object named {key} in the file")
KeyError: 'No object named gs in the file'
PS D:\pthon_txt>
i wanna to load this hdf file in pandas data frame
To know which keys stored in your HDF store, use the following code:
with pd.HDFStore('gss.hdf5') as store:
print(store.keys())
After that, you will be able to load your data with the correct key:
gss = pd.read_hdf('gss.hdf5', <KEY>)
The error is saying that the key gs doesn't exist in the file. If there's only one key you can use read_hdf without the key parameter, eg :
df = pd.read_hdf('gss.hdf5')
I'm new into this coding world (like 2 weeks old) so I just ran into a problem. I was following a tutorial like most of us did in the begging. The task was to add a new column called "Month". To do that they suggest to take the 2 first numbers from the column called "Order Date". I wrote the code by letter from the tutorial, the only difference was that I was using Pycharm and they Jupyter Notebook. I like Pycharm so maybe someone knows how to solve this.
The code is the following:
import pandas as pd
import os
files = [file for file in os.listdir("./Files")]
allmonths = pd.DataFrame()
for file in files:
df = pd.read_csv("./Files/" + file)
allmonths = pd.concat([allmonths,df])
alldata = pd.read_csv("allmonths.csv")
### Month Column addition
alldata["Month"] = alldata["Order Date"].str[0:2]
allmonths['Month']
print(alldata.head())
The Traceback:
Traceback (most recent call last):
File "D:\Coding\Sales_Data\venv\lib\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Order Date'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\Coding\Sales_Data\Sales Anal.py", line 11, in
alldata["Month"] = alldata["Order Date"].str[0:2]
File "D:\Coding\Sales_Data\venv\lib\site-packages\pandas\core\frame.py", line 3505, in getitem
indexer = self.columns.get_loc(key)
File "D:\Coding\Sales_Data\venv\lib\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'Order Date'
I know the problem is something about the column names, and maybe that Pycharm can't get it from the CSV file. But, HOW to solve it... IDK
Could someone help me figure out why my files dont open.
import pandas as pd
file = "C://Dev//20211103_logfile Box 2.8.xlsx"
temp=pd.read_excel(file)
Here is the full error!
PS C:\Dev> & C:/Users/keyur/AppData/Local/Programs/Python/Python39/python.exe c:/Dev/test_excel.py
C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\workbook.py:88:
UserWarning: File contains an invalid specification for 20211103_logfile. This will be removed
warn(msg)
Traceback (most recent call last):
File "c:\Dev\test_excel.py", line 6, in <module>
temp=pd.read_excel(file)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 372, in read_excel
data = io.parse(
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 1272, in parse
return self._reader.parse(
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 537, in parse
sheet = self.get_sheet_by_index(asheetname)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_openpyxl.py", line 546, in get_sheet_by_index
self.raise_if_bad_sheet_by_index(index)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 468, in raise_if_bad_sheet_by_index
raise ValueError(
ValueError: Worksheet index 0 is invalid, 0 worksheets found
PS C:\Dev>
There are problem with your excel,
try make a new excel and copy pase all data ,then try again ,this method works for me.
I am getting the following error when trying to export a pandas DataFrame to csv.
Traceback (most recent call last):
File "C:/Users/riley/PycharmProjects/EarlyPaidLoanReport/EarlyPaidOff.py", line 91, in <module>
LastTransactionDate.to_csv(LastTransactionDate, 'example.csv')
File "C:\Users\riley\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1344, in to_csv
formatter.save()
File "C:\Users\riley\Anaconda3\lib\site-packages\pandas\formats\format.py", line 1526, in save
compression=self.compression)
File "C:\Users\riley\Anaconda3\lib\site-packages\pandas\io\common.py", line 424, in _get_handle
f = open(path, mode, errors='replace')
TypeError: invalid file: AutoNumber LoanAgreementID \
I'm not sure why I am getting this error. I've been writing to csv using pandas many times in the past. Could someone please help to fix this error?
LastTransactionDate.to_csv(LastTransactionDate, 'example.csv')
Your syntax is wrong. Unless I am missing something, just do this:
LastTransactionDate.to_csv('example.csv')
I created a file by using:
store = pd.HDFStore('/home/.../data.h5')
and stored some tables using:
store['firstSet'] = df1
store.close()
I closed down python and reopened in a fresh environment.
How do I reopen this file?
When I go:
store = pd.HDFStore('/home/.../data.h5')
I get the following error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/io/pytables.py", line 207, in __init__
self.open(mode=mode, warn=False)
File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/io/pytables.py", line 302, in open
self.handle = _tables().openFile(self.path, self.mode)
File "/apps/linux/python-2.6.1/lib/python2.6/site-packages/tables/file.py", line 230, in openFile
return File(filename, mode, title, rootUEP, filters, **kwargs)
File "/apps/linux/python-2.6.1/lib/python2.6/site-packages/tables/file.py", line 495, in __init__
self._g_new(filename, mode, **params)
File "hdf5Extension.pyx", line 317, in tables.hdf5Extension.File._g_new (tables/hdf5Extension.c:3039)
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "H5F.c", line 1582, in H5Fopen
unable to open file
File "H5F.c", line 1373, in H5F_open
unable to read superblock
File "H5Fsuper.c", line 334, in H5F_super_read
unable to find file signature
File "H5Fsuper.c", line 155, in H5F_locate_signature
unable to find a valid file signature
End of HDF5 error back trace
Unable to open/create file '/home/.../data.h5'
What am I doing wrong here? Thank you.
In my hands, following approach works best:
df = pd.DataFrame(...)
"write"
with pd.HDFStore('test.h5', mode='w') as store:
store.append('df', df, data_columns= df.columns, format='table')
"read"
with pd.HDFStore('test.h5', mode='r') as newstore:
df_restored = newstore.select('df')
You could try doing instead:
store = pd.io.pytables.HDFStore('/home/.../data.h5')
df1 = store['firstSet']
or use the read method directly:
df1 = pd.read_hdf('/home/.../data.h5', 'firstSet')
Either way, you should have pandas 0.12.0 or higher...
I had the same problem and finally fixed it by installing the pytables module (next to the pandas modules which I was using):
conda install pytables
which got me numexpr-2.4.3 and pytables-3.2.0
After that it worked. I am using pandas 0.16.2 under python 2.7.9