File exists but python says 'does not exist' - python

I'm running this code that reads 2 csv files (one of them is train.csv). The code gives an error saying 'file not does not exist'. However, the file does exists in the same location as the .py file. Can someone please help me on this. Thanks!
Reading dataset...
Traceback (most recent call last):
File "c:\Project_1\regression_2.py", line 163, in <module>
main(**args)
File "c:\Project_1\regression_2.py", line 80, in main
train_data = pd.read_csv(train)
File "c:\Python27\lib\site-packages\pandas\io\parsers.py", line 401, in parser_f
return _read(filepath_or_buffer, kwds)
File "c:\Python27\lib\site-packages\pandas\io\parsers.py", line 209, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "c:\Python27\lib\site-packages\pandas\io\parsers.py", line 509, in __init__
self._make_engine(self.engine)
File "c:\Python27\lib\site-packages\pandas\io\parsers.py", line 611, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "c:\Python27\lib\site-packages\pandas\io\parsers.py", line 893, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 312, in pandas._parser.TextReader.__cinit__
(pandas\src\parser.c:2846)
File "parser.pyx", line 512, in pandas._parser.TextReader._setup_parser_source
(pandas\src\parser.c:4893)
IOError: File train.csv does not exist
The variable is being referred to as ->
def main(train='train.csv', test='test.csv', submit='logistic_pred.csv'):
print "Reading dataset..."
train_data = pd.read_csv(train)
test_data = pd.read_csv(test)

You are opening a relative path, but your working directory is not what you think it is.
Use an absolute path instead:
train = os.path.join('c:/Documents and Settings', train)
Without an absolute path, Python uses the current working directory. What that directory is depends on how you ran your script, and is not something you should rely on.

Related

Can't open some .csv file using read_csv()

I am writing a Python function to open two .csv files and make changes to the data inside. I am using pandas and pd.read_csv('text') to open the files. Everything works well and the function works for one .csv file. However, when I try it on a different smaller .csv file the file cannot even open.
This is part of the error I am getting when I try to open the .csv file.
Traceback (most recent call last):
File "C:\Users\...\Downloads\test\test.py", line 3, in <module>
df = pd.read_csv('data2.csv')
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 611, in _read
return parser.read(nrows)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 8836, saw 5
This is the code I am using to access the .csv files.
import pandas as pd
df = pd.read_csv('test.csv')
All the files are in the correct folders and the file paths are all correct. Any help is appreciated, thanks

BadZipFile: file is not a zip file in reading excel file using pandas

I'm trying to execute the following code and I'm constantly experiencing this issue.
import pandas as pd
df = pd.read_excel('First_Run.xlsx', engine='openpyxl')
print(df.head())
I've make sure that excel file is there at the respective path. Have tried multiple ways to resolve the issue but failed to find the desired solution.
Here's the output of the code block.
Traceback (most recent call last):
File "c:\Users\fharookshaik\Desktop\Gmail Bot\temp.py", line 7, in <module>
df = pd.read_excel('First_Run.xlsx',engine='openpyxl')
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_base.py", line 1131, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_openpyxl.py", line 475, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_base.py", line 391, in __init__
self.book = self.load_workbook(self.handles.handle)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_openpyxl.py", line 486, in load_workbook
return load_workbook(
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\zipfile.py", line 1269, in __init__
self._RealGetContents()
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\zipfile.py", line 1336, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Hope, this will be answered soon by the brilliant minds on this dynamic community.
Thanks in well Advance. šŸ™‚

Python shell csv problem FileNotFoundError: [Errno 2] No such file or directory: 'iris.csv'

iris_df = pd.read_csv("iris.csv")
Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
iris_df = pd.read_csv("iris.csv")
File "C:\Python 3.8.1\lib\site-packages\pandas\io\parsers.py", line 610, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Python 3.8.1\lib\site-packages\pandas\io\parsers.py", line 462, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python 3.8.1\lib\site-packages\pandas\io\parsers.py", line 819, in __init__
self._engine = self._make_engine(self.engine)
File "C:\Python 3.8.1\lib\site-packages\pandas\io\parsers.py", line 1050, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\Python 3.8.1\lib\site-packages\pandas\io\parsers.py", line 1867, in __init__
self._open_handles(src, kwds)
File "C:\Python 3.8.1\lib\site-packages\pandas\io\parsers.py", line 1362, in _open_handles
self.handles = get_handle(
File "C:\Python 3.8.1\lib\site-packages\pandas\io\common.py", line 642, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'iris.csv'
What should i do? I imported pandas as pd but it gives me error.Please help
That's probably because you don't have the file in the exact folder as your code file. Try keeping them both in the same folder and try this:
import pandas as pd
iris_df = pd.read_csv('./iris.csv')
Or you can also try copying the exact path to your iris.csv file and then load it using:
iris_df = pd.read_csv(' #path to your iris.csv file')
For relative path the ./iris.csv tells the function to look in the same folder as that of the code file for the iris.csv file. This however won't work if your iris.csv file is in another folder.
In that case:
If you are on a Linux based system your exact path would be something like:
home/usr/Desktop/FolderName/iris.csv
For a Windows system the absolute path would be something like:
C:/Desktop/FolderName/iris.csv

ValueError: Missing column provided to 'parse_dates': 'CRASH_DATE, CRASH_TIME'

I made a streamlit app. It works fine when I run it locally.
But, after I push it to heroku, I got this value error on the parse_dates:
ValueError: Missing column provided to 'parse_dates': 'CRASH_DATE, CRASH_TIME'
Traceback:
File "/app/.heroku/python/lib/python3.6/site-packages/streamlit/script_runner.py", line 324, in _run_script
exec(code, module.__dict__)
File "/app/app.py", line 35, in <module>
data = load_data(100000)
File "/app/.heroku/python/lib/python3.6/site-packages/streamlit/caching.py", line 591, in wrapped_func
return get_or_create_cached_value()
File "/app/.heroku/python/lib/python3.6/site-packages/streamlit/caching.py", line 575, in get_or_create_cached_value
return_value = func(*args, **kwargs)
File "/app/app.py", line 27, in load_data
data = pd.read_csv(DATA_URL, nrows= rows, parse_dates=[['CRASH_DATE', 'CRASH_TIME']])
File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
self._make_engine(self.engine)
File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 2068, in __init__
self._validate_parse_dates_presence(self.names)
File "/app/.heroku/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 1546, in _validate_parse_dates_presence
f"Missing column provided to 'parse_dates': '{missing_cols}'"
This is the code to read the csv:
DATA_URL = ("https://github.com/chairielazizi/streamlit-collision/blob/master/Motor_Vehicle_Collisions_-_Crashes.csv")
#st.cache(persist=True)
def load_data(rows):
data = pd.read_csv(DATA_URL, nrows= rows, parse_dates=[['CRASH_DATE', 'CRASH_TIME']])
# data.seek(0)
data.dropna(subset =['LATITUDE', 'LONGITUDE'], inplace=True)
lowercase = lambda x: str(x).lower()
data.rename(lowercase,axis='columns',inplace=True)
data.rename(columns={'crash_date_crash_time':'date/time'},inplace=True)
return data
data = load_data(100000)
I tried changing the streamlit and pandas version but still got error. Error only happen when I set the DATA_URL to the csv file store in GitHub, but it works fine if I set it to my local file.
The code and the csv file for the project is here:
https://github.com/chairielazizi/streamlit-collision
When you reference a file on GitHub, you need to make sure you are accessing the "raw" version, not the version from the GitHub interface. Adding the ?raw=true parameter to your url should work:
https://github.com/chairielazizi/streamlit-collision/blob/master/Motor_Vehicle_Collisions_-_Crashes.csv?raw=true

Encoding with pandas.read_csv when file name has accents

I'm trying to load a CSV with pandas, but am running into a problem if the file name has accents. It's clearly an encoding problem, but although read_csv lets you set encoding for text within the file, I can't figure out how to encode the file name properly.
input_file = r'C:\...\Datasets\%s\Provinces\Points\%s.csv' % (country, province)
self.locs = pandas.read_csv(input_file,sep=',',skipinitialspace=True)
The CSV file is AnzoƔtegui.csv. When I'm getting errors,
input_file = 'C:\\...\Datasets\Venezuela\Provinces\Points\AnzoƔtegui.csv
Error code:
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist
So maybe it's converting my string to bytes? I tried using io.StringIO(input_file) as well, which puts the correct file name as a column header on an empty DataFrame:
Empty DataFrame
Columns: [C:\PF2\QGIS Valmiera\Datasets\Venezuela\Provinces\Points\AnzoƔtegui.csv]
Index: []
Any ideas on how to get this file to load? Unfortunately I can't just strip out accents, as I have to interface with software that requires the proper name, and I have a ton of files to format (not just the one). Thanks!
Edit: Full error
Traceback (most recent call last):
File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_comm.py", line 891, in doIt
result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_vars.py", line 486, in evaluateExpression
result = eval(compiled, updated_globals, frame.f_locals)
File "<string>", line 1, in <module>
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 404, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 486, in __init__
self._make_engine(self.engine)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 594, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 952, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 330, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3040)
File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5387)
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist
Ok folks, I got a little lost in dependency hell, but it turns out that this issue was fixed in pandas 0.14.0. Install the updated version to get files named with accents to import correctly.
Comments at github.
Thanks for the input!

Categories

Resources