Pandas can't find the relevant file - python

I'm attempting to try a t-test in python using the pandas module. However, the same error keeps occuring in which my target file cannot be found. In this case, the target file is brain_size.csv, where the separators are semi-colons. The values which are left blank are represented by a period.
Here's what I have keyed in:
import pandas as pd
data = pd.read_csv('This PC\Desktop\brain_size.csv', sep=';', na_values='.')
Here's the error message. It's a long string
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Users\Tina Gnali\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3427)
File "pandas\parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:6861)
OSError: File b'This PC\\Desktop\x08rain_size.csv' does not exist
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
I want to ask:
What am I doing wrong? Why I can't retrieve the target file?
Why my error elicits such a long error message?
What does the "parser" module do?

The problem is with using backslashes "\". You must avoid that. Backslash is reserved for something called escape characters, like new line being denoted with "\n" and stuff. Either use double backslashes "\\" or just forwardslashes "/" or raw literals in your read_csv():
"C:\\Users\\blabla\\"
or
"C:/Users/blabla/"
or
r"C:\Users\blabla\"
Regarding how to identify the error, look for the "error" string in the error message. It is here:
OSError: File b'This PC\\Desktop\x08rain_size.csv' does not exist
This tells you that Python is looking for a file called 'x08rain_size.csv', and obviously you don't have such a file. But what is x08rain? Could it be that b is replaced with x08 when you place a backslash in front of it? Let's ask this to Python:
In [247]: '\b'
Out[247]: '\x08'
There we go!

Sometimes you may not be able to use
"C:\\Users\\blabla\\" or "C:/Users/blabla/"
Solution 1. The other option could be this:
Open Anaconda prompt or cmd, then change the path. Lets assume you are in drive "c" and your folder is in drive e. So after opening the cmd write "e:" and hit enter. Then the command will show you "E:\>". Now you should write "cd E:\Users\blabla\desired_folder". After running that, you should write "jupyter notebook" and run it. It will generate and open a new notebook in the same folder that has your file.
Solution 2. The other simple solution is, after opening the jupyther notebook> file> use the folder icon and choose the right folder.

May be the sep is different it may be "," try it and if still it doesnt works try removing sep and na values and try to keep the file in same directory where the program is present or give actual path

Related

How to export .csv file from python and using pandas DataFrame

I am trying to export some filtered data from Python using Pandas DF to .csv file (Personal Learning project)
Code : df5.to_csv(r'/C:/Users/j/Downloads/data1/export.csv')
Error:
Traceback (most recent call last):
File "C:\Users\jansa\PycharmProjects\bbb\main.py", line 62, in <module>
df5.to_csv(r'/C:/Users/jansa/Downloads/data1/export.csv')
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\core\generic.py", line 3551, in to_csv
return DataFrameRenderer(formatter).to_csv(
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\io\formats\format.py", line 1180, in to_csv
csv_formatter.save()
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\io\formats\csvs.py", line 241, in save with get_handle(
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\io\common.py", line 697, in get_handle
check_parent_directory(str(handle))
File "C:\Users\jansa\PycharmProjects\bbb\venv\lib\site-packages\pandas\io\common.py", line 571, in check_parent_directory
raise OSError(rf"Cannot save file into a non-existent directory: '{parent}'")
OSError: Cannot save file into a non-existent directory: '\C:\Users\jansa\Downloads\data1'
I am researching, but cannot pinpoint the error.
Try
df.to_csv(r'C:\path\to\directory\filename.csv')
Generally, in Linux/Mac environment path separator is '/' but in windows, it is '\'. Also, the absolute path starts with '/' in Linux/Mac, while in windows, it starts with / So, using arguments in to_csv with C:\Users\j\Downloads\data1\export.csv' will resolve your issue.
In addition, if you want to get rid of such situations, you can do this:
import os
path = os.path.join('.', 'export.csv') #will save the file in current directory
Also, this returns the os path separator:
print(os.sep)

Can't run any scripts, Traceback, FileNotFoundError: [Errno 2]

Earlier this week I put together my first ever working python script. I am trying to create another one in a different directory, but when I try to run any script at all (using Notepad++) I get an error that shows Python seems to be trying to access the old directory and not finding it, even though I haven't told it to look in the old directory. Now the original script doesn't work either. This is what the error message looks like, no matter what I try to run:
python "C:\Users\me\Documents\oldDirectory\oldScript.py"
Process started (PID=12884) >>>
Traceback (most recent call last):
File "C:\Users\me\Documents\oldDirectory\oldScript.py", line 13, in <module>
month = pd.read_csv(sheet)
File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 610, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 462, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 819, in __init__
self._engine = self._make_engine(self.engine)
File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 1050, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 1867, in __init__
self._open_handles(src, kwds)
File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 1362, in _open_handles
self.handles = get_handle(
File "C:\Users\me\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\common.py", line 642, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\me\Documents\oldDirectory\Table.csv'
<<< Process finished (PID=12884). (Exit code 1)
As you might guess from the error message, my original script used pd.read_csv() and accessed Table.csv. No matter what I try to run, I get this error. What's going on?
You can check again the path of you file to read. If it's not wrong, try without the '.csv' at the end of the file path.
The problem was with how I was using Notepad++ and not the code I was running. To run a script in that environment you press F6 and a dialog pops up asking what you want to execute. This does NOT default to whichever .py you have open at the moment, so when it hit ctrl+F6 to skip the dialog, it kept trying to run my old script.

'File does not exist' when passing a non-ASCII path to xarray.open_dataset

I have a problem when attempting to open a .nc file. For my college work I need to work with some data stored on .nc files, so I decided to give the 'xarray' library a go. The files are located on an OneDrive cloud. When passing the 'open_dataset' function a path that contains non-ASCII characters, the following error occurs:
import xarray as xr
path1 = (r'C:\Users\myname\OneDrive - Prirodoslovno-matematički fakultet'
'\DACCIWA\DATA\Sodar\Save_KIT_CM_20160702.nc')
ds = xr.open_dataset(path1)
Traceback (most recent call last):
File "C:\Users\petar\Desktop\Geofizika\5. Godina\KB - Research opportunity\test.py", line 9, in <module>
ds = xr.open_dataset(path1)
File "C:\Users\petar\Anaconda3\lib\site-packages\xarray\backends\api.py", line 499, in open_dataset
filename_or_obj, group=group, lock=lock, **backend_kwargs
File "C:\Users\petar\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py", line 389, in open
return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
File "C:\Users\petar\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py", line 335, in __init__
self.format = self.ds.data_model
File "C:\Users\petar\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py", line 398, in ds
return self._acquire()
File "C:\Users\petar\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py", line 392, in _acquire
with self._manager.acquire_context(needs_lock) as root:
File "C:\Users\petar\Anaconda3\lib\contextlib.py", line 112, in __enter__
return next(self.gen)
File "C:\Users\petar\Anaconda3\lib\site-packages\xarray\backends\file_manager.py", line 183, in acquire_context
file, cached = self._acquire_with_cache_info(needs_lock)
File "C:\Users\petar\Anaconda3\lib\site-packages\xarray\backends\file_manager.py", line 201, in _acquire_with_cache_info
file = self._opener(*self._args, **kwargs)
File "netCDF4\_netCDF4.pyx", line 2135, in netCDF4._netCDF4.Dataset.__init__
File "netCDF4\_netCDF4.pyx", line 1752, in netCDF4._netCDF4._ensure_nc_success
FileNotFoundError: [Errno 2] No such file or directory: b'C:\\Users\\petar\\OneDrive - Prirodoslovno-matemati\xc4\x8dki fakultet\\DACCIWA\\DATA\\Sodar\\Save_KIT_CM_20160702.nc'
I am confused since the file definitely is there (In the code above I replaced my name in the path with "myname", which does not contain non-ASCII characters). At first I thought this had to do something with OneDrive, but I created a folder on it with a path that does not contain non-ASCII characters, and it opens those no problem.
What I tried (although this was really just shotgunning, not familiar with encodings and such):
- input string as raw string (as you do to escape the slashes)
I noticed in the last line that the string path was preceeded by the letter "b", apparently this means that the string is a "byte literal" and can only contain ASCII characters, in which case why does xarray convert/interpret the string as a byte literal? How would I go about opening the file?
Thanks for help!

Pandas giving file not found error while trying to access csv via anaconda prompt

Beginner here. trying to load this table via python so i can figure out how i can manipulate it and gain some insight with the eventual intention of calculating the WOE and/or running a regression.
The command ran fine on a test db of two rows i created so it must be something to do with the format of the csv im trying to use. Its a file with 8000 customers and 50 associated variables including some dates and then counts, sums and averages for 30, 60 and 90 day windows of a number of different factors. Could any of this be the reason i get the error message at the bottom?
(* are just redaction's)
data = pd.read_csv("C:\Users\******\Desktop\*******.csv")
>>> data = pd.read_csv(r"C:\Users\******\Desktop\**************")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\******\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\******\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\******\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "C:\Users\******\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Users\******\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'C:\\Users\\******\\Desktop\\**************' does not exist: b'C:\\Users\\******\\Desktop\\**************'
....
add r(raw string) before ":
data = pd.read_csv(r"C:\Users******\Desktop*******.csv")
You should replace single backslash with double backslash, like so
data = pd.read_csv("C:\\Users******\\Desktop*******.csv")
or prefix path with r
data = pd.read_csv(r"C:\Users******\Desktop*******.csv")
See here for full description on which characters need escaping in python strings.
Its better to create a separate folder where keep both data and your csv file...
Then just read by only file name... Try to press tab when you are in parenthesis
because it will give you also suggestion where you will see if the file is available or not.
df = pd.read_csv('filename.csv)

How to write on top of pandas HDF5 'read-only mode' files?

I am storing data using pandas built-in HDF5 methods.
Somehow, these HDF5 files were turned into 'read-only' files, and I am getting a lot of Opening xxx in read-only mode messages when I open those files in write mode and I can't write them, which is something I really need to do.
The thing I really don't understand so far is how come those files turned into read-only, as I am not aware of a piece of code that I wrote that may result in that behavior. (I have tried to check if the data stored in the HDF5 is corrupt, but I am able to read it and manipulate it, so it seems to be working just fine)
I have 2 questions:
How can I append data to those 'read-only mode' HDF5 files? (Can I convert them back to write mode or any other clever solution?)
Is there any pandas method that would change the HDF5 file to a 'read-only mode' by default so I can avoid turning those files into read-only in the first place?
Code:
The piece of code that is raising this issue is, which is the piece I use to save the output I generated:
with pd.HDFStore('data/observer/' + self._currency + '_' + str(ts)) as hdf:
hdf.append(key='observers', value=df, format='table', data_columns=True)
I also use this piece of code to manipulate the outputs that were generated previously:
for the_file in list_dir:
if currency in the_file:
temp_df = pd.read_hdf(folder + the_file)
...
I use some select commands as well to get specific columns from the data files:
with pd.HDFStore('data/observer/' + self.currency + '_' + timestamp) as hdf:
df = hdf.select(key='observers', columns=[x, y])
Error Traceback:
File ".../data_processing/observer_data.py", line 52, in save_obs_to_pandas
hdf.append(key='observers', value=df, format='table', data_columns=True)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 963, in append
**kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 1341, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3930, in write
self.set_info()
File ".../venv/lib/python3.5/site-packages/pandas/io/pytables.py", line 3163, in set_info
self.attrs.info = self.info
File ".../venv/lib/python3.5/site-packages/tables/attributeset.py", line 464, in __setattr__
nodefile._check_writable()
File ".../venv/lib/python3.5/site-packages/tables/file.py", line 2119, in _check_writable
raise FileModeError("the file is not writable")
tables.exceptions.FileModeError: the file is not writable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../general_manager.py", line 144, in <module>
gm.run()
File ".../general_manager.py", line 114, in run
list_of_observer_managers = self.load_all_observer_managers()
File ".../general_manager.py", line 64, in load_all_observer_managers
observer = currency_pool.map(self.load_observer_manager, list_of_currencies)
File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
tables.exceptions.FileModeError: the file is not writable
The issue at hand was that I messed up with OS file permissions. The file I was trying to read belonged to the root (as I had run the code that generated those files with the root) and I was trying to access them with a user account.
I am running debian, and the following command (as root) solved my issues:
chown -R user.user folder
This commands recursively changes permissions of all files inside that folder to user.user.

Categories

Resources