Why can't I access the excel file in python? - python

I'm trying to use some data that I have in an excel file. However, I'm getting an error saying that it doesn't find the file. I've looked up and the directory and the file name are correct, What am I doing wrong?
Here is the code:
import os
import pandas as pd
print(os.getcwd())
df = pd.read_excel(r'C:/Users/Eder/Desktop/TFG/Data/Interpolation_sample.xlsx',
index_col =0,parse_dates=True, sheet_name='sheet3')
And the answer from the console:
runcell(0, 'C:/Users/Eder/untitled0.py')
C:\Users\Eder\Desktop\TFG\Data
Traceback (most recent call last):
File "C:\Users\Eder\untitled0.py", line 14, in <module>
index_col =0,parse_dates=True, sheet_name='sheet3')
File "E:\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "E:\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "E:\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 1072, in __init__
content=path_or_buffer, storage_options=storage_options
File "E:\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 950, in inspect_excel_format
content_or_path, "rb", storage_options=storage_options, is_text=False
File "E:\Anaconda3\lib\site-packages\pandas\io\common.py", line 651, in get_handle
handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Eder\\Desktop\\TFG\\Data\\Interpolation_sample.xlsx'

I've figured out a way to solve the problem. I just changed the name of the file from 'Interpolation_sample' to 'Interpolation sample'. I don't know why, but the underscore in the file name is what was causing this error.

Related

ValueError: Worksheet index 0 is invalid, 0 worksheets found.Cannot open xlsx with pandas in python

Could someone help me figure out why my files dont open.
import pandas as pd
file = "C://Dev//20211103_logfile Box 2.8.xlsx"
temp=pd.read_excel(file)
Here is the full error!
PS C:\Dev> & C:/Users/keyur/AppData/Local/Programs/Python/Python39/python.exe c:/Dev/test_excel.py
C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\reader\workbook.py:88:
UserWarning: File contains an invalid specification for 20211103_logfile. This will be removed
warn(msg)
Traceback (most recent call last):
File "c:\Dev\test_excel.py", line 6, in <module>
temp=pd.read_excel(file)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 372, in read_excel
data = io.parse(
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 1272, in parse
return self._reader.parse(
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 537, in parse
sheet = self.get_sheet_by_index(asheetname)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_openpyxl.py", line 546, in get_sheet_by_index
self.raise_if_bad_sheet_by_index(index)
File "C:\Users\keyur\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 468, in raise_if_bad_sheet_by_index
raise ValueError(
ValueError: Worksheet index 0 is invalid, 0 worksheets found
PS C:\Dev>
There are problem with your excel,
try make a new excel and copy pase all data ,then try again ,this method works for me.

Python Pandas Errno 13 while saving dataframe to xlsx

I am slowly loosing my mind. I'm trying to save pandas dataframe to xlsx.
save_path = self.SAVE_L['text']+ f"\df_{part}_.xlsx"
writer = pandas.ExcelWriter(save_path , engine='xlsxwriter',date_format='DD/MM/YYYY',datetime_format='DD/MM/YYYY')
df.to_excel(writer , index=False, sheet_name='df', engine='xlsxwriter')
writer.save()
However, for bigger df (but still in range of excel files - around 550 000 lines) I'm getting this error:
//NETWORK_DISK/save_path\df_part1_.xlsx
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\tkinter\__init__.py", line 1705, in __call__
return self.func(*args)
File "PATH to script/Prog 0.31.py", line 460, in Process
scr_t=self.scr_list)
File "PATH to script/Prog 0.31.py", line 829, in script_PROC
writer.save()
File "C:\Program Files\Python37\lib\site-packages\pandas\io\excel.py", line 1952, in save
return self.book.close()
File "C:\Program Files\Python37\lib\site-packages\xlsxwriter\workbook.py", line 310, in close
self._store_workbook()
File "C:\Program Files\Python37\lib\site-packages\xlsxwriter\workbook.py", line 636, in _store_workbook
xlsx_file.write(os_filename, xml_filename)
File "C:\Program Files\Python37\lib\zipfile.py", line 1746, in write
with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Pandriej\\AppData\\Local\\Temp\\tmpfi53pba3'
I can save it to .csv without any problem. I also can divide it into smaller chunks and save them individually. The problem is I need the entire output to be saved as xlsx.
Is there any way I can fix this? Make Pandas actually save the file?

PermissionError while using pandas .to_csv function

I have this code to read .txt file from s3 and convert this file to .csv using pandas:
file = pd.read_csv(f's3://{bucket_name}/{bucket_key}', sep=':', error_bad_lines=False)
file.to_csv(f's3://{bucket_name}/file_name.csv')
I have provided read write permission to IAM role but still this errors comes for the .to_csv function:
Anonymous access is forbidden for this operation: PermissionError
update: full error in ec2 logs is:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/s3fs/core.py", line 446, in _mkdir
await self.s3.create_bucket(**params)
File "/usr/local/lib/python3.6/dist-packages/aiobotocore/client.py", line 134, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateBucket operation: Anonymous access is forbidden for this operation
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "convert_file_instance.py", line 92, in <module>
main()
File "convert_file_instance.py", line 36, in main
raise e
File "convert_file_instance.py", line 30, in main
file.to_csv(f's3://{bucket_name}/file_name.csv')
File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 3165, in to_csv
decimal=decimal,
File "/usr/local/lib/python3.6/dist-packages/pandas/io/formats/csvs.py", line 67, in __init__
path_or_buf, encoding=encoding, compression=compression, mode=mode
File "/usr/local/lib/python3.6/dist-packages/pandas/io/common.py", line 233, in get_filepath_or_buffer
filepath_or_buffer, mode=mode or "rb", **(storage_options or {})
File "/usr/local/lib/python3.6/dist-packages/fsspec/core.py", line 399, in open
**kwargs
File "/usr/local/lib/python3.6/dist-packages/fsspec/core.py", line 254, in open_files
[fs.makedirs(parent, exist_ok=True) for parent in parents]
File "/usr/local/lib/python3.6/dist-packages/fsspec/core.py", line 254, in <listcomp>
[fs.makedirs(parent, exist_ok=True) for parent in parents]
File "/usr/local/lib/python3.6/dist-packages/s3fs/core.py", line 460, in makedirs
self.mkdir(path, create_parents=True)
File "/usr/local/lib/python3.6/dist-packages/fsspec/asyn.py", line 100, in wrapper
return maybe_sync(func, self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/fsspec/asyn.py", line 80, in maybe_sync
return sync(loop, func, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/fsspec/asyn.py", line 51, in sync
raise exc.with_traceback(tb)
File "/usr/local/lib/python3.6/dist-packages/fsspec/asyn.py", line 35, in f
result[0] = await future
File "/usr/local/lib/python3.6/dist-packages/s3fs/core.py", line 450, in _mkdir
raise translate_boto_error(e) from e
PermissionError: Anonymous access is forbidden for this operation
I don't know why is it trying to create bucket?
and I have provided full access of s3 to lambda role
Can someone please tell me what i'm missing here?
Thank you.
I think this is due to an incompatibility between the pandas, boto3 and s3fs libraries.
Try this setup:
pandas==1.0.3
boto3==1.13.11
s3fs==0.4.2
I also tried with pandas version 1.1.1, it works too (I work with Python 3.7)

Reading in csv file as dataframe from hdfs

I'm using pydoop to read in a file from hdfs, and when I use:
import pydoop.hdfs as hd
with hd.open("/home/file.csv") as f:
print f.read()
It shows me the file in stdout.
Is there any way for me to read in this file as dataframe? I've tried using pandas' read_csv("/home/file.csv"), but it tells me that the file cannot be found. The exact code and error is:
>>> import pandas as pd
>>> pd.read_csv("/home/file.csv")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 275, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 590, in __init__
self._make_engine(self.engine)
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 731, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 1103, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 353, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3246)
File "pandas/parser.pyx", line 591, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6111)
IOError: File /home/file.csv does not exist
I know next to nothing about hdfs, but I wonder if the following might work:
with hd.open("/home/file.csv") as f:
df = pd.read_csv(f)
I assume read_csv works with a file handle, or in fact any iterable that will feed it lines. I know the numpy csv readers do.
pd.read_csv("/home/file.csv") would work if the regular Python file open works - i.e. it reads the file a regular local file.
with open("/home/file.csv") as f:
print f.read()
But evidently hd.open is using some other location or protocol, so the file is not local. If my suggestion doesn't work, then you (or we) need to dig more into the hdfs documentation.
you can use the following code to read csv from hdfs
import pandas as pd
import pyarrow as pa
hdfs_config = {
"host" : "XXX.XXX.XXX.XXX",
"port" : 8020,
"user" : "user"
}
fs = pa.hdfs.connect(hdfs_config['host'], hdfs_config['port'],
user=hdfs_config['user'])
df=pd.read_csv(fs.open("/home/file.csv"))
Use read instead open, it works
with hd.read("/home/file.csv") as f:
df = pd.read_csv(f)

Python Error when reading data from .xls file

I need to read a few xls files into Python.The sample data file can be found through Link:data.file. I tried:
import pandas as pd
pd.read_excel('data.xls',sheet=1)
But it gives an error message:
ERROR *** codepage 21010 -> encoding 'unknown_codepage_21010' ->
LookupError: unknown encoding: unknown_codepage_21010 Traceback (most
recent call last):
File "", line 1, in
pd.read_excel('data.xls',sheet=1)
File "C:\Anaconda3\lib\site-packages\pandas\io\excel.py", line 113,
in read_excel
return ExcelFile(io, engine=engine).parse(sheetname=sheetname, **kwds)
File "C:\Anaconda3\lib\site-packages\pandas\io\excel.py", line 150,
in init
self.book = xlrd.open_workbook(io)
File "C:\Anaconda3\lib\site-packages\xlrd__init__.py", line 435, in
open_workbook
ragged_rows=ragged_rows,
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 116, in
open_workbook_xls
bk.parse_globals()
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 1170, in
parse_globals
self.handle_codepage(data)
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 794, in
handle_codepage
self.derive_encoding()
File "C:\Anaconda3\lib\site-packages\xlrd\book.py", line 775, in
derive_encoding
_unused = unicode(b'trial', self.encoding)
File "C:\Anaconda3\lib\site-packages\xlrd\timemachine.py", line 30,
in
unicode = lambda b, enc: b.decode(enc)
LookupError: unknown encoding: unknown_codepage_21010
Anyone could help with this problem?
PS: I know if I open the file in windows excel, and resave it, the code could work, but I am looking for a solution without manual adjustment.
using the ExcelFile class, I was successfully able to read the file into python.
let me know if this helps!
import xlrd
import pandas as pd
xls = pd.ExcelFile(’C:\data.xls’)
xls.parse(’Index Constituents Data’, index_col=None, na_values=[’NA’])
The below worked for me.
import xlrd
my_xls = xlrd.open_workbook('//myshareddrive/something/test.xls',encoding_override="gb2312")

Categories

Resources