Cannot convert the pdf file into the CSV file

Cannot convert the pdf file into the CSV file - python

I am a new python learner, I am struggling how to change the pdf file into CSV file by using Spyder.
Input
import tabula
dfs = tabula.read_pdf(r'C:\Users\home\Desktop\RN(G)_GazetteList.pdf.pdf', pages='all')
tabula.convert_into(r'\C:\Users\home\Desktop\RN(G)_GazetteList.pdf.pdf', "output.csv", output_format="csv", pages='all')
tabula.convert_into_by_batch("input_directory", output_format='csv', pages='all')
Output
The output file is empty.
Traceback (most recent call last):
File "C:\Users\home\.spyder-py3\temp.py", line 8, in <module>
tabula.convert_into(r'\C:\Users\home\Desktop\RN(G)_GazetteList.pdf.pdf', "output.csv", output_format="csv", pages='all')
File "C:\Users\home\anaconda3\lib\site-packages\tabula\wrapper.py", line 273, in convert_into
raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), path)
FileNotFoundError: [Errno 2] No such file or directory: '\\C:\\Users\\home\\Desktop\\RN(G)_GazetteList.pdf.pdf'
Thank you so much

As a possible answer to serve as a reference. You could use pdftables_api:
import pdftables_api
conversion = pdftables_api.Client('key')
conversion.csv('pdf_path','output_path')

Related

Invalid argument error when using f string in path of DataFrame.to_csv()

I want to write pandas dataframe to a csv file every 10 secs. The csv file name includes the current timestamp. Here is part of the code:
import pandas as pd
import time
while True:
df = pd.read_sql_query('select * from dbo.tbl_tag_values', cnxn)
t = time.localtime()
current_time = time.strftime('%Y-%m-%dT%H:%M:%S',t)
csv_path =f'C:/Users/00_Projects/App/data-{current_time}.csv'
df.to_csv(csv_path)
time.sleep(10)
Without using f-string and a static file name, the script works fine but with the f-string I get the error:
Traceback (most recent call last):
File "c:\Users\00_Projects\App\script.py", line 24, in <module>
df.to_csv(csv_path)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3466, in to_csv
return DataFrameRenderer(formatter).to_csv(
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\format.py", line 1105, in to_csv
csv_formatter.save()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 237, in save
with get_handle(
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\common.py", line 702, in get_handle
handle = open(
OSError: [Errno 22] Invalid argument: 'C:/Users/00_Projects/App/data-2023-01-02T15:33:19.csv'
I read this post How to use f string in a path location and tried Path from pathlib but got the same error.
My OS is windows.
Thanks for any help!

I'm trying to read in a csv file in pycharm that I downloaded on my mac and when I run my code it says no such file exists

#import necessary modules
import csv
with open ("C:/iCloud Drive/Desktop/Python/o.csv") as f:
data = csv.reader(f)
for row in data:
print(row)
Heres the error message
Traceback (most recent call last):
File "/Users/noahhenninger/PycharmProjects/Refactoring/main.py", line 3, in <module>
with open ("C:/iCloud Drive/Desktop/Python/o.csv") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/iCloud Drive/Desktop/Python/o.csv'
I tried adding more specific info about the location of the file but that didn't seem to work.

Pandas read_pickle from s3 bucket

I am working on a Jupyter notebook from AWS EMR.
I am able to do this:
pd.read_csv("s3:\\mypath\\xyz.csv').
However, if I try to open a pickle file like this, pd.read_pickle("s3:\\mypath\\xyz.pkl")
I am getting this error:
[Errno 2] No such file or directory: 's3://pvarma1/users/users/candidate_users.pkl'
Traceback (most recent call last):
File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 179, in read_pickle
return try_read(path)
File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 177, in try_read
lambda f: pc.load(f, encoding=encoding, compat=True))
File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 146, in read_wrapper
is_text=False)
File "/usr/local/lib64/python2.7/site-packages/pandas/io/common.py", line 421, in _get_handle
f = open(path_or_buf, mode)
IOError: [Errno 2] No such file or d
However, I can see both xyz.csv and xyz.pkl in the same path! Can anyone help?

Pandas read_pickle supports only local paths, unlike read_csv. So you should be copying the pickle file to your machine before reading it in pandas.

Since read_pickle does not support this, you can use smart_open:
from smart_open import open
s3_file_name = "s3://bucket/key"
with open(s3_file_name, 'rb') as f:
df = pd.read_pickle(f)

Unable to open xlsx file with xlrd

I am getting an error file is not support in xlrd-0.7.1.
The file is saved in xlsx format
Traceback (most recent call last):
File "C:\Users\jawed\workspace\test\Excelproject.py", line 8, in <module>
workbook=xlrd.open_workbook(file_location)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 425, in open_workbook
on_demand=on_demand,
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 878, in biff2_8_load
f = open(filename, open_mode)
IOError: [Errno 2] No such file or directory: 'C:\\Users\\jawed\\workspace\\IAMarks.xls'

The file doesn't exist.
Check the location of the file before calling the function:
import os
if os.path.isfile(file_location):
workbook = xlrd.open_workbook(file_location)
else:
# tell the user they've done something wrong
A possibly more Pythonic way to do it (see EAFP) is in a try/except block:
try:
workbook = xlrd.open_workbook(file_location)
except IOError as error:
print(error)
# tell the user they've done something wrong

Saving Workbook in Python using xlwt gives error

This problem has been happening to me a lot lately.
When I run this code:
import xlwt
wb = xlwt.Workbook()
sheet = wb.add_sheet("Random")
sheet.write(0,0,"hello!")
wb.save("test.xls")
It gives me the error:
Traceback (most recent call last):
File "<module2>", line 16, in <module>
File "C:\Python27\lib\site-packages\xlwt\Workbook.py", line 662, in save
doc.save(filename, self.get_biff_data())
File "C:\Python27\lib\site-packages\xlwt\CompoundDoc.py", line 261, in save
f = open(file_name_or_filelike_obj, 'w+b')
IOError: [Errno 13] Permission denied: 'test.xls'
I've searched for the answer, but I couldn't find it.
Any help would be greatly appreciated!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot convert the pdf file into the CSV file - python

As a possible answer to serve as a reference. You could use pdftables_api: import pdftables_api conversion = pdftables_api.Client('key') conversion.csv('pdf_path','output_path')

Related

Invalid argument error when using f string in path of DataFrame.to_csv()

I'm trying to read in a csv file in pycharm that I downloaded on my mac and when I run my code it says no such file exists

Pandas read_pickle from s3 bucket

Unable to open xlsx file with xlrd

Saving Workbook in Python using xlwt gives error

Categories

Resources