File writing is not working with pyPdf?

File writing is not working with pyPdf? - python

I am newer to python. I was try open the pdf files and write its content into the
new text files. That the text files name are generate by the pdf name. I tried so far but it is not give what i expect. How can i achieve it
import glob, os
import pyPdf
os.chdir("pdf/")
for file in glob.glob("*.pdf"):
filena = file
filename = "c:/documents/"+filena+".txt"
target = open(filename,'w')
pdf = pyPdf.PdfFileReader(open(filena,"rb"))
for page in pdf.pages:
target.write (page.extractText())
target.close()
Results the Error
File "c:/documents/atpkinase.pdf.txt",line 7, in <module>
target = open(filename,'w')
IOError: [Errno 2] No such file or directory: "c:/documents/atpkinase.pdf.txt"

Looks like if the directory "c:/documents/" does not exist. To write file to it you must create directory first. To check directory existent (and create it if needed) you can use
dir = "c:/documents"
if not os.path.exists(dir):
os.makedirs(dir)
Also, filea contains file name with extension, and when you create filename you need only a file name of old file without extension.

Related

how can i open a file which is in a zip file

I want to open a html file and that html file is in a zip file(both name is same) and i'm trying to open that html file.
old_file = input("DRAG:") #dir C:\Users\GG\PycharmProjects\pythonProject\f1dbef77-342b-4026-85d8-7f30fe691a63_f.zip
file_parts = old_file.split(".") #[C:\Users\GG\PycharmProjects\pythonProject\f1dbef77-342b-4026-85d8-7f30fe691a63_f] [zip]
first= file_parts[0]
direcs = first.split("\\")
file_itself = direcs[-1] # the file name that i need to use
last = file_parts[1]
file = open(f'{first}.zip\\{file_itself}.html', encoding="UTF-8").read()

You should first unzip the archive in a temporary folder, then you should open the file from there, and when everything is done, you may delete the folder in which you have extracted your data.
You may use python ZipFile as library and the extract() call to unzip your html.
See ZipFile Docs

Getting FileNotFoundError when trying to open a file for reading in Python 3

I am using the OS module to open a file for reading, but I'm getting a FileNotFoundError.
I am trying to
find all the files in a given sub-directory that contain the word "mda"
for each of those files, grab the string in the filename just after two "_"s (indicates a specific code called an SIC)
open that file for reading
will write to a master file for some Mapreduce processing later
When I try to do the opening, I get the following error:
File "parse_mda_SIC.py", line 16, in <module>
f = open(file, 'r')
FileNotFoundError: [Errno 2] No such file or directory:
'mda_3357_2017-03-08_1000230_000143774917004005__3357.txt'
I am suspicious the issue is either with the "file" variable or the fact that it is one directory down, but confused why this would occur when I am using OS to address that lower directory.
I have the following code :
working_dir = "data/"
for file in os.listdir(working_dir):
if (file.find("mda") != -1):
SIC = re.findall("__(\d+)", file)
f = open(file, 'r')
I would expect to be able to open the file without issue and then create my list from the data. Thanks for your help.

This should work for you. You need to append the directory because it sees it as just the file name at the top of your code and will look only in the directory where your code is located for that file name.
for file in os.listdir(working_dir):
if (file.find("mda") != -1):
SIC = re.findall("__(\d+)", file)
f = open(os.path.join(working_dir, file), 'r')
Also it's a good practice to open files using a context manager of with as it will handle closing your file when it is no longer needed:
for file in os.listdir(working_dir):
if (file.find("mda") != -1):
SIC = re.findall("__(\d+)", file)
with open(os.path.join(working_dir, file), 'r') as f:
# do stuff with f here

You need to append the directory, like this:
f = open(os.path.join(working_dir, file, 'r'))

reading all the files in a directory with specific extension using glob python

I have a directory with further sub directories each having files with a specific extension. I am able to get the names of all the files using glob function:
for name in glob.glob('*/*[a-b]*'):
print(os.path.basename(name))
that prints the name of files I have in all the sub directories:
PF44_aa
PF52_aa
PF95_aa
PF38_aa
PF45_aa
PF63_aa
PF68_aa
PF39_aa
However, if I pass these file names as arguments to open the files and read the contents:
for name in glob.glob('*/*[a-b]*'):
filename=os.path.basename(name)
with open('%s' %filename) as fn:
content = fn.readlines()
I get the following error:
File "<ipython-input-194-147f38fc2684>", line 1, in <module>
with open('%s' %filename) as fn:
FileNotFoundError: [Errno 2] No such file or directory: 'PF44_aa'
I also tried giving the filename directly as input instead of %s:
for name in glob.glob('*/*[a-b]*'):
filename=os.path.basename(name)
with open(filename) as fn:
content = fn.readlines()
But still got the same error:
File "<ipython-input-193-fb125b5aa813>", line 1, in <module>
with open(filename) as fn:
FileNotFoundError: [Errno 2] No such file or directory: 'PF44_aa'
What am I doing wrong and how can I fix this?

You have to use complete path of the file to open it, you can't use just filename unless if its on the same directory as your python file. So you have to do little change in your script to make it work.
for name in glob.glob('*/*[a-b]*'):
with open(name) as fn:
content = fn.readlines()
filename is replaced by name.
here, "name" is complete path to your file.

Alternative method...
Start by first importing:
import shutil
import os
Then assign the directory to a list:
file_list = []
file_list = os.listdir('C:/filepath')
Now distinguish between files:
files = []
files = [x for x in file_list if "_aa" in x]
Now you can open and read the files in the files list using your method.
however do:
filepath + filename
with open(filepath + filename) as fn:
content = fn.readlines()
Currently you're just trying to open the file with its name, you need to include the full file path...
outcome:
"C:/filepath/PF44_aa"

How to open a folder loop through opening other files within that folder in python

This current question is building on from this question.
I am trying to create a python script that will loop through all the text files in the specified folder. The text files contain directories to files that will be moved to a different specified folder. When looping through a text file, it takes the file from the file directory on each line of that text file.
The end goal is to have all the files which are referenced in the text file to move into one specified folder (\1855).
import shutil
dst = r"C:/Users/Aydan/Desktop/1855"
with open(r'C:\Users\Aydan\Desktop\RTHPython\Years') as my_folder:
for filename in my_folder:
text_file_name = filename.strip()
with open (text_file_name) as my_file:
for filename in my_file:
file_name = filename.strip()
src = r'C:\Users\Aydan\Desktop' + file_name
shutil.move(src, dst)
One text file (1855.txt) contains:
/data01/BL/ER/D11/fmp000005578/BL_ER_D11_fmp000005578_0001_1.txt
/data01/BL/ER/D11/fmp000005578/BL_ER_D11_fmp000005578_0002_1.txt
/data01/BL/ER/D11/fmp000005578/BL_ER_D11_fmp000005578_0003_1.txt
and another text file (1856.txt) contains:
/data01/BL/ER/D11/fmp000005578/BL_ER_D11_fmp000005578_0004_1.txt
/data01/BL/ER/D11/fmp000005578/BL_ER_D11_fmp000005578_0005_1.txt
/data01/BL/ER/D11/fmp000005578/BL_ER_D11_fmp000005578_0006_1.txt
This is the error I get when I run the above script:
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
with open(r'C:\Users\Aydan\Desktop\RTHPython\Years') as my_folder:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Aydan\\Desktop\\RTHPython\\Years'
This script doesn't seem to be moving the files named here to the C:/Users/Aydan/Desktop/1855 destination, even though in the script I'm trying to follow the same logic of iterating through each item in the text file, but applying that logic to a folder instead of inside text file.
Any help to find a solution would be brilliant! If you need any more info about the files just ask.
Thanks!
Aydan.

Since you can't open whole folders with the open method, you can get cycle through every .txt file in that folder like that:
import shutil
import glob
dst = r"C:/Users/Aydan/Desktop/1855"
for filename in glob.glob(r"C:\Users\Aydan\Desktop\RTHPython\Years\*.txt"):
text_file_name = filename.strip()
with open (text_file_name) as my_file:
for filename in my_file:
file_name = filename.strip()
src = r'C:\Users\Aydan\Desktop' + file_name
shutil.move(src, dst)

I want to process every file inside a folder line by line and get a particular matching string

I am trying to process every files inside a folder line by line. I need to check for a particular string and write into an excel sheet. Using my code, if i explicitly give the file name, the code will work. If I try to get all the files, then it throws an IOError. The code which I wrote is as below.
import os
def test_extract_programid():
folder = 'C://Work//Scripts//CMDC_Analysis//logs'
for filename in os.listdir(folder):
print filename
with open(filename, 'r') as fo:
strings = ("/uri")
<conditions>
for line in fo:
if strings in line:
<conditions>
I think the error is that the file is already opened when the for loop started but i am not sure. printing the file name prints the file name correctly.
The error shown is IOError: [Errno 2] No such file or directory:

if your working directory is not the same as folder, then you need to give open the path the the file as well:
with open(folder+'/'+filename, 'r') as fo
Alternatively, you can use glob
import glob
for filename in glob.glob(folder+'/*'):
print filename

It can't open the path. You should do
for filename in os.listdir(folder):
print folder+os.sep()+filename

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

File writing is not working with pyPdf? - python

Related

how can i open a file which is in a zip file

Getting FileNotFoundError when trying to open a file for reading in Python 3

reading all the files in a directory with specific extension using glob python

How to open a folder loop through opening other files within that folder in python

I want to process every file inside a folder line by line and get a particular matching string

Categories

Resources