Python: Files stop being opened at a certain point - python

I've written the following program in Python:
import re
import os
import string
folder = 'C:\Users\Jodie\Documents\Uni\Final Year Project\All Data'
folderlisting = os.listdir(folder)
for eachfolder in folderlisting:
print eachfolder
if os.path.isdir(folder + '\\' + eachfolder):
filelisting = os.listdir('C:\Users\Jodie\Documents\Uni\Final Year Project\All Data\\' + eachfolder)
print filelisting
for eachfile in filelisting:
if re.search('.genbank.txt$', eachfile):
genbankfile = open(eachfile, 'r')
print genbankfile
if re.search('.alleles.txt$', eachfile):
allelesfile = open(eachfile, 'r')
print allelesfile
It looks through a lot of folders, and prints the following:
The name of each folder, without the path
A list of all files in each folder
Two specific files in each folder (Any files containing ".genbank.txt" and ".alleles.txt").
The code works until it reaches a certain directory, and then fails with the following error:
Traceback (most recent call last):
File "C:\Users\Jodie\Documents\Uni\Final Year Project\Altering Frequency Practice\Change_freq_data.py", line 16, in <module>
genbankfile = open(eachfile, 'r')
IOError: [Errno 2] No such file or directory: 'ABP1.genbank.txt'
The problem is:
That file most definitely exists, since the program lists it before it tries to open the file.
Even if I take that directory out of the original group of directories, the program throws up the same error for the next folder it iterates to. And the next, if that one is removed. And so on.
This makes me think that it's not the folder or any files in it, but some limitation of Python? I have no idea. It has stumped me.
Any help would be appreciated.

You should use os.walk() http://docs.python.org/library/os.html#os.walk
Also, you need to read the contents of the file, you don't want to print the file object. And you need to close the file when you're done or use a context manager to close it for you.
would look something like:
for root, dirs, files in os.walk(folder):
for file_name in files:
if re.search('.genbank.txt$', file_name) or \
re.search('.alleles.txt$', file_name):
with open(os.path.join(root, f), 'r') as f:
print f.read()
Keep in mind this is not 'exactly' what you're doing, this will walk the entire tree, you may just want to walk a single level like you are already doing.

Related

Renaming files with python in specific format

I'm trying to rename files in one folder, in the pattern 0001, 0002, 0010, 0100 etc. I'm very very new to python, so sorry for asking something so basic.
I've searched around, and most of the code I come across will rename files (not how I want it) or strip out certain characters. I've also come across code which uses extra modules (glob) which only take me further down the rabbit hole. Most of what I see just makes my head spin; at the moment my skills don't go beyond simple functions, if, when, for, while statements and so on.
I've cobbled together some code, that I (somewhat) understand, but it doesn't work.
import os
dir = os.listdir("D:\\temp\\Wallpapers")
i = 0
for item in dir:
dst ="000" + str(i) + ".jpg"
src = item
dst = item + dst
# rename() function will
# rename all the files
os.rename(src, dst)
i += 1
This is the error I get:
Traceback (most recent call last):
File "rename.py", line 14, in <module>
os.rename(src, dst)
FileNotFoundError: [WinError 2] The system cannot find the file specified: '00-Pyatna.jpg' -> '0000.jpg'
It doesn't work because you probably are not in the proper directory and you are trying to find those files in the directory you are located right now. You should do it using absolute paths. See the following code
import os
base_path = "D:/temp/Wallpapers"
files = os.listdir(base_path)
for i, fp in enumerate(files):
dst = os.path.join(base_path, "{0:04d}.jpg".format(i))
src = os.path.join(base_path, fp)
os.rename(src, dst)
Firstly, you can retrieve the maximum number already present in your folder with the following function
import re
def max_counter_in_files(folder):
files = os.listdir(folder)
maxnum = '0'
if files:
maxnum = max([max(re.findall("[\d]+", file)) for file in files])
return maxnum
For example, if your folder contains the files
file001.txt
file002.txt
file003.txt
then max_counter_in_files('path/to/your/files') would return 3.
Secondly, you can use that function to increment your next filename when adding new files. for example,
counter = int(self.max_counter_in_files(dest_path))
filename = f"filename{counter+1:04d}.txt"
filename would then be "filename0004.txt".

Getting FileNotFoundError when trying to open a file for reading in Python 3

I am using the OS module to open a file for reading, but I'm getting a FileNotFoundError.
I am trying to
find all the files in a given sub-directory that contain the word "mda"
for each of those files, grab the string in the filename just after two "_"s (indicates a specific code called an SIC)
open that file for reading
will write to a master file for some Mapreduce processing later
When I try to do the opening, I get the following error:
File "parse_mda_SIC.py", line 16, in <module>
f = open(file, 'r')
FileNotFoundError: [Errno 2] No such file or directory:
'mda_3357_2017-03-08_1000230_000143774917004005__3357.txt'
I am suspicious the issue is either with the "file" variable or the fact that it is one directory down, but confused why this would occur when I am using OS to address that lower directory.
I have the following code :
working_dir = "data/"
for file in os.listdir(working_dir):
if (file.find("mda") != -1):
SIC = re.findall("__(\d+)", file)
f = open(file, 'r')
I would expect to be able to open the file without issue and then create my list from the data. Thanks for your help.
This should work for you. You need to append the directory because it sees it as just the file name at the top of your code and will look only in the directory where your code is located for that file name.
for file in os.listdir(working_dir):
if (file.find("mda") != -1):
SIC = re.findall("__(\d+)", file)
f = open(os.path.join(working_dir, file), 'r')
Also it's a good practice to open files using a context manager of with as it will handle closing your file when it is no longer needed:
for file in os.listdir(working_dir):
if (file.find("mda") != -1):
SIC = re.findall("__(\d+)", file)
with open(os.path.join(working_dir, file), 'r') as f:
# do stuff with f here
You need to append the directory, like this:
f = open(os.path.join(working_dir, file, 'r'))

I want to process every file inside a folder line by line and get a particular matching string

I am trying to process every files inside a folder line by line. I need to check for a particular string and write into an excel sheet. Using my code, if i explicitly give the file name, the code will work. If I try to get all the files, then it throws an IOError. The code which I wrote is as below.
import os
def test_extract_programid():
folder = 'C://Work//Scripts//CMDC_Analysis//logs'
for filename in os.listdir(folder):
print filename
with open(filename, 'r') as fo:
strings = ("/uri")
<conditions>
for line in fo:
if strings in line:
<conditions>
I think the error is that the file is already opened when the for loop started but i am not sure. printing the file name prints the file name correctly.
The error shown is IOError: [Errno 2] No such file or directory:
if your working directory is not the same as folder, then you need to give open the path the the file as well:
with open(folder+'/'+filename, 'r') as fo
Alternatively, you can use glob
import glob
for filename in glob.glob(folder+'/*'):
print filename
It can't open the path. You should do
for filename in os.listdir(folder):
print folder+os.sep()+filename

Python - File does not exist error

I'm trying to do a couple things here with the script below (it is incomplete). The first thing is to loop through some subdirectories. I was able to do that successfully. The second thing was to open a specific file (it is the same name in each subdirectory) and find the minimum and maximum value in each column EXCEPT the first.
Right now I'm stuck on finding the max value in a single column because the files I'm reading have two rows which I want to ignore. Unfortunately, I'm getting the following error when attempting to run the code:
Traceback (most recent call last):
File "test_script.py", line 22, in <module>
with open(file) as f:
IOError: [Errno 2] No such file or directory: 'tc.out'
Here is the current state of my code:
import scipy as sp
import os
rootdir = 'mydir'; #mydir has been changed from the actual directory path
data = []
for root, dirs, files in os.walk(rootdir):
for file in files:
if file == "tc.out":
with open(file) as f:
for line in itertools.islice(f,3,None):
for line in file:
fields = line.split()
rowdata = map(float, fields)
data.extend(rowdata)
print 'Maximum: ', max(data)
To open a file you need to specify full path. You need to change the line
with open(file) as f:
to
with open(os.path.join(root, file)) as f:
When you write open(file), Python is trying to find the the file tc.out in the directory where you started the interpreter from. You should use the full path to that file in open:
with open(os.path.join(root, file)) as f:
Let me illustrate with an example:
I have a file named 'somefile.txt' in the directory /tmp/sto/deep/ (this is a Unix system, so I use forward slashes). And then I have this simple script which resides in the directory /tmp:
oliver#armstrong:/tmp$ cat myscript.py
import os
rootdir = '/tmp'
for root, dirs, files in os.walk(rootdir):
for fname in files:
if fname == 'somefile.txt':
with open(os.path.join(root, fname)) as f:
print('Filename: %s' % fname)
print('directory: %s' % root)
print(f.read())
When I execute this script from the /tmp directory, you'll see that fname is just the filename, the path leading to it is ommitted. That's why you need to join it with the first returned argument from os.walk.
oliver#armstrong:/tmp$ python myscript.py
Filename: somefile.txt
directory: /tmp/sto/deep
contents

Python - IOError when running through folder tree

I am trying to read in a series of DICOM files in a folder tree and I am using the below code to run through the tree, reading in each file as I go. The problem is I am getting IOErrors for files that definitely exist, I have checked file permissions and other SO threads such as Python: IOError: [Errno 2] No such file or directory but I haven't managed to get it working without these IOErrors yet. Does anyone have any ideas?
for root, dirs, files in os.walk(path):
for fname in files:
name = os.path.basename(os.path.abspath(fname))
if name.startswith('.') == True:
pass
else:
try:
plan=dicom.read_file(fname)
ds=dicom.read_file(fname, stop_before_pixels = True)
kVp = TagChecker([0x18,0x60]) #KVP
Target = TagChecker([0x18,0x1191]) #ANODE
Filter = TagChecker([0x18,0x7050]) #
write_results.writerow([Survey_Number, Patient_ID, View_Protocol, int(kVp), Target, Filter, Thickness, mAs_Exposure, LPad_Yes_No, autoorman, AECMode, AECDset, Patient_Age, Comment, Compression_Force])
#print(fname)
except IOError:
print "IOError: ", "//" + os.path.join(root, fname) + "//"
except InvalidDicomError:
# This exception line prints an error message to the command line, checks to see if an error log
# has been generated for this session, writes a new one if not and then writes the error to the log file
print "Invalid Dicom File: ", fname
Usually a method that takes a filename, like dicom.read_file(fname), will take an absolute filename (or assume that the filename is relative to the dir that your main python program is running in, the cwd()). Can I suggest that you put this line in front of the first read_file() call:
print "reading: %s" % os.path.abspath(fname)
Then you'll see the filename that you're actually trying to read. I'm guessing it's not the file (or droids) you think you're looking for.
In order to fix your problem.. join the dir and the fname before you read.. e.g.
full_fname = os.path.join(dir, fname)
dicom.read_file(full_fname)
In other words, I think you're reading files with relative paths and you want to be using absolute paths.

Categories

Resources