UnboundLocalError in multiprocessing in python

UnboundLocalError in multiprocessing in python - python

I have a long code of calculations which takes parameters in .txt file, after calculations creates a new .txt file and save there a data. Here, for example, I show only the small first part of it where I construct the desired file name, look for it in the directory, open it read the parameters.
def target_func(k):
dir_path = os.getcwd()
for file in os.listdir(dir_path):
if file.endswith(".txt") and 'Focuses{}'.format(k) in file:
print(os.path.join(dir_path, file))
print(file)
param_file = os.path.join(dir_path, file)
with open(param_file, 'r') as f:
param_temp = f.readlines()
param_list = []
for i in range(len(param_temp) - 1):
temp = param_temp[i].replace(',','').replace(';','').split()
param_list.append([float(x) for x in temp])
if __name__ == '__main__':
numList = []
for k in range(1,7):
p = multiprocessing.Process(target=target_func, args=(k, ))
p.start()
print('Process number {} is running'.format(k))
numList.append(p)
for p in numList:
p.join()
print('End of calculations')
When I test this target_func() in the parent python interpreter it's ok. But when I try to execute this code in the cmd(windows) it breaks down with error
UnboundLocalError: local variable 'param_file' referenced before assignment
I have no idea how to declare the variables inside the process. I looked for solution a few days but the only that I could find was the information of two kinds:
the discussions about process message exchange and share memory or data;
lots of the same toy and useless examples of executing the code which prints process id and number.
My task is to run 6 independent processes which won't interact no way. Each of them should consume 6 different .txt parameter files, make some calculations and save the results to 6 new different .txt result files. Nothing more. Could someone get me any advises how to do such a way?

You'll need to pre-define the param_file variable outside the for loop (as None, for example), so that in case it doesn't get defined inside the for loop (happens when the if block within the for loop never gets executed), the variable will still be defined.
That way, you will be able to use the param_file variable after the for loop with an if statement that checks if the variable is None or not.
def target_func(k):
dir_path = os.getcwd()
param_file = None
for file in os.listdir(dir_path):
if file.endswith(".txt") and 'Focuses{}'.format(k) in file:
print(os.path.join(dir_path, file))
print(file)
param_file = os.path.join(dir_path, file)
if param_file is None:
print("File not found")
else:
with open(param_file, 'r') as f:
param_temp = f.readlines()
param_list = []
for i in range(len(param_temp) - 1):
temp = param_temp[i].replace(',','').replace(';','').split()
param_list.append([float(x) for x in temp])

Related

Python script iterates over whole folder but skips files in the folder

I tried running the following code. The code should read hdf5 files from a directory and create for every hdf5 file a png and a txt file with the same name (btw. I need it as input for the CNN YOLO).
The code does what I described but only for 20 images! I added print(i) to see if the for-loop is working proper... and it is. It prints every file in the directory (over 200 files). But it just creates 20 .png and 20 .txt files.
def process_fpath(path1, path2):
sensor_dim = (101, 101)
onlyfiles = [f for f in os.listdir(path1) if isfile(join(path1, f))]
for i in onlyfiles:
if i.endswith(".hdf"):
print(i)
#cut ".hdf"
name = str(i[0:-5])
# create png
im = h5py.File(path1 + str(i), 'r')
labels_im = im['labels']
image = im['image']
plt.imsave(path2 + name + '.png', image)
# create txt
exp = np.column_stack((np.zeros(np.size(labels_im,0)) , labels_im[:,0]/sensor_dim[0], labels_im[:,1]/sensor_dim[1], labels_im[:,3]/sensor_dim[0], labels_im[:,3]/sensor_dim[0]))
np.savetxt(path2 + name + '.txt', exp, delimiter = ' ', fmt=['%d', '%8f', '%8f', '%8f', '%8f'])
continue
else:
continue
This is my first post so if something isn't proper please let me know.

Maybe it's because of the name variable? You remove 5 characters but you want to remove only 4: name = str(i[0:-4])
Not related to your question, the last 3 lines are useless. you can remove them.
continue
else:
continue
Try to run on a given file that is not working to understand what the problem is instead of looping on each of them.

Rewriting file and calling script recurrently

In my current project, I'm trying to receive a list of values (for now I'm ok with receiving them as a list of strings as that makes a part of the code easier), a name of a file, and iterate through the values to change the value on the file in order to submit many calls to the terminal at once. The problem I have right now is, I believe, that though I'm changing the values, the submitted files are not different from the initial file because I'm failing to realize that I'm not rewriting the file correctly.
1st Part - Input
if __name__ == '__main__':
if len(sys.argv) < 1:
print "Specify the input"
exit(1)
f = sys.argv[1]
list = ast.literal_eval(sys.argv[3])
2nd Part - Rewriting
while i < len(list):
with open(f, 'r+') as file:
programFile = file.read()
for l in range(len(node)):
if i==0:
valuesDic.update({"initialValue":list[i]})
else:
valuesDic.update({list[i-1]:list[i]})
multiValuesChange(programFile, valuesDic)
out_file = open(f, "w")
out_file.write(programFile)
out_file.close()
call(["qsub","-l","h=node10",f])
i=i+1
3rd Part - multiValuesChange
def multiValuesChange(programFile, valuesDic):
rc = re.compile('|'.join(map(re.escape, valuesDic)))
def translate(match):
return valuesDic[match.group(0)]
return rc.sub(translate, programFile)
Thank you.

Your programFile is a string. Strings are immutable. If you want to update it you'll have to collect it back after being processed through the multiValuesChange() function, so do:
programFile = multiValuesChange(programFile, valuesDic)

Merging PDF's with PyPDF2 with inputs based on file iterator

I have two folders with PDF's of identical file names. I want to iterate through the first folder, get the first 3 characters of the filename, make that the 'current' page name, then use that value to grab the 2 corresponding PDF's from both folders, merge them, and write them to a third folder.
The script below works as expected for the first iteration, but after that, the subsequent merged PDF's include all the previous ones (ballooning quickly to 72 pages within 8 iterations).
Some of this could be due to poor code, but I can't figure out where that is, or how to clear the inputs/outputs that could be causing the failure to write only 2 pages per iteration:
import os
from PyPDF2 import PdfFileMerger
merger = PdfFileMerger()
rootdir = 'D:/Python/Scatterplots/BoundaryEnrollmentPatternMap'
for subdir, dirs, files in os.walk(rootdir):
for currentPDF in files:
#print os.path.join(file[0:3])
pagename = os.path.join(currentPDF[0:3])
print "pagename is: " + pagename
print "File is: " + pagename + ".pdf"
input1temp = 'D:/Python/Scatterplots/BoundaryEnrollmentPatternMap/' + pagename + '.pdf'
input2temp = 'D:/Python/Scatterplots/TraditionalScatter/' + pagename + '.pdf'
input1 = open(input1temp, "rb")
input2 = open(input2temp, "rb")
merger.append(fileobj=input1, pages=(0,1))
merger.append(fileobj=input2, pages=(0,1))
outputfile = 'D:/Python/Scatterplots/CombinedMaps/Sch_' + pagename + '.pdf'
print merger.inputs
output = open(outputfile, "wb")
merger.write(output)
output.close()
#clear all inputs - necessary?
outputfile = []
output = []
merger.inputs = []
input1temp = []
input2temp = []
input1 = []
input2 = []
print "done"
My code / work is based on this sample:
https://github.com/mstamy2/PyPDF2/blob/master/Sample_Code/basic_merging.py

I think that the error is that merger is initialized before the loop and it accumulates all the documents. Try to move line merger = PdfFileMerger() into the loop body. merger.inputs = [] doesn't seem to help in this case.
There are a few notes about your code:
input1 = [] doesn't close file. It will result in many files, which are opened by the program. You should call input1.close() instead.
[] means an empty array. It is better to use None if a variable should not contain any meaningful value.
To remove a variable (e.g. output), use del output.
After all, clearing all variables is not necessary. They will be freed with garbage collector.
Use os.path.join to create input1temp and input2temp.

Iterate over 2 files in each folder and compare them

I compare two text files and print out the results to a 3rd file. I am trying to make it so the script i'm running would iterate over all of the folders that have two text files in them, in the CWD of the script.
What i have so far:
import os
import glob
path = './'
for infile in glob.glob( os.path.join(path, '*.*') ):
print('current file is: ' + infile)
with open (f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:
Would this be a good way to start the iteration process?
It's not the most clear code but it gets the job done. However, i'm pretty sure i need to take the logic out of the read / write methods but i'm not sure where to start.
What i'm basically trying to do is have a script iterate over all of the folders in its CWD, open each folder, compare the two text files inside, write a 3rd text file to the same folder, then move on to the next.
Another method i have tried is as follows:
import os
rootDir = 'C:\\Python27\\test'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
And this outputs the following (to give you a better example of the file structure:
Found directory: C:\Python27\test
test.py
Found directory: C:\Python27\test\asdd
asd1.txt
asd2.txt
Found directory: C:\Python27\test\chro
ch1.txt
ch2.txt
Found directory: C:\Python27\test\hway
hw1.txt
hw2.txt
Would it be wise to put the compare logic under the for fname in fileList? How do i make sure it compares the two text files inside the specific folder and not with other fnames in the fileList?
This is the full code that i am trying to add this functionality into. I appologize for the Frankenstein nature of it but i am still working on a refined version but it does not work yet.
from collections import defaultdict
from operator import itemgetter
from itertools import groupby
from collections import deque
import os
class avs_auto:
def load_and_compare(self, input_file1, input_file2, output_file1, output_file2, result_file):
self.load(input_file1, input_file2, output_file1, output_file2)
self.compare(output_file1, output_file2)
self.final(result_file)
def load(self, fileIn1, fileIn2, fileOut1, fileOut2):
with open(fileIn1+'.txt') as fin1, open(fileIn2+'.txt') as fin2:
frame_rects = defaultdict(list)
for row in (map(str, line.split()) for line in fin1):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects[frame].append(id)
frame_rects[frame].append(rect)
frame_rects2 = defaultdict(list)
for row in (map(str, line.split()) for line in fin2):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects2[frame].append(id)
frame_rects2[frame].append(rect)
with open(fileOut1+'.txt', 'w') as fout1, open(fileOut2+'.txt', 'w') as fout2:
for frame, rects in sorted(frame_rects.iteritems()):
fout1.write('{{{}:{}}}\n'.format(frame, rects))
for frame, rects in sorted(frame_rects2.iteritems()):
fout2.write('{{{}:{}}}\n'.format(frame, rects))
def compare(self, fileOut1, fileOut2):
with open(fileOut1+'.txt', 'r') as fin1:
with open(fileOut2+'.txt', 'r') as fin2:
lines1 = fin1.readlines()
lines2 = fin2.readlines()
diff_lines = [l.strip() for l in lines1 if l not in lines2]
diffs = defaultdict(list)
with open(fileOut1+'x'+fileOut2+'.txt', 'w') as result_file:
for line in diff_lines:
d = eval(line)
for k in d:
list_ids = d[k]
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
result_file.write('{0} {1} {2}\n'.format(id_, group[0], group[-1]))
def final(self, result_file):
with open(result_file+'.txt', 'r') as fin:
lines = (line.split() for line in fin)
for k, g in groupby(lines, itemgetter(0)):
fst = next(g)
lst = next(iter(deque(g, 1)), fst)
with open('final/{}.avs'.format(k), 'w') as fout:
fout.write('video0=ImageSource("old\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video1=ImageSource("new\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video0=BilinearResize(video0,640,480)\n')
fout.write('video1=BilinearResize(video1,640,480)\n')
fout.write('StackHorizontal(video0,video1)\n')
fout.write('Subtitle("ID: {}", font="arial", size=30, align=8)'.format(k))
using the load_and_compare() function, i define two input text files, two output text files, a file for the comparison results and a final phase that writes many files for all of the differences.
What i am trying to do is have this whole class run on the current working directory and go through every sub folder, compare the two text files, and write everything into the same folder, specifically the final() results.

You can indeed use os.walk(), since that already separates the directories from the files. You only need the directories it returns, because that's where you're looking for your 2 specific files.
You could also use os.listdir() but that returns directories as well files in the same list, so you would have to check for directories yourself.
Either way, once you have the directories, you iterate over them (for subdir in dirnames) and join the various path components you have: The dirpath, the subdir name that you got from iterating over the list and your filename.
Assuming there are also some directories that don't have the specific 2 files, it's a good idea to wrap the open() calls in a try..except block and thus ignore the directories where one of the files (or both of them) doesn't exist.
Finally, if you used os.walk(), you can easily choose if you only want to go into directories one level deep or walk the whole depth of the tree. In the former case, you just clear the dirnames list by dirnames[:] = []. Note that dirnames = [] wouldn't work, since that would just create a new empty list and put that reference into the variable instead of clearing the old list.
Replace the print("do something ...") with your program logic.
#!/usr/bin/env python
import errno
import os
f1 = "test1"
f2 = "test2"
path = "."
for dirpath, dirnames, _ in os.walk(path):
for subdir in dirnames:
filepath1, filepath2 = [os.path.join(dirpath, subdir, f + ".txt") for f in f1, f2]
try:
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
except IOError as e:
# ignore directiories that don't contain the 2 files
if e.errno != errno.ENOENT:
# reraise exception if different from "file or directory doesn't exist"
raise
# comment the next line out if you want to traverse all subsubdirectories
dirnames[:] = []
Edit:
Based on your comments, I hope I understand your question better now.
Try the following code snippet instead. The overall structure stays the same, only now I'm using the returned filenames of os.walk(). Unfortunately, that would also make it harder to do something like "go only into the subdirectories 1 level deep", so I hope walking the tree recursively is fine with you. If not, I'll have to add a little code to later.
#!/usr/bin/env python
import fnmatch
import os
filter_pattern = "*.txt"
path = "."
for dirpath, dirnames, filenames in os.walk(path):
# comment this out if you don't want to filter
filenames = [fn for fn in filenames if fnmatch.fnmatch(fn, filter_pattern)]
if len(filenames) == 2:
# comment this out if you don't want the 2 filenames to be sorted
filenames.sort(key=str.lower)
filepath1, filepath2 = [os.path.join(dirpath, fn) for fn in filenames]
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
I'm still not really sure what your program logic does, so you will have to interface the two yourself.
However, I noticed that you're adding the ".txt" extension to the file name explicitly all over your code, so depending on how you are going to use the snippet, you might or might not need to remove the ".txt" extension first before handing the filenames over. That would be achieved by inserting the following line after or before the sort:
filenames = [os.path.splitext(fn)[0] for fn in filenames]
Also, I still don't understand why you're using eval(). Do the text files contain python code? In any case, eval() should be avoided and be replaced by code that's more specific to the task at hand.
If it's a list of comma separated strings, use line.split(",") instead.
If there might be whitespace before or after the comma, use [word.strip() for word in line.split(",")] instead.
If it's a list of comma separated integers, use [int(num) for num in line.split(",")] instead - for floats it works analogously.
etc.

FOR loop range to process certain number of files at a time

I have a for loop that runs through a directory and processes the files there, but I'd like to only process a certain number of the files at one time. For example, I have a directory with 1000 files, but I can only process 250 of them a day, so the first time I run the script, it processes the first 250. then the next 250, and so on and so forth.
First, I'm checking the file names against an XML file that records the name of files that have already been synced, so that I don't process them a second time. Then I would like to process the next n files, where I have a variable synclimit = n
I thought about adding the in range statement to the for loop like this:
tree = ET.parse("sync_list.xml")
root = tree.getroot()
synced = [elt.text for elt in root.findall('synced/sfile')]
for filename in os.listdir(filepath) and in range (0, synclimit) :
if fnmatch.fnmatch(filename, '*.txt') and filename not in synced:
filename = os.path.join(filepath, filename)
result = plistlib.readPlist(filename)
But, I'm pretty sure this will only check the first n number of files in the directory each time. Should I add the range statement to the if statement? like:
tree = ET.parse("sync_list.xml")
root = tree.getroot()
synced = [elt.text for elt in root.findall('synced/sfile')]
for filename in os.listdir(filepath):
if fnmatch.fnmatch(filename, '*.txt') and filename not in synced and in range (0, synclimit):
filename = os.path.join(filepath, filename)
result = plistlib.readPlist(filename)
or is there an easier way to do this? Thank you.

Just keep a separate counter and increment that, then test if it has reached synclimit. Simple as that. There is no need to get too clever here:
processed = 0
for filename in os.listdir(filepath):
if not filename.endswith('.txt') or filename in synched:
continue
# process
processed += 1
if processed >= synclimit:
break # done for today.
Alternatively, since os.listdir() returns a list, you could filter it if you have your already synched list of filenames in a set, then slice it down to your maximum size:
synced = set(elt.text for elt in root.findall('synced/sfile'))
to_process = [f for f in os.listdir(filepath) if f.endswith('.txt') and f not in synched]
for filename in to_process[:synclimit]:
# process
Note that I just test for .endswith('.txt') instead of using your simple filematcher; the test comes down to the same thing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

UnboundLocalError in multiprocessing in python - python

Related

Python script iterates over whole folder but skips files in the folder

Rewriting file and calling script recurrently

Merging PDF's with PyPDF2 with inputs based on file iterator

Iterate over 2 files in each folder and compare them

FOR loop range to process certain number of files at a time

Categories

Resources