I have a thread that I would like to loop through all of the .txt files in a certain directory (C:\files\) All I need is help reading anything from that directory that is a .txt file. I cant seem to figure it out.. Here is my current code that looks for specific files:
def file_Read(self):
if self.is_connected:
threading.Timer(5, self.file_Read).start();
print '~~~~~~~~~~~~Thread test~~~~~~~~~~~~~~~'
try:
with open('C:\\files\\test.txt', 'r') as content_file:
content = content_file.read()
Num,Message = content.strip().split(';')
print Num
print Message
print Num
self.send_message(Num + , Message)
content_file.close()
os.remove("test.txt")
#except
except Exception as e:
print 'no file ', e
time.sleep(10)
does anyone have a simple fix for this? I have found a lot of threads using methods like:
directory = os.path.join("c:\\files\\","path")
threading.Timer(5, self.file_Read).start();
print '~~~~~~~~~~~~Thread test~~~~~~~~~~~~~~~'
try:
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".txt"):
content_file = open(file, 'r')
but this doesn't seem to be working.
Any help would be appreciated. Thanks in advance...
I would do something like this, by using glob:
import glob
import os
txtpattern = os.path.join("c:\\files\\", "*.txt")
files = glob.glob(txtpattern)
for f in file:
print "Filename : %s" % f
# Do what you want with the file
This method works only if you want to read .txt in your directory and not in its potential subdirectories.
Take a look at the manual entries for os.walk - if you need to recurse sub-directories or glob.glob if you are only interested in a single directory.
The main problem is that the first thing you do in the function that you want to start in the threads is that you create a new thread with that function.
Since every thread will start a new thread, you should get an increasing number of threads starting new threads, which also seems to be what happens.
If you want to do some work on all the files, and you want to do that in parallel on a multi-core machine (which is what I'm guessing) take a look at the multiprocessing module, and the Queue class. But get the file handling code working first before you try to parallelize it.
Related
I need to run the following script for each txt file located in all subfolders.
The main folder is "simulations" in which there are different subfolders (called as "year-month-day"). In each subfolder there is a txt file "diagno.inp". I have to run this script for each "diagno.inp" file in order to have a list with the following data (a row for each day):
"year-month-day", "W_int", "W_dir"
Here's the code that is working for only a subfolder. Can you help me to create a loop?
fid=open('/Users/silviamassaro/weather/simulations/20180105/diagno.inp', "r")
subfolder="20180105"
n = fid.read().splitlines()[51:]
for element in n:
"do something" # here code to calculate W_dirand W_int for each day
print (subfolder, W_int, W_dir)
Here's what I usually do when I need to loop over a directory and its child recursively:
import os
main_folder = '/path/to/the/main/folder'
files_to_process = [os.path.join(main_folder, child) for child in os.listdir(main_folder)]
while files_to_process:
child = files_to_process.pop()
if os.path.isdir(child):
files_to_process.extend(os.path.join(child, sub_child) for sub_child in os.listdir(child))
else:
# We have a file here, we can do what we want with it
It's short, but has pretty strong assumptions:
You don't care about the order in which the files are treated.
You only have either directories or regular files in the childs of your entry point.
Edit: added another possible solution using glob, thanks to #jacques-gaudin's comment
This solution has the advantaged that you are sure to get only .inp files, but you are still not sure of their order.
import glob
main_folder = '/path/to/the/main/folder'
files_to_process = glob.glob('%s/**/*.inp' % main_folder, recursive=Tre)
for found_file in files_to_process:
# We have a file here, we can do what we want with it
Hope this helps!
With pathlib you can do something like this:
from pathlib import Path
sim_folder = Path("path/to/simulations/folder")
for inp_file in sim_folder.rglob('*.inp'):
subfolder = inp_file.parent.name
with open(inp_file, 'r') as fid:
n = fid.read().splitlines()[51:]
for element in n:
"do something" # here code to calculate W_dirand W_int for each day
print (subfolder, W_int, W_dir)
Note this is recursively traversing all subfolders to look for .inp files.
I have a Python code to perform some operations on a text file. I need to run this code over around 200+ text files that are all stored in the same folder.
I want the code to open one text file at a time, perform the operations and then start over with the next text file.
Can you give me some pointers regarding how I can do this?
My code is like this:
def main():
text_file = open("filename.txt","r")
#operations
text_file.close()
main()
Use listdir to iterate through files.
import os
def main():
for filename in os.listdir(somedir):
filepath = os.path.join(somedir, filename)
if os.path.isfile(filepath): # Is filepath really a file, not a directory?
text_file = open(filepath,"r")
#operations
text_file.close()
main()
As noted in the comments, it's better to use with.
not sure how many files you have but you could also look at
http://www.dabeaz.com/generators-uk/genfind.py and then send this to http://www.dabeaz.com/generators-uk/genopen.py
this is the ideal solution for processing a lot of data..
Thanks Dave ;-)
I am currently reading in 200 dicom images manually using the code:
ds1 = dicom.read_file('1.dcm')
so far, this has worked but I am trying to make my code shorter and easier to use by creating a loop to read in the files using this code:
for filename in os.listdir(dirName):
dicom_file = os.path.join("/",dirName,filename)
exists = os.path.isfile(dicom_file)
print filename
ds = dicom.read_file(dicom_file)
This code is not currently working and I am receiving the error:
"raise InvalidDicomError("File is missing 'DICM' marker. "
dicom.errors.InvalidDicomError: File is missing 'DICM' marker. Use
force=True to force reading
Could anyone advice me on where I am going wrong please?
I think the line:
dicom_file = os.path.join("/",dirName,filename)
might be an issue? It will join all three to form a path rooted at '/'. For example:
os.path.join("/","directory","file")
will give you "/directory/file" (an absolute path), while:
os.path.join("directory","file")
will give you "directory/file" (a relative path)
If you know that all the files you want are "*.dcm"
you can try the glob module:
import glob
files_with_dcm = glob.glob("*.dcm")
This will also work with full paths:
import glob
files_with_dcm = glob.glob("/full/path/to/files/*.dcm")
But also, os.listdir(dirName) will include everything in the directory including other directories, dot files, and whatnot
Your exists = os.path.isfile(dicom_file) line will filter out all the non files if you use an "if exists:" before reading.
I would recommend the glob approach, if you know the pattern, otherwise:
if exists:
try:
ds = dicom.read_file(dicom_file)
except InvalidDicomError as exc:
print "something wrong with", dicom_file
If you do a try/except, the if exists: is a bit redundant, but doesn't hurt...
Try adding:
dicom_file = os.path.join("/",dirName,filename)
if not dicom_file.endswith('.dcm'):
continue
Currently I am trying to upload a set of files via API call. The files have sequential names: part0.xml, part1.xml, etc. It loops through all the files and uploads them properly, but it seems it doesn't break the loop and after it uploads the last available file in the directory I am getting an error:
No such file or directory.
And I don't really understand how to make it stop as soon as the last file in the directory is uploaded. Probably it a very dumb question, but I am really lost. How do I stop it from looping through non-existent files?
The code:
part = 0
with open('part%d.xml' % part, 'rb') as xml:
#here goes the API call code
part +=1
I also tried something like this:
import glob
part = 0
for fname in glob.glob('*.xml'):
with open('part%d.xml' % part, 'rb') as xml:
#here goes the API call code
part += 1
Edit: Thank you all for the answers, learned a lot. Still lots to learn. :)
You almost had it. This is your code with some stuff removed:
import glob
for fname in glob.glob('part*.xml'):
with open(fname, 'rb') as xml:
# here goes the API call code
It is possible to make the glob more specific, but as it is it solves the "foo.xml" problem. The key is to not use counters in Python; the idiomatic iteration is for x in y: and you don't need a counter.
glob will return the filenames in alphabetical order so you don't even have to worry about that, however remember that ['part1', 'part10', 'part2'] sort in that order. There are a few ways to cope with that but it would be a separate question.
Alternatively, you can simply use a regex.
import os, re
files = [f for f in os.listdir() if re.search(r'part[\d]+\.xml$', f)]
for f in files:
#process..
This will be really useful in case you require advanced filtering.
Note: you can do similar filtering using list returned by glob.glob()
If you are not familiar with the list comprehension and regex, I would recommend you to refer to:
Regex - howto
List Comprehensions
Your for loop is saying "for every file that ends with .xml"; if you have any file that ends with .xml that isn't a sequential part%d.xml, you're going to get an error. Imagine you have part0.xml and foo.xml. The for loop is going to loop twice; on the second loop, it's going to try to open part1.xml, which doesn't exist.
Since you know the filenames already, you don't even need to use glob.glob(); just check if each file exists before opening it, until you find one that doesn't exist.
import os
from itertools import count
filenames = ('part%d.xml' % part_num for part_num in count())
for filename in filenames:
if os.path.exists(filename):
with open(filename, 'rb') as xmlfile:
do_stuff(xml_file)
# here goes the API call code
else:
break
If for any reason you're worried about files disappearing between os.path.exists(filename) and open(filename, 'rb'), this code is more robust:
import os
from itertools import count
filenames = ('part%d.xml' % part_num for part_num in count())
for filename in filenames:
try:
xmlfile = open(filename, 'rb')
except IOError:
break
else:
with xmlfile:
do_stuff(xmlfile)
# here goes the API call code
Consider what happens if there are other files that match the '*.xml'
suppose that you have 11 files "part0.xml"..."part10.xml" but also a file called "foo.xml"
Then the for loop will iterate 12 times (since there are 12 matches for the glob). On the 12th iteration, you are trying to open "part11.xml" which doesn't exist.
On approach is to dump the glob and just handle the exception.
part = 0
while True:
try:
with open('part%d.xml' % part, 'rb') as xml:
#here goes the API call code
part += 1
except IOerror:
break
When you use a counter, you need to test, if the file exists:
import os
from itertools import count
for part in count():
filename = 'part%d.xml' % part
if not os.path.exists(filename):
break
with open(filename) as inp:
# do something
You are doing it wrong.
Suppose folder has 3 files- part0.xml part1.xml and foo.xml. So loop will iterate 3 times and it will give error for third iteration, it will try to open part2.xml, which is not present.
Don't loop through all files with extension .xml.
Only Loop through files which start with 'part', have a digit in the name before the extension and having extension .xml
So your code will look like this:
import glob
for fname in glob.glob('part*[0-9].xml'):
with open(fname, 'rb') as xml:
#here goes the API call code
Read - glob – Filename pattern matching
If you want files to be uploaded in sequential order then read : String Natural Sort
I can't save this output, maybe someone have the solution. I'm listing a directory and some singles files. But when I save the output just catch the directory files, and Not the singles files.
My code:
import os
tosave = open('/tmp/list','ab')
thesource = ["/etc/ssh","/var/log/syslog","/etc/hosts"]
for f in thesource:
print f
for top, dirs, files in os.walk(f):
for nm in files:
print os.path.join(top, nm)
try:
tosave.write(top+nm+'\n')
finally:
tosave.close
I saw in the console all files and directory, but in the saved list, just ssh files. Why didn't save syslog and hosts too?
Thank you !!
In case you missed the () at tosave.close while pasting: (otherwise check harsh's answer)
The finally is wrong here. The code in finally will be executed after the try block, so after the first execution of tosave.write(top+nm+'\n') the file will be closed because of tosave.close().
Possibly you intended to use except:
# snip
try:
tosave.write(top+nm+'\n')
except:
tosave.close()
Edit: To answer your comment, you want the last line to be the same as the print statement:
tosave.write(os.path.join(top, nm) + '\n')
You could try adding tosave.flush() at the end. It does cause problems sometimes. Sometimes, a flush call is required to empty the contents of the buffer into the file.
Check if this works for you
import os
tosave = open('/tmp/list','ab')
thesource = ["/etc/ssh","/var/log/syslog","/etc/hosts"]
for f in thesource:
if os.path.isdir(f):
for top, dirs, files in os.walk(f):
for nm in files:
try:
tosave.write(top+nm+'\n')
if os.path.isfile(f):
tosave.write(f+'\n')
to.close()
With all your help I found a solution and it's working. I share it.
tosave = open('/tmp/list','ab')
thesource = ["/etc/ssh","/var/log/syslog","/etc/hosts"]
for f in thesource:
if os.path.isfile(f):
print f
tosave.write(f+'\n')
else:
for top, dirs, files in os.walk(f):
for nm in files:
print os.path.join(top, nm)
tosave.write(top+nm+'\n')
Thank you all for your help !!!
Maybe this is because you're opening the file in append mode (the 'a') and then looking at its beginning? Look at its end - you might see your new files listed there.
In append mode, each time the script runs it appends its output to the file. Usually what one wants is just the write mode ('w' instead of 'a'), which overwrites the file each time.
Put the close() at the end of the script.
Otherwise it will close the file after the first step in the loop, making the file unwritable.