Breaking the loop properly in Python - python

Currently I am trying to upload a set of files via API call. The files have sequential names: part0.xml, part1.xml, etc. It loops through all the files and uploads them properly, but it seems it doesn't break the loop and after it uploads the last available file in the directory I am getting an error:
No such file or directory.
And I don't really understand how to make it stop as soon as the last file in the directory is uploaded. Probably it a very dumb question, but I am really lost. How do I stop it from looping through non-existent files?
The code:
part = 0
with open('part%d.xml' % part, 'rb') as xml:
#here goes the API call code
part +=1
I also tried something like this:
import glob
part = 0
for fname in glob.glob('*.xml'):
with open('part%d.xml' % part, 'rb') as xml:
#here goes the API call code
part += 1
Edit: Thank you all for the answers, learned a lot. Still lots to learn. :)

You almost had it. This is your code with some stuff removed:
import glob
for fname in glob.glob('part*.xml'):
with open(fname, 'rb') as xml:
# here goes the API call code
It is possible to make the glob more specific, but as it is it solves the "foo.xml" problem. The key is to not use counters in Python; the idiomatic iteration is for x in y: and you don't need a counter.
glob will return the filenames in alphabetical order so you don't even have to worry about that, however remember that ['part1', 'part10', 'part2'] sort in that order. There are a few ways to cope with that but it would be a separate question.

Alternatively, you can simply use a regex.
import os, re
files = [f for f in os.listdir() if re.search(r'part[\d]+\.xml$', f)]
for f in files:
#process..
This will be really useful in case you require advanced filtering.
Note: you can do similar filtering using list returned by glob.glob()
If you are not familiar with the list comprehension and regex, I would recommend you to refer to:
Regex - howto
List Comprehensions

Your for loop is saying "for every file that ends with .xml"; if you have any file that ends with .xml that isn't a sequential part%d.xml, you're going to get an error. Imagine you have part0.xml and foo.xml. The for loop is going to loop twice; on the second loop, it's going to try to open part1.xml, which doesn't exist.
Since you know the filenames already, you don't even need to use glob.glob(); just check if each file exists before opening it, until you find one that doesn't exist.
import os
from itertools import count
filenames = ('part%d.xml' % part_num for part_num in count())
for filename in filenames:
if os.path.exists(filename):
with open(filename, 'rb') as xmlfile:
do_stuff(xml_file)
# here goes the API call code
else:
break
If for any reason you're worried about files disappearing between os.path.exists(filename) and open(filename, 'rb'), this code is more robust:
import os
from itertools import count
filenames = ('part%d.xml' % part_num for part_num in count())
for filename in filenames:
try:
xmlfile = open(filename, 'rb')
except IOError:
break
else:
with xmlfile:
do_stuff(xmlfile)
# here goes the API call code

Consider what happens if there are other files that match the '*.xml'
suppose that you have 11 files "part0.xml"..."part10.xml" but also a file called "foo.xml"
Then the for loop will iterate 12 times (since there are 12 matches for the glob). On the 12th iteration, you are trying to open "part11.xml" which doesn't exist.
On approach is to dump the glob and just handle the exception.
part = 0
while True:
try:
with open('part%d.xml' % part, 'rb') as xml:
#here goes the API call code
part += 1
except IOerror:
break

When you use a counter, you need to test, if the file exists:
import os
from itertools import count
for part in count():
filename = 'part%d.xml' % part
if not os.path.exists(filename):
break
with open(filename) as inp:
# do something

You are doing it wrong.
Suppose folder has 3 files- part0.xml part1.xml and foo.xml. So loop will iterate 3 times and it will give error for third iteration, it will try to open part2.xml, which is not present.
Don't loop through all files with extension .xml.
Only Loop through files which start with 'part', have a digit in the name before the extension and having extension .xml
So your code will look like this:
import glob
for fname in glob.glob('part*[0-9].xml'):
with open(fname, 'rb') as xml:
#here goes the API call code
Read - glob – Filename pattern matching
If you want files to be uploaded in sequential order then read : String Natural Sort

Related

How can i read specific files in a folder (files within a range)in Python

For example, I have some 43000 txt files in my folder, however, I want to read not all the files but just some of them by giving in a range, like from 1.txt till 14400.txt`. How can I achieve this in Python? For now, I'm reading all the files in a directory like
for each in glob.glob("data/*.txt"):
with open(each , 'r') as file:
content = file.readlines()
with open('{}.csv'.format(each[0:-4]) , 'w') as file:
file.writelines(content)
Any way I can achieve the desired results?
Since glob.glob() returns an iterable, you can simply iterate over a certain section of the list using something like:
import glob
for each in glob.glob("*")[:5]:
print(each)
Just use variable list boundaries and I think this achieves the results you are looking for.
Edit: Also, be sure that you are not trying to iterate over a list slice that is out of bounds, so perhaps a check for that prior might be in order.
If the files have numerically consecutive names starting with 1.txt, you can use range() to help construct the filenames:
for num in range(1, 14400):
filename = "data/%d.txt" % num
I found a solution here: How to extract numbers from a string in Python?
import os
import re
filepath = './'
for filename in os.listdir():
numbers_in_name = re.findall('\d',filename)
if (numbers_in_name != [] and int(numbers_in_name[0]) < 5 ) :
print(os.path.join(filepath,filename))
#do other stuff with the filenames
You can use re to get the numbers in the filename. This prints all filenames where the first number is smaller than 5 for example.

opening several txt files using while loop

I have some txt files that their names have the following pattern:
arc.1.txt, arc.2.txt,...,arc.100.txt,..., arc.500.txt,...,arc.838.txt
I know that we can write a program using for loop to open the files one by one, if we now the total numbers of files. I want to know is it possible to use While loop without counting the number files to open them ?
import glob
for each_file in glob.glob("arc\.\d+\.txt"):
print(each_file)
It is definitely possible to use a while loop assuming that the files are numbered in sequential order:
i = 0
while True:
i += 1
filename = 'arc.%d.txt' % i
try:
with open(filename, 'r') as file_handle:
...
except IOError:
break
Though this becomes pretty ugly with all the nesting. You're probably better off getting the list of filenames using something like glob.glob.
from glob import glob
filenames = glob('arc.*.txt')
for filename in filenames:
with open(filename) as file_handle:
...
There are some race conditions associated with this second approach -- If the file somehow gets deleted between when glob found it and when it actually is time to process the file then your program could have a bad day.
If you add them all to a list and then remove them one by one and set the while condition to while (name of list) > 0: then open the next file

Reading all .txt files in C:\\Files\\

I have a thread that I would like to loop through all of the .txt files in a certain directory (C:\files\) All I need is help reading anything from that directory that is a .txt file. I cant seem to figure it out.. Here is my current code that looks for specific files:
def file_Read(self):
if self.is_connected:
threading.Timer(5, self.file_Read).start();
print '~~~~~~~~~~~~Thread test~~~~~~~~~~~~~~~'
try:
with open('C:\\files\\test.txt', 'r') as content_file:
content = content_file.read()
Num,Message = content.strip().split(';')
print Num
print Message
print Num
self.send_message(Num + , Message)
content_file.close()
os.remove("test.txt")
#except
except Exception as e:
print 'no file ', e
time.sleep(10)
does anyone have a simple fix for this? I have found a lot of threads using methods like:
directory = os.path.join("c:\\files\\","path")
threading.Timer(5, self.file_Read).start();
print '~~~~~~~~~~~~Thread test~~~~~~~~~~~~~~~'
try:
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".txt"):
content_file = open(file, 'r')
but this doesn't seem to be working.
Any help would be appreciated. Thanks in advance...
I would do something like this, by using glob:
import glob
import os
txtpattern = os.path.join("c:\\files\\", "*.txt")
files = glob.glob(txtpattern)
for f in file:
print "Filename : %s" % f
# Do what you want with the file
This method works only if you want to read .txt in your directory and not in its potential subdirectories.
Take a look at the manual entries for os.walk - if you need to recurse sub-directories or glob.glob if you are only interested in a single directory.
The main problem is that the first thing you do in the function that you want to start in the threads is that you create a new thread with that function.
Since every thread will start a new thread, you should get an increasing number of threads starting new threads, which also seems to be what happens.
If you want to do some work on all the files, and you want to do that in parallel on a multi-core machine (which is what I'm guessing) take a look at the multiprocessing module, and the Queue class. But get the file handling code working first before you try to parallelize it.

Create for loop for naming output file Python

So i'm importing a list of names
e.g.
Textfile would include:
Eleen
Josh
Robert
Nastaran
Miles
my_list = ['Eleen','Josh','Robert','Nastaran','Miles']
Then i'm assigning each name to a list and I want to write a new excel file for each name in that list.
#1. Is there anyway I can create a for loop where on the line:
temp = os.path.join(dir,'...'.xls')
_________________________
def high_throughput(names):
import os
import re
# Reading file
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
out_file=open(temp,'w')
data = []
for line in in_file:
data.append(line)
in_file.close()
I'm still not sure what you're trying to do (and by "not sure", I mean "completely baffled"), but I think I can explain some of what you're doing wrong, and how to do it right:
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
At this point, you don't have the input list of names. That's what you're reading from in_file, and you haven't read it yet. Later on, you read those named into data, after which you can use them. So:
in_file=open(names,'r')
dir,file=os.path.split(names)
data = []
for line in in_file:
data.append(line)
in_file.close()
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
Note that I put the for loop outside the function call, because you have to do that. And that's a good thing, because you presumably want to open each path (and do stuff to each file) inside that loop, not open a single path made out of a loop of files.
But if you don't insist on using a for loop, there is something that may be closer to what you were looking for: a list comprehension. You have a list of names. You can use that to build a list of paths. And then you can use that to build a list of open files. Like this:
paths = [os.path.join(dir, '{}.xls'.format(name)) for name in data]
out_files = [open(path, 'w') for path in paths]
Then, later, after you've built up the string you want to write to all the files, you can do this:
for out_file in out_files:
out_file.write(stuff)
However, this is kind of an odd design. Mainly because you have to close each file. They may get closed automatically by the garbage collection, and even if they don't, they may get flushed… but unless you get lucky, all that data you wrote is just sitting around in buffers in memory and never gets written to disk. Normally you don't want to write programs that depend on getting lucky. So, you want to close your files. With this design, you'd have to do something like:
for out_file in out_files:
out_file.close()
It's probably a lot simpler to go back to the one big loop I suggested in the first place, so you can do this:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
out_file.write(stuff)
out_file.close()
Or, even better:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
with open(temp,'w') as out_file:
out_file.write(stuff)
A few more comments, while we're here…
First, you really shouldn't be trying to generate .xls files manually out of strings. You can use a library like openpyxl. Or you can create .csv files instead—they're easy to create with the csv library that comes built in with Python, and Excel can handle them just as easily as .xls files. Or you can use win32com or pywinauto to take control of Excel and make it create your files. Really, anything is better than trying to generate them by hand.
Second, the fact that you can write for line in in_file: means that an in_file is some kind of sequence of lines. So, if all you want to do is convert it to a list of lines, you can do that in one step:
data = list(in_file)
But really, the only reason you want this list in the first place is so you can loop around it later, creating the output files, right? So why not just hold off, and loop over the lines in the file in the first place?
Whatever you do to generate the output stuff, do that first. Then loop over the file with the list of filenames and write stuff. Like this:
stuff = # whatever you were doing later, in the code you haven't shown
dir = os.path.dirname(names)
with open(names, 'r') as in_file:
for line in in_file:
temp = os.path.join(dir, '{}.xls'.format(line))
with open(temp, 'w') as out_file:
out_file.write(stuff)
That replaces all of the code in your sample (except for that function named high_throughput that imports some modules locally and then does nothing).
Take a look at openpyxl, especially if you need to create .xlsx files. Below example assumes the Excel workbooks are created as blank.
from openpyxl import Workbook
names = ['Eleen','Josh','Robert','Nastaran','Miles']
for name in names:
wb = Workbook()
wb.save('{0}.xlsx'.format(name))
Try this:
in_file=open(names,'r')
dir,file=os.path.split(names)
for name in in_file:
temp = os.path.join(dir, name + '.xls')
with open(temp,'w') as out_file:
# write data to out_file

Python - Strip all drive letters from csv file and replace with Z:

Here is the code example. Basically output.csv needs to remove any drive letter A:-Y: and replace it with Z: I tried to do this with a list (not complete yet) but it generates the error: TypeError: expected a character buffer object
#!/usr/bin/python
import os.path
import os
import shutil
import csv
import re
# Create the videos directory in the current directory
# If the directory exists ignore it.
#
# Moves all files with the .wmv extenstion to the
# videos folder for file structure
#
#Crawl the videos directory then change to videos directory
# create the videos.csv file in the videos directory
# replace any drive letter A:-Y: with Z:
def createCSV():
directory = "videos"
if not os.path.isdir("." + directory + "/"):
os.mkdir("./" + directory + "/")
for file in os.listdir("./"):
if os.path.splitext(file)[1] == ".wmv":
shutil.move(file, os.path.join("videos", file))
listDirectory = os.listdir("videos")
os.chdir(directory)
f = open("videos.csv", "w")
f.writelines(os.path.join(os.getcwd(), f + '\n') for f in listDirectory)
f = open('videos.csv', 'r')
w = open('output.csv', 'w')
f_cont = f.readlines()
for line in f_cont:
regex = re.compile("\b[GHI]:")
re.sub(regex, "Z:", line)
w.write(line)
f.close()
createCSV()
EDIT:
I think my flow/logic is wrong, the output.csv file that gets created still G: in the .csv it was not renamed to Z:\ from the re.sub line.
I can see you use some pythonic snippets, with smart uses of path.join and a commented code. This can get even better, let's rewrite a few things so we can solve your drive letters issue, and gain a more pythonic code on the way :
#!/usr/bin/env python
# -*- coding= UTF-8 -*-
# Firstly, modules can be documented using docstring, so drop the comments
"""
Create the videos directory in the current directory
If the directory exists ignore it.
Moves all files with the .wmv extension to the
videos folder for file structure
Crawl the videos directory then change to videos directory
create the videos.csv file in the videos directory
create output.csv replace any drive letter A:-Y: with Z:
"""
# not useful to import os and os.path as the second is contain in the first one
import os
import shutil
import csv
# import glob, it will be handy
import glob
import ntpath # this is to split the drive
# don't really need to use a function
# Here, don't bother checking if the directory exists
# and you don't need add any slash either
directory = "videos"
ext = "*.wmv"
try :
os.mkdir(directory)
except OSError :
pass
listDirectory = [] # creating a buffer so no need to list the dir twice
for file in glob.glob(ext): # much easier this way, isn't it ?
shutil.move(file, os.path.join(directory, file)) # good catch for shutil :-)
listDirectory.append(file)
os.chdir(directory)
# you've smartly imported the csv module, so let's use it !
f = open("videos.csv", "w")
vid_csv = csv.writer(f)
w = open('output.csv', 'w')
out_csv = csv.writer(w)
# let's do everything in one loop
for file in listDirectory :
file_path = os.path.abspath(file)
# Python includes functions to deal with drive letters :-D
# I use ntpath because I am under linux but you can use
# normal os.path functions on windows with the same names
file_path_with_new_letter = ntpath.join("Z:", ntpath.splitdrive(file_path)[1])
# let's write the csv, using tuples
vid_csv.writerow((file_path, ))
out_csv.writerow((file_path_with_new_letter, ))
It seems like the problem is in the loop at the bottom of your code. The string's replace method doesn't receive a list as its first arguments, but another string. You need to loop through your removeDrives list and call line.remove with every item in that list.
You could use
for driveletter in removedrives:
line = line.replace(driveletter, 'Z:')
thereby iterating over your list and replacing one of the possible drive letters after the other. As abyx wrote, replace expects a string, not a list, so you need this extra step.
Or use a regular expression like
import re
regex = re.compile(r"\b[FGH]:")
re.sub(regex, "Z:", line)
Additional bonus: Regex can check that it's really a drive letter and not, for example, a part of something bigger like OH: hydrogen group.
Apart from that, I suggest you use os.path's own path manipulation functions instead of trying to implement them yourself.
And of course, if you do anything further with the CSV file, take a look at the csv module.
A commentator above has already mentioned that you should close all the files you've opened. Or use with with statement:
with open("videos.csv", "w") as f:
do_stuff()

Categories

Resources