I have a python script that checks a certain folder for new files and then copies the new files to another directory. The files are in such a format 1234.txt and 1234_status.txt. It should only move 1234.txt and leave the 1234_status.txt unattended.
Here's a little piece of my code in python
while 1:
#retrieves listdir
after = dict([(f, None) for f in os.listdir (path_to_watch)])
#if after has more files than before, then it adds the new files to an array "added"
added = [f for f in after if not f in before]
My idea is that after it fills added, then it checks it for values that have status in it and pops it from the array. Couldn't find a way to do this though : /
If I understand your problem correctly:
while 1:
for f in os.listdir(path_to_watch):
if 'status' not in f: # or a more appropriate condition
move_file_to_another_directory(f)
# wait
or check pyinotify if using Linux to avoid useless checks.
added = [f for f in after if not f in before and '_status' not in f]
I do however recommend to refrain from long one line statements as they make the code almost impossible to read
files_in_directory = [filename for filename in os.listdir(directory_name)]
files_to_move = filter(lambda filename: '_status' not in filename, files_in_directory)
You can use set logic since order doesn't matter here:
from itertools import filterfalse
def is_status_file(filename):
return filename.endswith('_status.txt')
# ...
added = set(after) - set(before)
without_status = filterfalse(is_status_file, added)
Related
After I execute a python script from a particular directory, I get many output files but apart from 5-6 files I want to delete the rest from that directory. What I have done is, I have taken those 5-6 useful files inside a list and deleted all the other files which are not there in that list. Below is my code:
list1=['prog_1.py', 'prog_2.py', 'prog_3.py'] #Extend
import os
dir = '/home/dev/codes' #Change accordingly
for f in os.listdir(dir):
if f not in list1:
os.remove(os.path.join(dir, f))
Now here I just want to add one more thing, if the output files start with output_of_final, then I don't want them to be deleted. How can I do it? Should I use regex?
You could use Regex, but that's overkill here. Just use the str.startswith method.
Also, it's bad practice to use reserved keywords, built-in types and functions as variable names. I have renamed dir to directory. (https://docs.python.org/3/library/functions.html#dir)
list1 = ['prog_1.py', 'prog_2.py', 'prog_3.py'] # Extend
import os
directory = '/home/dev/codes' # Change accordingly
for f in os.listdir(directory):
if f not in list1 and not f.startswith('output_of_final'):
os.remove(os.path.join(directory, f))
yes the regex works here, but there are easier options like using startswith method for strings
list1=['prog_1.py', 'prog_2.py', 'prog_3.py'] #Extend
import os
dir = '/home/dev/codes' #Change accordingly
for f in os.listdir(dir):
if (f not in list1) and (not f.startswith('output_of_final')):
os.remove(os.path.join(dir, f))
I'm new to python and get stuck by a problem I encountered while studying loops and folder navigation.
The task is simple: loop through a folder and count all '.txt' files.
I believe there may be some modules to tackle this task easily and I would appreciate it if you can share them. But since this is just a random question I encountered while learning python, it would be nice if this can be solved using the tools I just acquired, like for/while loops.
I used for and while clauses to loop through a folder. However, I'm unable to loop through a folder entirely.
Here is the code I used:
import os
count=0 # set count default
path = 'E:\\' # set path
while os.path.isdir(path):
for file in os.listdir(path): # loop through the folder
print(file) # print text to keep track the process
if file.endswith('.txt'):
count+=1
print('+1') #
elif os.path.isdir(os.path.join(path,file)): #if it is a subfolder
print(os.path.join(path,file))
path=os.path.join(path,file)
print('is dir')
break
else:
path=os.path.join(path,file)
Since the number of files and subfolders in a folder is unknown, I think a while loop is appropriate here. However, my code has many errors or pitfalls I don't know how to fix. for example, if multiple subfolders exist, this code will only loop the first subfolder and ignore the rest.
Your problem is that you quickly end up trying to look at non-existent files. Imagine a directory structure where a non-directory named A (E:\A) is seen first, then a file b (E:\b).
On your first loop, you get A, detect it does not end in .txt, and that it is a directory, so you change path to E:\A.
On your second iteration, you get b (meaning E:\b), but all your tests (aside from the .txt extension test) and operations concatenate it with the new path, so you test relative to E:\A\b, not E:\b.
Similarly, if E:\A is a directory, you break the inner loop immediately, so even if E:\c.txt exists, if it occurs after A in the iteration order, you never even see it.
Directory tree traversal code must involve a stack of some sort, either explicitly (by appending and poping from a list of directories for eventual processing), or implicitly (via recursion, which uses the call stack to achieve the same purpose).
In any event, your specific case should really just be handled with os.walk:
for root, dirs, files in os.walk(path):
print(root) # print text to keep track the process
count += sum(1 for f in files if f.endswith('txt'))
# This second line matches your existing behavior, but might not be intended
# Remove it if directories ending in .txt should not be included in the count
count += sum(1 for d in files if d.endswith('txt'))
Just for illustration, the explicit stack approach to your code would be something like:
import os
count = 0 # set count default
paths = ['E:\\'] # Make stack of paths to process
while paths:
# paths.pop() gets top of directory stack to process
# os.scandir is easier and more efficient than os.listdir,
# though it must be closed (but with statement does this for us)
with os.scandir(paths.pop()) as entries:
for entry in entries: # loop through the folder
print(entry.name) # print text to keep track the process
if entry.name.endswith('.txt'):
count += 1
print('+1')
elif entry.is_dir(): #if it is a subfolder
print(entry.path, 'is dir')
# Add to paths stack to get to it eventually
paths.append(entry.path)
You probably want to apply recursion to this problem. In short, you will need a function to handle directories that will call itself when it encounters a sub-directory.
This might be more than you need, but it will allow you to list all the files within the directory that are .txt files but you can also add criteria to the search within the files as well. Here is the function:
def file_search(root,extension,search,search_type):
import pandas as pd
import os
col1 = []
col2 = []
rootdir = root
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if "." + extension in file.lower():
try:
with open(os.path.join(subdir, file)) as f:
contents = f.read()
if search_type == 'any':
if any(word.lower() in contents.lower() for word in search):
col1.append(subdir)
col2.append(file)
elif search_type == 'all':
if all(word.lower() in contents.lower() for word in search):
col1.append(subdir)
col2.append(file)
except:
pass
df = pd.DataFrame({'Folder':col1,
'File':col2})[['Folder','File']]
return df
Here is an example of how to use the function:
search_df = file_search(root = r'E:\\',
search=['foo','bar'], #words to search for
extension = 'txt', #could change this to 'csv' or 'sql' etc.
search_type = 'all') #use any or all
search_df
The analysis of your code has already been addressed by #ShadowRanger's answer quite well.
I will try to address this part of your question:
there may be some modules to tackle this task easily
For these kind of tasks, there actually exists the glob module, which implements Unix style pathname pattern expansion.
To count the number of .txt files in a directory and all its subdirectories, one may simply use the following:
import os
from glob import iglob, glob
dirpath = '.' # for example
# getting all matching elements in a list a computing its length
len(glob(os.path.join(dirpath, '**/*.txt'), recursive=True))
# 772
# or iterating through all matching elements and summing 1 each time a new item is found
# (this approach is more memory-efficient)
sum(1 for _ in iglob(os.path.join(dirpath, '**/*.txt'), recursive=True))
# 772
Basically glob.iglob() is the iterator version of glob.glob().
for nested Directories it's easier to use functions like os.walk
take this for example
subfiles = []
for dirpath, subdirs, files in os.walk(path):
for x in files:
if x.endswith(".txt"):
subfiles.append(os.path.join(dirpath, x))`
and it'ill return a list of all txt files
else ull need to use Recursion for task like this
I've got a bunch of files and a few folders. I'm trying to append the zips to a list so I can extract those files in other part of the code. It never finds the zips.
for file in os.listdir(path):
print(file)
if file.split(".")[1] == 'zip':
reg_zips.append(file)
The path is fine or it wouldn't print out anything. It picks up the same files each time but will not pick up any others. It picks up about 1/5th of the files in the directory.
At a complete loss. I've made sure that some weird race condition with the file availability isn't the problem by putting a time.sleep(3) in the code. Didn't solve it.
It's possible your files have more than one period in them. Try using str.endswith:
reg_zips = []
for file in os.listdir(path):
if file.endswith('zip'):
reg_zips.append(file)
Another good idea (thanks, Jean-François Fabre!) is to use os.path.splitext, which handles the extension quite nicely:
if os.path.splitext(file)[-1] == '.zip':
...
Am even better solution, I recommend with the glob.glob function:
import glob
reg_zips = glob.glob('*.zip')
reg_zips = [z for z in os.listdir(path) if z.endswith("zip")]
I am writing a method that takes a filename and a path to a directory and returns the next available filename in the directory or None if there are no files with names that would sort after the file.
There are plenty of questions about how to list all the files in a directory or iterate over them, but I am not sure if the best solution to finding a single next filename is to use the list that one of the previous answers generated and then find the location of the current file in the list and choose the next element (or None if we're already on the last one).
EDIT: here's my current file-picking code. It's reused from a different part of the project, where it is used to pick a random image from a potentially nested series of directories.
# picks a file from a directory
# if the file is also a directory, pick a file from the new directory
# this might choke up if it encounters a directory only containing invalid files
def pickNestedFile(directory, bad_files):
file=None
while file is None or file in bad_files:
file=random.choice(os.listdir(directory))
#file=directory+file # use the full path name
print "Trying "+file
if os.path.isdir(os.path.join(directory, file))==True:
print "It's a directory!"
return pickNestedFile(directory+"/"+file, bad_files)
else:
return directory+"/"+file
The program I am using this in now is to take a folder of chatlogs, pick a random log, starting position, and length. These will then be processed into a MOTD-like series of (typically) short log snippets. What I need the next-file picking ability for is when the length is unusually long or the starting line is at the end of the file, so that it continues at the top of the next file (a.k.a. wrap around midnight).
I am open to the idea of using a different method to choose the file, since the above method does not discreetly give a separate filename and directory and I'd have to go use a listdir and match to get an index anyway.
You should probably consider rewriting your program to not have to use this. But this would be how you could do it:
import os
def nextFile(filename,directory):
fileList = os.listdir(directory)
nextIndex = fileList.index(filename) + 1
if nextIndex == 0 or nextIndex == len(fileList):
return None
return fileList[nextIndex]
print(nextFile("mail","test"))
I tweaked the accepted answer to allow new files to be added to the directory on the fly and for it to work if a file is deleted or changed or doesn't exist. There are better ways to work with filenames/paths, but the example below keeps it simple. Maybe it's helpful:
import os
def next_file_in_dir(directory, current_file=None):
file_list = os.listdir(directory)
next_index = 0
if current_file in file_list:
next_index = file_list.index(current_file) + 1
if next_index >= len(file_list):
next_index = 0
return file_list[next_index]
file_name = None
directory = "videos"
user_advanced_to_next = True
while user_advanced_to_next:
file_name = next_file_in_dir(directory=directory, current_file=file_name )
user_advanced_to_next = play_video("{}/{}".format(directory, file_name))
finish_and_clean_up()
Currently I am trying to upload a set of files via API call. The files have sequential names: part0.xml, part1.xml, etc. It loops through all the files and uploads them properly, but it seems it doesn't break the loop and after it uploads the last available file in the directory I am getting an error:
No such file or directory.
And I don't really understand how to make it stop as soon as the last file in the directory is uploaded. Probably it a very dumb question, but I am really lost. How do I stop it from looping through non-existent files?
The code:
part = 0
with open('part%d.xml' % part, 'rb') as xml:
#here goes the API call code
part +=1
I also tried something like this:
import glob
part = 0
for fname in glob.glob('*.xml'):
with open('part%d.xml' % part, 'rb') as xml:
#here goes the API call code
part += 1
Edit: Thank you all for the answers, learned a lot. Still lots to learn. :)
You almost had it. This is your code with some stuff removed:
import glob
for fname in glob.glob('part*.xml'):
with open(fname, 'rb') as xml:
# here goes the API call code
It is possible to make the glob more specific, but as it is it solves the "foo.xml" problem. The key is to not use counters in Python; the idiomatic iteration is for x in y: and you don't need a counter.
glob will return the filenames in alphabetical order so you don't even have to worry about that, however remember that ['part1', 'part10', 'part2'] sort in that order. There are a few ways to cope with that but it would be a separate question.
Alternatively, you can simply use a regex.
import os, re
files = [f for f in os.listdir() if re.search(r'part[\d]+\.xml$', f)]
for f in files:
#process..
This will be really useful in case you require advanced filtering.
Note: you can do similar filtering using list returned by glob.glob()
If you are not familiar with the list comprehension and regex, I would recommend you to refer to:
Regex - howto
List Comprehensions
Your for loop is saying "for every file that ends with .xml"; if you have any file that ends with .xml that isn't a sequential part%d.xml, you're going to get an error. Imagine you have part0.xml and foo.xml. The for loop is going to loop twice; on the second loop, it's going to try to open part1.xml, which doesn't exist.
Since you know the filenames already, you don't even need to use glob.glob(); just check if each file exists before opening it, until you find one that doesn't exist.
import os
from itertools import count
filenames = ('part%d.xml' % part_num for part_num in count())
for filename in filenames:
if os.path.exists(filename):
with open(filename, 'rb') as xmlfile:
do_stuff(xml_file)
# here goes the API call code
else:
break
If for any reason you're worried about files disappearing between os.path.exists(filename) and open(filename, 'rb'), this code is more robust:
import os
from itertools import count
filenames = ('part%d.xml' % part_num for part_num in count())
for filename in filenames:
try:
xmlfile = open(filename, 'rb')
except IOError:
break
else:
with xmlfile:
do_stuff(xmlfile)
# here goes the API call code
Consider what happens if there are other files that match the '*.xml'
suppose that you have 11 files "part0.xml"..."part10.xml" but also a file called "foo.xml"
Then the for loop will iterate 12 times (since there are 12 matches for the glob). On the 12th iteration, you are trying to open "part11.xml" which doesn't exist.
On approach is to dump the glob and just handle the exception.
part = 0
while True:
try:
with open('part%d.xml' % part, 'rb') as xml:
#here goes the API call code
part += 1
except IOerror:
break
When you use a counter, you need to test, if the file exists:
import os
from itertools import count
for part in count():
filename = 'part%d.xml' % part
if not os.path.exists(filename):
break
with open(filename) as inp:
# do something
You are doing it wrong.
Suppose folder has 3 files- part0.xml part1.xml and foo.xml. So loop will iterate 3 times and it will give error for third iteration, it will try to open part2.xml, which is not present.
Don't loop through all files with extension .xml.
Only Loop through files which start with 'part', have a digit in the name before the extension and having extension .xml
So your code will look like this:
import glob
for fname in glob.glob('part*[0-9].xml'):
with open(fname, 'rb') as xml:
#here goes the API call code
Read - glob – Filename pattern matching
If you want files to be uploaded in sequential order then read : String Natural Sort