Generating a list of files

Generating a list of files - python

I have this this code here... it generates a list of files and folders in a directory
import os
for x in os.listdir("C:\Users"):
text = text + "\n" + x
f = open("List.txt", "w")
f.write(text)
f.close()
But how can I get it to do two things...
Firstly well read whats in the folders and keep going until there is only a child.
Secondly when it goes down a level it adds a tab. Like this
Level 1 (parent)
Level 2 (child)
How can I get it to add that tab in? For infinite amount of levels?

Use os.walk() instead:
start = r'C:\Users'
entries = []
for path, filenames, dirnames in os.walk(start):
relative = os.path.relpath(path, start)
depth = len(os.path.split(os.pathsep))
entries.extend([('\t' * depth) + f for f in filenames])
with open("List.txt", "w") as output:
output.write('\n'.join(entries))
In each loop, path is the full path to the directory the filenames and dirnames are directly contained in. By using os.path.relpath we extract the 'local' path, count the depth and use that for the number of tabs to prepend each filename with.

Related

Script fails to move the rest of the file due to a file being used by another process

I'm currently trying to make a script
where it takes a zipfile inside a directory,
checks if the zipfile contains a specific name,
and if so, it will move the zipfile to another directory.
Running the following does move the first file.
However, after it moves the first file, it fails to
go through the rest of the file and gives me this error.
"WindowsError: [Error 32] The process cannot access the file because it
is being used by another process: (here shows the location of the file)"
I wonder what would be the cause of this error.
items = os.listdir(location)
Asset_list = os.listdir(drive_location)
def get_list():
for each in items:
new_location = drive_location + "\\" + each
if ".zip" in each:
selected_zip = location + "\\" + each
with ZipFile(str(selected_zip)) as zip:
list_of_files = zip.namelist()
for each in list_of_files:
if Asset_list[5] in each:
shutil.move(selected_zip, new_location)

You cannot continue to check files in the zip file and move it to another location simultaneously. Change shutil.move to shutil.copy or exit the with code block before attempting to move the file. You can just simply add a flag and then copy based on the value of that flag:
items = os.listdir(location)
Asset_list = os.listdir(drive_location)
def get_list():
for each in items:
new_location = drive_location + "\\" + each
if ".zip" in each:
to_move: bool = False
selected_zip = location + "\\" + each
with ZipFile(str(selected_zip)) as zip:
list_of_files = zip.namelist()
for each in list_of_files:
if Asset_list[5] in each:
to_move = True
if to_move:
shutil.move(selected_zip, new_location)
Although I'm not entirely convinced about what exactly the line if Asset_list[5] in each: is meant to do. Are you comparing whether one string is contained in the other? Or do you want to compare equality of values? If it's the latter then you need to change it to if Asset_list[5] == each:

How can I confirm and remove the original files after sorting and copying them into several folders?

I'm a newbie and I'm trying to make office work a little less tedious. I currently have a little program that sorts and copies .pdf files from a folder into several folders, based on who these files need to be sent to later.
It works great. There's just the issue that I keep double-checking if it did its job. So then I added a bit where it counts the copied files to make checking easier.
Now I've been trying to figure out if I could make the program compare the list of files in the original folder with a list of files from all the other destination folders and then delete the originals if the files are indeed copied.
I've also resorted to having the program print the resulting file paths, but it's ugly and still requires me to manually compare.
Here's my code:
import os
import shutil
import pathlib
import pprint
dir = ('[path to original folder]')
files = os.listdir(dir)
user_data = [
('Karl H. Preusse', [Path to Karl]),
('Rom', [Path to Rom]),
('Hochschule', [Path to Hochschule]),
('Kiefer', [Path to Kiefer),
('Penny', [Path to Penny),
('Steigenberger', [Path to Steigenberger]),
('Penzkofer', [Path to Penzkofer]),
('Stoffel', [Path to Stoffel]),
('Cavertitzer', [Path to Cavertitzer])
]
for pattern, dest_dir in user_data:
matching_files = [f for f in files if pattern in f]
for filename in matching_files:
full_filename = os.path.join(dir, filename)
if os.path.isfile(full_filename):
if not os.path.exists(dest_dir):
os.makedirs(dest_dir)
shutil.copy(full_filename, dest_dir)
pprint.pprint(shutil.copy(full_filename, dest_dir))
stetje_datotek = sum(len(files) for _, _, files in os.walk([Path to directory that holds the copy folders])) #defines files to count
print('Stevilo datotek v mapi Posiljanje je: {}' .format(stetje_datotek)) #Prints out how many files are in the target folders.
Below are my attempts at getting things automated.
#I commented this function out as I couldn't figure out how to get the data out of it.
#def sub_files(folder):
# relpath = os.path.relpath
# join = os.path.join
# for path, _, files in os.walk([Path to directory that holds the copy folders]):
# relative = relpath(path, [Path to directory that holds the copy folders])
# for file in files:
# yield join(relative, file)
#print(sub_files)
Here I thought to use inputs to individually check each folder:
#print(os.listdir([Path to directory that holds the copy folders]))
#if input() == 'Penzkofer':
#pprint.pprint(os.listdir([Path to Penzkofer folder]))
And here I tried to compare lists, but I get a TypeError: unhashable type: 'list' error
prvotne_datoteke = set(os.listdir(dir))
kopirane_datoteke = set(os.walk([Path to directory that holds the copy folders])
set(prvotne_datoteke).intersection(kopirane_datoteke)
Any help is appreciated. Thank you.

One approach is to print the names of each copied file recipient and the number of recipients, then delete the original file if all intended recipients are included.
to_be_copied = set() # holds original paths of all files being copied
for pattern, dest_dir in user_data:
matching_files = [f for f in files if pattern in f]
for filename in matching_files:
full_filename = os.path.join(dir, filename)
to_be_copied.add(filename) # adds filepaths
if os.path.isfile(full_filename):
if not os.path.exists(dest_dir):
os.makedirs(dest_dir)
shutil.copy(full_filename, dest_dir)
pprint.pprint(shutil.copy(full_filename, dest_dir))
# Iterates through copied files
for original_file in to_be_copied:
count = 0
recipients = []
# Iterates through potential recipients
for pattern, dest_dir in user_data:
complete_name = os.path.join(dest_dir, original_file)
if os.path.isfile(complete_name):
count += 1
recipients.append(pattern)
print(original_file + ' sent to ' + str(count) + ' people:')
print(recipients)
# Quick manual check, could be changed to checking if count/recipients is correct
print('Delete original file? (Y or N): ')
delete = input()
if (delete == 'Y'):
os.remove(os.path.join(dir, original_file))

Concatenating files with matching string in middle of filename

My goal is to concatenate files in a folder based on a string in the middle of the filename, ideally using python or bash. To simplify the question, here is an example:
P16C-X128-22MB-LL_merged_trimmed.fastq
P16C-X128-27MB-LR_merged_trimmed.fastq
P16C-X1324-14DL-UL_merged_trimmed.fastq
P16C-X1324-21DL-LL_merged_trimmed.fastq
I would like to concatenate based on the value after the first dash but before the second (e.g. X128 or X1324), so that I am left with (in this example), two additional files that contain the concatenated contents of the individual files:
P16C-X128-Concat.fastq (concat of 2 files with X128)
P16C-X1324-Concat.fastq (concat of 2 files with X1324)
Any help would be appreciated.

For simple string manipulations, I prefer to avoid the use of regular expressions. I think that str.split() is enough in this case. Besides, for simple file name matching, the library fnmatch provides enough functionality.
import fnmatch
import os
from itertools import groupby
path = '/full/path/to/files/'
ext = ".fastq"
files = fnmatch.filter(os.listdir(path), '*' + ext)
def by(fname): return fname.split('-')[1] # Ej. X128
# You said:
# I would like to concatenate based on the value after the first dash
# but before the second (e.g. X128 or X1324)
# If you want to keep both parts together, uncomment the following:
# def by(fname): return '-'.join(fname.split('-')[:2]) # Ej. P16C-X128
for k, g in groupby(sorted(files, key=by), key=by):
dst = str(k) + '-Concat' + ext
with open(os.path.join(path, dst), 'w') as dstf:
for fname in g:
with open(os.path.join(path, fname), 'r') as srcf:
dstf.write(srcf.read())
Instead of the read, write in Python, you could also delegate the concatenation to the OS. You would normally use a bash command like this:
cat *-X128-*.fastq > X128.fastq
Using the subprocess library:
import subprocess
for k, g in groupby(sorted(files, key=by), key=by):
dst = str(k) + '-Concat' + ext
with open(os.path.join(path, dst), 'w') as dstf:
command = ['cat'] # +++
for fname in g:
command.append(os.path.join(path, fname)) # +++
subprocess.run(command, stdout=dstf) # +++
Also, for a batch job like this one, you should consider placing the concatenated files in a separate directory, but that is easily done by changing the dst filename.

You can use open to read and write (create) files, os.listdir to get all files (and directories) in a certain directory and re to match file name as needed.
Use a dictionary to store contents by filename prefix (the file's name up until 3rd hyphen -) and concatenate the contents together.
import os
import re
contents = {}
file_extension = "fastq"
# Get all files and directories that are in current working directory
for file_name in os.listdir('./'):
# Use '.' so it doesn't match directories
if file_name.endswith('.' + file_extension):
# Match the first 2 hyphen-separated values from file name
prefix_match = re.match("^([^-]+\-[^-]+)", file_name)
file_prefix = prefix_match.group(1)
# Read the file and concatenate contents with previous contents
contents[file_prefix] = contents.get(file_prefix, '')
with open(file_name, 'r') as the_file:
contents[file_prefix] += the_file.read() + '\n'
# Create new file for each file id and write contents to it
for file_prefix in contents:
file_contents = contents[file_prefix]
with open(file_prefix + '-Concat.' + file_extension, 'w') as the_file:
the_file.write(file_contents)

Iterate over 2 files in each folder and compare them

I compare two text files and print out the results to a 3rd file. I am trying to make it so the script i'm running would iterate over all of the folders that have two text files in them, in the CWD of the script.
What i have so far:
import os
import glob
path = './'
for infile in glob.glob( os.path.join(path, '*.*') ):
print('current file is: ' + infile)
with open (f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:
Would this be a good way to start the iteration process?
It's not the most clear code but it gets the job done. However, i'm pretty sure i need to take the logic out of the read / write methods but i'm not sure where to start.
What i'm basically trying to do is have a script iterate over all of the folders in its CWD, open each folder, compare the two text files inside, write a 3rd text file to the same folder, then move on to the next.
Another method i have tried is as follows:
import os
rootDir = 'C:\\Python27\\test'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
And this outputs the following (to give you a better example of the file structure:
Found directory: C:\Python27\test
test.py
Found directory: C:\Python27\test\asdd
asd1.txt
asd2.txt
Found directory: C:\Python27\test\chro
ch1.txt
ch2.txt
Found directory: C:\Python27\test\hway
hw1.txt
hw2.txt
Would it be wise to put the compare logic under the for fname in fileList? How do i make sure it compares the two text files inside the specific folder and not with other fnames in the fileList?
This is the full code that i am trying to add this functionality into. I appologize for the Frankenstein nature of it but i am still working on a refined version but it does not work yet.
from collections import defaultdict
from operator import itemgetter
from itertools import groupby
from collections import deque
import os
class avs_auto:
def load_and_compare(self, input_file1, input_file2, output_file1, output_file2, result_file):
self.load(input_file1, input_file2, output_file1, output_file2)
self.compare(output_file1, output_file2)
self.final(result_file)
def load(self, fileIn1, fileIn2, fileOut1, fileOut2):
with open(fileIn1+'.txt') as fin1, open(fileIn2+'.txt') as fin2:
frame_rects = defaultdict(list)
for row in (map(str, line.split()) for line in fin1):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects[frame].append(id)
frame_rects[frame].append(rect)
frame_rects2 = defaultdict(list)
for row in (map(str, line.split()) for line in fin2):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects2[frame].append(id)
frame_rects2[frame].append(rect)
with open(fileOut1+'.txt', 'w') as fout1, open(fileOut2+'.txt', 'w') as fout2:
for frame, rects in sorted(frame_rects.iteritems()):
fout1.write('{{{}:{}}}\n'.format(frame, rects))
for frame, rects in sorted(frame_rects2.iteritems()):
fout2.write('{{{}:{}}}\n'.format(frame, rects))
def compare(self, fileOut1, fileOut2):
with open(fileOut1+'.txt', 'r') as fin1:
with open(fileOut2+'.txt', 'r') as fin2:
lines1 = fin1.readlines()
lines2 = fin2.readlines()
diff_lines = [l.strip() for l in lines1 if l not in lines2]
diffs = defaultdict(list)
with open(fileOut1+'x'+fileOut2+'.txt', 'w') as result_file:
for line in diff_lines:
d = eval(line)
for k in d:
list_ids = d[k]
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
result_file.write('{0} {1} {2}\n'.format(id_, group[0], group[-1]))
def final(self, result_file):
with open(result_file+'.txt', 'r') as fin:
lines = (line.split() for line in fin)
for k, g in groupby(lines, itemgetter(0)):
fst = next(g)
lst = next(iter(deque(g, 1)), fst)
with open('final/{}.avs'.format(k), 'w') as fout:
fout.write('video0=ImageSource("old\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video1=ImageSource("new\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video0=BilinearResize(video0,640,480)\n')
fout.write('video1=BilinearResize(video1,640,480)\n')
fout.write('StackHorizontal(video0,video1)\n')
fout.write('Subtitle("ID: {}", font="arial", size=30, align=8)'.format(k))
using the load_and_compare() function, i define two input text files, two output text files, a file for the comparison results and a final phase that writes many files for all of the differences.
What i am trying to do is have this whole class run on the current working directory and go through every sub folder, compare the two text files, and write everything into the same folder, specifically the final() results.

You can indeed use os.walk(), since that already separates the directories from the files. You only need the directories it returns, because that's where you're looking for your 2 specific files.
You could also use os.listdir() but that returns directories as well files in the same list, so you would have to check for directories yourself.
Either way, once you have the directories, you iterate over them (for subdir in dirnames) and join the various path components you have: The dirpath, the subdir name that you got from iterating over the list and your filename.
Assuming there are also some directories that don't have the specific 2 files, it's a good idea to wrap the open() calls in a try..except block and thus ignore the directories where one of the files (or both of them) doesn't exist.
Finally, if you used os.walk(), you can easily choose if you only want to go into directories one level deep or walk the whole depth of the tree. In the former case, you just clear the dirnames list by dirnames[:] = []. Note that dirnames = [] wouldn't work, since that would just create a new empty list and put that reference into the variable instead of clearing the old list.
Replace the print("do something ...") with your program logic.
#!/usr/bin/env python
import errno
import os
f1 = "test1"
f2 = "test2"
path = "."
for dirpath, dirnames, _ in os.walk(path):
for subdir in dirnames:
filepath1, filepath2 = [os.path.join(dirpath, subdir, f + ".txt") for f in f1, f2]
try:
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
except IOError as e:
# ignore directiories that don't contain the 2 files
if e.errno != errno.ENOENT:
# reraise exception if different from "file or directory doesn't exist"
raise
# comment the next line out if you want to traverse all subsubdirectories
dirnames[:] = []
Edit:
Based on your comments, I hope I understand your question better now.
Try the following code snippet instead. The overall structure stays the same, only now I'm using the returned filenames of os.walk(). Unfortunately, that would also make it harder to do something like "go only into the subdirectories 1 level deep", so I hope walking the tree recursively is fine with you. If not, I'll have to add a little code to later.
#!/usr/bin/env python
import fnmatch
import os
filter_pattern = "*.txt"
path = "."
for dirpath, dirnames, filenames in os.walk(path):
# comment this out if you don't want to filter
filenames = [fn for fn in filenames if fnmatch.fnmatch(fn, filter_pattern)]
if len(filenames) == 2:
# comment this out if you don't want the 2 filenames to be sorted
filenames.sort(key=str.lower)
filepath1, filepath2 = [os.path.join(dirpath, fn) for fn in filenames]
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
I'm still not really sure what your program logic does, so you will have to interface the two yourself.
However, I noticed that you're adding the ".txt" extension to the file name explicitly all over your code, so depending on how you are going to use the snippet, you might or might not need to remove the ".txt" extension first before handing the filenames over. That would be achieved by inserting the following line after or before the sort:
filenames = [os.path.splitext(fn)[0] for fn in filenames]
Also, I still don't understand why you're using eval(). Do the text files contain python code? In any case, eval() should be avoided and be replaced by code that's more specific to the task at hand.
If it's a list of comma separated strings, use line.split(",") instead.
If there might be whitespace before or after the comma, use [word.strip() for word in line.split(",")] instead.
If it's a list of comma separated integers, use [int(num) for num in line.split(",")] instead - for floats it works analogously.
etc.

FOR loop range to process certain number of files at a time

I have a for loop that runs through a directory and processes the files there, but I'd like to only process a certain number of the files at one time. For example, I have a directory with 1000 files, but I can only process 250 of them a day, so the first time I run the script, it processes the first 250. then the next 250, and so on and so forth.
First, I'm checking the file names against an XML file that records the name of files that have already been synced, so that I don't process them a second time. Then I would like to process the next n files, where I have a variable synclimit = n
I thought about adding the in range statement to the for loop like this:
tree = ET.parse("sync_list.xml")
root = tree.getroot()
synced = [elt.text for elt in root.findall('synced/sfile')]
for filename in os.listdir(filepath) and in range (0, synclimit) :
if fnmatch.fnmatch(filename, '*.txt') and filename not in synced:
filename = os.path.join(filepath, filename)
result = plistlib.readPlist(filename)
But, I'm pretty sure this will only check the first n number of files in the directory each time. Should I add the range statement to the if statement? like:
tree = ET.parse("sync_list.xml")
root = tree.getroot()
synced = [elt.text for elt in root.findall('synced/sfile')]
for filename in os.listdir(filepath):
if fnmatch.fnmatch(filename, '*.txt') and filename not in synced and in range (0, synclimit):
filename = os.path.join(filepath, filename)
result = plistlib.readPlist(filename)
or is there an easier way to do this? Thank you.

Just keep a separate counter and increment that, then test if it has reached synclimit. Simple as that. There is no need to get too clever here:
processed = 0
for filename in os.listdir(filepath):
if not filename.endswith('.txt') or filename in synched:
continue
# process
processed += 1
if processed >= synclimit:
break # done for today.
Alternatively, since os.listdir() returns a list, you could filter it if you have your already synched list of filenames in a set, then slice it down to your maximum size:
synced = set(elt.text for elt in root.findall('synced/sfile'))
to_process = [f for f in os.listdir(filepath) if f.endswith('.txt') and f not in synched]
for filename in to_process[:synclimit]:
# process
Note that I just test for .endswith('.txt') instead of using your simple filematcher; the test comes down to the same thing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generating a list of files - python

Related

Script fails to move the rest of the file due to a file being used by another process

How can I confirm and remove the original files after sorting and copying them into several folders?

Concatenating files with matching string in middle of filename

Iterate over 2 files in each folder and compare them

FOR loop range to process certain number of files at a time

Categories

Resources