Compare file to list and copy accordingly - python

I have a script that I am using to compare and sort the files in two directories. I am currently trying to compare all of the files in one directory to a list of files in the other, and then copy those files into a "match" or "unique" directory.
I've managed to match the file name against the list and then copy the file, but I can't quite get it to copy that file into a target directory while keeping the name.
Here is what I have:
input2_only = [file1.mp3, file2.mp3, etc]
for root, dirs, files in os.walk("input2", topdown=False):
for filename in files:
print(filename)
if filename in input2_only:
print('yay')
shutil.copy(os.path.join(root, filename), "outputs")
I think there is something that I can change in the shutil line to make this work, but every tweak I've tried so far has lead to heartache. Just to be clear, in this snippet I want it to copy the file being compared against the list to a directory called "outputs". Once I can do that I'm reasonably confident I can fill in the rest of the logic.
thanks!

OK, figured this out and posting in case it is helpful to someone else. The key is creating the output target before calling shutil. The correct snippet is:
for root, dirs, files in os.walk("input2", topdown=False):
for filename in files:
print(filename)
if filename in input2_only:
print('yay')
out = "outputs/" + filename
shutil.copy(os.path.join(root, filename), out)
that "out" line makes the target of the shutil output the name of the file in the folder "outputs"

Related

Python: Finding files in directory but ignoring folders and their contents

So my program search_file.py is trying to look for .log files in the directory it is currently placed in. I used the following code to do so:
import os
# This is to get the directory that the program is currently running in
dir_path = os.path.dirname(os.path.realpath(__file__))
# for loop is meant to scan through the current directory the program is in
for root, dirs, files in os.walk(dir_path):
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
My current directory is as follows:
search_file.py
sample_1.log
sample_2.log
extra_file (this is a folder)
And within the extra_file folder we have:
extra_sample_1.log
extra_sample_2.log
Now, when the program runs and prints the files out it also takes into account the .log files in the extra_file folder. But I do not want this. I only want it to print out sample_1.log and sample_2.log. How would I approach this?
Try this:
import os
files = os.listdir()
for file in files:
if file.endswith('.log'):
print(file)
The problem in your code is os.walk traverses the whole directory tree and not just your current directory. os.listdir returns a list of all filenames in a directory with the default being your current directory which is what you are looking for.
os.walk documentation
os.listdir documentation
By default, os.walk does a root-first traversal of the tree, so you know the first emitted data is the good stuff. So, just ask for the first one. And since you don't really care about root or dirs, use _ as the "don't care" variable name
# get root files list.
_, _, files = next(os.walk(dir_path))
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
Its also common to use glob:
from glob import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in glob(os.path.join(dir_path, "*.log")):
print(file)
This runs the risk that there is a directory that ends in ".log", so you could also add a testing using os.path.isfile(file).

Select files from specific directories

I am trying to loop through a list of subdirectories, and perform two related operations:
Only select subdirectories that match a certain pattern, and save part of that name
Read a file in that subdirectory
I have tried adapting the answers in this question but am having trouble opening only certain subdirectories. I know I can do this recursively, where I loop through every file, and pull its parent directory using Path.parent, but this would also go into the directories I am not interested in.
My file structure looks like:
002normal
|- names.txt
|- test.txt
002custom
|- names.txt
|- test.txt
I would like only the directories ending in "normal". I'll then read the file named "names.txt" in that directory. I have tried something like the below, without luck.
import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
for f in files:
print(subdir)
You can modify the dirs list in-place to filter out any subdirectories with names not ending with 'normal' so that os.walk won't traverse into them:
for subdir, dirs, files in os.walk(root_dir):
dirs[:] = (name for name in dirs if name.endswith('normal'))
if 'names.txt' in files:
with open(os.path.join(subdir, 'names.txt')) as file:
print(os.path.basename(subdir), file.read())
Excerpt from the documentation of os.walk:
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting, or even
to inform walk() about directories the caller creates or renames
before it resumes walk() again.
import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
if str(subdir).endswith("normal"):
for file in files:
if str(file).startswith("names"):
print(os.path.basename(subdir), file)
f = open(os.path.join(root_dir,subdir,file), "r")
print(f.read())
That's how you can do it with your file structure. First you check if any subdir ends with "normal" and if it does you can check the content in the file. Also you have to build the path to the file so that you can read the file with os.path.join
In case you have multiple subdirectories of unknown depth you have to do something with while, but as long as the directory which contains names.txt ends with normal it works.

How to delete all files inside a main folder with many subfolders?

I want to delete only the files, not the folder and subfolders?
Tried this but I dont want to give examples of characters in a condition.
for i in glob('path'+ '**/*',recursive = True):
if '.' in i:
os.remove(i)
I don't like this because some folder names have '.' in the name. Also there are many types of files there so making a list and check those in a list would not be efficient. What ways do you suggest?
You can use os.walk:
import os
for root, _, files in os.walk('path'):
for file in files:
os.remove(os.path.join(root, file))
Try something like that:
def get_file_paths(folder_path):
paths = []
for root, directories, filenames in os.walk(folder_path):
for filename in filenames:
paths.append(os.path.join(root, filename))
return paths

iterate a directory and find only file whose names start with the certain string

I have a directory path and in this path there are several folders. So i am am trying to build a script which would find all the xml files and the file name must start with report. I have been so far able to iterate over all the directories but further i do not know how to proceed. Here is my code:
def search_xml_report(rootdir):
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print os.path.join(subdir,file) # print statement just for testing
You can use str.startswith:
def search_xml_report(rootdir):
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if file.startswith('report'):
yield subdir, file
use str.startswith with os.path.splitext
os.path.splitext: Split the extension from a pathname. Extension is everything from the last dot to the end, ignoring leading dots. Returns "(root, ext)"; ext may be empty.
if file.startswith('report') and os.path.splitext(filepath+filename)[-1] == '.xml':
return file

How to copy folder structure under another directory?

I have some questions related to copying a folder structure. In fact, I need to do a conversion of pdf files to text files. Hence I have such a folder structure for the place where I import the pdf:
D:/f/subfolder1/subfolder2/a.pdf
And I would like to create the exact folder structure under "D:/g/subfolder1/subfolder2/" but without the pdf file since I need to put at this place the converted text file. So after the conversion function it gives me
D:/g/subfolder1/subfolder2/a.txt
And also I would like to add if function to make sure that under "D:/g/" the same folder structure does not exist before creating.
Here is my current code. So how can I create the same folder structure without the file?
Thank you!
import converter as c
import os
inputpath = 'D:/f/'
outputpath = 'D:/g/'
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
with open("D:/g/"+ ,mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
For me the following works fine:
Iterate over existing folders
Build the structure for the new folders based on existing ones
Check, if the new folder structure does not exist
If so, create new folder without files
Code:
import os
inputpath = 'D:/f/'
outputpath = 'D:/g/'
for dirpath, dirnames, filenames in os.walk(inputpath):
structure = os.path.join(outputpath, dirpath[len(inputpath):])
if not os.path.isdir(structure):
os.mkdir(structure)
else:
print("Folder does already exits!")
Documentation:
os.walk
os.mkdir
os.path.isdir
How about using shutil.copytree()?
import shutil
def ig_f(dir, files):
return [f for f in files if os.path.isfile(os.path.join(dir, f))]
shutil.copytree(inputpath, outputpath, ignore=ig_f)
The directory you want to create should not exist before calling this function. You can add a check for that.
Taken from shutil.copytree without files
A minor tweak to your code for skipping pdf files:
for root, dirs, files in os.walk('.', topdown=False):
for name in files:
if name.find(".pdf") >=0: continue
with open("D:/g/"+ ,mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

Categories

Resources