Python to go through multiple folders and process files inside them

Python to go through multiple folders and process files inside them - python

I have multiple folders than contain about 5-10 files each. What I am trying to do is go to the next folder when finishing processing files from the previous folders and start working on the new files. I have this code:
for root, dirs, files in os.walk("Training Sets"): #Path that contains folders
for i in dirs: #if I don't have this, an error is shown in line 4 that path needs to be str and not list
for file in i: #indexing files inside the folders
path = os.path.join(i, files) #join path of the files
dataset = pd.read_csv(path, sep='\t', header = None) #reading the files
trainSet = dataset.values.tolist() #some more code
editedSet = dataset.values.tolist() #some more code
#rest of the code...
The problem is that it doesn't do anything. Not even printing if I add prints for debugging.

First off, be sure that you are in the correct top-level directory (i.e. the one containing "Training Sets". You can check this with os.path.abspath(os.curdir). Otherwise, the code does nothing since it does not find the directory to walk.
os.walk does the directory walking for you. The key is understanding root (the path to the current directory), dirs (a list of subdirectories) and files (a list of files in the current directory). You don't actually need dirs.
So your code is two loops:
>>> for root, dirs, files in os.walk("New Folder1"): #Path that contains folders
... for file in files: #indexing files inside the folders
... path = os.path.join(root, file) #join path of the files
... print(path) # Your code here
...
New Folder1\New folder1a\New Text Document.txt
New Folder1\New folder1b\New Text Document2.txt

Related

Python: Finding files in directory but ignoring folders and their contents

So my program search_file.py is trying to look for .log files in the directory it is currently placed in. I used the following code to do so:
import os
# This is to get the directory that the program is currently running in
dir_path = os.path.dirname(os.path.realpath(__file__))
# for loop is meant to scan through the current directory the program is in
for root, dirs, files in os.walk(dir_path):
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
My current directory is as follows:
search_file.py
sample_1.log
sample_2.log
extra_file (this is a folder)
And within the extra_file folder we have:
extra_sample_1.log
extra_sample_2.log
Now, when the program runs and prints the files out it also takes into account the .log files in the extra_file folder. But I do not want this. I only want it to print out sample_1.log and sample_2.log. How would I approach this?

Try this:
import os
files = os.listdir()
for file in files:
if file.endswith('.log'):
print(file)
The problem in your code is os.walk traverses the whole directory tree and not just your current directory. os.listdir returns a list of all filenames in a directory with the default being your current directory which is what you are looking for.
os.walk documentation
os.listdir documentation

By default, os.walk does a root-first traversal of the tree, so you know the first emitted data is the good stuff. So, just ask for the first one. And since you don't really care about root or dirs, use _ as the "don't care" variable name
# get root files list.
_, _, files = next(os.walk(dir_path))
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
Its also common to use glob:
from glob import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in glob(os.path.join(dir_path, "*.log")):
print(file)
This runs the risk that there is a directory that ends in ".log", so you could also add a testing using os.path.isfile(file).

Unzipping a file with subfolders into the same directory without creating an extra folder

I hope I don't duplicate here, but I didn't find a solution until now since the answers don't include subfolders. I have a zipfile that contains a folder which contains files and subfolders.
I want to extract the files within the folder (my_folder) and the subfolder to a specific path: Users/myuser/Desktop/another . I want only files and subfolders in the another dir. With my current code what happens it that a directory my_folder is created in which my files and subfolders are placed. But I don't want that directory created. This is what I am doing:
with zipfile.ZipFile("Users/myuser/Desktop/another/my_file.zip", "r") as zip_ref:
zip_ref.extractall(Users/myuser/Desktop/another)
I tried listing all the zipfiles within the folder and extracting them manually:
with ZipFile('Users/myuser/Desktop/another/myfile.zip', 'r') as zipObj:
# Get a list of all archived file names from the zip
listOfFileNames = zipObj.namelist()
for fileName in new_list_of_fn:
print(fileName)
zipObj.extract(fileName, 'Users/myuser/Desktop/another/')
This yields the same result. I the tried create a new list, stripping the names so that they don't include the name of the folder anymore but then it tells me that there is no item named xyz in the archive.
Finally I leveraged those two questions/code (extract zip file without folder python and Extract files from zip without keeping the structure using python ZipFile?) and this works, but only if there are no subfolders involved. If there are subfolders it throws me the error FileNotFoundError: [Errno 2] No such file or directory: ''. What I want though is that the files in the subdirectory get extracted to the subdirectory.
I can only use this code if I skip all directories:
my_zip = Users/myuser/Desktop/another/myfile.zip
my_dir = Users/myuser/Desktop/another/
with zipfile.ZipFile(my_zip, 'r') as zip_file:
for member in zip_file.namelist():
filename = os.path.basename(member)
print(filename)
# skip directories
if not filename:
continue
# copy file (taken from zipfile's extract)
source = zip_file.open(member)
target = open(os.path.join(my_dir, filename), "wb")
with source, target:
shutil.copyfileobj(source, target)
So I am looking for a way to do this which would also extract subdirs to their respective dir. That means I want a structure within /Users/myuser/Desktop/another:
-file1
-file2
-file3
...
- subfolder
-file1
-file2
-file3
...
I have the feeling this must be doable with shututil but don't really know how....
Is there a way I can do this? Thanks so much for any help. Very much appreciated.

Python Iterate over Folders and combine csv files inside

Windows OS - I've got several hundred subdirectories and each subdirectory contains 1 or more .csv files. All the files are identical in structure. I'm trying to loop through each folder and concat all the files in each subdirectory into a new file combining all the .csv files in that subdirectory.
example:
folder1 -> file1.csv, file2.csv, file3.csv -->> file1.csv, file2.csv, file3.csv, combined.csv
folder2 -> file1.csv, file2.csv -->> file1.csv, file2.csv, combined.csv
Very new to coding and getting lost in this. Tried using os.walk but completely failed.

The generator produced by os.walk yields three items each iteration: the path of the current directory in the walk, a list of paths representing sub directories that will be traversed next, and a list of filenames contained in the current directory.
If for whatever reason you don't want to walk certain file paths, you should remove entries from what I called sub below (the list of sub directories contained in root). This will prevent os.walk from traversing any paths you removed.
My code does not prune the walk. Be sure to update this if you don't want to traverse an entire file subtree.
The following outline should work for this although I haven't been able to test this on Windows. I have no reason to think it'll behave differently.
import os
import sys
def write_files(sources, combined):
# Want the first header
with open(sources[0], 'r') as first:
combined.write(first.read())
for i in range(1, len(sources)):
with open(sources[i], 'r') as s:
# Ignore the rest of the headers
next(s, None)
for line in s:
combined.write(line)
def concatenate_csvs(root_path):
for root, sub, files in os.walk(root_path):
filenames = [os.path.join(root, filename) for filename in files
if filename.endswith('.csv')]
combined_path = os.path.join(root, 'combined.csv')
with open(combined_path, 'w+') as combined:
write_files(filenames, combined)
if __name__ == '__main__':
path = sys.argv[1]
concatenate_csvs(path)

Skip a certain folder when using os.walk

Here is my code:
rootdir_path_without_slash = '/home/winpc/Downloads/Prageeth/backups/Final/node-wen-app'
rootdir_path_with_slash= '/home/winpc/Downloads/Prageeth/backups/Final/node-wen-app/'
dir_src = (rootdir_path_with_slash)
for subdir, dirs, files in os.walk(rootdir_path_without_slash):
for file in files:
file_name=os.path.join(subdir, file)
if file_name.endswith('.html'):
print file_name
Here this code navigate all the sub directories from the given source directory for searching .html file.I need to skip if node modules folder found.Please help me.

You'll need to put an if condition on the root directory, to avoid traversing node_modules or any of its descendants. You'll want:
for subdir, dirs, files in os.walk(rootdir_path_without_slash):
if 'node_modules' in subdir:
continue
... # rest of your code
Also, subdir here is a misnomer, the first argument os.walk returns is the root path.

Finding correct path to files in subfolders with os.walk with python?

I am trying to create a program that copies files with certain file extension to the given folder. When files are located in subfolders instead of the root folder the program fails to get correct path. In its current state the program works perfectly for the files in the root folder, but it crashes when it finds matching items in subfolders. The program tries to use rootfolder as directory instead of the correct subfolder.
My code is as follows
# Selective_copy.py walks through file tree and copies files with
# certain extension to give folder
import shutil
import os
import re
# Deciding the folders and extensions to be targeted
# TODO: user input instead of static values
extension = "zip"
source_folder = "/Users/viliheikkila/documents/kooditreeni/"
destination_folder = "/Users/viliheikkila/documents/test"
def Selective_copy(source_folder):
# create regex to identify file extensions
mo = re.compile(r"(\w+).(\w+)") # Group(2) represents the file extension
for dirpath, dirnames, filenames in os.walk(source_folder):
for i in filenames:
if mo.search(i).group(2) == extension:
file_path = os.path.abspath(i)
print("Copying from " + file_path + " to " + destination_folder)
shutil.copy(file_path, destination_folder)
Selective_copy(source_folder)

dirpath is one of the things provided by walk for a reason: it gives the path to the directory that the items in files is located in. You can use that to determine the subfolder you should be using.

file_path = os.path.abspath(i)
This line is blatantly wrong.
Keep in mind that filenames keeps list of base file names. At this point it's just a list of strings and (technically) they are not associated at all with files in filesystem.
os.path.abspath does string-only operations and attempts to merge file name with current working dir. As a result, merged filename points to file that does not exist.
What should be done is merge between root and base file name (both values yield from os.walk):
file_path = os.path.abspath(dirpath, i)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python to go through multiple folders and process files inside them - python

Related

Python: Finding files in directory but ignoring folders and their contents

Unzipping a file with subfolders into the same directory without creating an extra folder

Python Iterate over Folders and combine csv files inside

Skip a certain folder when using os.walk

Finding correct path to files in subfolders with os.walk with python?

Categories

Resources