Having trouble understanding directory navigation with os.walk

Having trouble understanding directory navigation with os.walk - python

I'm relatively new to python and I'm trying my hand at a weekend project. I want to navigate through my music directories and get the artist name of each music file and export that to a csv so that I can upgrade my music collection (a lot of it is from when I was younger and didn't care about quality).
Anyway, I'm trying to get the path of each music file in its respective directory, so I can pass it to id3 tag reading module to get the artist name.
Here is what I'm trying:
import os
def main():
for subdir, dirs, files in os.walk(dir):
for file in files:
if file.endswith(".mp3") or file.endswith(".m4a"):
print(os.path.abspath(file))
However, .abspath() doesn't do what I think it should. If I have a directory like this:
music
--1.mp3
--2.mp3
--folder
----a.mp3
----b.mp3
----c.mp3
----d.m4a
----e.m4a
and I run my code, I get this output:
C:\Users\User\Documents\python_music\1.mp3
C:\Users\User\Documents\python_music\2.mp3
C:\Users\User\Documents\python_music\a.mp3
C:\Users\User\Documents\python_music\b.mp3
C:\Users\User\Documents\python_music\c.mp3
C:\Users\User\Documents\python_music\d.m4a
C:\Users\User\Documents\python_music\e.m4a
I'm confused why it doesn't show the 5 files being inside of a folder.
Aside from that, am I even going about this in the easiest or best way? Again, I'm new to python so any help is appreciated.

You are passing just the filename to os.path.abspath(), which has no context but your current working directory.
Join the path with the subdir parameter:
print(os.path.join(subdir, file))
From the os.path.abspath() documentation:
On most platforms, this is equivalent to calling the function normpath() as follows: normpath(join(os.getcwd(), path)).
so if your current working directory is C:\Users\User\Documents\python_music all your files are joined relative to that.
But os.walk gives you the correct location to base filenames off instead; from the documentation:
For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
dirpath is a string, the path to the directory. [...] filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
Emphasis mine.

Related

Different File Paths in Python ZipFile Depending on .write() vs .writestr()

I just wanted to ask quickly if the behavior I'm seeing in Python's zipfile module is expected... I wanted to put together a zip archive. For reasons I don't think I need to get into, I was adding some files using zipfile.writestr() and others using .write(). I was writing some files to zip subdirectory called /scripts and others to a zip subdirectory called /data.
For /data, I originally did this:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
zipFile.write(name, f'/data/{root_name}')
This worked fine and produced a working archive that I could extract. So far, so good. To write text files to the /script subdirectory, I used:
zipFile.writestr(f'/script/{scriptname}', fileBytes)
Again, so far so good.
Now it gets odd... I wanted to extract files in /data/. So I looked for paths in zipFile.namelist() starting with /data. My code kept missing the files in /data/, however. Doing some more digging, I noticed that the files written using .writestr had a slash at the start of the zipfile path like this: "/scripts/myscript.py". The files written using .write did not have a slash at the start of the path, so the data file paths looked like this: "data/mydata.pickle".
I changed my code to use .writestr() for the data files:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
with open(name, mode='rb') as extracted_file:
zipFile.writestr(f'/data/{root_name}', extracted_file.read())
Voila, the data files now have slashes at the start of the path. I'm not sure why, however, as I'm providing the same file path either way, and I wouldn't expect using one method versus another would change the paths.
Is this supposed to work this way? Am I missing something obvious here?

First Practice Project in Automate the Boring Stuff with Python, Ch. 9

So my friend and I have been having a problem with the first practice project of the above chapter of Automate the Boring Stuff with Python. The prompt goes: "Write a program that walks through a folder tree and searches for files with a certain file extension (such as .pdf or .jpg). Copy these files from whatever location they are in to a new folder."
To simplify, we are trying to write a program that copies all of the .jpg files out of My Pictures to another directory. Here's our code:
#! python3
# moveFileType looks in My Puctures and copies .jpg files to my Python folder
import os, shutil
def moveFileType(folder):
for folderName, subfolders, filenames in os.walk(folder):
for subfolder in subfolders:
for filename in filenames:
if filename.endswith('.jpg'):
shutil.copy(folder + filename, '<destination>')
moveFileType('<source>')
We keep getting an error along the lines of "FileNotFoundError: [Errno 2] No such file or directory".
Edit: I added a "\" to the end of my source path (I'm not sure if that is what you meant, #Jacob H), and was able to copy all of the .jpg files in that directory, but received an error when it tried to copy a file within a subfolder of that directory. I added a for loop for subfolder in subfolders and I no longer get any errors, but it doesn't actually look in the subfolders for .jpg files.

There is a more fundamental problem with your code. When you use os.walk() it will already loop through every directory for you, so looping manually through the subfolders is going to produce the same results multiple times.
The other, and more immediate, problem is that os.walk() produces relative file names, so you need to glue them back together. Basically you are omitting the directory name and looking in the current directory for files which os.walk() is finding down in a subdirectory somewhere.
Here's a quick attempt at fixing your code:
def moveFileType(folder):
for folderName, subfolders, filenames in os.walk(folder):
for filename in filenames:
if filename.endswith('.jpg'):
shutil.copy(os.path.join(folderName, filename), '<destination>')
Making the function accept a destination parameter as a second argument, instead of hardcoding <destination>, would make it a lot more useful for the future.

Make sure to type the source file destination address correctly. While i tested your code, i wrote
moveFileType('/home/anum/Pictures')
and i got error;
IOError: [Errno 2] No such file or directory:
and when i wrote
moveFileType('/home/anum/Pictures/')
the code worked perfectly...
Try doing that, hope that will do your work. M using Python 2.7
Herez the re defined code for walking into subfolders and copying ,jpg files from there aswell.
import os, shutil
def moveFileType(folder):
for root, dirs, files in os.walk(folder):
for file in files:
if file.endswith('.jpg'):
image_path=os.path.join(root,file) # get the path location of each jpeg image.
print 'location: ',image_path
shutil.copy(image_path, '/home/anum/Documents/Stackoverflow questions')
moveFileType('/home/anum/Pictures/')

Renaming files in Python

I'm doing a Python course on Udacity. And there is a class called Rename Troubles, which presents the following incomplete code:
import os
def rename_files():
file_list = os.listdir("C:\Users\Nick\Desktop\temp")
print (file_list)
for file_name in file_list:
os.rename(file_name, file_name.translate(None, "0123456789"))
rename_files()
As the instructor explains it, this will return an error because Python is not attempting to rename files in the right folder. He then proceeds to check the "current working directory", and then goes on to specify to Python which directory to rename files in.
This makes no sense to me. We are using the for loop to specifically tell Python that we want to rename the contents of file_list, which we have just pointed to the directory we need, in rename_files(). So why does it not attempt to rename in that folder? Why do we still need to figure out cwd and then change it?? The code looks entirely logical without any of that.

Look closely at what os.listdir() gives you. It returns only a list of names, not full paths.
You'll then proceed to os.rename one of those names, which will be interpreted as a relative path, relative to whatever your current working directory is.
Instead of messing with the current working directory, you can os.path.join() the path that you're searching to the front of both arguments to os.rename().

I think your code needs some formatting help.
The basic issue is that os.listdir() returns names relative to the directory specified (in this case an absolute path). But your script can be running from any directory. Manipulate the file names passed to os.rename() to account for this.

Look into relative and absolute paths, listdir returns names relative to the path (in this case absolute path) provided to listdir. os.rename is then given this relative name and unless the app's current working directory (usually the directory you launched the app from) is the same as provided to listdir this will fail.
There are a couple of alternative ways of handling this, changing the current working directory:
os.chdir("C:\Users\Nick\Desktop\temp")
for file_name in os.listdir(os.getcwd()):
os.rename(file_name, file_name.translate(None, "0123456789"))
Or use absolute paths:
directory = "C:\Users\Nick\Desktop\temp"
for file_name in os.listdir(directory):
old_file_path = os.path.join(directory, file_name)
new_file_path = os.path.join(directory, file_name.translate(None, "0123456789"))
os.rename(old_file_path, new_file_path)

You can get a file list from ANY existing directory - i.e.
os.listdir("C:\Users\Nick\Desktop\temp")
or
os.listdir("C:\Users\Nick\Desktop")
or
os.listdir("C:\Users\Nick")
etc.
The instance of the Python interpreter that you're using to run your code is being executed in a directory that is independent of any directory for which you're trying to get information. So, in order to rename the correct file, you need to specify the full path to that file (or the relative path from wherever you're running your Python interpreter).

os.walk() returning '/' instead of actual folder name

I know there are a lot of questions related to this, but I can't seem to find an answer that helps me solve the problem.
I'm using os.walk() to loop through subfolders in my main folder, which contains both folders and files.
Main Folder
Pass Folder
files.txt
Fail Folder
files.txt
file.txt
file2.txt
So I'm using this code to create a new text file based on the subfolder names. However this returns folder/.txt, which means that dirs is returning '/' and files is returning ['file.txt', 'file2.txt'].
for root, dirs, files in os.walk(path):
for dirs in root:
new_txt = 'folder%s.txt' % (dirs)
How do fix it so that dirs returns ['Main Folder/Pass Folder', 'Main Folder/Fail Folder'] and files returns the files in each folder?

I used something similar to this in my code recently (which, if I recall correctly, I also found on SO). Mine went something like this:
for (dirpath, subdirs, filelist) in os.walk(folder):
# join directories in here
From the documentation:
dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
I'm not sure os.walk() does quite what you expect. I would suggest joining the directories together using os.path.join() to get what you want.

recursive script to rename folders ending with a space or period

We just switched over our storage server to a new file system. The old file system allowed users to name folders with a period or space at the end. The new system considers this an illegal character. How can I write a python script to recursively loop through all directories and rename and folder that has a period or space at the end?

Use os.walk. Give it a root directory path and it will recursively iterate over it. Do something like
for root, dirs, files in os.walk('root path'):
for dir in dirs:
if dir.endswith(' ') or dir.endswith('.'):
os.rename(...)
EDIT:
We should actually rename the leaf directories first - here is the workaround:
alldirs = []
for root, dirs, files in os.walk('root path'):
for dir in dirs:
alldirs.append(os.path.join(root, dir))
# the following two lines make sure that leaf directories are renamed first
alldirs.sort()
alldirs.reverse()
for dir in alldirs:
if ...:
os.rename(...)

You can use os.listdir to list the folders and files on some path. This returns a list that you can iterate through. For each list entry, use os.path.join to combine the file/folder name with the parent path and then use os.path.isdir to check if it is a folder. If it is a folder then check the last character's validity and, if it is invalid, change the folder name using os.rename. Once the folder name has been corrected, you can repeat the whole process with that folder's full path as the base path. I would put the whole process into a recursive function.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Having trouble understanding directory navigation with os.walk - python

Related

Different File Paths in Python ZipFile Depending on .write() vs .writestr()

First Practice Project in Automate the Boring Stuff with Python, Ch. 9

Renaming files in Python

os.walk() returning '/' instead of actual folder name

recursive script to rename folders ending with a space or period

Categories

Resources