Python os.walk from current directory

Python os.walk from current directory - python

How can I edit this script, that it will run from the current directory. If I run the script as it is now, I get the error that it can not find the files that I have specified. My feeling is that os.walk is not searching in the subfolders of the current directory. I do not want to specify the path name, since I want to run this script in different directories.
To sum up; please help me to change this script, that it will run from the current directory and find the files that are in the subfolders of the current directory. Thanks!
import os
import csv
from itertools import chain
from collections import defaultdict
for root, dirs, files in os.walk('.'):
d1 = {}
with open (os.path.join(root, 'genes.gff.genespercontig.csv'), 'r') as f1:
for line in f1:
ta = line.split()
d1[ta[1]] = int(ta[0])
d2 = {}
with open(os.path.join(root, 'hmmer.analyze.txt.result.txt'), 'r') as f2:
for line in f2:
tb = line.split()
d2[tb[1]] = int(tb[0])
d3 = defaultdict(list)
for k, v in chain(d1.items(), d2.items()):
d3[k].append(v)
with open(os.path.join(root, 'output_contigsvsgenes.csv'), 'w+') as fnew:
writer = csv.writer(fnew)
for k,v in d3.items():
writer.writerow([k] + v)

import os
os.getcwd() #return the current working directory
so in your case the loop changes to :
for root, dirs, files in os.walk(os.getcwd()):
In your case you might also have to check whether the file exists or not :
if os.path.isfile(os.path.join(root, 'genes.gff.genespercontig.csv')):
with open (os.path.join(root, 'genes.gff.genespercontig.csv'), 'r') as f1:
for line in f1:
ta = line.split()
d1[ta[1]] = int(ta[0])
similarly for all other with as statements

I don't think the issue is working from the current directory, I think the issue is with the way you're using os.walk. You should check that the files exist before you start playing with them, and I think the error might occur because the first root folder is the current working directory. We can rearrange it into a function though, as follows:
import os
import csv
from itertools import chain
from collections import defaultdict
def get_file_values(find_files, output_name):
for root, dirs, files in os.walk(os.getcwd()):
if all(x in files for x in find_files):
outputs = []
for f in find_files:
d = {}
with open(os.path.join(root, f), 'r') as f1:
for line in f1:
ta = line.split()
d[ta[1]] = int(ta[0])
outputs.append(d)
d3 = defaultdict(list)
for k, v in chain(*(d.items() for d in outputs)):
d3[k].append(v)
with open(os.path.join(root, output_name), 'w+') as fnew:
writer = csv.writer(fnew)
for k, v in d3.items():
writer.writerow([k] + v)
get_file_values(['genes.gff.genespercontig.csv', 'hmmer.analyze.txt.result.txt'], 'output_contigsvsgenes.csv')
Not having your data I have been unable to test this, though I think it should work.
EDIT
To get the folder included in each row of the output csv files, we can just change our call to writer.writerow a little, to:
writer.writerow([root, k] + v)
Thus, the first column of each csv file created contains the name of the folder the values were obtained from.

You could use os.getcwd() to get the current directory (the one you're in when calling your script), but the better would be to pass the target directory as argument.

Within a Python script there are many options allowing deep retrospection for better orientation about the environment in which the script is running. The current directory is available via
os.getcwd()
You have in comments suggested, that the files to work on are not in the current directory but in the subdirectories. In this case adjust your script like this (move the entire block of your loop one level deeper into for dir in dirs: and adjust os.path.join() accordingly):
for root, dirs, files in os.walk(os.getcwd()):
for dir in dirs:
print(os.path.join(root, dir, 'genes.gff.genespercontig.csv'))
Just for the fun of it, below a short overview of some other useful insights into the environment a Python script runs within:
import __future__
import os, sys
print( "Executable running THIS script : { " + sys.executable + " }" )
print( "Full path file name of THIS script: { " + os.path.realpath(__file__) + " }" )
print( "Full path directory to THIS script: { " + os.path.dirname(os.path.abspath(__file__)) + " }" )
print( "Current working directory : { " + os.getcwd() + " }" )
print( "Has THIS file started Python? : { " + { True: "Yes", False: "No" }[(__name__ == "__main__")] + " }" )
print( "Which Python version is running? : { " + sys.version.replace("\n", "") + " }" )
print( "Which operating system is there? : { " + sys.platform + " }" )

Related

Python code to merge multiple .wav files from multiple folders gets hung up

I have a bunch of wave files from an outdoor bird recorder that are broken up into 1 hour segments. Each days worth of audio is in a single folder and I have 30 days worth of folders. I am trying to iterate through the folders an merge each days audio into one file and export it with the folder name but each time i try to run it the print statements indicate that each for loop runs to completion before the merge function can be called, or it runs properly and the merge funtion throws a write error.
import wave
import os
#creates an empty object for the first folder name
rootfiles= ""
#sets the path for the starting location
path = "I:\SwiftOne_000"
#lists all folders in the directory "path"
dir_list = os.listdir(path)
print("Files and directories in '", path, "' :")
#iterates through folders in path
for i in dir_list:
#adds file name to original path
rootfiles = ( path + "\\" + i)
prefix = i
# define outfiles for waves
out_name = prefix
print("first loop completed")
for x in rootfiles:
myfiles= []
paths = rootfiles
ext = (".wav")
#print(paths)
dir_lists = os.listdir(paths)
#print(dir_lists)
#print("Files and directories in '", paths, "' :")
print("second loop completed")
for x in dir_lists:
myfiles.append( paths + "\\" + x)
#print (myfiles)
outfile= "D:\SwiftD\prefix" + prefix + ".wav"
wav_files = myfiles
print("third loop completed")
from contextlib import closing
with closing(wave.open(outfile, 'wb')) as output:
# find sample rate from first file
with closing(wave.open(wav_files[0])) as w:
output.setparams(w.getparams())
# write each file to output
for infile in wav_files:
with closing(wave.open(infile)) as w:
output.writeframes(w.readframes(w.getnframes()))

I think you want something like this, assuming your folder structure is:
- Swift (directory)
- Day1 (directory)
- File1
- File2
- File3
import os, wave
src = r'I:\SwiftOne_000'
output_folder = r'I:\OutputFolder'
input_data = {}
for d_name, d_path in [(d, path) for d in os.listdir(src) if os.path.isdir(path := os.path.join(src, d))]:
input_data[d_name] = [path for f in os.listdir(d_path) if f.lower().endswith('.wav') and os.path.isfile(path := os.path.join(d_path, f))]
print(input_data)
for d_name, paths in input_data.items():
with wave.open(os.path.join(output_folder, f'{d_name}.wav'), 'wb') as output:
params_written = False
for path in paths:
with wave.open(path, 'rb') as data:
if not params_written:
output.setparams(data.getparams())
params_written = True
output.writeframes(data.readframes(data.getnframes()))
There are a few issues with your code. It better to use os.path.join to concatentate paths rather than constructing the string yourself as it makes it platform independent (although you probably don't care). os.listdir will return files and folders so you should check the type with os.path.isfile or os.path.isdir to be sure. The case for the file extension isn't always in lower case so your extension check might not work; using .lower() means you can always check for .wav.
I'm pretty sure you don't need contentlib closing as the with block will already take care of this for you.
You are using the outfile variable to write to the file, however, you overwrite this each time you loop around the third loop, so you will only ever get one file corresponding to the last directory.
Without seeing the stack trace, I'm not sure what the write error is likely to be.

Defining the directory to save outputs in Python for text files

I would like to save my outputs (which are text files) from a Python script to a different directory. Following is my code to define the directory for my outputs:
headers = list(uniq)
output_headers = open(save_path, "headers_dash1" + "_" + str(count) + ".raw", "w")
for item in headers:
output_headers.write("%s\n" % item)
How can I generate these outputs to a different fixed directory like:
D:\Test\"The file name"

For instance, you can use the join function from os.path (docs)
import os
output_dir = "Test"
files = ["file1", "file2", "file3"]
for f in files:
open(os.path.join(output_dir, f)) #this combines the path into Test/file1, etc.
#write etc here

You can use os.path.join(path, filename) to create the desired path.
os.path.join('path/to','file.jpg')

Run multiple find and replaces in Python (on every file in the folder and sub-folder)

I have a folder (courses) with sub-folders and a random number of files. I want to run multiple search and replaces on those random files. Is it possible to do a wild card search for .html and have the replaces run on every html file ?
Search and replaces:
1) "</b>" to "</strong>"
2) "</a>" to "</h>"
3) "<p>" to "</p>"
Also all these replaces have to be run on every file in the folder and sub-folders.
Thank you so much

Try this,
import os
from os.path import walk
mydict = {"</b>":"</strong>", "</a>":"</h>", "<p>":"</p>"}
for (path, dirs, files) in os.walk('./'):
for f in files:
if f.endswith('.html'):
filepath = os.path.join(path,f)
s = open(filepath).read()
for k, v in mydict.iteritems():
s = s.replace(k, v)
f = open(filepath, 'w')
f.write(s)
f.close()
You can change os.walk('./') to os.walk('/anyFolder/')

Use the glob module to get a list of *.html files.

Moving files by starting letter in powershell, python or other scripting language running windows

I need a script than can recursively traverse c:\somedir\ and move files to c:\someotherdir\x\ - where x is the starting letter of the file.
Can anyone help?
Ended up with this one:
import os
from shutil import copy2
import uuid
import random
SOURCE = ".\\pictures\\"
DEST = ".\\pictures_ordered\\"
for path, dirs, files in os.walk(SOURCE):
for f in files:
print(f)
starting_letter = f[0].upper()
source_path = os.path.join(path, f)
dest_path = os.path.join(DEST, starting_letter)
if not os.path.isdir(dest_path):
os.makedirs(dest_path)
dest_fullfile = os.path.join(dest_path, f)
if os.path.exists(dest_fullfile):
periodIndex = source_path.rfind(".")
renamed_soruce_path = source_path[:periodIndex] + "_" + str(random.randint(100000, 999999)) + source_path[periodIndex:]
os.rename(source_path, renamed_soruce_path)
copy2(renamed_soruce_path, dest_path)
os.remove(renamed_soruce_path)
else:
copy2(source_path, dest_path)
os.remove(source_path)`

Here's a simple script that does what you want. It doesn't tell you anything about what it's doing, and will just overwrite the old file if there are two files with the same name.
import os
from shutil import copy2
SOURCE = "c:\\source\\"
DEST = "c:\\dest\\"
# Iterate recursively through all files and folders under the source directory
for path, dirs, files in os.walk(SOURCE):
# For each directory iterate over the files
for f in files:
# Grab the first letter of the filename
starting_letter = f[0].upper()
# Construct the full path of the current source file
source_path = os.path.join(path, f)
# Construct the destination path using the first letter of the
# filename as the folder
dest_path = os.path.join(DEST, starting_letter)
# Create the destination folder if it doesn't exist
if not os.path.isdir(dest_path):
os.makedirs(dest_path)
# Copy the file to the destination path + starting_letter
copy2(source_path, dest_path)

I suspect this will work in PowerShell.
gci -path c:\somedir -filter * -recurse |
where { -not ($_.PSIsContainer) } |
foreach { move-item -path $_.FullName -destination $_.Substring(0, 1) }

ls c:\somedir\* -recurse | ? { -not ($_.PSIsContainer)} | mv -destination "C:\someotherdir\$($_.Name.substring(0,1))" } ... -whatif :P

Here's an answer in Python, note that warning message, you may want to deal with overwrites differently. Also save this to a file in the root directory and run it there, otherwise you have to change the argument to os.walk and also how the paths are joined together.
import os
import sys
try:
letter = sys.argv[1]
except IndexError:
print 'Specify a starting letter'
sys.exit(1)
try:
os.makedirs(letter)
except OSError:
pass # already exists
for dirpath, dirnames, filenames in os.walk('.'):
for filename in filenames:
if filename.startswith(letter):
src = os.path.join(dirpath, filename)
dst = os.path.join(letter, filename)
if os.path.exists(dst):
print 'warning, existing', dst, 'being overwritten'
os.rename(src, dst)

sure, I'll help: look at os.path.walk in Python2, which I believe is simply os.walk in Python3.

Python recursive folder read

I have a C++/Obj-C background and I am just discovering Python (been writing it for about an hour).
I am writing a script to recursively read the contents of text files in a folder structure.
The problem I have is the code I have written will only work for one folder deep. I can see why in the code (see #hardcoded path), I just don't know how I can move forward with Python since my experience with it is only brand new.
Python Code:
import os
import sys
rootdir = sys.argv[1]
for root, subFolders, files in os.walk(rootdir):
for folder in subFolders:
outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path
folderOut = open( outfileName, 'w' )
print "outfileName is " + outfileName
for file in files:
filePath = rootdir + '/' + file
f = open( filePath, 'r' )
toWrite = f.read()
print "Writing '" + toWrite + "' to" + filePath
folderOut.write( toWrite )
f.close()
folderOut.close()

Make sure you understand the three return values of os.walk:
for root, subdirs, files in os.walk(rootdir):
has the following meaning:
root: Current path which is "walked through"
subdirs: Files in root of type directory
files: Files in root (not in subdirs) of type other than directory
And please use os.path.join instead of concatenating with a slash! Your problem is filePath = rootdir + '/' + file - you must concatenate the currently "walked" folder instead of the topmost folder. So that must be filePath = os.path.join(root, file). BTW "file" is a builtin, so you don't normally use it as variable name.
Another problem are your loops, which should be like this, for example:
import os
import sys
walk_dir = sys.argv[1]
print('walk_dir = ' + walk_dir)
# If your current working directory may change during script execution, it's recommended to
# immediately convert program arguments to an absolute path. Then the variable root below will
# be an absolute path as well. Example:
# walk_dir = os.path.abspath(walk_dir)
print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))
for root, subdirs, files in os.walk(walk_dir):
print('--\nroot = ' + root)
list_file_path = os.path.join(root, 'my-directory-list.txt')
print('list_file_path = ' + list_file_path)
with open(list_file_path, 'wb') as list_file:
for subdir in subdirs:
print('\t- subdirectory ' + subdir)
for filename in files:
file_path = os.path.join(root, filename)
print('\t- file %s (full path: %s)' % (filename, file_path))
with open(file_path, 'rb') as f:
f_content = f.read()
list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))
list_file.write(f_content)
list_file.write(b'\n')
If you didn't know, the with statement for files is a shorthand:
with open('filename', 'rb') as f:
dosomething()
# is effectively the same as
f = open('filename', 'rb')
try:
dosomething()
finally:
f.close()

If you are using Python 3.5 or above, you can get this done in 1 line.
import glob
# root_dir needs a trailing slash (i.e. /root/dir/)
for filename in glob.iglob(root_dir + '**/*.txt', recursive=True):
print(filename)
As mentioned in the documentation
If recursive is true, the pattern '**' will match any files and zero or more directories and subdirectories.
If you want every file, you can use
import glob
for filename in glob.iglob(root_dir + '**/**', recursive=True):
print(filename)

Agree with Dave Webb, os.walk will yield an item for each directory in the tree. Fact is, you just don't have to care about subFolders.
Code like this should work:
import os
import sys
rootdir = sys.argv[1]
for folder, subs, files in os.walk(rootdir):
with open(os.path.join(folder, 'python-outfile.txt'), 'w') as dest:
for filename in files:
with open(os.path.join(folder, filename), 'r') as src:
dest.write(src.read())

TL;DR: This is the equivalent to find -type f to go over all files in all folders below and including the current one:
for currentpath, folders, files in os.walk('.'):
for file in files:
print(os.path.join(currentpath, file))
As already mentioned in other answers, os.walk() is the answer, but it could be explained better. It's quite simple! Let's walk through this tree:
docs/
└── doc1.odt
pics/
todo.txt
With this code:
for currentpath, folders, files in os.walk('.'):
print(currentpath)
The currentpath is the current folder it is looking at. This will output:
.
./docs
./pics
So it loops three times, because there are three folders: the current one, docs, and pics. In every loop, it fills the variables folders and files with all folders and files. Let's show them:
for currentpath, folders, files in os.walk('.'):
print(currentpath, folders, files)
This shows us:
# currentpath folders files
. ['pics', 'docs'] ['todo.txt']
./pics [] []
./docs [] ['doc1.odt']
So in the first line, we see that we are in folder ., that it contains two folders namely pics and docs, and that there is one file, namely todo.txt. You don't have to do anything to recurse into those folders, because as you see, it recurses automatically and just gives you the files in any subfolders. And any subfolders of that (though we don't have those in the example).
If you just want to loop through all files, the equivalent of find -type f, you can do this:
for currentpath, folders, files in os.walk('.'):
for file in files:
print(os.path.join(currentpath, file))
This outputs:
./todo.txt
./docs/doc1.odt

The pathlib library is really great for working with files. You can do a recursive glob on a Path object like so.
from pathlib import Path
for elem in Path('/path/to/my/files').rglob('*.*'):
print(elem)

import glob
import os
root_dir = <root_dir_here>
for filename in glob.iglob(root_dir + '**/**', recursive=True):
if os.path.isfile(filename):
with open(filename,'r') as file:
print(file.read())
**/** is used to get all files recursively including directory.
if os.path.isfile(filename) is used to check if filename variable is file or directory, if it is file then we can read that file.
Here I am printing file.

If you want a flat list of all paths under a given dir (like find . in the shell):
files = [
os.path.join(parent, name)
for (parent, subdirs, files) in os.walk(YOUR_DIRECTORY)
for name in files + subdirs
]
To only include full paths to files under the base dir, leave out + subdirs.

I've found the following to be the easiest
from glob import glob
import os
files = [f for f in glob('rootdir/**', recursive=True) if os.path.isfile(f)]
Using glob('some/path/**', recursive=True) gets all files, but also includes directory names. Adding the if os.path.isfile(f) condition filters this list to existing files only

For my taste os.walk() is a little too complicated and verbose. You can do the accepted answer cleaner by:
all_files = [str(f) for f in pathlib.Path(dir_path).glob("**/*") if f.is_file()]
with open(outfile, 'wb') as fout:
for f in all_files:
with open(f, 'rb') as fin:
fout.write(fin.read())
fout.write(b'\n')

use os.path.join() to construct your paths - It's neater:
import os
import sys
rootdir = sys.argv[1]
for root, subFolders, files in os.walk(rootdir):
for folder in subFolders:
outfileName = os.path.join(root,folder,"py-outfile.txt")
folderOut = open( outfileName, 'w' )
print "outfileName is " + outfileName
for file in files:
filePath = os.path.join(root,file)
toWrite = open( filePath).read()
print "Writing '" + toWrite + "' to" + filePath
folderOut.write( toWrite )
folderOut.close()

os.walk does recursive walk by default. For each dir, starting from root it yields a 3-tuple (dirpath, dirnames, filenames)
from os import walk
from os.path import splitext, join
def select_files(root, files):
"""
simple logic here to filter out interesting files
.py files in this example
"""
selected_files = []
for file in files:
#do concatenation here to get full path
full_path = join(root, file)
ext = splitext(file)[1]
if ext == ".py":
selected_files.append(full_path)
return selected_files
def build_recursive_dir_tree(path):
"""
path - where to begin folder scan
"""
selected_files = []
for root, dirs, files in walk(path):
selected_files += select_files(root, files)
return selected_files

I think the problem is that you're not processing the output of os.walk correctly.
Firstly, change:
filePath = rootdir + '/' + file
to:
filePath = root + '/' + file
rootdir is your fixed starting directory; root is a directory returned by os.walk.
Secondly, you don't need to indent your file processing loop, as it makes no sense to run this for each subdirectory. You'll get root set to each subdirectory. You don't need to process the subdirectories by hand unless you want to do something with the directories themselves.

Try this:
import os
import sys
for root, subdirs, files in os.walk(path):
for file in os.listdir(root):
filePath = os.path.join(root, file)
if os.path.isdir(filePath):
pass
else:
f = open (filePath, 'r')
# Do Stuff

If you prefer an (almost) Oneliner:
from pathlib import Path
lookuppath = '.' #use your path
filelist = [str(item) for item in Path(lookuppath).glob("**/*") if Path(item).is_file()]
In this case you will get a list with just the paths of all files located recursively under lookuppath.
Without str() you will get PosixPath() added to each path.

This worked for me:
import glob
root_dir = "C:\\Users\\Scott\\" # Don't forget trailing (last) slashes
for filename in glob.iglob(root_dir + '**/*.jpg', recursive=True):
print(filename)
# do stuff

If just the file names are not enough, it's easy to implement a Depth-first search on top of os.scandir():
stack = ['.']
files = []
total_size = 0
while stack:
dirname = stack.pop()
with os.scandir(dirname) as it:
for e in it:
if e.is_dir():
stack.append(e.path)
else:
size = e.stat().st_size
files.append((e.path, size))
total_size += size
The docs have this to say:
The scandir() function returns directory entries along with file attribute information, giving better performance for many common use cases.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python os.walk from current directory - python

You could use os.getcwd() to get the current directory (the one you're in when calling your script), but the better would be to pass the target directory as argument.

Related

Python code to merge multiple .wav files from multiple folders gets hung up

Defining the directory to save outputs in Python for text files

Run multiple find and replaces in Python (on every file in the folder and sub-folder)

Moving files by starting letter in powershell, python or other scripting language running windows

Python recursive folder read

Categories

Resources