Python - Need to loop through directories looking for TXT files

Python - Need to loop through directories looking for TXT files - python

I am a total Python Newb
I need to loop through a directory looking for .txt files, and then read and process them individually. I would like to set this up so that whatever directory the script is in is treated as the root of this action. For example if the script is in /bsepath/workDir, then it would loop over all of the files in workDir and its children.
What I have so far is:
#!/usr/bin/env python
import os
scrptPth = os.path.realpath(__file__)
for file in os.listdir(scrptPth)
with open(file) as f:
head,sub,auth = [f.readline().strip() for i in range(3)]
data=f.read()
#data.encode('utf-8')
pth = os.getcwd()
print head,sub,auth,data,pth
This code is giving me an invalid syntax error and I suspect that is because os.listdir does not like file paths in standard string format. Also I dont think that I am doing the looped action right. How do I reference a specific file in the looped action? Is it packaged as a variable?
Any help is appriciated

import os, fnmatch
def findFiles (path, filter):
for root, dirs, files in os.walk(path):
for file in fnmatch.filter(files, filter):
yield os.path.join(root, file)
Use it like this, and it will find all text files somewhere within the given path (recursively):
for textFile in findFiles(r'C:\Users\poke\Documents', '*.txt'):
print(textFile)

os.listdir expects a directory as input. So, to get the directory in which the script resides use:
scrptPth = os.path.dirname(os.path.realpath(__file__))
Also, os.listdir returns just the filenames, not the full path.
So open(file) will not work unless the current working directory happens to be the directory where the script resides. To fix this, use os.path.join:
import os
scrptPth = os.path.dirname(os.path.realpath(__file__))
for file in os.listdir(scrptPth):
with open(os.path.join(scrptPth, file)) as f:
Finally, if you want to recurse through subdirectories, use os.walk:
import os
scrptPth = os.path.dirname(os.path.realpath(__file__))
for root, dirs, files in os.walk(scrptPth):
for filename in files:
filename = os.path.join(root, filename)
with open(filename, 'r') as f:
head,sub,auth = [f.readline().strip() for i in range(3)]
data=f.read()
#data.encode('utf-8')

Related

For folder in dir, enter it and delete files if condition is met

My dir looks something like this:
dir
|_folder1
|_file1.py
|_file2.png
|_folder2
|_file1.py
|_file2.png
|_etc..
I want to enter each folder and delete all files that don't have .py in their name, only part of the problem I don't know how to solve its how to know if the file is a folder and to enter it if is.
I tried with listdir() and asked for the type of each element in that list, but all were string, probably because it's just a list of names.

You should spend time to make this function more efficient. However, it will do what you want.
import os
def deleteNonPyFiles(parent_dir):
no_delete_kw = '.py'
for (dirpath, dirnames, filenames) in os.walk(parent_dir):
for file in filenames:
if no_delete_kw not in file:
os.remove(f'{dirpath}/{file}')
deleteNonPyFiles('C:/User/mydirpath')

os.walk(...) conveniently steps through all folders and sub-folders under the supplied folder and returns a list of all files within each folder. You can then reconstruct the full path to the file and ignore any that end in .py.
You can try:
import os
for dir_path, _, file_names in os.walk('/path/to/your/parent/directory'):
for delete_me in [os.path.join(dir_path, fname) for fname in file_names if not fname.endswith('.py')]:
print(f'REMOVING: {delete_me}')
os.remove(delete_me)

Take a look:
from os import walk, remove
def get_filenames(path):
filenames = next(walk(path), (None, None, []))[2]
return filenames
def delete_files_without_key(path, key):
child_dirs = next(walk(path))[1]
for dir in child_dirs:
files = get_filenames(f"{path}/{dir}")
for file in files:
if key not in file:
remove(f"{path}/{dir}/{file}")
delete_files_without_key('/path/to/parent/directory', key=".py")

Python: linecache.getline not working as intended

I have a directory with numerous subdirectories.
At the bottom of the directories there are some .txt files i need to extract line 2 from.
import os
import os.path
import linecache
for dirpath, dirnames, filenames in os.walk("."):
for filename in [f for f in filenames if f.endswith(".txt")]:
#print os.path.join(dirpath, filename)
#print filename
print linecache.getline(filename, 2)
I am able to successfully parse all the directories and find every text file. But linecache.getline simply returns newline (where there should be data from that line of the files). Using
print linecache.getline(filename, 2).rstrip('\n')
Does not solve this either.
I am able to correctly print out just the filenames in each directory, but passing these to linecache seems to potentially be the issue. I am able to use linecache.getline(file, lineno.) successfully if I just run the script on 1 .txt file in the current directory.

linecache.getline takes filename from current working directory.
Solution is thus:
import os
import os.path
import linecache
for dirpath, dirnames, filenames in os.walk("."):
for filename in [f for f in filenames if f.endswith(".txt")]:
direc = os.path.join(dirpath, filename)
print linecache.getline(direc, 2)

searching and moving files using python

I have been trying to write some python code in order to get each line from a .txt file and search for a file with that name in a folder and its subfolders. After this I want to move that file in a preset destination folder.
I have tried the following code which was posted on stack overflow only but it doesn't seem to work and I am unable to figure out the problem.Any help would be highly appreciated:
import os
import shutil
def main():
destination = '/Users/jorjis/Desktop/new'
with open('/Users/jorjis/Desktop/articles.txt', 'r') as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
for root, _, filenames in os.walk('/Users/jorjis/Desktop/folder/'):
for filename in filenames:
if filename in filenames_to_copy:
shutil.copy(os.path.join(root, filename), destination)

Without any debugging output (which you have now obtained) I can only guess a common pitfall of os.walk: the filenames returned in filenames are just that, filenames without any path. If your file contains filenames with paths they will never match. Use this instead:
if os.path.join(root, filename) in filenames_to_copy:
shutil.copy(os.path.join(root, filename), destination)

Open a file without specifying the subdirectory python

Lets say my python script is in a folder "/main". I have a bunch of text files inside subfolders in main. I want to be able to open a file just by specifying its name, not the subdirectory its in.
So open_file('test1.csv') should open test1.csv even if its full path is /main/test/test1.csv.
I don't have duplicated file names so it should no be a problem.
I using windows.

you could use os.walk to find your filename in a subfolder structure
import os
def find_and_open(filename):
for root_f, folders, files in os.walk('.'):
if filename in files:
# here you can either open the file
# or just return the full path and process file
# somewhere else
with open(root_f + '/' + filename) as f:
f.read()
# do something
if you have a very deep folder structure you might want to limit the depth of the search

import os
def get_file_path(file):
for (root, dirs, files) in os.walk('.'):
if file in files:
return os.path.join(root, file)
This should work. It'll return the path, so you should handle opening the file, in your code.

import os
def open_file(filename):
f = open(os.path.join('/path/to/main/', filename))
return f

Browse files and subfolders in Python

I'd like to browse through the current folder and all its subfolders and get all the files with .htm|.html extensions. I have found out that it is possible to find out whether an object is a dir or file like this:
import os
dirList = os.listdir("./") # current directory
for dir in dirList:
if os.path.isdir(dir) == True:
# I don't know how to get into this dir and do the same thing here
else:
# I got file and i can regexp if it is .htm|html
and in the end, I would like to have all the files and their paths in an array. Is something like that possible?

You can use os.walk() to recursively iterate through a directory and all its subdirectories:
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith((".html", ".htm")):
# whatever
To build a list of these names, you can use a list comprehension:
htmlfiles = [os.path.join(root, name)
for root, dirs, files in os.walk(path)
for name in files
if name.endswith((".html", ".htm"))]

I had a similar thing to work on, and this is how I did it.
import os
rootdir = os.getcwd()
for subdir, dirs, files in os.walk(rootdir):
for file in files:
#print os.path.join(subdir, file)
filepath = subdir + os.sep + file
if filepath.endswith(".html"):
print (filepath)
Hope this helps.

In python 3 you can use os.scandir():
def dir_scan(path):
for i in os.scandir(path):
if i.is_file():
print('File: ' + i.path)
elif i.is_dir():
print('Folder: ' + i.path)
dir_scan(i.path)

Use newDirName = os.path.abspath(dir) to create a full directory path name for the subdirectory and then list its contents as you have done with the parent (i.e. newDirList = os.listDir(newDirName))
You can create a separate method of your code snippet and call it recursively through the subdirectory structure. The first parameter is the directory pathname. This will change for each subdirectory.
This answer is based on the 3.1.1 version documentation of the Python Library. There is a good model example of this in action on page 228 of the Python 3.1.1 Library Reference (Chapter 10 - File and Directory Access).
Good Luck!

Slightly altered version of Sven Marnach's solution..
import os
folder_location = 'C:\SomeFolderName'
file_list = create_file_list(folder_location)
def create_file_list(path):
return_list = []
for filenames in os.walk(path):
for file_list in filenames:
for file_name in file_list:
if file_name.endswith((".txt")):
return_list.append(file_name)
return return_list

There are two ways works for me.
1. Work with the `os` package and use `'__file__'` to replace the main
directory when the project locates
import os
script_dir = os.path.dirname(__file__)
path = 'subdirectory/test.txt'
file = os.path.join(script_dir, path)
fileread = open(file,'r')
2. By using '\\' to read or write the file in subfolder
fileread = open('subdirectory\\test.txt','r')

from tkinter import *
import os
root = Tk()
file = filedialog.askdirectory()
changed_dir = os.listdir(file)
print(changed_dir)
root.mainloop()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Need to loop through directories looking for TXT files - python

Related

For folder in dir, enter it and delete files if condition is met

Python: linecache.getline not working as intended

searching and moving files using python

Open a file without specifying the subdirectory python

Browse files and subfolders in Python

Categories

Resources