I have a directory which contains different file names. The other directory I have is a directory of directories, such that each directory has the same name of the filename in the first directory.
What I want to do is that I would like to check if a file exists in the directory of directories through its name.
While working on this I had to make different for-loops and was a bit confusing. Is there a simpler way to do that in Python?
Well, here's what I did so far:
import os
directory_of_files_path = '/home/user/directory_of_files'
directory_of_directories_path = '/home/user/directory_of_directories'
i = 0
for root_pairs, dirs_pairs, files_pairs in os.walk(directory_of_files_path):
for root_aligned, dirs_aligned, files_aligned in os.walk(directory_of_directories_path):
for file in files_pairs:
for directory in dirs_aligned:
filename, file_extension = os.path.splitext(file)
if filename == directory:
i = i + 1
As you can see, in the above code I was able to return the number of files included in the directory of directories (based on name). But, couldn't figure out to check those that are not included in the directory of directories.
Thanks.
Don't know if I understood what you wanted to do but... here is my guess:
This is how I made the files and dirs
import os
dir_files = 'dir_files'
dir_dir = 'dir_dir'
i = 0
for rf, df, ff in os.walk(dir_files):
ff2 = ff
for rd, dd, fd in os.walk(dir_dir):
for dirs in dd:
if dirs + ".txt" in ff:
i += 1
ff2.remove(dirs + ".txt")
print("There are ", i, "dirs that matches with the files")
print("not found dir corresponding to {}".format(ff2))
Output
Related
I would like some help to loop through some directories and subdirectories and extracting data. I have a directory with three levels, with the third level containing several .csv.gz files. The structure is like this
I need to access level 2 (where subfolders are) of each folder and check the existence of a specific folder (in my example, this will be subfolder 3; I left the other folders empty for this example, but in real cases they will have data). If checking returns True, then I want to change the name of files within the target subfolder3 and transfer all files to another folder.
Bellow is my code. It is quite cumbersome and there is probably better ways of doing it. I tried using os.walk() and this is the closest I got to a solution but it won't move the files.
import os
import shutil
def organizer(parent_dir, target_dir, destination_dir):
for root, dirs, files in os.walk(parent_dir):
if root.endswith(target_dir):
target = root
for files in os.listdir(target):
if not files.startswith("."):
# this is to change the name of the file
fullname = files.split(".")
just_name = fullname[0]
csv_extension = fullname[1]
gz_extension = fullname[2]
subject_id = target
#make a new name
origin = subject_id + "/"+ just_name + "." + csv_extension + "." + gz_extension
#make a path based on this new name
new_name = os.path.join(destination_dir, origin)
#move file from origin folder to destination folder and rename the file
shutil.move(origin, new_name)
Any suggestions on how to make this work and / or more eficient?
simply enough, you can use the built-in os module, with os.walk(path) returns you root directories and files found
import os
for root, _, files in os.walk(path):
#your code here
for your problem, do this
import os
for root, dirs, files in os.walk(parent_directory);
for file in files:
#exctract the data from the "file"
check this for more information os.walk()
and if you want to get the name of the file, you can use os.path.basename(path)
you can even check for just the gzipped csv files you're looking for using built-in fnmatch module
import fnmathch, os
def find_csv_files(path):
result = []
for root, _, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, "*.csv.gz"): # find csv.gz using regex paterns
result.append(os.path.join(root, name))
return list(set(results)) #to get the unique paths if for some reason duplicated
Ok, guys, I was finally able to find a solution. Here it is. Not the cleanest one, but it works in my case. Thanks for the help.
def organizer(parent_dir, target_dir, destination_dir):
for root, dirs, files in os.walk(parent_dir):
if root.endswith(target_dir):
target = root
for files in os.listdir(target):
#this one because I have several .DS store files in the folder which I don't want to extract
if not files.startswith("."):
fullname = files.split(".")
just_name = fullname[0]
csv_extension = fullname[1]
gz_extension = fullname[2]
origin = target + "/" + files
full_folder_name = origin.split("/")
#make a new name
new_name = full_folder_name[5] + "_"+ just_name + "." + csv_extension + "." + gz_extension
#make a path based on this new name
new_path = os.path.join(destination_dir, new_name)
#move file from origin folder to destination folder and rename the file
shutil.move(origin, new_path)
The guess the problem was that was passing a variable that was a renamed file (in my example, I wrongly called this variable origin) as the origin path to shutil.move(). Since this path does not exist, then the files weren't moved.
I have the following directory structure within my zip file:
myzip.zip
- directory 1
- subdirectory 1
- imageA.jpg
- imageB.jpg
- directory 2
- subdirectory 2
- imageA.jpg
- imageB.jpg
And my goal is to rename the .jpg files to main directory name like so:
myzip.zip
- directory 1
- subdirectory 1
- directory 1-1.jpg
- directory 1-2.jpg
- directory 2
- subdirectory 2
- directory 2-1.jpg
- directory 2-2.jpg
Thereby taking in account that an subdirectory can contain multiple .jpg files adding an incremental number after each newly renamed .jpg file starting from 1 (hence the new filename directory 1-1.jpg).
And lastly I would like to write these changes to an new zipfile, keeping the same structure with the only difference the changed names from the .jpg files.
My idea in code:
import zipfile
source = zipfile.ZipFile("myzip.zip", 'r')
target = zipfile.ZipFile(source.filename+"_renamed"+".zip", 'w', zipfile.ZIP_DEFLATED)
for file in source.infolist():
filename = file.filename #directory 1/subdirectory 1/imageA.jpg
rootname, image_name = filename.split("/subdirectory")
# rootname results in: directory 1
# image_name results in /subdirectory/image_name.jpg
new_image = image_name.replace(image_name, "/subdirectory/"+rootname+image_name[4:])
target.write(rootname+new_image)
I though (haven't really tested it) about using zipfile.Zipfile and something of using the above code, but to be honest I have not really an idea how to solve this.
Any ideas or examples?
Here's some pseudocode representing how you could implement this:
unzip myzip.zip
for directory in unzipped:
for subdirectory in directory:
i = 0
for file in subdirectory:
file.rename(f"{directory.name}-{i}.jpg")
i += 1
zip unzipped
Based on #bobtho'-' pseudocode I've created the following program:
import os
import zipfile
import sys
import shutil
root = os.path.join(sys.path[0])
unzipped = os.path.join(root,"unzipped") #a folder to extract/unzip your content to
if not os.path.exists(unzipped):
print("Create new unzipped directory")
os.makedirs(unzipped)
elif len(os.listdir(unzipped)) != 0:
shutil.rmtree(unzipped) # remove the old folder and its contents
os.makedirs(unzipped)
filename = "myfile.zip"
source = os.path.join(root, filename)
with zipfile.ZipFile(source) as source:
source.extractall(unzipped)
target = zipfile.ZipFile(source.filename, 'w')
with target:
for filename in os.listdir(unzipped):
directory = os.path.join(unzipped, filename)
for sub_dir in os.listdir(directory):
files = os.path.join(unzipped, directory, sub_dir)
i = 0
for file in os.listdir(files):
old_file_path = os.path.join(unzipped, directory, sub_dir, file)
print(filename)
renamed_file = "{directory}-{i}.jpg".format(directory=filename, i=i)
new_file_path = os.path.join(unzipped, directory, sub_dir, renamed_file)
os.rename(old_file_path, new_file_path)
directory_to_zip = os.path.relpath(os.path.join(filename, sub_dir, renamed_file))
target.write(new_file_path, directory_to_zip)
i += 1
target.close()
shutil.rmtree(unzipped)
I think its not the best (or fastest) solution, but fitted my goals. Hopefully someone is helped by this.
I'd like to browse through the current folder and all its subfolders and get all the files with .htm|.html extensions. I have found out that it is possible to find out whether an object is a dir or file like this:
import os
dirList = os.listdir("./") # current directory
for dir in dirList:
if os.path.isdir(dir) == True:
# I don't know how to get into this dir and do the same thing here
else:
# I got file and i can regexp if it is .htm|html
and in the end, I would like to have all the files and their paths in an array. Is something like that possible?
You can use os.walk() to recursively iterate through a directory and all its subdirectories:
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith((".html", ".htm")):
# whatever
To build a list of these names, you can use a list comprehension:
htmlfiles = [os.path.join(root, name)
for root, dirs, files in os.walk(path)
for name in files
if name.endswith((".html", ".htm"))]
I had a similar thing to work on, and this is how I did it.
import os
rootdir = os.getcwd()
for subdir, dirs, files in os.walk(rootdir):
for file in files:
#print os.path.join(subdir, file)
filepath = subdir + os.sep + file
if filepath.endswith(".html"):
print (filepath)
Hope this helps.
In python 3 you can use os.scandir():
def dir_scan(path):
for i in os.scandir(path):
if i.is_file():
print('File: ' + i.path)
elif i.is_dir():
print('Folder: ' + i.path)
dir_scan(i.path)
Use newDirName = os.path.abspath(dir) to create a full directory path name for the subdirectory and then list its contents as you have done with the parent (i.e. newDirList = os.listDir(newDirName))
You can create a separate method of your code snippet and call it recursively through the subdirectory structure. The first parameter is the directory pathname. This will change for each subdirectory.
This answer is based on the 3.1.1 version documentation of the Python Library. There is a good model example of this in action on page 228 of the Python 3.1.1 Library Reference (Chapter 10 - File and Directory Access).
Good Luck!
Slightly altered version of Sven Marnach's solution..
import os
folder_location = 'C:\SomeFolderName'
file_list = create_file_list(folder_location)
def create_file_list(path):
return_list = []
for filenames in os.walk(path):
for file_list in filenames:
for file_name in file_list:
if file_name.endswith((".txt")):
return_list.append(file_name)
return return_list
There are two ways works for me.
1. Work with the `os` package and use `'__file__'` to replace the main
directory when the project locates
import os
script_dir = os.path.dirname(__file__)
path = 'subdirectory/test.txt'
file = os.path.join(script_dir, path)
fileread = open(file,'r')
2. By using '\\' to read or write the file in subfolder
fileread = open('subdirectory\\test.txt','r')
from tkinter import *
import os
root = Tk()
file = filedialog.askdirectory()
changed_dir = os.listdir(file)
print(changed_dir)
root.mainloop()
I'm trying to write my first script.
I have been reading about python but I am stock.
I'm trying to write a script that will rename all the file names in a specific folder.
this is what I have so far:
import os
files = os.listdir('files_to_Change')
print (files)
Get all the file names from folder:
for i in files:
if i == ".DS_Store":
p = files.index(".DS_Store")
del files[p]
If mac invisible file exists delete from list (maybe a mistake here).
for i in files:
oldName = i
fileName, fileExtension = os.path.splitext(i)
print (oldName)
print (fileName)
os.rename(oldName,fileName)
This is where I am stock, I get this error:
Output:
FileNotFoundError: [Errno 2] No such file or directory: 'File.1'
On the above part I'm just removing the file extension, but that is only the beginning.
I'm also trying to substitute every point by a space and make the first letter of every word a capital.
Can anyone point me in the right direction?
Thanks so much
In your example, when you get a list of files in a files_to_Change directory, you get file names without the directory name:
>>> files = os.listdir('test_folder')
>>> print files[0]
.com.apple.timemachine.supported
So in order to get the full path to that file, from whereever you're in your directory tree, you should join the directory name (files_to_Change) with the file name:
import os
join = os.path.join
src = 'files_to_Change'
files = os.listdir( src )
for i in files:
old = i
new, ext = os.path.splitext ( old )
os.rename( join( src, old ), join( src, fileName ))
I'd like to browse through the current folder and all its subfolders and get all the files with .htm|.html extensions. I have found out that it is possible to find out whether an object is a dir or file like this:
import os
dirList = os.listdir("./") # current directory
for dir in dirList:
if os.path.isdir(dir) == True:
# I don't know how to get into this dir and do the same thing here
else:
# I got file and i can regexp if it is .htm|html
and in the end, I would like to have all the files and their paths in an array. Is something like that possible?
You can use os.walk() to recursively iterate through a directory and all its subdirectories:
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith((".html", ".htm")):
# whatever
To build a list of these names, you can use a list comprehension:
htmlfiles = [os.path.join(root, name)
for root, dirs, files in os.walk(path)
for name in files
if name.endswith((".html", ".htm"))]
I had a similar thing to work on, and this is how I did it.
import os
rootdir = os.getcwd()
for subdir, dirs, files in os.walk(rootdir):
for file in files:
#print os.path.join(subdir, file)
filepath = subdir + os.sep + file
if filepath.endswith(".html"):
print (filepath)
Hope this helps.
In python 3 you can use os.scandir():
def dir_scan(path):
for i in os.scandir(path):
if i.is_file():
print('File: ' + i.path)
elif i.is_dir():
print('Folder: ' + i.path)
dir_scan(i.path)
Use newDirName = os.path.abspath(dir) to create a full directory path name for the subdirectory and then list its contents as you have done with the parent (i.e. newDirList = os.listDir(newDirName))
You can create a separate method of your code snippet and call it recursively through the subdirectory structure. The first parameter is the directory pathname. This will change for each subdirectory.
This answer is based on the 3.1.1 version documentation of the Python Library. There is a good model example of this in action on page 228 of the Python 3.1.1 Library Reference (Chapter 10 - File and Directory Access).
Good Luck!
Slightly altered version of Sven Marnach's solution..
import os
folder_location = 'C:\SomeFolderName'
file_list = create_file_list(folder_location)
def create_file_list(path):
return_list = []
for filenames in os.walk(path):
for file_list in filenames:
for file_name in file_list:
if file_name.endswith((".txt")):
return_list.append(file_name)
return return_list
There are two ways works for me.
1. Work with the `os` package and use `'__file__'` to replace the main
directory when the project locates
import os
script_dir = os.path.dirname(__file__)
path = 'subdirectory/test.txt'
file = os.path.join(script_dir, path)
fileread = open(file,'r')
2. By using '\\' to read or write the file in subfolder
fileread = open('subdirectory\\test.txt','r')
from tkinter import *
import os
root = Tk()
file = filedialog.askdirectory()
changed_dir = os.listdir(file)
print(changed_dir)
root.mainloop()