Glob pattern: Exclude file only from top directory - python

Lets take a below example as my folder structure,
|---GCD.txt
|---Azure.png
|---AWS.txt
|---foo/
| |--- app.txt
| |--- bar.txt
GCD.txt, azure.png, AWS.txt files are at root folder. I don't know the folder name(New GUID every time).
Now I want to write a glob pattern such that text file(*.txt) only from root folder should skip, not from sub-folders. So expected behavior should be,
|---Azure.png
|---foo/
| |--- app.txt
| |--- bar.txt
GCD.txt and AWS.txt should be skipped.
My attempts:
.*.txt
./*.txt
*.txt
None of the above pattern helped. Am I missing something.

I don't think there's an expression that can be passed to glob that will do this. Here's how it could be done though:
from os import walk
from os.path import join
from pathlib import Path
files_of_interest = []
base_directory = '.'
def istxt(filename):
return Path(filename).suffix.lower() in {'.txt'}
for root, _, files in walk(base_directory):
if root == base_directory:
for file in files:
if not istxt(file):
files_of_interest.append(join(root, file))
else:
for file in files:
if istxt(file):
files_of_interest.append(join(root, file))
print(files_of_interest)

Related

calling a python module that reads a file

so my program import a utils that reads a file in the same directory as the utils. However, this utils function can be called from different files from different directory.
Project
|
|-module_1:
|__ init __.py
| file.py <--- calls util.load_file()
|module_2:
| __ init __.py
| utils.py <---- load_file() path used 'file.txt'
| file.txt
what is this thing called ? I couldn't even search for it. tried package managment, expanding path ...etc
__file__ contains the path to the current file. Check it with print(__file__).
pathlib from Pythons standard library can be used to construct an absolute path to the data file.
import pathlib
print(pathlib.Path(__file__))
print(pathlib.Path(__file__).parent)
print(pathlib.Path(__file__).parent / 'file.txt')
You can now open your file like this:
filepath = pathlib.Path(__file__).parent / 'file.txt'
with open(filepath) as f:
for line in f:
print(line)

Search in directory and subdirectories for missing files

I am trying to search through a directory and associated subdirectories to see if these listed jpg files are missing. I have got it looping through one directory but cannot extend the search into any subdirectories.
I have tried using os.walk but it just loops through all files and repeats that all files are missing even if they are not. So I am not sure how to proceed.
This is the code that I have so far.
source = 'path_to_file'
paths = ['Hello', 'Hi', 'Howdy']
for index, item in enumerate(paths):
paths[index] = (source + '\\' + paths[index]+'.jpg')
mp = [path for path in paths if not isfile(path)]
for nl in mp:
print(f'{nl}... is missing')
As you told that using os.walk you were unable to get your desired output, Here's a solution.
What i have done is using os os.walk i have searched the whole directory and then appended the file names to a list called emty_list. Then i have tried to check if the item in the list file_name is in emty_list or not.
import os
source = r'path'
emty_list=[]
file_name= ['hello.jpg', 'Hi.jpg', 'Howdy.jpg']
for root, dirs, files in os.walk(f"{source}", topdown=False): #Listing Directory
for name in files:
emty_list.append(name)
for check in file_name:
if check not in emty_list:
print(f"File Not Found Error : File Name: {check}")
Note: Please check if the file that you have created in your system is for example Hello.jpg not hello.jpg.
You can leverage glob and the recursive parameter to do that in python :
import glob
source='./'
paths=['file1','file2','file3']
for path in paths:
print(f"looking for {path} with {source+'**/'+path+'.jpg'}")
print(glob.glob(source+"**/"+path+".jpg",recursive=True))
mp=[path for path in paths if not glob.glob(source+"**/"+path+".jpg",recursive=True)]
for nl in mp:
print(f'{nl}... is missing')
(You can the remove the for loop line 5-7, it's just to clarify how glob works, the comprehension list itself is enough)
With the following folder :
.
├── file1.jpg
├── search.py
└── subfolder
└── file3.jpg
It returns :
looking for file1 with ./**/file1.jpg
['./file1.jpg']
looking for file2 with ./**/file2.jpg
[]
looking for file3 with ./**/file3.jpg
['./subfolder/file3.jpg']
file2... is missing

Zipping files to the same folder level

This thread here advises to use shutilto zip files:
import shutil
shutil.make_archive(output_filename, 'zip', dir_name)
This zips everything in dir_name and maintains the folder structure in it. Is it possible to use this same library to remove all sub-folders and just zip all files in dir_name into the same level? Or must I introduce a separate code chunk to first consolidate the files? For eg., this is a hypothetical folder structure:
\dir_name
\dir1
\cat1
file1.txt
file2.txt
\cat2
file3.txt
\dir2
\cat3
file4.txt
Output zip should just contain:
file1.txt
file2.txt
file3.txt
file4.txt
shutil.make_archive does not have a way to do what you want without copying files to another directory, which is inefficient. Instead you can use a compression library directly similar to the linked answer you provided. Note this doesn't handle name collisions!
import zipfile
import os
with zipfile.ZipFile('output.zip','w',zipfile.ZIP_DEFLATED,compresslevel=9) as z:
for path,dirs,files in os.walk('dir_name'):
for file in files:
full = os.path.join(path,file)
z.write(full,file) # write the file, but with just the file's name not full path
# print the files in the zipfile
with zipfile.ZipFile('output.zip') as z:
for name in z.namelist():
print(name)
Given:
dir_name
├───dir1
│ ├───cat1
│ │ file1.txt
│ │ file2.txt
│ │
│ └───cat2
│ file3.txt
│
└───dir2
└───cat3
file4.txt
Output:
file1.txt
file2.txt
file3.txt
file4.txt
# The root directory to search for
path = r'dir_name/'
import os
import glob
# List all *.txt files in the root directory
file_paths = [file_path
for root_path, _, _ in os.walk(path)
for file_path in glob.glob(os.path.join(root_path, '*.txt'))]
import tempfile
# Create a temporary directory to copy your files into
with tempfile.TemporaryDirectory() as tmp:
import shutil
for file_path in file_paths:
# Get the basename of the file
basename = os.path.basename(file_path)
# Copy the file to the temporary directory
shutil.copyfile(file_path, os.path.join(tmp, basename))
# Zip the temporary directory to the working directory
shutil.make_archive('output', 'zip', tmp)
This will create a output.zip file in the current working directory. The temporary directory will be deleted when the end of the context manager is reached.

Get files from specific folders in python

I have the following directory structure with the following files:
Folder_One
├─file1.txt
├─file1.doc
└─file2.txt
Folder_Two
├─file2.txt
├─file2.doc
└─file3.txt
I would like to get only the .txt files from each folder listed. Example:
Folder_One-> file1.txt and file2.txt
Folder_Two-> file2.txt and file3.txt
Note: This entire directory is inside a folder called dataset. My code looks like this, but I believe something is missing. Can someone help me.
path_dataset = "./dataset/"
filedataset = os.listdir(path_dataset)
for i in filedataset:
pasta = ''
pasta = pasta.join(i)
for file in glob.glob(path_dataset+"*.txt"):
print(file)
from pathlib import Path
for path in Path('dataset').rglob('*.txt'):
print(path.name)
Using glob
import glob
for x in glob.glob('dataset/**/*.txt', recursive=True):
print(x)
You can use re module to check that filename ends with .txt.
import re
import os
path_dataset = "./dataset/"
l = os.listdir(path_dataset)
for e in l:
if os.path.isdir("./dataset/" + e):
ll = os.listdir(path_dataset + e)
for file in ll:
if re.match(r".*\.txt$", file):
print(e + '->' + file)
One may use an additional option to check and find all files by using the os module (this is of advantage if you already use this module):
import os
#get current directory, you may also provide an absolute path
path=os.getcwd()
#walk recursivly through all folders and gather information
for root, dirs, files in os.walk(path):
#check if file is of correct type
check=[f for f in files if f.find(".txt")!=-1]
if check!=[]:print(root,check)

Zip directories from listed ones

I am relatively new to Python, I am trying to make a script which will only zip subfolders that I define in a list, with their content of course. I tried modifying various codes from Stack Overflow and whatever I found on internet but either I zip all subfolders, or I zip subfolders that I want but without content.
List could be full path to subfolder or can I define the path to the root folder and then specify subfolders?
This is the idea: 3 subfolders 1,2,3 and I want to zip only subfolders 1 and 3. I added the last code that I was modifying but I just can't return the list in a function.
Folder
|- SubFolder1
| |- file1.txt
| |- file2.txt
|- SubFolder2
| |- file1.txt
| |- file2.txt
|- SubFolder3
| |- file1.txt
| |- file2.txt
The code:
import os
import zipfile
list=["SubFolder1", "SubFolder3"]
def zipdir(path, ziph):
# ziph is zipfile handle
for root, dirs, files in os.walk(path):
for file in files:
ziph.write(os.path.join(root, file))
if __name__ == '__main__':
zipf = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED)
zipdir(**list**, zipf)
zipf.close()
I modified your code and think this way it should work as expected if I understand your request correctly. Please check out this thread too, I think it's pretty much what you are looking for ;-)
1) os.walk goes through each of your subfolders, too. So just wait until each subfolder becomes the "root" (here is an example).
2) Check whether the folder is contained in the list. If it is, zip all files within this folder
3) Recreate the subfolder structure relative to your original root-folder
import os
import zipfile
rootpath = r'C:\path\to\your\rootfolder'
list=["SubFolder1", "SubFolder3"]
def zipdir(path, ziph, dirlist):
# ziph is zipfile handle
for root, dirs, files in os.walk(path):
if os.path.basename(root) in dirlist:
for file in files:
file_path = os.path.join(root, file)
relative_path = os.path.relpath(file_path, path)
ziph.write(file_path, relative_path)
if __name__ == '__main__':
zipf = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED)
zipdir(rootpath, zipf, list)
zipf.close()

Categories

Resources