Delete file by non standard extension - python

I know how to delete files by extension but what if my files are looking like this:
update_24-08-2022_14-54.zip.001
Where last 3 digits can be between 001-029
Here is code that I'm using for standard zip files
files_in_directory = os.listdir(directory)
filtered_files = [file for file in files_in_directory if file.endswith(".zip")]
for file in filtered_files:
path_to_file = os.path.join(directory, file)
os.remove(path_to_file)

Assuming the double extensions are of the form .zip.xyz, with xyz being triple digits, you can use globbing:
import glob
import os
for path in glob.glob('*.zip.[0-9][0-9][0-9]'):
os.remove(path)
(As a usual precaution, check first, by replacing os.remove with print).
If you have a specific directory, its name stored in directory, you can use:
import glob
import os
for path in glob.glob(os.path.join(directory, '*.zip.[0-9][0-9][0-9]')):
os.remove(path)
There is no need to join the directory and path inside the for loop (as is the case in the question): path itself will already contain the directory name.

Related

Recursively find and copy files from many folders

I have some files in an array that I want to recursively search from many folders
An example of the filename array is ['A_010720_X.txt','B_120720_Y.txt']
Example of folder structure is as below which I can also provide as an array e.g ['A','B'] and ['2020-07-01','2020-07-12']. The "DL" remains the same for all.
C:\A\2020-07-01\DL
C:\B\2020-07-12\DL
etc
I have tried to use shutil but it doesn't seem to work effectively for my requirement as I can only pass in a full file name and not a wildcard. The code I have used with shutil which works but without wildcards and with absolute full file name and path e.g the code below will only give me A_010720_X.txt
I believe the way to go would be using glob or pathlib which i have not used before or cannot find some good examples similar to my use case
import shutil
filenames_i_want = ['A_010720_X.txt','B_120720_Y.txt']
RootDir1 = r'C:\A\2020-07-01\DL'
TargetFolder = r'C:\ELK\LOGS\ATH\DEST'
for root, dirs, files in os.walk((os.path.normpath(RootDir1)), topdown=False):
for name in files:
if name in filenames_i_want:
print ("Found")
SourceFolder = os.path.join(root,name)
shutil.copy2(SourceFolder, TargetFolder)
I think this should do what you need assuming they are all .txt files.
import glob
import shutil
filenames_i_want = ['A_010720_X.txt','B_120720_Y.txt']
TargetFolder = r'C:\ELK\LOGS\ATH\DEST'
all_files = []
for directory in ['A', 'B']:
files = glob.glob('C:\{}\*\DL\*.txt'.format(directory))
all_files.append(files)
for file in all_files:
if file in filenames_i_want:
shutil.copy2(file, TargetFolder)

Python3 create list of image in a folder

I have an array of images in Python3 like this...
images = [
"/var/www/html/myfolder/images/1.jpg",
"/var/www/html/myfolder/images/441.jpg",
"/var/www/html/myfolder/images/15.jpg",
"/var/www/html/myfolder/images/78.jpg",
]
Instead of specifying the image like this I would like to pass an absolute path and have python create me the images list out of the .jpg images that are in that path.
What is my best approach?
You can make use of glob.
glob.glob(pathname, *.jpg, recursive=False)
Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can
be either absolute (like /usr/src/Python-1.5/Makefile) or relative
(like ../../Tools//.gif), and can contain shell-style wildcards.
Broken symlinks are included in the results (as in the shell).
If recursive is true, the pattern “**” will match any files and zero or more directories and subdirectories. If the pattern is
followed by an os.sep, only directories and subdirectories match.
let's say your abs path is myfolder
import glob
images = glob.glob('images/*.jpg')
https://docs.python.org/3/library/glob.html#glob.glob
The pathlib module in Python 3 makes this easy:
from pathlib import Path
images = Path("/var/www/html/myfolder/images").glob("*.jpg")
Want all jpg images recursively under that directory instead? Use .glob("*/**.jpg").
Note that this is creating an array of Path objects. If you want strings, just convert them:
image_strings = [str(p) for p in images]
If you specify the path there are a number of ways to find all the files in that directory. Once you have that list you can simply iterate through it and create the images.
See: How do I list all files of a directory?
A good way to do it is using os.listdir:
import os
# specify the img directory path
path = "path/to/img/folder/"
# list files in img directory
files = os.listdir(path)
for file in files:
# make sure file is an image
if file.endswith(('.jpg', '.png', 'jpeg')):
img_path = path + file
# load file as image...
To scan just the top level
import os
path = "path/to/img/folder/"
jpgs = [os.path.join(path, file)
for file in os.listdir(path)
if file.endswith('.jpg')]
To scan recursively, replace the last line with
jpgs = [os.path.join(root, file)
for root, dirs, files in os.walk(path)
for file in files
if file.endswith('.jpg')]
import os
images=[]
def getFiles(path):
for file in os.listdir(path):
if file.endswith(".jpg"):
images.append(os.path.join(path, file))
images list:
filesPath = "/var/www/html/myfolder/images"
getFiles(filesPath)
print(images)
The modern method is to use
pathlib which
treats paths as objects, not strings. As an object, all paths then
have methods to access various components of the path, (e.g.
.suffix, .stem).
pathlib also has:
.glob build-in
a .open method (e.g. Path.open(mode='r'))
Python 3's pathlib Module: Taming the File System
Code:
from pathlib import Path
jpg_files = Path('/some_path').glob('*.jpg')
for file in jpg_files:
with file.open(mode='r') as f:
...
do some stuff

Loop through binary files that has no extension

I was looking for ways to loop over files in directory with python, and I found this question:
Loop through all CSV files in a folder
The point is that the files I have are binary files, with no file extension at the end.
What I want my program to do is to iterate through all the files that have no extension.
Anyway to apply this using wildcards? (Or any other way?)
You can use os.path.splitext to check if a file has an extension or not.
See this examples:
import os
os.path.splitext("foo.ext")
=> ('foo', '.ext')
os.path.splitext("foo")
=> ('foo', '')
So, you can do that:
import os
path = "path/to/files"
dirs = os.listdir(path)
for path in dirs:
if not os.path.splitext(path)[1]:
print(path)
But, beware of "hidden" files which name starts with a dot, ie.: ".bashrc".
You can also check for the existence of a dot in the filename:
for path in dirs:
if "." not in path:
print(path)
Sounds like what you are interested in is
[f for f in next(os.walk(folder))[2] if '.' not in f]
I would suggest using os.listdir(), and then check whether filename has an extension (check if there is a dot in a filename). Once You get all filenames without dots (that is, without extension), just be sure to check that the filename isn't actually directory name, and that's it.
You could use the glob module and filter out any files with extensions:
import glob
for filename in (filename for filename in glob.iglob('*') if '.' not in filename):
print(filename)

Creating subdirectories and sorting files based on filename PYTHON

I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.
Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.

how to get name of a file in directory using python

There is an mkv file in a folder named "export". What I want to do is to make a python script which fetches the file name from that export folder.
Let's say the folder is at "C:\Users\UserName\Desktop\New_folder\export".
How do I fetch the name?
I tried using this os.path.basename and os.path.splitext .. well.. didn't work out like I expected.
os.path implements some useful functions on pathnames. But it doesn't have access to the contents of the path. For that purpose, you can use os.listdir.
The following command will give you a list of the contents of the given path:
os.listdir("C:\Users\UserName\Desktop\New_folder\export")
Now, if you just want .mkv files you can use fnmatch(This module provides support for Unix shell-style wildcards) module to get your expected file names:
import fnmatch
import os
print([f for f in os.listdir("C:\Users\UserName\Desktop\New_folder\export") if fnmatch.fnmatch(f, '*.mkv')])
Also as #Padraic Cunningham mentioned as a more pythonic way for dealing with file names you can use glob module :
map(path.basename,glob.iglob(pth+"*.mkv"))
You can use glob:
from glob import glob
pth ="C:/Users/UserName/Desktop/New_folder/export/"
print(glob(pth+"*.mkv"))
path+"*.mkv" will match all the files ending with .mkv.
To just get the basenames you can use map or a list comp with iglob:
from glob import iglob
print(list(map(path.basename,iglob(pth+"*.mkv"))))
print([path.basename(f) for f in iglob(pth+"*.mkv")])
iglob returns an iterator so you don't build a list for no reason.
I assume you're basically asking how to list files in a given directory. What you want is:
import os
print os.listdir("""C:\Users\UserName\Desktop\New_folder\export""")
If there's multiple files and you want the one(s) that have a .mkv end you could do:
import os
files = os.listdir("""C:\Users\UserName\Desktop\New_folder\export""")
mkv_files = [_ for _ in files if _[-4:] == ".mkv"]
print mkv_files
If you are searching for recursive folder search, this method will help you to get filename using os.walk, also you can get those file's path and directory using this below code.
import os, fnmatch
for path, dirs, files in os.walk(os.path.abspath(r"C:/Users/UserName/Desktop/New_folder/export/")):
for filename in fnmatch.filter(files, "*.mkv"):
print(filename)
You can use glob
import glob
for file in glob.glob('C:\Users\UserName\Desktop\New_folder\export\*.mkv'):
print(str(file).split('\')[-1])
This will list out all the files having extention .mkv as
file.mkv, file2.mkv and so on.
From os.walk you can read file paths as a list
files = [ file_path for _, _, file_path in os.walk(DIRECTORY_PATH)]
for file_name in files[0]: #note that it has list of lists
print(file_name)

Categories

Resources