Python3 create list of image in a folder - python

I have an array of images in Python3 like this...
images = [
"/var/www/html/myfolder/images/1.jpg",
"/var/www/html/myfolder/images/441.jpg",
"/var/www/html/myfolder/images/15.jpg",
"/var/www/html/myfolder/images/78.jpg",
]
Instead of specifying the image like this I would like to pass an absolute path and have python create me the images list out of the .jpg images that are in that path.
What is my best approach?

You can make use of glob.
glob.glob(pathname, *.jpg, recursive=False)
Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can
be either absolute (like /usr/src/Python-1.5/Makefile) or relative
(like ../../Tools//.gif), and can contain shell-style wildcards.
Broken symlinks are included in the results (as in the shell).
If recursive is true, the pattern “**” will match any files and zero or more directories and subdirectories. If the pattern is
followed by an os.sep, only directories and subdirectories match.
let's say your abs path is myfolder
import glob
images = glob.glob('images/*.jpg')
https://docs.python.org/3/library/glob.html#glob.glob

The pathlib module in Python 3 makes this easy:
from pathlib import Path
images = Path("/var/www/html/myfolder/images").glob("*.jpg")
Want all jpg images recursively under that directory instead? Use .glob("*/**.jpg").
Note that this is creating an array of Path objects. If you want strings, just convert them:
image_strings = [str(p) for p in images]

If you specify the path there are a number of ways to find all the files in that directory. Once you have that list you can simply iterate through it and create the images.
See: How do I list all files of a directory?
A good way to do it is using os.listdir:
import os
# specify the img directory path
path = "path/to/img/folder/"
# list files in img directory
files = os.listdir(path)
for file in files:
# make sure file is an image
if file.endswith(('.jpg', '.png', 'jpeg')):
img_path = path + file
# load file as image...

To scan just the top level
import os
path = "path/to/img/folder/"
jpgs = [os.path.join(path, file)
for file in os.listdir(path)
if file.endswith('.jpg')]
To scan recursively, replace the last line with
jpgs = [os.path.join(root, file)
for root, dirs, files in os.walk(path)
for file in files
if file.endswith('.jpg')]

import os
images=[]
def getFiles(path):
for file in os.listdir(path):
if file.endswith(".jpg"):
images.append(os.path.join(path, file))
images list:
filesPath = "/var/www/html/myfolder/images"
getFiles(filesPath)
print(images)

The modern method is to use
pathlib which
treats paths as objects, not strings. As an object, all paths then
have methods to access various components of the path, (e.g.
.suffix, .stem).
pathlib also has:
.glob build-in
a .open method (e.g. Path.open(mode='r'))
Python 3's pathlib Module: Taming the File System
Code:
from pathlib import Path
jpg_files = Path('/some_path').glob('*.jpg')
for file in jpg_files:
with file.open(mode='r') as f:
...
do some stuff

Related

Delete file by non standard extension

I know how to delete files by extension but what if my files are looking like this:
update_24-08-2022_14-54.zip.001
Where last 3 digits can be between 001-029
Here is code that I'm using for standard zip files
files_in_directory = os.listdir(directory)
filtered_files = [file for file in files_in_directory if file.endswith(".zip")]
for file in filtered_files:
path_to_file = os.path.join(directory, file)
os.remove(path_to_file)
Assuming the double extensions are of the form .zip.xyz, with xyz being triple digits, you can use globbing:
import glob
import os
for path in glob.glob('*.zip.[0-9][0-9][0-9]'):
os.remove(path)
(As a usual precaution, check first, by replacing os.remove with print).
If you have a specific directory, its name stored in directory, you can use:
import glob
import os
for path in glob.glob(os.path.join(directory, '*.zip.[0-9][0-9][0-9]')):
os.remove(path)
There is no need to join the directory and path inside the for loop (as is the case in the question): path itself will already contain the directory name.

Recursively find and copy files from many folders

I have some files in an array that I want to recursively search from many folders
An example of the filename array is ['A_010720_X.txt','B_120720_Y.txt']
Example of folder structure is as below which I can also provide as an array e.g ['A','B'] and ['2020-07-01','2020-07-12']. The "DL" remains the same for all.
C:\A\2020-07-01\DL
C:\B\2020-07-12\DL
etc
I have tried to use shutil but it doesn't seem to work effectively for my requirement as I can only pass in a full file name and not a wildcard. The code I have used with shutil which works but without wildcards and with absolute full file name and path e.g the code below will only give me A_010720_X.txt
I believe the way to go would be using glob or pathlib which i have not used before or cannot find some good examples similar to my use case
import shutil
filenames_i_want = ['A_010720_X.txt','B_120720_Y.txt']
RootDir1 = r'C:\A\2020-07-01\DL'
TargetFolder = r'C:\ELK\LOGS\ATH\DEST'
for root, dirs, files in os.walk((os.path.normpath(RootDir1)), topdown=False):
for name in files:
if name in filenames_i_want:
print ("Found")
SourceFolder = os.path.join(root,name)
shutil.copy2(SourceFolder, TargetFolder)
I think this should do what you need assuming they are all .txt files.
import glob
import shutil
filenames_i_want = ['A_010720_X.txt','B_120720_Y.txt']
TargetFolder = r'C:\ELK\LOGS\ATH\DEST'
all_files = []
for directory in ['A', 'B']:
files = glob.glob('C:\{}\*\DL\*.txt'.format(directory))
all_files.append(files)
for file in all_files:
if file in filenames_i_want:
shutil.copy2(file, TargetFolder)

Ignoring folders when organizing files

I am fairly new to python, and trying to write a program that organizes files based on their extensions
import os
import shutil
newpath1 = r'C:\Users\User1\Documents\Downloads\Images'
if not os.path.exists(newpath1): # check to see if they already exist
os.makedirs(newpath1)
newpath2 = r'C:\Users\User1\Documents\Downloads\Documents'
if not os.path.exists(newpath2):
os.makedirs(newpath2)
newpath3 = r'C:\Users\User1\Documents\Downloads\Else'
if not os.path.exists(newpath3):
os.makedirs(newpath3)
source_folder = r"C:\Users\User1\Documents\Downloads" # the location of the files we want to move
files = os.listdir(source_folder)
for file in files:
if file.endswith(('.JPG', '.png', '.jpg')):
shutil.move(os.path.join(source_folder,file), os.path.join(newpath1,file))
elif file.endswith(('.pdf', '.pptx')):
shutil.move(os.path.join(source_folder,file), os.path.join(newpath2,file))
#elif file is folder:
#do nothing
else:
shutil.move(os.path.join(source_folder,file), os.path.join(newpath3,file))
I want it to move files based on their extensions. However, I am trying to figure out how to stop the folders from moving. Any help would be greatly appreciated.
Also, for some reason, not every file is being moved, even though they have the same extension.
As with most path operations, I recommend using the pathlib module. Pathlib is available since Python 3.4 and has portable (multi platform), high-level API for file system operations.
I recommend using the following methods on Path objects, to determine their type:
Path.is_file()
Path.is_dir()
import shutil
from pathlib import Path
# Using class for nicer grouping of target directories
# Note that pathlib.Path enables Unix-like path construction, even on Windows
class TargetPaths:
IMAGES = Path.home().joinpath("Documents/Downloads/Images")
DOCUMENTS = Path.home().joinpath("Documents/Downloads/Documents")
OTHER = Path.home().joinpath("Documents/Downloads/Else")
__ALL__ = (IMAGES, DOCUMENTS, OTHER)
for target_dir in TargetPaths.__ALL__:
if not target_dir.is_dir():
target_dir.mkdir(exist_ok=True)
source_folder = Path.home().joinpath("Documents/Downloads") # the location of the files we want to move
# Get absolute paths to the files in source_folder
# files is a generator (only usable once)
files = (path.absolute() for path in source_folder.iterdir() if path.is_file())
def move(source_path, target_dir):
shutil.move(str(source_path), str(target_dir.joinpath(file.name))
for path in files:
if path.suffix in ('.JPG', '.png', '.jpg'):
move(path, TargetPaths.IMAGES)
elif path.suffix in ('.pdf', '.pptx'):
move(path, TargetPaths.DOCUMENTS)
else:
move(path, TargetPaths.OTHER)
See here
In particular, the os.walk command. This command returns a 3-tuple with the dirpath, dirname, and filename.
In your case, you should use [x[0] for x in os.walk(dirname)]

Loop through binary files that has no extension

I was looking for ways to loop over files in directory with python, and I found this question:
Loop through all CSV files in a folder
The point is that the files I have are binary files, with no file extension at the end.
What I want my program to do is to iterate through all the files that have no extension.
Anyway to apply this using wildcards? (Or any other way?)
You can use os.path.splitext to check if a file has an extension or not.
See this examples:
import os
os.path.splitext("foo.ext")
=> ('foo', '.ext')
os.path.splitext("foo")
=> ('foo', '')
So, you can do that:
import os
path = "path/to/files"
dirs = os.listdir(path)
for path in dirs:
if not os.path.splitext(path)[1]:
print(path)
But, beware of "hidden" files which name starts with a dot, ie.: ".bashrc".
You can also check for the existence of a dot in the filename:
for path in dirs:
if "." not in path:
print(path)
Sounds like what you are interested in is
[f for f in next(os.walk(folder))[2] if '.' not in f]
I would suggest using os.listdir(), and then check whether filename has an extension (check if there is a dot in a filename). Once You get all filenames without dots (that is, without extension), just be sure to check that the filename isn't actually directory name, and that's it.
You could use the glob module and filter out any files with extensions:
import glob
for filename in (filename for filename in glob.iglob('*') if '.' not in filename):
print(filename)

Finding correct path to files in subfolders with os.walk with python?

I am trying to create a program that copies files with certain file extension to the given folder. When files are located in subfolders instead of the root folder the program fails to get correct path. In its current state the program works perfectly for the files in the root folder, but it crashes when it finds matching items in subfolders. The program tries to use rootfolder as directory instead of the correct subfolder.
My code is as follows
# Selective_copy.py walks through file tree and copies files with
# certain extension to give folder
import shutil
import os
import re
# Deciding the folders and extensions to be targeted
# TODO: user input instead of static values
extension = "zip"
source_folder = "/Users/viliheikkila/documents/kooditreeni/"
destination_folder = "/Users/viliheikkila/documents/test"
def Selective_copy(source_folder):
# create regex to identify file extensions
mo = re.compile(r"(\w+).(\w+)") # Group(2) represents the file extension
for dirpath, dirnames, filenames in os.walk(source_folder):
for i in filenames:
if mo.search(i).group(2) == extension:
file_path = os.path.abspath(i)
print("Copying from " + file_path + " to " + destination_folder)
shutil.copy(file_path, destination_folder)
Selective_copy(source_folder)
dirpath is one of the things provided by walk for a reason: it gives the path to the directory that the items in files is located in. You can use that to determine the subfolder you should be using.
file_path = os.path.abspath(i)
This line is blatantly wrong.
Keep in mind that filenames keeps list of base file names. At this point it's just a list of strings and (technically) they are not associated at all with files in filesystem.
os.path.abspath does string-only operations and attempts to merge file name with current working dir. As a result, merged filename points to file that does not exist.
What should be done is merge between root and base file name (both values yield from os.walk):
file_path = os.path.abspath(dirpath, i)

Categories

Resources