Want to get desired Image path using os.walk in python

Want to get desired Image path using os.walk in python - python

I have Folder Named A, which includes some Sub-Folders starting name with Alphabet A.
In these Sub-Folders different images are placed (some of the image formats are .png, jpeg, .giff and .webp, having different names
like item1.png, item1.jpeg, item2.png, item3.png etc). From these Sub-Folders I want to get list of path of those images which endswith 1.
Along with that I want to only get 1 image file format like for example only for jpeg. In some Sub-Folders images name endswith 1.png, 1.jpeg, 1.giff and etc.
I only want one image from every Sub-Folder which endswith 1.(any image format). I am sharing the code which returns image path of items (ending with 1) for all images format.
CODE:

here is the code that can solve your problem.
import os
img_paths = []
for top,dirs, files in os.walk("your_path_goes_here"):
for pics in files:
if os.path.splitext(pics)[0] == '1':
img_paths.append(os.path.join(top,pics))
break
img_paths will have the list that you need
it will have the first image from the subfolder with the name of 1 which can be any format.
Incase if you want with specific formats,
import os
img_paths = []
for top,dirs, files in os.walk("your_path_goes_here"):
for pics in files:
if os.path.splitext(pics)[0] == '1' and os.path.splitext(pics)[1][1:] in ['png','jpg','tif']:
img_paths.append(os.path.join(top,pics))
break
Thanks, to S3DEV for making it more optimized

Related

Loading images from their respective names mentioned in csv column

I have dataset of images and it's corresponding csv files (converted to dataframe) containing names and other information of these images. The actual number of images are about 7000 but after pre-processing the dataframe, I have left just 3000 image names in this dataframe. Now I want to load only those images which are available in the dataframe only.
The image names in dataframe are like below
| images |
1_IM-0001-4001.dcm.png
2_IM-0001-4001.dcm.png
3_IM-0001-4001.dcm.png
but the full path of these images are like below including directory path which is also called absolute path
/content/ChestXR/images/images_normalized/1004_IM-0005-1001.dcm.png
Now I want to run a loop that read images from the dataframe column only for this I need absolute path plus image names mentioned in dataframe column
for images in os.listdir(path):
if (images.endswith(".png") or images.endswith(".jpg") or images.endswith(".jpeg")):
image_path = path + {df["images"]}
where image directory path is below
path = "/content/drive/MyDrive/IU-Xray/images/images_normalized"
and the respective data frame column name is below
df["images"]
but the below line does not work in my loop and generates error that "TypeError: unhashable type: 'Series'"
image_path = path + {df["images"]}

This may not be the fullest answer but I think will get you close..
path = "/content/drive/MyDrive/IU-Xray/images/images_normalized"
filestodownload = []
for images in df["images"]:
if (images.endswith(".png") or images.endswith(".jpg") or images.endswith(".jpeg")):
filestodownload.append(path + '//' + images)
Then you'll have a list of images you need to download from etc.
You may have to check if df["images"] will work to iterate through, you can turn that column into a list as well if that's easier

as you are using pandas you can do something like this:
path = "/content/drive/MyDrive/IU-Xray/images/images_normalized/"
mask = df['images'].str.contains(r'\.(?:png|jpg|jpeg)$')
full_path = path + df.images[mask]
print(full_path[1])
# /content/drive/MyDrive/IU-Xray/images/images_normalized/2_IM-0001-4001.dcm.png

Folders of pictures into a single PDF

I have the following problem.
I have folder structure like this:
vol1/
chap1/
01.jpg
02.JPG
03.JPG
chap2/
04.JPG
05.jpg
06.jpg
chap3/
07.JPG
08.jpg
09.JPG
vol2/
chap4/
01.JPG
02.jpg
03.jpg
chap5/
04.jpg
05.JPG
06.jpg
chap6/
07.jpg
08.JPG
09.jpg
Inside a single vol folder, the chapters have an increasing order, and the same happens for the jpg files inside each chap folder.
Now, I would like, for each vol folder to obtain a pdf, maintaining the ordering of the pictures. Think about it as a divided comics or manga volume to be put back into a single file.
How could I do it in bash or python?
I do not know how many volumes I have, or how many chapters are in a single volume, or how many jpg files are in a single chapter. In other words, I need it to work it for whatever number of volumes/chapters/jpgs.
An addition would be considering heterogeneous picture files, maybe having both jpg and png in a single chapter, but that's a plus.

I guess this should work like intended ! Tell me if you encounter issues
import os
from PIL import Image
def merge_into_pdf(paths, name):
list_image = []
# Create list of images from list of path
for i in paths:
list_image.append(Image.open(i).convert("RGB"))
# merge into one pdf
if len(list_image) == 0:
return
# get first element of list and pop it from list
img1 = list_image[0]
list_image.pop(0)
# append all images and save as pdf
img1.save(f"{name}.pdf",save_all=True, append_images=imagelist)
def main():
# List directory
directory = os.listdir(".")
for i in directory:
# if directory start with 'vol' iterate through it
if i.find("vol") != -1:
path = []
sub_dir = os.listdir(i)
# for each subdirectory
for j in sub_dir:
files = os.listdir(f"{i}/{j}")
for k in files:
# if file ends with jpg or png append to list
if k.endswith((".jpg", ".JPG", ".png", ".PNG")):
path.append(f"{i}/{j}/{k}")
# merge list into one pdf
merge_into_pdf(path, i)
if __name__ == "__main__":
main()

How do I convert multiple PDFs into images from the same folder in Python?

from pdf2image import convert_from_path
images = convert_from_path('path.pdf',poppler_path=r"E:/software/poppler-0.67.0/bin")
for i in range(len(images)):
images[i].save('image_name'+ str(i) +'.jpg', 'JPEG')
But now I want to convert more than 100 pdf files into images.
Is there any way?
Thanks in advance.

You can use glob to 'glob' the file names into a list: Python glob is here https://docs.python.org/3/library/glob.html - but it's a general expression for using wildcard expansion in the (*nix) filesystem [https://en.wikipedia.org/wiki/Glob_(programming)]. I assume it works under windows :)
Then you just loop over the files. Hey presto!
import glob
from pdf2image import convert_from_path
poppler_path = r"E:/software/poppler-0.67.0/bin"
pdf_filenames = glob.glob('/path/to/image_dir/*.pdf')
for pdf_filename in pdf_filenames:
images = convert_from_path(pdf_filename, poppler_path=poppler_path)
for i in range(len(images)):
images[i].save(f"{pdf_filename}{i}.jpg", 'JPEG')
!TIP: f"{pdf_filename}{i}.jpg" is a python f-string which gives a the reader a better idea of what the string will look like eventually. You might want to zero pad the integers there, because at some point you might want to 'glob' those or some such. There are lots of ways to achieve that - see How to pad zeroes to a string? for example.

You will possibly need to use the os module.
First step:
Use the os.listdir function like this
os.listdir(path to folder containing pdf files)
to get a list of paths within that folder.
To be more specific the os.isfile() to check if the current path is a file or a folder .
Perform the conversion if the path lead to a file like this.
images = convert_from_path('path.pdf',poppler_path=r"E:/software/poppler-0.67.0/bin")
for i in range(len(images)):
images[i].save('image_name'+ str(i) +'.jpg', 'JPEG')
Otherwise use recursion to traverse the folder even more.
Here's a link to a repo where I recursively resized images in a folder . It could be useful to digest this idea.
Link to a recursive resizing of images in a given path.

Is there a way to automatically generate an empty array for each iteration of a for loop?

I am trying to create a separate array for each pass of the for loop in order to store the values of 'signal' which are generated by the wavefile.read function.
Some background as to how the code works / how Id like it to work:
I have the following file path:
Root directory
Labeled directory
Irrelevant multiple directories
Multiple .wav files stored in these subdirectories
Labeled directory
Irrelevant multiple directories
Multiple .wav files stored in these subdirectories
Now for each Labeled Folder, Id like to create an array that holds the values of all the .wav files contained in its respective sub directories.
This is what I attempted:
for label in df.index:
for path, directories, files in os.walk('voxceleb1/wav_dev_files/' + label):
for file in files:
if file.endswith('.wav'):
count = count + 1
rate,signal = wavfile.read(os.path.join(path, file))
print(count)
Above is a snapshot of dataframe df
Ultimately, the reason for these arrays is that I would like to calculate the mean average length of time of the wav files contained in each labeled subdirectory and add this as a column vector to the dataframe.
Note that the index of the dataframe corresponds to the directory names. I appreciate any and all help!

The code snippet you've posted can be simplified and modernized a bit. Here's what I came up with:
I've got the following directory structure:
I'm using text files instead of wav files in my example, because I don't have any wav files on hand.
In my root, I have A and B (these are supposed to be your "labeled directories"). A has two text files. B has one immediate text file and one subfolder with another text file inside (this is meant to simulate your "irrelevant multiple directories").
The code:
def main():
from pathlib import Path
root_path = Path("./root/")
labeled_directories = [path for path in root_path.iterdir() if path.is_dir()]
txt_path_lists = []
# Generate lists of txt paths
for labeled_directory in labeled_directories:
txt_path_list = list(labeled_directory.glob("**/*.txt"))
txt_path_lists.append(txt_path_list)
# Print the lists of txt paths
for txt_path_list in txt_path_lists:
print(txt_path_list)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
The output:
[WindowsPath('root/A/a_one.txt'), WindowsPath('root/A/a_two.txt')]
[WindowsPath('root/B/b_one.txt'), WindowsPath('root/B/asdasdasd/b_two.txt')]
As you can see, we generated two lists of text file paths, one for each labeled directory. The glob pattern I used (**/*.txt) handles multiple nested directories, and recursively finds all text files. All you have to do is change the extension in the glob pattern to have it find .wav files instead.

Reading images in python using OpenCV and OS libraries

I have a folder with numerous images(about 300), I am gonna save the python file which will be splitting the images into red, green and blue channels and saving them as _red, _green, _blue, preceded by the original image name itself in a different folder. For example if the image is named "image 001", then the images obtained after the split are: "image 001_red", "image 001_green", "image 001_blue". Now, is there a way I can obtain the images one after the other using the OS library? (Appreciate any answer what-so-ever, because this is my first question on this site)

You are asking how to read an image list from a directory to python. Here is how.
from os import walk
# Get file list
def getImageList(path):
for (dirpath, dirnames, filenames) in walk(path):
return filenames
# Demo printing file names
filelist = getImageList("path/to/image/dir")
for fileName in fileList:
print(fileName)
getImageList(path) function returns all files (not directories) in a given path. Place all your images inside a directory, and use the function to get the file list.
Hope this helped.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Want to get desired Image path using os.walk in python - python

Related

Loading images from their respective names mentioned in csv column

Folders of pictures into a single PDF

How do I convert multiple PDFs into images from the same folder in Python?

Is there a way to automatically generate an empty array for each iteration of a for loop?

Reading images in python using OpenCV and OS libraries

Categories

Resources