Python Pillow Library opening and editing images ending with specific names - python

Currently I am using Python Pillow Library to edit images. Since I am dealing with large data-sets and need to edit some images with only specific name endings (say only image names that end with cropped or images of specific file type like png or bmp), is there a way to write code in such a way that allows me to open and edit these images? If so please give me hints or suggestions! Thanks!
Also Pillow version is 5.0.0 and Python version is 3.6.

If your question is to only know if is there a way to write code to only allow you to edit image files with specific file type and specific end names. Then the answer is YES. You can do it with python.
A Sample Code:
import os
from PIL import Image #Pillow
directory = os.fsencode("images_folder")
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".png") or filename.endswith(".bmp") or "cropped" in filename:
# Do the editing using pillow
# img = Image.open(filename)
continue

Certainly you can do this in python, but the specific way of doing this obviously depends on the specifics of the problem. Are all your images stored in one directory or many? Will you be running the script from the same directory as the images or from some other directory? Etc.
To get you started, take a look at the os module here.
In this module, there is a listdir method that returns a list of all files inside a directory. You can iterate through that list and find all the filenames that ends with a specific set of characters by using the built in endswith method on strings. For example:
import os
fileslist = [f for f in os.listdir(path) if f.endswith('.jpg')]
Now that you have a filelist of all the files in a directory that ends with some certain characters, you can then use pillow to open the images from that list.

Related

How do I convert multiple PDFs into images from the same folder in Python?

from pdf2image import convert_from_path
images = convert_from_path('path.pdf',poppler_path=r"E:/software/poppler-0.67.0/bin")
for i in range(len(images)):
images[i].save('image_name'+ str(i) +'.jpg', 'JPEG')
But now I want to convert more than 100 pdf files into images.
Is there any way?
Thanks in advance.
You can use glob to 'glob' the file names into a list: Python glob is here https://docs.python.org/3/library/glob.html - but it's a general expression for using wildcard expansion in the (*nix) filesystem [https://en.wikipedia.org/wiki/Glob_(programming)]. I assume it works under windows :)
Then you just loop over the files. Hey presto!
import glob
from pdf2image import convert_from_path
poppler_path = r"E:/software/poppler-0.67.0/bin"
pdf_filenames = glob.glob('/path/to/image_dir/*.pdf')
for pdf_filename in pdf_filenames:
images = convert_from_path(pdf_filename, poppler_path=poppler_path)
for i in range(len(images)):
images[i].save(f"{pdf_filename}{i}.jpg", 'JPEG')
!TIP: f"{pdf_filename}{i}.jpg" is a python f-string which gives a the reader a better idea of what the string will look like eventually. You might want to zero pad the integers there, because at some point you might want to 'glob' those or some such. There are lots of ways to achieve that - see How to pad zeroes to a string? for example.
You will possibly need to use the os module.
First step:
Use the os.listdir function like this
os.listdir(path to folder containing pdf files)
to get a list of paths within that folder.
To be more specific the os.isfile() to check if the current path is a file or a folder .
Perform the conversion if the path lead to a file like this.
images = convert_from_path('path.pdf',poppler_path=r"E:/software/poppler-0.67.0/bin")
for i in range(len(images)):
images[i].save('image_name'+ str(i) +'.jpg', 'JPEG')
Otherwise use recursion to traverse the folder even more.
Here's a link to a repo where I recursively resized images in a folder . It could be useful to digest this idea.
Link to a recursive resizing of images in a given path.

Split image/pdf based on specific text with Python

I want to split a pdf (or image if needed) based on text in it. I want to split it to get each question with its options in the pdf/image, separately like a screenshot of just that question and its options.
Sample PDF link:https://drive.google.com/file/d/1UtMropzRdfJwQjaRf9kZa1UpAzrKlH-K/view?usp=sharing
Is it even possible? If yes what is the code needed to accomplish this. I am a newbie to python so some explanation might help. I've got almost 100 of these PDFs and just wanted to automate the process of getting individual question and its options.
Step1: You simply need to install pdftotext and put the .exe in the same working directory.
Step2: Copy the code down below and paste it in the same directory.
step3: Also keep in mind that the pdf files should also be in the same directory
step4: Run the .py file
Complete Code that worked for me :
import os
import glob
import subprocess
files=[]
#remember to put your pdftotxt.exe to the folder with your pdf files
for filename in glob.glob(os.getcwd() + '\\*.pdf'):
files.append(filename[0:-4]+".txt")
subprocess.call([os.getcwd() + '\\pdftotext', filename, filename[0:-4]+".txt"])
all_files=[]
for i in range(len(files)):
with open(files[i],'r') as f:
text=f.read()
text=text.split('carry one mark each')[1].split('WWW.UNITOPERATION.COM')[0]
text_ls=text.splitlines()
ques=[]
counter=1
for i in range(len(text_ls)):
if text_ls[i].startswith(str(counter)+'.'):
ques.append(''.join(text_ls[i:]).split('\n'[0]))
counter+=1
all_files.append(ques)
# Now you have list of all_files in which ques list is added
# You simply need take one by one element out from all_files and write it in a .txt file
# and that will be your task

prevent getfiles from seeing .DS and other hidden files

I am currently working on a python project on my macintosh, and from time to time I get unexpected errors, because .DS or other files, which are not visible to the "not-root-user" are found in folders. I am using the following command
filenames = getfiles.getfiles(file_directory)
to retreive information about the amount and name of the files in a folder. So I was wondering, if there is a possibility to prevent the getfiles command to see these types of files, by for example limiting its right or the extensions which it can see (all files are of .txt format)
Many thanks in advance!
In your case, I would recommend you switch to the Python standard library glob.
In your case, if all files are of .txt format, and they are located in the directory of /sample/directory/, you can use the following script to get the list of files you wanted.
from glob import glob
filenames = glob.glob("/sample/directory/*.txt")
You can easily use regular expressions to match files and filter out files you do not need. More details can be found from Here.
Keep in mind that with regular expression, you could do much more complicated pattern matching than the above example to handle your future needs.
Another good example of using glob to glob multiple extensions can be found from Here.
If you only want to get the basenames of those files, you can always use standard library os to extract basenames from the full paths.
import os
file_basenames = [os.path.basename(full_path) for full_path in filenames]
There isn't an option to filter within getfiles, but you could filter the list after.
Most-likely you will want to skip all "dot files" ("system files", those with a leading .), which you can accomplish with code like the following.
filenames = [f for f in ['./.a', './b'] if not os.path.basename(f).startswith('.')]
Welcome to Stackoverflow.
You might find the glob module useful. The glob.glob function takes a path including wildcards and returns a list of the filenames that match.
This would allow you to either select the files you want, like
filenames = glob.glob(os.path.join(file_directory, "*.txt")
Alternatively, select the files you don't want, and ignore them:
exclude_files = glob.glob(os.path.join(file_directory, ".*"))
for filename in getfiles.getfiles(file_directory):
if filename in exclude_files:
continue
# process the file

How to rapidly switch from one directory to another Python

I have a huge list of image in one directory and another corresponding list of annotations in the other (.txt files).
I need to perform an operation on each image following the matching image annotations and save it into another directory. Is there an elegant way not to chdir three times at each step?
Maybe using cPickle or whatever library used for fast files management ?
import glob
from PIL import Image
os.chdir('path_images')
list_im=glob.glob('*.jpg')
list_im.sort()
list_im=path_images+list_im
os.chdir('path_txt')
list_annot=glob.glob('*.txt')
list_annot.sort()
list_annot=path_txt+list_im
for i in range(0,len(list_images)):
Joel pointed out that the os operations are not mandatory if you include the path in the name
#os.chdir('path_images')
im=Image.open(list_im[i])
#os.chdir('path_text')
action_on_image(im,list_annot[i])
#os.chdir('path_to_save_image')
im.save(path_to_save+nom_image)
I am a true beginner in Python but I am confident that my code is super inefficient and can be improved.
You don't have to chdir (and FWIW you really don't want to depend on the current working directory). Use absolute paths everywhere in your code and you'll be fine.
import os
import glob
from PIL import Image
abs_images_path = <absolute path to your images directory here>
abs_txt_path = <absolute path to your txt directory here>
abs_dest_path = <absolute path to where you want to save your images>
list_im=sorted(glob.glob(os.path.join(abs_images_path, '*.jpg')))
list_annot=sorted(glob.glob(os.path.join(abs_txt_path, '*.txt')))
for im_path, txt_path in zip(list_im, list_annot):
im = Image.open(im_path)
action_on_image(im, txt_path)
im.save(os.path.join(abs_dest_path, nom_image))
Note that if your paths are relative to where your script is installed, you can get the script's directory path with os.path.dirname(os.path.abspath(__file__))

How to open multiple images using Wand and Python

I'm trying to use imagemagick to take several JPGs and lay them out on a pdf
I'm at the beginning of the project and I do not understand how to open the multiple images and do this.
My research indicated to me that this is possible on the command line by simply passing the convert() function multiple files. experience is telling me that this isn't how it's done with Wand, but I can't figure out how!
Any advice is appreciated!
my example code is below.
before you excute this code, insert the dir path.
import os
from wand.image import Image
from wand.display import display
path = "____absolute_dir_path____ (ex. /home/kim/work/)"
dirList=os.listdir(path)
for fname in dirList:
print fname
with Image(filename=path+fname) as img:
print img.size

Categories

Resources