Reading images while maintaining folder structure - python

I have to write a matlab script in python as apparently what I want to achieve is done much more efficiently in Python.
So the first task is to read all images into python using opencv while maintaining folder structure. For example if the parent folder has 50 sub folders and each sub folder has 10 images then this is how the images variable should look like in python, very much like a cell in matlab. I read that python lists can perform this cell like behaviour without importing anything, so thats good I guess.
For example, below is how I coded it in Matlab:
path = '/home/university/Matlab/att_faces';
subjects = dir(path);
subjects = subjects(~strncmpi('.', {subjects.name}, 1)); %remove the '.' and '..' subfolders
img = cell(numel(subjects),1); %initialize the cell equal to number of subjects
for i = 1: numel(subjects)
path_now = fullfile(path, subjects(i).name);
contents = dir([path_now, '/*.pgm']);
for j = 1: numel(contents)
img{i}{j} = imread(fullfile(path_now,contents(j).name));
disp([i,j]);
end
end
The above img will have 50 cells and each cell will have stored 10 images. img{1} will be all images belonging to subject 1 and so on.
Im trying to replicate this in python but am failing, this is what I have I got so far:
import cv2
import os
import glob
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
images = []
for n in sub_f:
path_now = os.path.join(path, sub_f[n], '*.pgm')
images[n] = [cv2.imread(file) for file in glob.glob(path_now)]
Its not exactly what I am looking for, some help would be appreciated. Please ignore silly mistakes as it is my first day writing in python.
Thanks
edit: directory structure:

The first problem is that n isn't a number or index, it is a string containing the path name. To get the index, you can use enumerate, which gives index, value pairs.
Second, unlike in MATLAB you can't assign to indexes that don't exist. You need to pre-allocate your image array or, better yet, append to it.
Third, it is better not to use the variable file since in python 2 it is a built-in data type so it can confuse people.
So with preallocating, this should work:
images = [None]*len(sub_f)
for n, cursub in enumerate(sub_f):
path_now = os.path.join(path, cursub, '*.pgm')
images[n] = [cv2.imread(fname) for fname in glob.glob(path_now)]
Using append, this should work:
for cursub in sub_f
path_now = os.path.join(path, cursub, '*.pgm')
images.append([cv2.imread(fname) for fname in glob.glob(path_now)])
That being said, there is an easier way to do this. You can use the pathlib module to simplify this.
So something like this should work:
from pathlib import Path
mypath = Path('/home/university/Matlab/att_faces')
images = []
for subdir in mypath.iterdir():
images.append([cv2.imread(str(curfile)) for curfile in subdir.glob('*.pgm')])
This loops over the subdirectories, then globs each one.
This can even be done in a nested list comprehension:
images = [[cv2.imread(str(curfile)) for curfile in subdir.glob('*.pgm')]
for subdir in mypath.iterdir()]

It should be the following:
import os
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
print(sub_f) #--- this will print all the files present in this directory ---
#--- this a list to which you will append all the images ---
images = []
#--- iterate through every file in the directory and read those files that end with .pgm format ---
#--- after reading it append it to the list ---
for n in sub_f:
if n.endswith('.pgm'):
path_now = os.path.join(path, n)
print(path_now)
images.append(cv2.imread(path_now, 1))

import cv2
import os
import glob
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
images = []
#read the images
for folder in sub_f:
path_now = os.path.join(path, folder, '*.pgm')
images.append([cv2.imread(file) for file in glob.glob(path_now)])
#display the images
for folder in images:
for image in folder:
cv2.imshow('image',image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Related

batch copy files, perform operation and copy more files

I want to copy files from a directory in batch mode, perform an operation on the copied files and then copy more files. To do this I have managed this code
import os
import sys
from shutil import copy2
_, _, filenames = next(os.walk("src/"))
print(filenames)
number_of_files = len(filenames)
batch_number = 2
i = 0
while i < number_of_files:
i += 1
j = i + batch_number
print(filenames[i:j])
and its output is
['file_02', 'file_03']
['file_03', 'file_04']
['file_04', 'file_010']
['file_010', 'file_01']
['file_01', 'file_06']
['file_06', 'file_08']
['file_08', 'file_09']
['file_09', 'file_07']
['file_07']
[]
What I want is:
['file_01', 'file_02']
['file_03', 'file_04']
['file_05', 'file_06']
['file_07', 'file_08']
['file_09', 'file_10']
What would be the best way to go about doing this?
Be careful, os.walk doesn't provide sorting in numerical way.
You can use sort() method. And pass each time to certain list, where you sort the content numerically
you_file_list.sort(key=int)
Since you case contain file name, you can put file_ to XX number in filenames list.

Opening files from directory in specific order

I have a folder that contains around 500 images that I am rotating at a random angle from 0 to 360. The files are named 00i.jpeg where i = 0 then i = 1. For example I have an image named 009.jpeg and one named 0052.jpeg and another one 00333.jpeg. My code below works as is does rotate the image, but how the files are being read through is not stepping correctly.
I would think I would need some sort of stepping code chunk that starts at 0 and adds one each time, but I'm not sure where I would put that. os.listdir doesn't allow me to do that because (from my understanding) it just lists the files out. I tried using os.walk but I cannot use cv2.imread. I receive a SystemError: <built-in function imread> returned NULL without setting an error error.
Any suggestions?
import cv2
import imutils
from random import randrange
import os
os.chdir("C:\\Users\\name\\Desktop\\training\\JPEG")
j = 0
for infile in os.listdir("C:\\Users\\name\\Desktop\\training\\JPEG"):
filename = 'testing' + str(j) + '.jpeg'
i = randrange(360)
image = cv2.imread(infile)
rotation_output = imutils.rotate_bound(image, angle=i)
os.chdir("C:\\Users\\name\\Desktop\\rotate_test")
cv2.imwrite("C:\\Users\\name\\Desktop\\rotate_test\\" + filename, rotation_output)
os.chdir("C:\\Users\\name\\Desktop\\training\\JPEG")
j = j + 1
print(infile)
000.jpeg
001.jpeg
0010.jpeg
00100.jpeg
...
Needs to be:
print(infile)
000.jpeg
001.jpeg
002.jpeg
003.jpeg
...
Get a list of files first, then use sort with key where the key is an integer version of the file name without extension.
files = os.listdir("C:\\Users\\name\\Desktop\\training\\JPEG")
files.sort(key=lambda x:int(x.split('.')[0]))
for infile in files:
...
Practical example:
files = ['003.jpeg','000.jpeg','001.jpeg','0010.jpeg','00100.jpeg','002.jpeg']
files.sort(key=lambda x:int(x.split('.')[0]))
print(files)
Output
['000.jpeg', '001.jpeg', '002.jpeg', '003.jpeg', '0010.jpeg', '00100.jpeg']

Sorting a set of matrices

I have many images(about 10000). The my goal is make the binary research on a the set the matrixs bidimensional and researching if there are images duplicate and delete this images. But exist the concept the matrix major another matrix? How i can solve? The alternative is make a research sequential, but is many innefficient.
#Miki's suggestion seemed like a fun exercise, so I created an implementation that you can use.
More on hashing here
import hashlib, os, cv2
# location of images
path = '.'
# create list that will hold the hashes
all_hashes = []
# get and iterate all image paths
all_files = os.listdir(path)
for f in all_files:
# check image extension
name, ext = os.path.splitext(f)
if ext == '.jpg':
# open image
img = cv2.imread(f)
# hash the image and get hex representation
hash = hashlib.md5(img).hexdigest()
# check if hash already exists, if not then add it to the list
if hash in all_hashes:
print('Already exists: ' + f)
else:
all_hashes.append(hash)

Iteratively open image with increasing ID number as a file name in pyhon

I've got an image database with a set of images named [frame01.png, frame02.png, ..., frameN.png].
My directory path is ./img, and iteratively I'd like to read one by one, do some image processing until reaching the last one. Since I'm not familiar with strings concatenation in python, what's the easiest way to do it?
file_names = os.listdir('path_to_folder/')
should give you a list of all you files.
To read them you can have:
for file_name in file_names:
read_and_process_image('path_to_folder/' + file_name)
Then inside read_and_process_image:
import matplotlib.image
def read_and_process_image(path):
read_img = matplotlib.image.imread(path) # or whatever you use to read the image
# process read_img
Alternatively, you could have:
import glob
for image_path in glob.glob("path_to_your_image*.png"):
image = matplotlib.image.imread(image_path) # or whatever you use to read the image
# process your image
If you are just looking for a quick way to create the list with this particular names:
[ 'frame' + "%02d" % (i,) + '.png' for i in range(1, MAX_NUM)]
If your last image is 20 then replace MAX_NUM with 20 + 1 applies for any other number x, x + 1.
How/what you use to read the files depends on you. You can use matplotlib.image as in the examples or whatever works for you.

Pulling random files out of a folder for sampling

I needed a way to pull 10% of the files in a folder, at random, for sampling after every "run." Luckily, my current files are numbered numerically, and sequentially. So my current method is to list file names, parse the numerical portion, pull max and min values, count the number of files and multiply by .1, then use random.sample to get a "random [10%] sample." I also write these names to a .txt then use shutil.copy to move the actual files.
Obviously, this does not work if I have an outlier, i.e. if I have a file 345.txt among other files from 513.txt - 678.txt. I was wondering if there was a direct way to simply pull a number of files from a folder, randomly? I have looked it up and cannot find a better method.
Thanks.
Using numpy.random.choice(array, N) you can select N items at random from an array.
import numpy as np
import os
# list all files in dir
files = [f for f in os.listdir('.') if os.path.isfile(f)]
# select 0.1 of the files randomly
random_files = np.random.choice(files, int(len(files)*.1))
I was unable to get the other methods to work easily with my code, but I came up with this.
output_folder = 'C:/path/to/folder'
for x in range(int(len(files) *.1)):
to_copy = choice(files)
shutil.copy(os.path.join(subdir, to_copy), output_folder)
This will give you the list of names in the folder with mypath being the path to the folder.
from os import listdir
from os.path import isfile, join
from random import shuffle
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
shuffled = shuffle(onlyfiles)
small_list = shuffled[:len(shuffled)/10]
This should work
You can use following strategy:
Use list = os.listdir(path) to get all your files in the directory as list of paths.
Next, count your files with range = len(list) function.
Using rangenumber you can get random item number like that random_position = random.randrange(1, range)
Repeat step 3 and save values in a list until you get enough positions (range/10 in your case)
After that you can get required files names like that list[random_position]
Use cycle for for iterating.
Hope this helps!
Based on Karl's solution (which did not work for me under Win 10, Python 3.x), I came up with this:
import numpy as np
import os
# List all files in dir
files = os.listdir("C:/Users/.../Myfiles")
# Select 0.5 of the files randomly
random_files = np.random.choice(files, int(len(files)*.5))
# Get the remaining files
other_files = [x for x in files if x not in random_files]
# Do something with the files
for x in random_files:
print(x)

Categories

Resources