I am working with Chest X-Ray14 dataset. The data contains about 112,200 images grouped in 12 folders (i.e. images1 to images12) The image labels are in a csv file called Data_Entry_2017.csv. I want to split the images base on the csv labels (attribute "Finding Labels) into their their various train and test folders.
Can anyone help me with Python or Jupyter-notebook split code? I will be grateful.
df = pd.rread_csv("Data_Entry_2017.csv")
infiltration_df = df[df["Finding Label"]=="Infiltration"]
list_infiltration = infiltration_df .index.values.tolist() # This will be a list of image names
Then you can parse each folder and check if image name is in the list of infiltration labels, you can put that in different folders.
To read all image filenames in a folder, you can use os.listdir
from os import listdir
from os.path import isfile, join
imagefiles = [f for f in listdir(image_folder_name) if isfile(join(image_folder_name, f))]
For train test split you can refer here
Related
Right now I have a image dataset which is very coarse, but the "content" and "label" is sperated in different file/folers(image dataset have files like 00001, while another csv file have patterns like 00001,class a etc), if I want to use the keras image loading function, the dataset should have the structure like below, then I can split that dataset into 'X' and 'y'. I tried to "combine" content and label. I found a function based on shutil module, which can conditionally move files to different folders. but for some compatibility issue I cannot install shutil module(I tried update python). Can you guys guide me some directions? Thanks!
training_data/ ...class_a/ ......a_image_1.jpg ......a_image_2.jpg ...class_b/ ......b_image_1.jpg ......b_image_2.jpg
I found a function based on shutil module, which can conditionally move files to different folders. but for some compatibility issue I cannot install shutil module(I tried update python).
def match_label(source,dest):
files = os.listdir(source)
for file in files:
num = int(file.split('.')[0])
if num in label:
shutil.move(os.path.join(source,file),os.path.join(dest,file))
match_label(train_dir,homogeneous_dir)
During one of my projects, I faced this challenge: There is a folder named Project, and inside that, there are multiple images (say 100 images), and each has been named sequentially like the first image name is imag_0, 2nd image name is img_2,....imag_99.
Now, based on some conditions, I need to separate out some images say img_5, img_10, img_30, img_88, img_61. My question will be, is there any way to filter out these images and make a folder inside the folder Project named "the odd ones" and store those specified images?
One extra help will be in my case. Suppose I have hundreds of such Projects folders in a sequential way Projects_1, Projects_2, Projects_3,....., Projects_99, and each contains hundreds of pictures. Can it be possible to separate all the specified photos and store them inside a separate folder inside each Projects_n folder, assuming the photos we have to separate out and store differently will be the same for each Projects_n folder?
Please help me with this. Thank you!
For the first problem you can lookup to the below pseudo-code (you have to specify the target function). Instead, for the second problem you should provide more details;
from glob import glob
import itertools
import shutil
import os
# Creating a funtion to check if filename
# is a target file which has to be moved:
def is_target(filename):
if ... return True
else return False
dirname = "some/path/to/project"
# Creating a list of all files in dir which
# could be moved based on type extension:
types = ('*.png', '*.jpeg')
filepaths = list(itertools.chain(*[glob(os.path.join(dirname, f"*.{t}")) for t in types]))
# Finding the files to move:
filepaths_to_move = []
for filepath in filepaths:
if is_target(os.path.basename(filepath)):
filepaths_to_move.append(filepath)
# Creating the new subfolder:
new_folder_name = "odd_images"
new_dir = os.path.join(dirname, new_folder_name)
if not os.path.exists(new_dir): os.makedirs(new_dir)
# Moving files into subfolder:
for filepath in filepaths_to_move:
basename = os.path.basename(filepath)
shutil.move(source, os.path.join(filepath, os.path.join(dirname, basename)))
Here is the logic.make necessary improvements for your use case
project_dir = "project_dir"
move_to_dir = os.path.join(project_dir,"move_to_dir")
files = [os.path.join(project_dir,file) for file in os.listdir(project_dir)]
filenames_to_filter = "test1.txt,test2.txt"
if not os.path.exists(move_to_dir):
os.makedirs(move_to_dir)
for(file in files):
if os.path.basename(file) in filenames_to_filter:
shutil.move(file,move_to_dir)
`
i have a folder of 100 images of human eye. i have 50 files named retina and 50 files named mask in that folder. i need to read all the images named retina1, retina2.....retina 50 and store them in an object retina. and similarly for mask images.
i could read all the files in a folder based on the code below. but not sure how to read them based on their filenames.i.e to read all the images of retina and mask separately. as i need to implement image segmentation and cnn classifier later.
for i in os.listdir():
f = open(i,"r")
f.read()
f.close()
I would use the glob module to get the path to the correct filenames.
Reference glob
import glob
retina_images = glob.glob(r"C:\Users\Fabian\Desktop\stack\images\*retina*")
mask_images = glob.glob(r"C:\Users\Fabian\Desktop\stack\images\*mask*")
print(retina_images)
print(mask_images)
Now you can use the path list to read in the correct files.
In my case my images located under:
C:\Users\Fabian\Desktop\stack\images\ you can use the * as a wildcard.
EDIT:
import glob
images = {}
patterns = ["retina", "mask"]
for pattern in patterns:
images[pattern] = glob.glob(r"C:\Users\fabia\Desktop\stack\images\*{}*".format(pattern))
print(images)
Generate a dict out of your searching patterns could be helpful.
You can limit the loop to match certain filenames by combining the loop with a generator expression.
for i in (j for j in os.listdir() if 'retina' in j):
f = open(i,"r")
f.read()
f.close()
I want to read images of multiple datatypes from multiple subdirectories in python using glob function
I have successfully read images of JPG type from the subdirectories. Want to know how can I read images of multiple datatypes. Below are the codes I have tried so far
###########READ IMAGES OF MULTIPLE DATATYPES from a SINGLE Folder######
import os
import glob
files = []
for ext in ('*.jpg', '*.jpeg'):
files.extend(glob(join("C:\\Python35\\target_non_target\\test_images", ext)))
count=0
for i in range(len(files)):
image=cv2.imread(files[i])
print(image)
###### READ MULTIPLE JPG IMAGS FROM MULTIPLE SUBDIRECTORIES#########
import os
import glob
from glob import glob
folders = glob("C:\\Python36\\videos\\videos_new\\*")
img_list = []
for folder in folders:
for f in glob(folder+"/*.jpg"):
img_list.append(f)
for i in range(len(img_list)):
print(img_list[i])
Both codes work perfectly but I am confused how to include the line for reading multiple datatype images from multiple subdirectories. I have a directory with multiple subdirectories in which there are images of multiple datatypes like JPG,PNG,JPEG,etc. I want to read all those images and use them in my code
Try this:
import glob
my_root = r'C:\Python36\videos\videos_new'
my_exts = ['*.jpg', 'jpeg', '*.png']
files = [glob.glob(my_root + '/**/'+ x, recursive=True) for x in my_exts]
Check this out for various ways to recursively search folders.
On another note, instead of this:
for i in range(len(img_list)):
print(img_list[i])
do this for simple, readable for-loop using python:
for img in img_list:
print(img)
Is there a way to use the pandas library to simply load the images (as pixelated data) into a single array?
Let's say you have a folder that only contains JPEG images.
First, import everything you'll need
from os import listdir
from os.path import isfile, join
import imageio
Then, set the location of the folder that contains ONLY IMAGES. With this folder location, we will generate the list of full filenames for each and every image.
image_folder_path = "D:\\temp\\images"
onlyfiles = [f for f in listdir(image_folder_path) if isfile(join(image_folder_path, f))]
full_filenames = [join(image_folder_path,this_image) for this_image in onlyfiles]
Then, you can start an empty list, start opening one file at a time and appending them to your list.
image_list = []
for this_filename in full_filenames:
image_rgb_values = imageio.imread(this_filename)
image_list.append(image_rgb_values.copy())
image_list = np.array(image_list)
Now, the variable image_list has stored all the images.
This will work best if all images have identical dimensions (width x height), but it should also work otherwise.
Hope it helps! =)