Increasing dataset size using imgaug

Increasing dataset size using imgaug - python

I am merging two different datasets containing images into one dataset. One of the datasets contains 600 images in the training set. The other dataset contains only 90-100 images. I want to increase the size of the latter dataset by using the imgaug library. The images are stored in folders under the name of their class. So the path for a "cake" image in the training set would be ..//images//Cake//cake_0001. I'm trying to use this code to augment the images in this dataset:
path = 'C:\\Users\\User\\Documents\\Dataset\\freiburg_groceries_dataset\\images'
ia.seed(6)
seq = iaa.Sequential([
iaa.Fliplr(0.5),
iaa.Crop(percent=(0, 0.1)),
iaa.Affine(rotate=(-25,25))
], random_order=True)
for folder in os.listdir(path):
try:
for i in os.listdir(folder):
img = imageio.imread(i)
img_aug = seq(images=img)
iaa.imshow(img_aug)
print(img_aug)
except:
pass
Right now there's not output, even if I put print(img) or imshow(img) or anything. How do I ensure that I got more images for this dataset? Also, what is the best spot to augment images? Where do the augmented images get stored, and how do I see how many new images were generated?

The Question was not clear. So, for the issue2: error in saving file and not able to visualize using imshow().
First: In the second loop code block
img = imageio.imread(i)
img_aug = seq(images=img)
iaa.imshow(img_aug)
print(img_aug)
1st error is: i is not the file path. To solve this replace imageio.imread(i) with imageio.imread(path+'/'+folder+'/'+i).
2nd error is: iaa doesn't have the property imshow().
To fix this replace iaa.imshow(img_aug) with iaa.imgaug.imshow(img_aug). This fixes the error of visualizing and finishing the loop execution.
Second: If you have any issue in saving images, then use PIL.
i.e.,
from PIL import Image
im = Image.fromarray(img_aug)
im.save('img_aug.png')`

It's because folder is not the path to the directory you are looking for.
You should change for i in os.listdir(folder): to for i in os.listdir(path+'\\'+folder):. Then it looks inside the path\folder directory for files.

Related

How to create a list of DICOM files and convert it to a single numpy array .npy?

I have a problem and don't know how to solve:
I'm learning how to analyze DICOM files with Python and, so,
I got a patient exam, on single patient and one single exam, which is 200 DICOM files all of the size 512x512 each archive representing a different layer of him and I want to turn them into a single archive .npy so I can use in another tutorial that I found online.
Many tutorials try to convert them to jpg or png using opencv first, but I don't want this since I'm not interested in a friendly image to see right now, I need the array. Also, this step screw all the quality of images.
I already know that using:
medical_image = pydicom.read_file(file_path)
image = medical_image.pixel_array
I can grab the path, turn 1 slice in a pixel array and them use it, but the thing is, it doesn't work in a for loop.
The for loop I tried was basically this:
image = [] # to create an empty list
for f in glob.iglob('file_path'):
img = pydicom.dcmread(f)
image.append(img)
It results in a list with all the files. Until here it goes well, but it seems it's not the right way, because I can use the list and can't find the supposed next steps anywhere, not even answers to the errors that I get in this part, (so I concluded it was wrong)

The following code snippet allows to read DICOM files from a folder dir_path and to store them into a list. Actually, the list does not consist of the raw DICOM files, but is filled with NumPy arrays of Hounsfield units (by using the apply_modality_lut function).
import os
from pathlib import Path
import pydicom
from pydicom.pixel_data_handlers import apply_modality_lut
dir_path = r"path\to\dicom\files"
dicom_set = []
for root, _, filenames in os.walk(dir_path):
for filename in filenames:
dcm_path = Path(root, filename)
if dcm_path.suffix == ".dcm":
try:
dicom = pydicom.dcmread(dcm_path, force=True)
except IOError as e:
print(f"Can't import {dcm_path.stem}")
else:
hu = apply_modality_lut(dicom.pixel_array, dicom)
dicom_set.append(hu)

You were well on your way. You just have to build up a volume from the individual slices that you read in. This code snippet will create a pixelVolume of dimension 512x512x200 if your data is as advertised.
import dicom
import numpy
images = [] # to create an empty list
# Read all of the DICOM images from file_path into list "images"
for f in glob.iglob('file_path'):
image = pydicom.dcmread(f)
images.append(image)
# Use the first image to determine the number of rows and columns
repImage = images[0]
rows=int(repImage.Rows)
cols=int(repImage.Columns)
slices=len(images)
# This tuple represents the dimensions of the pixel volume
volumeDims = (rows, cols, slices)
# allocate storage for the pixel volume
pixelVolume = numpy.zeros(volumeDims, dtype=repImage.pixel_array.dtype)
# fill in the pixel volume one slice at a time
for image in images:
pixelVolume[:,:,i] = image.pixel_array
#Use pixelVolume to do something interesting
I don't know if you are a DICOM expert or a DICOM novice, but I am just accepting your claim that your 200 images make sense when interpreted as a volume. There are many ways that this may fail. The slices may not be in expected order. There may be multiple series in your study. But I am guessing you have a "nice" DICOM dataset, maybe used for tutorials, and that this code will help you take a step forward.

How to create a Data generator for an image segmentation model?

I am working on an image segmentation problem. There are 3 types of images in my dataset(Drishti_GS). One is raw fundus image and the other two are soft map images namely optic cup seg and optic disc seg. I am trying to make a data generator for which I can train my model on. I am attaching the screenshot of names of images I got after using the following code.
import os
for root, dirs, files in os.walk("."):
for filename in files:
if filename.endswith(".png"):
print(filename)
I need to load these images. I hope that someone can help me with some concrete codes or some useful materials.

You can load images by:
for image_name in os.listdir("data_path"):
image = keras.preprocessing.image.load_img('data_path/'+image_name, color_mode='rgb', target_size=(128,128))
names = image_name.split('_')
if names[-1]=="cupsegSoftmap.png":
label = ...
elif names[-1]=="ODsegSoftmap.png":
label = ...
You can manually set what the label value should be, as an array

How do I remove unknown, extra, data values from large file?

I am working on an Python, TensorFlow, image classification model, and in my training images, I have 12,611 images, but in my training labels, I have 12,613. (each image has a number as the title, and this number corresponds to the same number in a CSV file with the accompanying information for that image).
From here, what I need to do is simply remove those 2 extra data points for which I don't have pictures for. How can I write a code to help with this?
(If the code tells me which data points are the extras, I can manually remove them from the CSV file)
Thanks for the help.

Well its very straightforward, you can try something like this (As I dont kno exactly how and where you have saved your images, you might have to update the code to meet your use case) :
dir_path = r'/path/to/folder/of/images'
csv_path = r'/path/to/csv/file'
images = []
# Get all images labels
for filename in os.listdir(dir_path):
images.append(int(filename.split('.')[0]))
# Read CSV
df = pd.read_csv(csv_path)
# Print which labels are extra
for i in df['<COLUMN_NAME>'].tolist():
if i not in images:
print(i)

Having trouble reading the images using open cv

I have a folder name "seg_train" in which I have 6 labeled folders: building, tree, street, glacier, forest, sea and mountain. I am trying to read all the images in these folders using open cv and for that I have written a function but I don't know what I am doing wrong. There are approximately 14000 files in these 6 folders all together but the function I wrote is reading only 2300 from one folder. Could you please help?
Here's the python code
result- Shape of Images: (2382, 150, 150, 3)
Shape of Labels: (2382,)
i was expecting (14000,150,150,3)
for image_file in os.listdir(r'C:/Users/dhvan/Desktop/intel-image-classification/seg_train/seg_train/'+ labels):
image = cv2.imread(r'C:/Users/dhvan/Desktop/intel-image-classification/seg_train/seg_train/' + labels + r'/'+ image_file)
image = cv2.resize(image,(150,150))
Images.append(image)
Labels.append(label)
return shuffle(Images,Labels,random_state=812490023)

I suggest the code below. Also I think if you try to read in 14000 images of size (150,150,3) you will get a resource exhaust error because this will use a very large amount
of memory. If you are building a CNN classifier I recommend you read the images in in batches using the Keras ImageDataGenerator.flowfrom directory, documentation is here
https://keras.io/preprocessing/image/
import os
dir=r'C:/Users/dhvan/Desktop/intel-image-classification/seg_train/seg_train'
path_to_labels=os.path.join (dir, 'labels')
dir_list=os.listdir(path_to_labels)
for images in dir_list:
path_to_images=os.path.join (path_to_labels, images)
cv2.imread(path_to_images)

Read Own Multiple Images from folder and Save as a Dataset for Training

I am working on training my own images read from my folders. I would be thankful if you could help me for this.
I successfully read my all images from the folder and create my own onehot_encoded labels. However, in each time I run my code, it takes a lot of time to do read all images from the folders. Therefore, I want to create dataset from these images and save it like MNIST to use faster. Thus, I will not read my whole images again. Could you please help me for this?
The code is:
path = "D:/cleandata/train_data/"
loadedImages = []
labels = []
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for i in range(len(os.listdir(path))):
imagesList = listdir(path+os.listdir(path)[i])
for image in imagesList:
image_raw_data_jpg tf.gfile.FastGFile(path+os.listdir(path)
[i]+'/'+image, 'rb').read()
raw_image =tf.image.decode_png(image_raw_data_jpg,3)
gray_resize=tf.image.resize_images(raw_image, [28, 28])
image_data =
sess.run(tf.image.rgb_to_grayscale(gray_resize))
loadedImages.append(image_data)

Here is a tutorial on how to use a TFRecords file. It shows how to create the file (containing images and labels) and read from it.
http://www.machinelearninguru.com/deep_learning/tensorflow/basics/tfrecord/tfrecord.html
Or you could just use zipfile, and include the label in the image file name, thus keeping them together (that is what I did)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Increasing dataset size using imgaug - python

It's because folder is not the path to the directory you are looking for. You should change for i in os.listdir(folder): to for i in os.listdir(path+'\\'+folder):. Then it looks inside the path\folder directory for files.

Related

How to create a list of DICOM files and convert it to a single numpy array .npy?

How to create a Data generator for an image segmentation model?

How do I remove unknown, extra, data values from large file?

Having trouble reading the images using open cv

Read Own Multiple Images from folder and Save as a Dataset for Training

Categories

Resources