Data preprocessing for CNN tips - python

I’m fairly new to deep learning and learning as I got so sorry if this is very basic, but I’m working on a model for detecting invasive coconut rhinoceros beetles destroying palm trees using drone photography. The 1080p photos I’m given were taken 250ft AGL and were cropped into equal size smaller images with some having one or more palm trees and some having none. I’m using labelStudio to generate the XML files that point to their jpg counterparts path.
My current problem is to input the XML into a CSV for training and validation on Keras. Each of the cropped images is named the same such as:
Drone_img1
11.jpg
12.jpg
13.jpg
…
46.jpg
Drone_img2
11.jpg
12.jpg
13.jpg
…
46.jpg
Drone_img1000
11.jpg
12.jpg
13.jpg
…
46.jpg
I’m using a python script written by a previous student before me that is supposed to split the data for training and validation into different directories and create the csv file and the model. But when I run it, it appears to have a problem with the cropped images having the same naming scheme. My test and validation directories now look like this:
Test dir & validation dir
11.jpg
11(1).jpg
11(2).jpg
12.jpg
13.jpg
13(1).jpg
152.jpg
…
999.jpg
999(1).jpg
1000.jpg
Note: the cropped images all had the same naming scheme but were in separate directories. However, when using a script to split into test & validation groups, it’s getting a duplicate photo and adds a number in parenthesis.
My question: Is there a better way to preprocess image data with XML annotations into csv without me having to change the 1000 image names manually? Keep in mind that XML notations also point to their jpg names path so if I change the jpg names I’d have to change their XML annotations too.
The only thing I can think of is to write a new cropping script that ensures that the names are all different for the next time I get image data, but I would prefer to not go backward with the current data.
Edit:
Update: Looks like I need to make sure the path slashes are consistent.
Here is a picture of the Cropped Img Directories.
This is an image of the training and validation sets that were created
Here is an image of the csv files generated.
Script I created(mostly GPT) to edit XML <path> tags:
import os
import tkinter as tk
from tkinter import filedialog
from xml.etree import ElementTree as ET
def browse_directory():
root = tk.Tk()
root.withdraw()
xml_directory = filedialog.askdirectory(parent=root, title='Choose the directory of the XML files')
jpg_directory = filedialog.askdirectory(parent=root, title='Choose the directory of the JPG files')
batch_edit_xml(xml_directory, jpg_directory)
def headless_mode():
xml_directory = input("Enter the path of the XML folder: ")
jpg_directory = input("Enter the path of the JPG folder: ")
batch_edit_xml(xml_directory, jpg_directory)
def batch_edit_xml(xml_directory, jpg_directory):
count = 1 # initializing count to 1
for root, dirs, files in os.walk(xml_directory):
for file in files:
if file.endswith(".xml"):
file_path = os.path.join(root, file) # creating a file path by joining the root and the file name
xml_tree = ET.parse(file_path) # parsing the XML file
xml_root = xml_tree.getroot() # getting the root of the XML file
filename = os.path.splitext(file)[0] # getting the file name without the extension
jpg_path = os.path.join(jpg_directory, os.path.basename(root), filename + '.jpg') # creating a jpg path
xml_root.find('./path').text = jpg_path # finding the path element in the XML file and updating it with the jpg_path
xml_tree.write(file_path) # writing the changes back to the XML file
print(f"{count} of {len(files)}: {file_path}") # printing the current count and the total number of files processed
count += 1
if count > len(files): # checking if the count has reached the length of the files
count = 1 # resetting the count back to 1
print("Edit Complete") # indicating that the edit is complete
mode = input("Enter 1 for headless mode or 2 for desktop mode: ")
if mode == '1':
headless_mode()
elif mode == '2':
browse_directory()
else:
print("Invalid input. Please enter 1 or 2.")

It is not hard for you to write another python script to read all images in test dir and save them into a csv file. A sample code in python is as below:
import os
import pandas as pd
images = []
# suppose test_dir holds all test images
for path, subdirs, files in os.walk(test_dir):
for image_name in files:
images.append(os.path.join(path, image_name))
dict = {'image name': images}
df = pd.DataFrame(dict)
df.to_csv('your.csv')

Related

Creating subfolder and storing specified files/images in those

During one of my projects, I faced this challenge: There is a folder named Project, and inside that, there are multiple images (say 100 images), and each has been named sequentially like the first image name is imag_0, 2nd image name is img_2,....imag_99.
Now, based on some conditions, I need to separate out some images say img_5, img_10, img_30, img_88, img_61. My question will be, is there any way to filter out these images and make a folder inside the folder Project named "the odd ones" and store those specified images?
One extra help will be in my case. Suppose I have hundreds of such Projects folders in a sequential way Projects_1, Projects_2, Projects_3,....., Projects_99, and each contains hundreds of pictures. Can it be possible to separate all the specified photos and store them inside a separate folder inside each Projects_n folder, assuming the photos we have to separate out and store differently will be the same for each Projects_n folder?
Please help me with this. Thank you!
For the first problem you can lookup to the below pseudo-code (you have to specify the target function). Instead, for the second problem you should provide more details;
from glob import glob
import itertools
import shutil
import os
# Creating a funtion to check if filename
# is a target file which has to be moved:
def is_target(filename):
if ... return True
else return False
dirname = "some/path/to/project"
# Creating a list of all files in dir which
# could be moved based on type extension:
types = ('*.png', '*.jpeg')
filepaths = list(itertools.chain(*[glob(os.path.join(dirname, f"*.{t}")) for t in types]))
# Finding the files to move:
filepaths_to_move = []
for filepath in filepaths:
if is_target(os.path.basename(filepath)):
filepaths_to_move.append(filepath)
# Creating the new subfolder:
new_folder_name = "odd_images"
new_dir = os.path.join(dirname, new_folder_name)
if not os.path.exists(new_dir): os.makedirs(new_dir)
# Moving files into subfolder:
for filepath in filepaths_to_move:
basename = os.path.basename(filepath)
shutil.move(source, os.path.join(filepath, os.path.join(dirname, basename)))
Here is the logic.make necessary improvements for your use case
project_dir = "project_dir"
move_to_dir = os.path.join(project_dir,"move_to_dir")
files = [os.path.join(project_dir,file) for file in os.listdir(project_dir)]
filenames_to_filter = "test1.txt,test2.txt"
if not os.path.exists(move_to_dir):
os.makedirs(move_to_dir)
for(file in files):
if os.path.basename(file) in filenames_to_filter:
shutil.move(file,move_to_dir)
`

How to loop over a folder, and recreate them in another folder with same structure?

I want to process a folder and blur the images and recreate them in another folder while preserving the structure.
My source folder has the following structure
Data/
test1/
test2/
6.png
4.png
5.jpeg
1.jpg/
2.jpg
3.jpeg
I wanted to blur all these images and save them in another folder
src = 'C:\Users\shakhansho\Downloads\Data' #folder with images
dst = 'C:\Users\shakhansho\Downloads\Output' #folder for output
let's say I have a function which takes a path to image and then applies blurring and then saves it in the same directory blur(path_to_img)
How can I loop over the src files, blur and then save in dst folder with preserving the structure.I would like the dst folder contain the same folder name and image names but blurred.
I would advise using glob.glob (or glob.iglob) for this. It can recursively find all files under a directory. Then, we can simply open the images in some way, transform them, find the output file and folder, optionally create that folder, and write out transformed images. The code contains comments to elaborate these steps slightly.
import glob
import os
# Recursively find all files under Data/
for filepath in glob.iglob("Data/**/*.*", recursive=True):
# Ignore non images
if not filepath.endswith((".png", ".jpg", ".jpeg")):
continue
# Open your image and perform your transformation
# to get the blurred image
with open(filepath, "r") as f:
image = f.read()
blurred = transform(image)
# Get the output file and folder path
output_filepath = filepath.replace("Data", "Output")
output_dir = os.path.dirname(output_filepath)
# Ensure the folder exists
os.makedirs(output_dir, exist_ok=True)
# Write your blurred output files
with open(output_filepath, "w") as f:
f.write(blurred)
I recreated your file structure, and my program was able to re-create the exact file structure, but under Output instead.

Want to get desired Image path using os.walk in python

I have Folder Named A, which includes some Sub-Folders starting name with Alphabet A.
In these Sub-Folders different images are placed (some of the image formats are .png, jpeg, .giff and .webp, having different names
like item1.png, item1.jpeg, item2.png, item3.png etc). From these Sub-Folders I want to get list of path of those images which endswith 1.
Along with that I want to only get 1 image file format like for example only for jpeg. In some Sub-Folders images name endswith 1.png, 1.jpeg, 1.giff and etc.
I only want one image from every Sub-Folder which endswith 1.(any image format). I am sharing the code which returns image path of items (ending with 1) for all images format.
CODE:
here is the code that can solve your problem.
import os
img_paths = []
for top,dirs, files in os.walk("your_path_goes_here"):
for pics in files:
if os.path.splitext(pics)[0] == '1':
img_paths.append(os.path.join(top,pics))
break
img_paths will have the list that you need
it will have the first image from the subfolder with the name of 1 which can be any format.
Incase if you want with specific formats,
import os
img_paths = []
for top,dirs, files in os.walk("your_path_goes_here"):
for pics in files:
if os.path.splitext(pics)[0] == '1' and os.path.splitext(pics)[1][1:] in ['png','jpg','tif']:
img_paths.append(os.path.join(top,pics))
break
Thanks, to S3DEV for making it more optimized

Folders of pictures into a single PDF

I have the following problem.
I have folder structure like this:
vol1/
chap1/
01.jpg
02.JPG
03.JPG
chap2/
04.JPG
05.jpg
06.jpg
chap3/
07.JPG
08.jpg
09.JPG
vol2/
chap4/
01.JPG
02.jpg
03.jpg
chap5/
04.jpg
05.JPG
06.jpg
chap6/
07.jpg
08.JPG
09.jpg
Inside a single vol folder, the chapters have an increasing order, and the same happens for the jpg files inside each chap folder.
Now, I would like, for each vol folder to obtain a pdf, maintaining the ordering of the pictures. Think about it as a divided comics or manga volume to be put back into a single file.
How could I do it in bash or python?
I do not know how many volumes I have, or how many chapters are in a single volume, or how many jpg files are in a single chapter. In other words, I need it to work it for whatever number of volumes/chapters/jpgs.
An addition would be considering heterogeneous picture files, maybe having both jpg and png in a single chapter, but that's a plus.
I guess this should work like intended ! Tell me if you encounter issues
import os
from PIL import Image
def merge_into_pdf(paths, name):
list_image = []
# Create list of images from list of path
for i in paths:
list_image.append(Image.open(i).convert("RGB"))
# merge into one pdf
if len(list_image) == 0:
return
# get first element of list and pop it from list
img1 = list_image[0]
list_image.pop(0)
# append all images and save as pdf
img1.save(f"{name}.pdf",save_all=True, append_images=imagelist)
def main():
# List directory
directory = os.listdir(".")
for i in directory:
# if directory start with 'vol' iterate through it
if i.find("vol") != -1:
path = []
sub_dir = os.listdir(i)
# for each subdirectory
for j in sub_dir:
files = os.listdir(f"{i}/{j}")
for k in files:
# if file ends with jpg or png append to list
if k.endswith((".jpg", ".JPG", ".png", ".PNG")):
path.append(f"{i}/{j}/{k}")
# merge list into one pdf
merge_into_pdf(path, i)
if __name__ == "__main__":
main()

how create cvs file from image folders?

I have two image folders for skin cancer benign and malignant, I want to get the CSV file contains the first column is a path of the image and the second column is a label of the image in python language. how I can do that?
paths of dataset
'../input/skin-cancer-malignant-vs-benign/train/benign'
'../input/skin-cancer-malignant-vs-benign/train/malignant'
Check out the glob module:
benign = glob.glob('{path to benign folder}/*.png')
malignant = glob.glob('{path to malignant folder}/*.png')
the * here just means take the file path for all .png files in this folder. of course change .png to whatever image format you are using.
Then it's just a matter of writing the data
import glob
benign = glob.glob('../input/skin-cancer-malignant-vs-benign/train/benign/*.png')
malignant = glob.glob('../input/skin-cancer-malignant-vs-benign/train/malignant/*.png')
CSV_FILE_NAME = 'my_file.csv'
with open(CSV_FILE_NAME, 'w') as f:
for path in benign:
f.write(path) # write the path in the first column
f.write(',') # separate first and second item by a comma
f.write('benign') # write the label in the second column
f.write('\n') # start a new line
for path in malignant:
f.write(path)
f.write(',')
f.write('malignant')
f.write('\n')
You can definitely write this more succinctly, but this is a bit more readable

Categories

Resources