How do I train two sets of data given both files separately? - python

I am doing a project in which I need to estimate the age of an individual, given an X-Ray of their hand. I am given a testing set, which contains a large collection of images (in a folder on my computer), all NUMBERED, and I am also given a CSV file that corresponds each image number with 2 pieces of information: the age(in months), as well as whether the individual is male (this is given as "true" or "false." Also, I believe I have successfully imported both of these files into python(the image folder, as well as the CSV file)
I have looked at many TensorFlow tutorials, but I am struggling to figure out how I can associate the image numbers together, as well as train the data set. Any help would be greatly appreciated!!
I have attached blocks of my code, as well as how the data is presented to me, up until this point.
import pandas as pd
import numpy as np
import os
import tensorflow as tf
import cv2
from tensorflow import keras
from tensorflow.keras.layers import Dense, Input, InputLayer, Flatten
from tensorflow.keras.models import Sequential, Model
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
import random
%matplotlib inline
import matplotlib.pyplot as plt
--This simply imports libraries that I use, or anticipate using later on.
plt.figure(figsize=(20,20))
train_images=r'/Users/FOLDER/downloads/Boneage_competition/training_dataset/boneage-training-dataset'
for i in range(5):
file = random.choice(os.listdir(train_images))
image_path= os.path.join(train_images, file)
img=mpimg.imread(image_path)
ax=plt.subplot(1,5,i+1)
ax.title.set_text(file)
plt.imshow(img)
-- This successfully imports the image folder, as well as prints 5 random images to test if the importing worked.
This screenshot provides an example of how the pictures are depicted
IMG_WIDTH=200
IMG_HEIGHT=200
img_folder=r'/Users/FOLDER/downloads/Boneage_competition/training_dataset/'
-- I believe this resizes all the images to the specified dimensions
label_file = '/Users/FOLDER/downloads/train.csv'
train_labels = pd.read_csv (r'/Users/FOLDER/downloads/train.csv')
print (train_labels)
-- This successfully imports the data from the CSV file, and prints it, to make sure it worked.
If you have any ideas on how to connect these two datasets and train the data, I would greatly appreciate it.
Thank you!

The approach is simple create a map between the image_data and the label. After that you can create two lists/np.array and use the same to pass the train and label info to you model. Following code should help in getting the same.
import os
import glob
dic = {}
# assuming you have .png format files else change the same into the glob statement
train_images='/Users/FOLDER/downloads/Boneage_competition/training_dataset/boneage-training-dataset'
for file in glob.glob(train_images+'/*.png'):
b_name = os.path.basename(file).split('.')[0]
dic[b_name] = mpimg.imread(file)
dic_label_match = {}
label_file = '/Users/FOLDER/downloads/train.csv'
train_labels = pd.read_csv (r'/Users/rkrishna/downloads/train.csv')
for i in range(len(train_labels)):
# given your first column is age and image no starts from 1
dic_label_match[i+1] = str(train_labels.iloc[i][0])
# you can use the below line too
# dic_label_match[i+1] = str(train_labels.iloc[i][age])
# now you have dict with keys and values
# create two lists / arrays and you can pass the same to the keram model
train_x = []
label_ = []
for val in dic:
if val in dic and val in dic_label_match:
train_x.append(dic[val])
label_.append(dic_label_match[val])

Related

How do I access image files

So I was trying to open image files in google collab. My target is to apply FFT for image processing. Here i have files labelled as array values. How exactly would I access an image instead of the numerical values.
!nvidia-smi
!nvidia-smi
!nvidia-smi
!pip install tensorflow-gpu
!pip install tensorflow_hub
from __future__ import absolute_import, division, print_function, unicode_literals
import matplotlib.pylab as plt
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import pandas as pd
import os
import cv2
from google.colab import drive
drive.mount('/content/drive')
data_root='/content/drive/My Drive/ML/AlpDatabase/DATASET'
import numpy as np
from matplotlib import pyplot as plt
data = tf.keras.utils.image_dataset_from_directory('/content/drive/My Drive/ML/AlpDatabase/DATASET')
data_iterator = data.as_numpy_iterator()
batch = data_iterator.next()
batch[0].shape
fig, ax = plt.subplots(ncols=4, figsize=(20,20))
for idx, img in enumerate(batch[0][:4]):
ax[idx].imshow(img.astype(int))
ax[idx].title.set_text(batch[1][idx])
so if I had to print a certain image from data set how would I accomplish that as my goal is to send the image files to my FFT algorithm and then create a new dataset after processing. My goal here was to basically read image files from my drive folder maintain classes.
My drive folder has DATASET folder containing A,B,C,D.... etc folders each with many images. Basically an image dataset of alphabets.
If I understood correctly - You want to basically read image files from the drive folder maintain classes.
The code mentioned above is working fine to do the same when I tried replicating the issue. Please check this gist for your reference.
You also can use the below code to display images after fetching from the directory where class_names will be your DATASET inside folder names (A,B,C,D...).
data = tf.keras.utils.image_dataset_from_directory('/content/drive/My Drive/MY WORK/dataset/flowers')
class_names = data.class_names
print(class_names)
Output:
['daisy', 'dandelion', 'rose', 'sunflower', 'tulip'] #in your case ['A', 'B', 'C', 'D'...
To display images:
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
for images, labels in data.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
Output:

merging train and test datasets into one using tensorflow

I am working with the classic titanic dataset and trying to apply NNs. My data comes already split into train and dev sets. However, I want to merge the datasets together for many things (for example, my own splitting, etc..)
Is there a way I can merge both datasets?
I have looked around and only found information about how to split a dataset, but I was unable to find how to merge them back together.
Any help?
A MWE is provided below!
from __future__ import absolute_import,division,print_function,unicode_literals
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import clear_output
from six.moves import urllib
import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf
import seaborn as sns
# URL address of data
TRAIN_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/train.csv"
TEST_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/eval.csv"
# Downloading data
train_file_path = tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
test_file_path = tf.keras.utils.get_file("eval.csv", TEST_DATA_URL)
# Reading data
data_train = pd.read_csv(train_file_path)
data_test = pd.read_csv(test_file_path)
MY_DATA= MERGE HERE????? # merge(data_train,data_test)??
I assume data_train and data_test have the same number of columns and the column names are the same. Then just do
merged_df= pd.concat([data_train, data_test], axis=0)

CV2 Returning NoneType for Image?

I am currently trying to compute the arrays of different images. I have the code below which uses cv2 to read and then hog.compute to calculate it. However the issue that I am getting is that I am getting a NoneType being outputted. I know that the absolute file path which is why I used os.path.abspath(file). However, I know that the file is being read as I have printed the file name and it is the file that is in the directory?
The files are located within a folder called image_dataset and this has 3 subfolders called bikes, cars and people. I'm pretty sure the first file is being read as well but have no clue why I am getting a NoneType returned when I try hog.compute(im)? Any clue as to why?
import os
import numpy as np
import cv2
import glob
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
def obtain_dataset(folder_name):
# assuming 128x128 size images and HoGDescriptor length of 34020
hog_feature_len=34020
hog = cv2.HOGDescriptor()
image_dict = {'bikes':1, 'cars':2, 'people':3}
y = []
X = []
#code for obtaining hog feature for one image file name
for subdir, dirs, files in os.walk(folder_name):
for file in files:
if file.lower().endswith(('.png')):
location = (os.path.abspath(file))
im = cv2.imread(location)
h = hog.compute(im)
# use this to read all images in the three directories and obtain the set of features X and train labels Y
# you can assume there are three different classes in the image dataset
return (X,y)
train_folder_name='image_dataset'
(X_train, Y_train) = obtain_dataset(train_folder_name)
In your case, as you have a subdirectory, the os.path.abspath() method does not return the complete path of the file. Instead, use os.path.join() to join the file names with the path of the directory of the files:
location = os.path.join(subdir, file)

TensorFlow, what does the tensorflow_datasets.load() exactly return?

I am following a tutorial of TensorFlow ML and I am new to Python. I come from a background of languages like Java. Here is the link to the tutorial.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import tensorflow_hub as hub
import tensorflow_datasets as tfds
from tensorflow.keras import layers
# Download the Flowers Dataset using TensorFlow Datasets
(training_set, validation_set), dataset_info = tfds.load(
'tf_flowers',
split=['train[:70%]', 'train[70%:]'],
with_info=True,
as_supervised=True,
)
for example in training_set:
num_training_examples += 1
# Reformat Images and Create Batches
IMAGE_RES = 224
def format_image(image, label):
image = tf.image.resize(image, (IMAGE_RES, IMAGE_RES))/255.0
return image, label
BATCH_SIZE = 32
train_batches = training_set.shuffle(num_training_examples//4).map(format_image).batch(BATCH_SIZE).prefetch(1)
validation_batches = validation_set.map(format_image).batch(BATCH_SIZE).prefetch(1)
I don't understand how this code operates: (training_set, validation_set), dataset_info = tfds.load. The function tfds.load downloads images of flowers. How come that training_set is iterable like some sort of array, when it should be a folder perhaps?
for example in training_set:
num_training_examples += 1
Also how come each element in it is used in the following line as two arguments to the function format_image(image, label) in this line:
train_batches = training_set.shuffle(num_training_examples//4).map(format_image).batch(BATCH_SIZE).prefetch(1)
What is training_set exactly? Why is it not a folder that contains the following structure:
flowers_a
file1, file2, file3 ... etc
flowers_b
file1, file2, file3 ... etc
flowers_c
file1, file2, file3 ... etc
etc ...
instead its some sort of an array with each element containing an image and its label? It is not clear in the documentation what is happening for a beginner in Python such as I.
Like the name suggests, Tensorflow exists to "make the tensors flow". It's an entire ecosystem with data loading, preprocessing, and machine learning capabilities. So it's not built as an intuitive library that deals with numpy arrays. Tensorflow doesn't keep everything in memory so what TFDS returns is literally a "Tensorflow Dataset". You need to manipulate it as such. This means that you can't get basic information, like the count, intuitively. You need to iterate through the whole thing. For instance this line you gave:
for example in training_set:
num_training_examples += 1
It's passing all the samples and counting them. For this part:
(training_set, validation_set), dataset_info = tfds.load...
It loads the "Tensorflow Dataset" as supervised, meaning that it's 2 tuples for data and label. If you remove the as_supervised=True, it will be a dictionary, and you can iterate through them with dataset['image'] and dataset['label'].
Let me know if you want me to explain anything else.

How to save trajectories of tracked objects with trackpy?

I am testing http://soft-matter.github.io/trackpy/stable/
You can access my image data here: http://goo.gl/fMv5oE
My code for tracking objects in subsequent video images is:
import matplotlib.pyplot as plt
plt.rcParams['image.cmap'] = 'gray' # Set grayscale images as default.
import trackpy as tp
import pims
v = pims.ImageSequence('F:/*.png')
f = tp.batch(v[:100],diameter=21,threshold=25)
t = tp.link_df(f, 5)
How can I save t? (I am new to Python)
As a rule of thumb you can serialize objects using Pickle.
import pickle
pickle.dump(t,open("filename.pck","wb"))
Also looking at the documentation o TrackPy you can find some ways to store data as a Panda matrix.

Categories

Resources