convert an csv file that has RGB pixel values to image - python

i have this dataset on kaggle which has the pixel values of RGB image of retina vessels as csv file
i want to convert this csv file into an image to use it in CNN in python . after which i want to hold the images in seperate npy file (X.npy ) and hold the labels (Y.npy) into another npy file to be used in CNN and colab
this is what i have been able to do so far
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('C:\\Users\\Desktop\\DRIVE_Modified'):
for filename in filenames:
print(os.path.join(dirname, filename))
import pandas as pd
from matplotlib import pyplot as plt
data = pd.read_csv('img1.csv', usecols=['R', 'G', 'B'])
print(data.shape)
print(data.head(5))
plt.imshow(data.to_numpy().reshape(584,565,3))
data=data.to_numpy().reshape(584,565,3)
plt.imsave('img1.png', data.astype(np.uint8))
any suggestions much appreciated

Related

How do I access image files

So I was trying to open image files in google collab. My target is to apply FFT for image processing. Here i have files labelled as array values. How exactly would I access an image instead of the numerical values.
!nvidia-smi
!nvidia-smi
!nvidia-smi
!pip install tensorflow-gpu
!pip install tensorflow_hub
from __future__ import absolute_import, division, print_function, unicode_literals
import matplotlib.pylab as plt
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import pandas as pd
import os
import cv2
from google.colab import drive
drive.mount('/content/drive')
data_root='/content/drive/My Drive/ML/AlpDatabase/DATASET'
import numpy as np
from matplotlib import pyplot as plt
data = tf.keras.utils.image_dataset_from_directory('/content/drive/My Drive/ML/AlpDatabase/DATASET')
data_iterator = data.as_numpy_iterator()
batch = data_iterator.next()
batch[0].shape
fig, ax = plt.subplots(ncols=4, figsize=(20,20))
for idx, img in enumerate(batch[0][:4]):
ax[idx].imshow(img.astype(int))
ax[idx].title.set_text(batch[1][idx])
so if I had to print a certain image from data set how would I accomplish that as my goal is to send the image files to my FFT algorithm and then create a new dataset after processing. My goal here was to basically read image files from my drive folder maintain classes.
My drive folder has DATASET folder containing A,B,C,D.... etc folders each with many images. Basically an image dataset of alphabets.
If I understood correctly - You want to basically read image files from the drive folder maintain classes.
The code mentioned above is working fine to do the same when I tried replicating the issue. Please check this gist for your reference.
You also can use the below code to display images after fetching from the directory where class_names will be your DATASET inside folder names (A,B,C,D...).
data = tf.keras.utils.image_dataset_from_directory('/content/drive/My Drive/MY WORK/dataset/flowers')
class_names = data.class_names
print(class_names)
Output:
['daisy', 'dandelion', 'rose', 'sunflower', 'tulip'] #in your case ['A', 'B', 'C', 'D'...
To display images:
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
for images, labels in data.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
Output:

How do I train two sets of data given both files separately?

I am doing a project in which I need to estimate the age of an individual, given an X-Ray of their hand. I am given a testing set, which contains a large collection of images (in a folder on my computer), all NUMBERED, and I am also given a CSV file that corresponds each image number with 2 pieces of information: the age(in months), as well as whether the individual is male (this is given as "true" or "false." Also, I believe I have successfully imported both of these files into python(the image folder, as well as the CSV file)
I have looked at many TensorFlow tutorials, but I am struggling to figure out how I can associate the image numbers together, as well as train the data set. Any help would be greatly appreciated!!
I have attached blocks of my code, as well as how the data is presented to me, up until this point.
import pandas as pd
import numpy as np
import os
import tensorflow as tf
import cv2
from tensorflow import keras
from tensorflow.keras.layers import Dense, Input, InputLayer, Flatten
from tensorflow.keras.models import Sequential, Model
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
import random
%matplotlib inline
import matplotlib.pyplot as plt
--This simply imports libraries that I use, or anticipate using later on.
plt.figure(figsize=(20,20))
train_images=r'/Users/FOLDER/downloads/Boneage_competition/training_dataset/boneage-training-dataset'
for i in range(5):
file = random.choice(os.listdir(train_images))
image_path= os.path.join(train_images, file)
img=mpimg.imread(image_path)
ax=plt.subplot(1,5,i+1)
ax.title.set_text(file)
plt.imshow(img)
-- This successfully imports the image folder, as well as prints 5 random images to test if the importing worked.
This screenshot provides an example of how the pictures are depicted
IMG_WIDTH=200
IMG_HEIGHT=200
img_folder=r'/Users/FOLDER/downloads/Boneage_competition/training_dataset/'
-- I believe this resizes all the images to the specified dimensions
label_file = '/Users/FOLDER/downloads/train.csv'
train_labels = pd.read_csv (r'/Users/FOLDER/downloads/train.csv')
print (train_labels)
-- This successfully imports the data from the CSV file, and prints it, to make sure it worked.
If you have any ideas on how to connect these two datasets and train the data, I would greatly appreciate it.
Thank you!
The approach is simple create a map between the image_data and the label. After that you can create two lists/np.array and use the same to pass the train and label info to you model. Following code should help in getting the same.
import os
import glob
dic = {}
# assuming you have .png format files else change the same into the glob statement
train_images='/Users/FOLDER/downloads/Boneage_competition/training_dataset/boneage-training-dataset'
for file in glob.glob(train_images+'/*.png'):
b_name = os.path.basename(file).split('.')[0]
dic[b_name] = mpimg.imread(file)
dic_label_match = {}
label_file = '/Users/FOLDER/downloads/train.csv'
train_labels = pd.read_csv (r'/Users/rkrishna/downloads/train.csv')
for i in range(len(train_labels)):
# given your first column is age and image no starts from 1
dic_label_match[i+1] = str(train_labels.iloc[i][0])
# you can use the below line too
# dic_label_match[i+1] = str(train_labels.iloc[i][age])
# now you have dict with keys and values
# create two lists / arrays and you can pass the same to the keram model
train_x = []
label_ = []
for val in dic:
if val in dic and val in dic_label_match:
train_x.append(dic[val])
label_.append(dic_label_match[val])

Saving a multidimensional array as tiff image in python

I am working on a Hyperspectral Image(HSI) of Indian Pines. Initially the data was stored in .mat file and then using loadmat function I have read the data in an array. The array dimension is (145,145,200). Now When I am trying to save this array as a Tiff(.tif) image somehow things are not working well. I am using tifffile package and it's imwrite function to save the image. But when I am opening the image in QGIS software there it's showing only one band instead of 200 bands.
I am attaching the code here below:
import tifffile
import numpy as np
from scipy.io import loadmat
def read_HSI():
X = loadmat('Indian_pines_corrected.mat')['indian_pines_corrected']#dataset
y = loadmat('Indian_pines_gt.mat')['indian_pines_gt']#ground truth
print(f"X shape: {X.shape}\ny shape: {y.shape}")
return X, y
X, y = read_HSI()
tifffile.imwrite('IndianPines(inputX).tif', X)
If there is any other way to save a .mat file into .tif format then please let me know.
Thank you in advance.
I was able to save it using scikit-image:
import numpy as np
import skimage
data = np.dstack([skimage.data.astronaut(),skimage.data.astronaut()])
data = data.swapaxes(0,2)
data = data.swapaxes(1,2)
import matplotlib.pyplot as plt
plt.imshow(data[4])
Under the hood it uses tiffile plugin:
skimage.io.imsave('test2.tiff', data,photometric='minisblack')

Converting image folder to numpy array is consuming the entire RAM

I am trying to convert the celebA dataset(https://www.kaggle.com/jessicali9530/celeba-dataset) images folder into a numpy array for later to be converted into a .pkl file(for using the data as simply as mnist or cifar).
I am willing to find a better way of converting since this method is absolutely consuming the whole RAM.
from PIL import Image
import pickle
from glob import glob
import numpy as np
TARGET_IMAGES = "img_align_celeba/*.jpg"
def generate_dataset(glob_files):
dataset = []
for _, file_name in enumerate(sorted(glob(glob_files))):
img = Image.open(file_name)
pixels = list(img.getdata())
dataset.append(pixels)
return np.array(dataset)
celebAdata = generate_dataset(TARGET_IMAGES)
I am rather curious on how the mnist authors did this themselves but any approach that works is welcome.
You can transform any kind of data on the fly in Keras and load in memory one batch at the time during training.
See documentation, search for 'Example of using .flow_from_directory(directory)'.

support vector machines for classifying images

I am trying to use SVMs to classify a set if images I have on my computer into 3 categories :
I am just facing a problem of how to load the data as in the following example , he uses a data set that is already saved.
http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html
Me I have all the images in png format saved in a folder on my pc
You can load data as numpy arrays using Pillow, in this way:
from PIL import Image
import numpy as np
data = np.array(Image.open('yourimg.png')) # .astype(float) if necessary
couple it with os.listdir to read multiple files, e.g.
import os
for file in os.listdir('your_dir/'):
img = Image.open(os.path.join('your_dir/', file))
data = np.array(img)
your_model.train(data)

Categories

Resources