How can I remove EXIF data from a dataset?

How can I remove EXIF data from a dataset? - python

I am trying to remove EXIF data from images in a dataset (which I will use in transfer learning). However, it does not seem to be working. Below is my code:
import os
from PIL import Image
import piexif
import imghdr
from tqdm import tqdm
import warnings
Folder = 'drive/My Drive/PetImages'
labels =['Dog', 'Cat']
for label in labels:
imageFolder = os.path.join(Folder, label)
listImages = os.listdir(imageFolder)
for img in tqdm(listImages):
imgPath = os.path.join(imageFolder,img)
try:
img = Image.open(imgPath)
data = list(img.getdata())
image_without_exif = Image.new(img.mode, img.size)
image_without_exif.putdata(data)
image_without_exif.save(img)
print("done")
except:
print("except")
I tried saving the image using PIL (as per a previously asked question: Python: Remove Exif info from images) but the output is purely composed of "except"s.
I tried again using the piexif module, as below:
# Same imports as above
Folder = 'drive/My Drive/PetImages'
labels =['Dog', 'Cat']
for label in labels:
imageFolder = os.path.join(Folder, label)
listImages = os.listdir(imageFolder)
for img in tqdm(listImages):
imgPath = os.path.join(imageFolder,img)
try:
ImageType = img.format
# warnings.filterwarnings("error")
if ImageType in ["JPEG", "TIF", "WAV"]:
exif_data = img._getexif()
print(exif_data)
piexif.remove(img)
print("done")
except:
print("except")
In the code above, I check for the image type first to make sure the method _getexif() actually exists, then I just remove the data after saving it in exif_data variable. The output consisted of "except"s and the occasional exif data (in the form of a dictionary) or "None" if it doesn't exist but never the word "done". Why doesn't it reach that part?

For anyone stumbling upon this through Google, there is a simple solution using PIL:
from PIL import Image
im = Image.open('some-image.jpg')
# this clears all exif data
im.getexif().clear()
im.save('some-image-without-exif.jpg')
I thought that getexif() only allows read access as the name might imply, but it turns out that this is not the case.
Edit: In my case, it even worked to just load and save the file, without im.getexif().clear(). I don't know how reliable that is, though.
That command definitely removes exif-data from the image-object, though. This can be simply tested in a Python shell:
>>> from PIL import Image
>>> im = Image.open('some-image.jpg')
>>> print(im.getexif())
{296: 2, 282: 72.0, 283: 72.0 ..... }
>>> im.getexif().clear()
>>> print(im.getexif())
{}

Related

Avoid Loading Image dataset in Pycharm IDE multiple times(Load only once)

I am working on an Image Classification problem using Keras/Tensorflow. The problem is that since I am using an IDE like Pycharm (I also use Jupyter Notebook), I am curious to know if there is any way where I can load the dataset from the directory only once and then when I re-run the whole .py file, I just use the images from already loaded data?
labels = ['rugby', 'soccer']
img_size = 224
def get_data(data_dir):
data = []
for label in labels:
path = os.path.join(data_dir, label)
class_num = labels.index(label)
for img in os.listdir(path):
try:
img_arr = cv2.imread(os.path.join(path, img))[...,::-1] #convert BGR to RGB format
resized_arr = cv2.resize(img_arr, (img_size, img_size)) # Reshaping images to preferred size
data.append([resized_arr, class_num])
except Exception as e:
print(e)
return np.array(data)
Now we can easily fetch our train and validation data.
train = get_data('../input/traintestsports/Main/train')
val = get_data('../input/traintestsports/Main/test')
Every time get_data is called, it would require additional time to load entire datset

You can read in each image using the cv2.imread() method, and use the np.save() method to save all the images (put into a single array) to save the data into a binary file in .npy format:
import cv2
import numpy as np
imgs = ['image1.png', 'image2.png', 'image3.png', 'image4.png']
# Map each str to cv2.imread, convert map object to list, and convert list to array
arr = np.array(list(map(cv2.imread, imgs)))
np.save('data.npy', arr)
When you want to access the data, you can use the np.load() method:
import numpy as np
arr = np.load('data.npy')
You can install cv2 (OpenCV) via the command prompt command:
pip install opencv-python
and numpy with
pip install numpy
If you have a more complex data type, you can use the pickle.dump() method to save your data sterilized into a file:
import pickle
data = {"data": ['test', 1, 2, 3]} # Replace this with your dataset
with open("data.pickle", "wb") as f:
pickle.dump(data, f)
When you want to access the data, you can use the pickle.load() method:
import pickle
with open("data.pickle", "rb") as f:
data = pickle.load(f)
print(data)
Output:
{'data': ['test', 1, 2, 3]}
The pickle module is built into python.

Stitching multiple pngs into a h5 image h5py

I created an model in blender. From here I took 2d slices through the y-plane of that model leading to the following.
600 png files each corresponding to a ylocation i.e y=0, y=0.1 etc
Each png file has a resolution of 500 x 600.
I am now trying to merge the 600 pngs into a h5 file using python before loading the .h5 into some software. I find that each individual png file is read fine and looks great. However when I look at the final 3d image there is some stretching of the image, and im not sure how this is being created.
The images are resized (from 600x600 to 500x600, but I have checked and this is not the cause of the stretching). I would like to know why I am introducing such stretching in other planes (not y-plane).
Here is my code, please note that there is some work in progress here, hence why I append the dataset to a list (this is to be used for later code)
from PIL import Image
import sys
import os
import h5py
import numpy as np
import cv2
from datetime import datetime
dir_path = os.path.dirname(os.path.realpath(__file__))
sys.path.append(dir_path + '//..//..')
Xlen=500
Ylen=600
Zlen=600
directory=dir_path+"/LowPolyA21/"
for filename in os.listdir(directory):
if fnmatch.fnmatch(filename, '*.png'):
image = Image.open(directory+filename)
new_image = image.resize((Zlen, Xlen))
new_image.save(directory+filename)
dataset = np.zeros((Xlen, Zlen, Ylen), np.float)
# traverse all the pictures under the specified address
cnt_num = 0
img_list = sorted(os.listdir(directory))
os.chdir(directory)
for img in (img_list):
if img.endswith(".png"):
gray_img = cv2.imread(img, 0)
dataset[:, :, cnt_num] = gray_img
cnt_num += 1
dataset[dataset == 0] = -1
dataset=dataset.swapaxes(1,2)
datasetlist=[]
datasetlist.append(dataset)
dz_dy_dz = (float(0.001),float(0.001),float(0.001))
for j in range(Xlen):
for k in range(Ylen):
for l in range(Zlen):
if datasetlist[i][j,k,l]>1:
datasetlist[i][j,k,l]=1
now = datetime.now()
timestamp = now.strftime("%d%m%Y_%H%M%S%f")
out_h5_path='voxelA_'+timestamp+'_flipped'
out_h5_path2='voxelA_'+timestamp+'_flipped.h5'
with h5py.File(out_h5_path2, 'w') as f:
f.attrs['dx_dy_dz'] = dz_dy_dz
f['data'] = datasetlist[i] # Write data to the file's primary key data below
Example of image without stretching (in y-plane)
Example of image with stretching (in x-plane)

Write an array of Dicom images

I have a folder of dicom images and I stored these images in an array and I would like to print them out in a different folder.
I cannot find a method that will write out each of the images like the cv2.imwrite
import pydicom
import skimage, os
import numpy as np
FolderPathName = r'FolderPathName'
slices = [pydicom.read_file(FolderPathName + imagename) for imagename in os.listdir(FolderPathName)]
# Sort the dicom slices in their respective order
slices.sort(key=lambda x: int(x.InstanceNumber))
for x in range(len(slices)):
#write the images in a new folder

Method 1:
In your case,
The answer is ...
import pydicom
import skimage, os
import numpy as np
FolderPathName = r'FolderPathName'
slices = [pydicom.read_file(FolderPathName + imagename) for imagename in os.listdir(FolderPathName)]
# Sort the dicom slices in their respective order
slices.sort(key=lambda x: int(x.InstanceNumber))
jpg_folder = '' # Set your jpg folder
for idx in range(len(slices)):
#write the images in a new folder
jpg_filepath = os.path.join( jpg_folder, "pic-{}.jpg".format(idx) )
np_pixel_array = slices[idx].pixel_array
cv2.imwrite(jpg_filepath, np_pixel_array)
Method 2:
But, there is better way to process dicom files ...
import pydicom
import os
import numpy as np
import cv2
dicom_folder = '' # Set the folder of your dicom files that inclued images
jpg_folder = '' # Set the folder of your output folder for jpg files
# Step 1. prepare your input(.dcm) and output(.jpg) filepath
dcm_jpg_map = {}
for dicom_f in os.listdir(dicom_folder):
dicom_filepath = os.path.join(dicom_folder, dicom_f)
jpg_f = dicom_f.replace('.dcm', '.jpg')
jpg_filepath = os.path.join(jpg_folder,jpg_f)
dcm_jpg_map[dicom_filepath] = jpg_filepath
# Now, dcm_jpg_map is key,value pair of input dcm filepath and output jpg filepath
# Step 2. process your image by input/output information
for dicom_filepath, jpg_filepath in dcm_jpg_map.items():
# convert dicom file into jpg file
dicom = pydicom.read_file(dicom_filepath)
np_pixel_array = dicom.pixel_array
cv2.imwrite(jpg_filepath, np_pixel_array)
In above code,
the Step 1 is focus on file path processing. It's good for your to porting your code into different environment easily.
The Step 2 is major code which focus on any kind of image processing.

Convert image files to a csv file

I'm working on a The Japanese Female Facial Expression (JAFFE) Database. You can find the database on this link http://www.kasrl.org/jaffe.html.
When I download the database I got a list of pictures. I would like to convert these image files into a CSV file but I'm still new in deep learning and I don't know how. Someone proposed that I work with OpenCV. what should I do?

i have simple example
i hope this help you.
from PIL import Image
import numpy as np
import sys
import os
import csv
def createFileList(myDir, format='.jpg'):
fileList = []
print(myDir)
for root, dirs, files in os.walk(myDir, topdown=False):
for name in files:
if name.endswith(format):
fullName = os.path.join(root, name)
fileList.append(fullName)
return fileList
# load the original image
myFileList = createFileList('path/to/directory/')
for file in fileList:
print(file)
img_file = Image.open(file)
# get original image parameters...
width, height = img_file.size
format = img_file.format
mode = img_file.mode
# Make image Greyscale
img_grey = img_file.convert('L')
value = np.asarray(img_grey.getdata(), dtype=np.int).reshape((img_grey.size[1], img_grey.size[0]))
value = value.flatten()
print(value)
with open("img_pixels.csv", 'a') as f:
writer = csv.writer(f)
writer.writerow(value)

Install pillow, numpy, pandas
Convert the image to RGB
plot RGB along with x,y co-ordinates in a pandas Dataframe
Save the dataframe as csv
Sample working code as below
from PIL import Image
from numpy import array, moveaxis, indices, dstack
from pandas import DataFrame
image = Image.open("data.tiff")
pixels = image.convert("RGB")
rgbArray = array(pixels.getdata()).reshape(image.size + (3,))
indicesArray = moveaxis(indices(image.size), 0, 2)
allArray = dstack((indicesArray, rgbArray)).reshape((-1, 5))
df = DataFrame(allArray, columns=["y", "x", "red","green","blue"])
print(df.head())
df.to_csv("data.csv",index=False)

You don't need to write any code, you can just use vips on the command-line on macOS, Linux or Windows.
So, in Terminal (or Command Prompt, if on Windows):
vips im_vips2csv TM.AN1.190.tiff result.csv
will convert the 256x256 greyscale image TM.AN1.190.tiff into a 256 line CSV with 256 entries per line. Simples!
If you want to replace the tab separators by commas, you can do:
tr '\t' , < result.csv > NewFile.csv

How to properly load a set of images in python

I am trying to open a set of images in python, but I am a bit puzzled on how I should do that. I know how to do it with one image, but I don't have a clue on how to handle several hundreds of images.
I have a file folder with a few hundred .jpg images. I want to load them in a python program to do machine learning on them. How can I do this properly?
I don't have any code yet since I am already struggling with this.
But my Idea in pseudocode was
dataset = load(images)
do some manipulations on it
How I have done it before:
from sklearn.svm import LinearSVC
from numpy import genfromtxt,savetxt
load = lambda x: genfromtxt(open(x,"r"),delimiter = ",",dtype = "f8")[1:]
dataset = load("train.csv")
train = [x[1:] for x in dataset]
target = [x[0] for x in dataset]
test = load("test.csv")
linear = LinearSVC()
linear.fit(train,target)
savetxt("digit2.csv",linear.predict(test),delimiter = ",", fmt = "%d")
Which worked fine because of the format. Al the data was in one file.

If you want to process each image individually (assuming you're using PIL or Pillow) then do so sequentially:
import os
from glob import glob
try:
# PIL
import Image
except ImportError:
# Pillow
from PIL import Image
def process_image(img_path):
print "Processing image: %s" % img_path
# Open the image
img = Image.open(img_path)
# Do your processing here
print img.info
# Not strictly necessary, but let's be explicit:
# Close the image
del img
images_dir = "/home/user/images"
if __name__ == "__main__":
# List all JPEG files in your directory
images_list = glob(os.path.join(images_dir, "*.jpg"))
for img_filename in images_list:
img_path = os.path.join(images_dir, img_filename)
process_image(img_path)

Read the documentation on python glob module and in a loop process each of the images in turn.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I remove EXIF data from a dataset? - python

Related

Avoid Loading Image dataset in Pycharm IDE multiple times(Load only once)

Stitching multiple pngs into a h5 image h5py

Write an array of Dicom images

Convert image files to a csv file

How to properly load a set of images in python

Categories

Resources