Saving an Image file using binary Files - pyspark

Saving an Image file using binary Files - pyspark - python

How can I save Image file(JPG format) into my local system. I used BinaryFiles to load the pictures into spark, converted them into Array and processed them. Below is the code
from PIL import Image
import numpy as np
import math
images = sc.binaryFiles("path/car*")
imagerdd = images.map(lambda (x,y): (x,(np.asarray(Image.open(StringIO(y)))))
did some image processing and now key has path and value has Array for Image
imageOutuint = imagelapRDD.map(lambda (x,y): (x,(y.astype(np.uint8))))
imageOutIMG = imageOutuint.map(lambda (x,y): (x,(Image.fromarray(y))))
How can I save the Image to local/HDFS system, I see there is no option pertaining to it.

If you want to save data to local file system just collect as local iterator and use standard tools to save files records by records:
for x, img in imagerdd.toLocalIterator():
path = ... # Some path .jpg (based on x?)
img.save(path)
Just be sure to cache imagerdd to avoid recomputation.

Related

How to load images from memory to numpy using file system

I want to store my image directory in memory, then load the images into a numpy array.
The normative way to load images that are not in memory is as follows:
import PIL.Image
import numpy as np
image = PIL.Image.open("./image_dir/my_image_1.jpg")
image = np.array(image)
However, I am not sure how to do this when the images are in memory. So far, I have been able to setup the following starter code:
import fs
import fs.memoryfs
import fs.osfs
image_dir = "./image_dir"
mem_fs = fs.memoryfs.MemoryFS()
drv_fs = fs.osfs.OSFS(image_path)
fs.copy.copy_fs(drv_fs, mem_fs)
print(mem_fs.listdir('.'))
Returns:
['my_image_1.jpg', 'my_image_2.jpg']
How do I load images that are in memory into numpy?
I am also open to alternatives to the fs package.

As per the documentation, Pillow's Image.open accepts a file object instead of a file name, so as long as your in-memory file package provides Python file objects (which it most likely does), you can just use them. If it doesn't, you could even just wrap them in a class that provides the required methods. Assuming you are using PyFilesystem, according to its documentation you should be fine.
So, you want something like:
import numpy as np
import PIL.Image
import fs.memoryfs
import fs.osfs
import fs.copy
mem_fs = fs.memoryfs.MemoryFS()
drv_fs = fs.osfs.OSFS("./image_dir")
fs.copy.copy_file(drv_fs, './my_image_1.jpg', mem_fs, 'test.jpg')
with mem_fs.openbin('test.jpg') as f:
image = PIL.Image.open(f)
image = np.array(image)
(note I just used copy_file because I tested with a single file, you can use copy_fs if you need to copy the entire tree - it's the same principle)

Writing Images to s3fs.S3FileSystem after preprocessing image

Am currently accessing a s3 bucket from my school system.
To connect, I used the following:
import s3fs
from skimage import exposure
from PIL import Image, ImageStat
s3 = s3fs.S3FileSystem(client_kwargs={'endpoint_url': 'XXX'},
key='XXX',
secret='XXX')
I can retrieve an image from the s3 bucket as defined above and preprocess them using
infile = s3.open('test.jpg',"rb")
image = Image.open(infile)
img = np.asarray(image) #numpy.ndarray
img_eq = exposure.equalize_adapthist(img,clip_limit=0.03) #CLAHE
image_eq = Image.fromarray((img_eq * 255).astype(np.uint8)) #Convert back to image
To save the resulting image <image_eq> locally, would just be
image_eq.save("hello.jpg")
However, how do I save/write the resulting image into the s3fs filesystem instead?

save in Pillow accepts a file too. You could do:
image_eq.save(fs.open('s3://bucket/file.png', 'wb'), 'PNG')
You have to write a binary file. I think it works best by enforcing the file type, e.g. in this case PNG.

How to join two TIFF files populated using memory-mapped IO

I'm trying to write a python function which will output a single TIFF file after combining multiple TIFF files. I have a folder with a large amount of TIFF files and I'm trying to join each of the TIFF files into a single file. I have to load the data as numpy array and should also be populating using memory-mapped IO.

Untested example, that should give you an idea:
from pathlib import Path
import numpy as np
import tifffile
my_path = Path(r'path/to/tiffs')
output = Path('output.tiff')
tiffs = list(my_path.glob('*.tiff'))
x,y = (512,512) # either hardcode or read from first tiff
output = np.zeros((len(tiffs), x, y))
for i, image in enumerate(tiffs):
a = tifffile.imread(image.open(mode = 'rb'))
output[i, :, : ] = a
tifffile.imsave(output.open(mode='wb'), output)

How to load images from LMDB in python without Caffe?

I want to load my image and label data from a LMDB database I created. I assign a unique key to corresponding image-label pairs and add them to the LMDB (eg. image-000000001, label-000000001). While saving the images, I convert the numpy-array of the image to string using image.tostring(). Now while loading the LMDB, I see that I can get the labels very simply by passing the keys I generated, however the image-data is shown in an encoded fashion. Doing a numpy.fromstring(lmdb_cursor.get('image-000000001')) doesn't work.
I see here - the second answer, specifically, by #Ghilas BELHADJ that one has to use Caffe-datum objects to first load the data and then fetch the image using datum.data. But I don't have such a structure where the image and label are organised using the 'data' and 'label' tags. How does one read the data correctly back in the form of a numpy image from such an LMDB in python?
In Lua, this can be achieved as follows,
local imgBin -- this is the object returned from cursor:get(image-id)
local imageByteLen = string.len(imgBin)
local imageBytes = torch.ByteTensor(imageByteLen):fill(0)
imageBytes:storage():string(imgBin)
local img = Image.decompress(imageBytes, 3, 'byte')
img = Image.rgb2y(img)
img = Image.scale(img, imgW, imgH)
I don't know how to do this in Python.

import lmdb
import cv2
import numpy as np
with lmdb.open(lmdb_dir,readonly=True).begin(write=False) as txn:
for idx,(key,val) in enumerate(txn.cursor()):
img = cv2.imdecode(np.fromstring(val,dtype=np.uint8),1)

support vector machines for classifying images

I am trying to use SVMs to classify a set if images I have on my computer into 3 categories :
I am just facing a problem of how to load the data as in the following example , he uses a data set that is already saved.
http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html
Me I have all the images in png format saved in a folder on my pc

You can load data as numpy arrays using Pillow, in this way:
from PIL import Image
import numpy as np
data = np.array(Image.open('yourimg.png')) # .astype(float) if necessary
couple it with os.listdir to read multiple files, e.g.
import os
for file in os.listdir('your_dir/'):
img = Image.open(os.path.join('your_dir/', file))
data = np.array(img)
your_model.train(data)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Saving an Image file using binary Files - pyspark - python

If you want to save data to local file system just collect as local iterator and use standard tools to save files records by records: for x, img in imagerdd.toLocalIterator(): path = ... # Some path .jpg (based on x?) img.save(path) Just be sure to cache imagerdd to avoid recomputation.

Related

How to load images from memory to numpy using file system

Writing Images to s3fs.S3FileSystem after preprocessing image

How to join two TIFF files populated using memory-mapped IO

How to load images from LMDB in python without Caffe?

support vector machines for classifying images

Categories

Resources