I am trying to rotate some images whose width is more than height about the left-top corner, by 90 degrees. I have written this :
from PIL import Image
import sys, csv, os, traceback, glob
import shutil, math
src_im = Image.open("Test.png")
print src_im.size[0] , ',' , src_im.size[1]
src_im = src_im.transpose(Image.ROTATE_90)
src_im = src_im.transpose(Image.FLIP_LEFT_RIGHT)
src_im = src_im.transpose(Image.FLIP_TOP_BOTTOM)
src_im.save("TestResult.png")
print src_im.size[0] , ',' , src_im.size[1]
Output generated is as I expect, but there is a huge change in size. Any ideas where I might be going wrong ?
Its the same pixel information being stored, just rotated, why should there be a change in the image size ?
eg.
(936 x 312) 155KB
(312 x 936) 342KB
Edit:
Ok, so I tried rotating the images with the inbuilt image viewer of windows, and there is an increase in that case as well. So its not really specific to Python per se. More about compression. Am still not clear why would it be less compressible on rotation ? And this is happening for all images I am trying, not this particular one. Updating the tags accordingly.
PNG compress the image by "filtering" each line, trying to predict the values for each pixel as a function of the "past" neighbours (previous row and/or column), and then compressing the prediction error by using ZLIB (Deflate).
The issue here seems to be this: the vertical image has almost vertical stripes; when scanned along the rows, it has a fairly predictable medium-range pattern (about 8 similar colors followed by a short burst of lighter colour). This suggest that, while the short-range prediction will not be very successful, the prediction error will get a highly repetitive pattern, that should be relatively easy to compress. This does not happen when the image is rotated.
I verified that the different horizontal/vertical sizes were not the problem: I made a bigger square (900x900) by repeating the original image 9 times. The PNG image with quasi vertical stripes has roughly half the size than the other one.
Another experiment that confirms the above: save both images as grayscale BMP (this is an uncompressed format, it stores one byte per pixel, along the rows). You get two images of 293.110 bytes. Compress both of them with a standard ZIP compressor (same family as ZLIB's deflate). The vertical image, again, gets about half the size than the other one.
Related
I want to reduce the file size of a PNG file using Python. I have gone through a lot of material on the internet, but I could not find anything where I would reduce the file size for an image without changing the dimensions i.e. height/width. What I have found is how to change the dimensions of the image, using PIL or some other library of Python.
How can I reduce image file size while keeping it's dimensions constant?
PNG is a lossless format but it still can be compressed in a lossless fashion. Pillow's Image class comes with a png writer accepting a compress_level parameter.
It can easily be demonstrated that it works:
Step 1: Load generic rgb lena image in .png format:
import os
import requests
from PIL import Image
from io import BytesIO
img_url = 'http://pngnq.sourceforge.net/testimages/lena.png'
response = requests.get(img_url)
img = Image.open(BytesIO(response.content))
Step 2: Write png with compress_level=0:
path_uncompressed = './lena_uncompressed.png'
img.save(path_uncompressed,compress_level=0)
print(os.path.getsize(path_uncompressed))
> 691968
Step 3: Write png with compress_level=9:
path_compressed = './lena_compressed.png'
img.save(path_compressed,compress_level=9)
print(os.path.getsize(path_compressed))
> 406889
which in this case gives us a respectable 30% compression without any obvious image quality degradation (which should be expected for a lossless compression algorithm).
PNG is lossless format and obviously it will consume more space.
If you are only concerned on resolution, then you can convert to any of the lossy form like JPG.
https://whatis.techtarget.com/definition/lossless-and-lossy-compression
The dimension after conversion would be the same, but quality depends on the compression needed
Snippet to convert PNG to JPG
from PIL import Image
im = Image.open("Picture2.png")
rgb_im = im.convert('RGB')
rgb_im.save('Picture2.jpg')
By default, most PNG writers will use the maximum compression setting, so trying to change that will not help much. Uncompressed PNGs exist, but they make no sense, I don't think you'll run into many of those.
Thus, the only way of making PNGs significantly smaller is to reduce the number of pixels it stores, or to reduce the number of colors it stores. You specifically said not to want to reduce the number of pixels, so the other option is to reduce the number of colors.
Most PNG files will be in "true color" mode (typically 24 bits per pixel, with one byte each for the red, green and blue components of each pixel). However, it is also possible to make indexed PNG files. These store a color map (a.k.a. palette), and a single value per pixel, the index into a color map. If you, for example, pick a color map of 64 entries, then each pixel will need 6 bits to encode the index. You'd store 64*3 bytes + 3/4 bytes per pixel (which of course compress as well). I found this web site comparing a few example images, what they'd look like and how big the file ends up being when reducing colors.
This other question shows how to use PIL to convert an RGB image to an indexed image:
img.convert("P", palette=Image.ADAPTIVE)
This seems to generate an 8-bit color map though, PIL has no support for smaller color maps. The PyPNG module would allow you to write PNG files with any number of colors in the color map.
I've got a GeoTIFF image that I need to make blurry by applying a smoothing filter. The image itself contains metadata that needs to be preserved. It has a bit-depth of 8 and uses a color table with 256 32-bit RGBA values to look up a color for each pixel, but in order for the resulting image to look smooth it will probably have to use a bit-depth of 24 or 32 and no color table, alternatively use jpeg compression. What may complicate this further is that the image is 23,899x18,330 pixels large, which is almost five times as large as the largest file PIL wants to open by default.
How can create the blurry version of this image in Python 3?
I have also tried using PIL to just open and save it again:
from PIL import Image
Image.MAX_IMAGE_PIXELS = 1000000000
im = Image.open(file_in)
im.save(file_out)
This code doesn't crash, and I get a new .tif file that is approximatelly as large as the original file, but when I try to open it in Windows Photo Viewer to look at it the application says it is corrupt, and it cannot be re-opened by PIL.
I have also tried using GDAL. When I try this code, I get an output image that is 835 MB large, which corresponds to an uncompressed image with a bit-depth of 16 (which is also what the file metadata says when I right-click on it and choose "Properties" – I'm using Windows 10). However, the resulting image is monochrome and very dark, and the colors look like they have been jumbled up, which makes me believe that the code I'm trying interprets the pixel values as intensity values and not as table keys.
So in order to make this method work, I need to figure out how to apply the color table (which is some sort of container for tuples, of type osgeo.gdal.ColorTable) to the raster band (whatever a raster band is), which is a numpy array with the shape (18330, 23899), to get a new numpy array with the shape (18330, 23899, 4) or (4, 18330, 23899) (don't know which is the correct shape), insert this back into the loaded image and remove the color table (or create a new one with the same metadata), and finally save the modified image with compression enabled (so I get closer to the original file size – 11.9 MB – rather than 835 MB which is the size of the file I get now). How can I do that?
pyvips can process huge images quickly using just a small amount of memory, and supports palette TIFF images.
Unfortunately it won't support the extra geotiff tags, since libtiff won't work on unknown tag types. You'd need to copy that metadata over in some other way.
Anyway, if you can do that, pyvips should work on your image. I tried this example:
import sys
import pyvips
# the 'sequential' hint tells libvips that we want to stream the image
# and don't need full random access to pixels ... in this mode,
# libvips can read, process and write in parallel, and without needing
# to hold the whole image in memory
image = pyvips.Image.new_from_file(sys.argv[1], access='sequential')
image = image.gaussblur(2)
image.write_to_file(sys.argv[2])
On an image of the type and size you have, generating a JPEG-compressed TIFF:
$ tiffinfo x2.tif
TIFF Directory at offset 0x1a1c65c6 (438068678)
Image Width: 23899 Image Length: 18330
Resolution: 45118.5, 45118.5 pixels/cm
Bits/Sample: 8
Compression Scheme: None
Photometric Interpretation: palette color (RGB from colormap)
...
$ /usr/bin/time -f %M:%e python3 ~/try/blur.py x2.tif x3.tif[compression=jpeg]
137500:2.42
So 140MB of memory, 2.5 seconds. The output image looks correct and is 24mb, so not too far off yours.
A raster band is just the name given to each "layer" of the image, in your case they will be the red, green, blue, and alpha values. These are what you want to blur. You can open the image and save each band to a separate array by using data.GetRasterBand(i) to get the ith band (with 1-indexing, not 0-indexing) of the image you opened using GDAL.
You can then try and use SciPy's scipy.ndimage.gaussian_filter to achieve the blurring. You'll want to send it an array that is shape (x,y), so you'll have to do this for each raster band individually. You should be able to save your data as another GeoTIFF using GDAL.
If the colour table you are working with means that your data is stored in each raster band in some odd format that isn't just floats between 0 and 1 for each of R, G, B, and A, then consider using scipy.ndimage.generic_filter, although without knowing how your data is stored it's hard to give specifics on how you'd do this.
I am trying to learn python by doing some small projects. one of them is as follows:
I want to write a code that converts an image to straight lines. I am actually into string art and want to receive the output in a way that can be used to easily build the string art by using the output of my code.
so, I'll try to explain what I'm trying to do:
1. import an image and get the pixel coordinate and save them into an array.
2.get brightness value for each pixel of the image.
3. choose a number of lines that are going to be the "quality" of my output, in reality, they'd be the strings that I use to create the art.
4.draw a random amount of lines through the darkest pixel of the image, and compare each lines total brightness to the others, the darkest (the line hitting the darkest pixels) is chosen and the pixels in that line are removed from the pixel array.
5.save the two x and y coordinate of every line's intersection with the image borders ( my canvas). so I can use this later and know where to start and finish ny strings.
6. repeat the step 4 and 5, for the number of lines chosen.
7. print or save the XY coordinates of the line intersections with image borders.
I plan using PIL and numpy to do this, now my question is this:
a. do you think there are easier or better ways to active my goal?
b. what is the best way to get a clean array of pixels from any given digital image?
you can see the kind of image I'm trying to produce at linify.me
thanks.
I'm trying to develop a system which can convert a seven-segment display on an old analog pressure output system to text, such that the data can be processed by LabVIEW. I've been working on image processing to get Tesseract (using v3.02) to recognize the numbers correctly, but have been hitting some roadblocks and don't quite know how to proceed. This is what I've got so far:
Image needs to be a height of between 50-100 pixels for Tesseract to read it correctly. I've found the best results with a height of 50.
Image needs to be cropped such that there is only one line of text.
Image should be in black and white
Image should be relatively level from left to right.
I've been using the seven-segment training data 'letsgodigital'. This is the code for the image manipulation I've been doing so far:
ret, i = video.read()
h,width,channels = i.shape #get dimensions
g = cv2.cvtColor(i,cv2.COLOR_BGR2GRAY)
histeq=cv2.equalizeHist(g) #spreads pixel values across entire spectrum
_,t = cv2.threshold(histeq,150,225,cv2.THRESH_BINARY) #thresholds histeq
cropped = t[int(0.4*h):int(.6*h), int(0.1*width):int(0.9*width)]
rotated = imutils.rotate_bound(cropped, angle)
resized = imutils.resize(rotated,height=resizing_height)
Some numbers work better than others - for example, '1' seems to have a lot of trouble. The numbers occurring after the '+' or '-' often don't show up, and the '+' often shows up as a '-'. I've played around with the threshold values a bit, too.
The last three parts are because my video sample I've been drawing from was slightly askew. I could try taking some better data to work with, and I could also try making my own training data over the standard 'letsgodigital' lang. I feel like I'm not doing the image processing in the best way though, and would appreciate some guidance.
I plan to use some degree of edge detection to autocrop to the display, but for now I've just been trying to keep it simple and manually get the results I want. I've uploaded sample images with various degrees of image processing applied at http://imgur.com/a/vnqgP. It's difficult because sometimes I get the exact right answer from tesseract, and other times get nothing. The camera or light levels haven't really changed though, which makes me think it's a problem with my training data. Any suggestions or direction on where I should go would be much appreciated!! Thank you
For reading seven segment digits, normal OCR programs like tesseract don't usually work too well because of the space between individual segments. You should try ssocr, which was made specifically for reading seven segment digits. However, your preprocessing will need to be better as ssocr expects the input to be a single row of seven segment digits.
References - https://www.unix-ag.uni-kl.de/~auerswal/ssocr/
Usage example - http://www.instructables.com/id/Raspberry-Pi-Reading-7-Segment-Displays/
I have a large 2D array (4000x3000) saved as a numpy array which I would like to display and save while keeping the ability to look at each individual pixels.
For the display part, I currently use matplotlib imshow() function which works very well.
For the saving part, it is not clear to me how I can save this figure and preserve the information contained in all 12M pixels. I tried adjusting the figure size and the resolution (dpi) of the saved image but it is not obvious which figsize/dpi settings should be used to match the resolution of the large 2D matrix displayed. Here is an example code of what I'm doing (arr is a numpy array of shape (3000,4000)):
fig = pylab.figure(figsize=(16,12))
pylab.imshow(arr,interpolation='nearest')
fig.savefig("image.png",dpi=500)
One option would be to increase the resolution of the saved image substantially to be sure all pixels will be properly recorded but this has the significant drawback of creating an image of extremely large size (at least much larger than the 4000x3000 pixels image which is all that I would really need). It also has the disadvantage that not all pixels will be of exactly the same size.
I also had a look at the Python Image Library but it is not clear to me how it could be used for this purpose, if at all.
Any help on the subject would be much appreciated!
I think I found a solution which works fairly well. I use figimage to plot the numpy array without resampling. If you're careful in the size of the figure you create, you can keep full resolution of your matrix whatever size it has.
I figured out that figimage plots a single pixel with size 0.01 inch (this number might be system dependent) so the following code will for example save the matrix with full resolution (arr is a numpy array of shape (3000,4000)):
rows = 3000
columns = 4000
fig = pylab.figure(figsize=(columns*0.01,rows*0.01))
pylab.figimage(arr,cmap=cm.jet,origin='lower')
fig.savefig("image.png")
Two issues I still have with this options:
there is no markers indicating column/row numbers making it hard to know which pixel is which besides the ones on the edges
if you decide to interactively look at the image, it is not possible to zoom in/out
A solution that also solves the above 2 issues would be terrific, if it exists.
The OpenCV library was designed for scientific analysis of images. Consequently, it doesn't "resample" images without your explicitly asking for it. To save an image:
import cv2
cv2.imwrite('image.png', arr)
where arr is your numpy array. The saved image will be the same size as your array arr.
You didn't mention the color-model that you are using. Pngs, like jpegs, are usually 8-bit per color channel. OpenCV will support up to 16-bits per channel if you request it.
Documentation on OpenCV's imwrite is here.