Increase resolution with word-cloud and remove empty border - python

I am using word cloud with some txt files. How do I change this example if I wanted to 1) increase resolution and 2) remove empty border.
#!/usr/bin/env python2
"""
Minimal Example
===============
Generating a square wordcloud from the US constitution using default arguments.
"""
from os import path
import matplotlib.pyplot as plt
from wordcloud import WordCloud
d = path.dirname(__file__)
# Read the whole text.
text = open(path.join(d, 'constitution.txt')).read()
wordcloud = WordCloud().generate(text)
# Open a plot of the generated image.
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

You can't increase the resolution of the image in plt.show() since that is determined by your screen, but you can increase the size. This allows it to scale, zoom, etc. without blurring. To do this pass dimensions to WordCloud, e.g.
wordcloud = WordCloud(width=800, height=400).generate(text)
However, this just determines the size of the image created by WordCloud. When you display this using matplotlib it is scaled to the size of the plot canvas, which is (by default) around 800x600 and you again lose quality. To fix this you need to specify the size of the figure before you call imshow, e.g.
plt.figure( figsize=(20,10) )
plt.imshow(wordcloud)
By doing this I can successfully create a 2000x1000 high resolution word cloud.
For your second question (removing the border) first we could set the border to black, so it is less apparent, e.g.
plt.figure( figsize=(20,10), facecolor='k' )
You can also shrink the size of the border by using tight_layout, e.g.
plt.tight_layout(pad=0)
The final code:
# Read the whole text.
text = open(path.join(d, 'constitution.txt')).read()
wordcloud = WordCloud(width=1600, height=800).generate(text)
# Open a plot of the generated image.
plt.figure( figsize=(20,10), facecolor='k')
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad=0)
plt.show()
By replacing the last two lines with the following you can get the final output shown below:
plt.savefig('wordcloud.png', facecolor='k', bbox_inches='tight')

If you are trying to use an image as a mask, make sure to use a big image to get better image quality.. I spent hours figuring this out.
Heres an example of a code snippet I used
mask = np.array(Image.open('path_to_your_image'))
image_colors = ImageColorGenerator(mask)
wordcloud = WordCloud(width=1600, height=800, background_color="rgba(255, 255, 255, 0)", mask=mask
,color_func = image_colors).generate_from_frequencies(x)
# Display the generated image:
plt.figure( figsize=(20,10) )
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")

It is very simple, plt.tight_layout(pad=0) does the job, reduces the space in the background, removing the excess padding.

You can use the method to_svg and get a resolution however high you want.
with open("Output.svg", "w") as text_file:
text_file.write(wc.to_svg())
Try an example by appending these two lines to this file, and the result is gorgeous.
(Other answers have addressed the border problem, and also the example doe not have a border.)

In case you run into the issue of slower application while improving the resolution ie. in a web application, the WordCloud documentation advises that you utilize the scale parameter along with the canvas' width & height params to get a resolution & response time that works for your use case.

Blurry wordclouds - I've been wrestling with this. For my use, I found that too large a differential in the between the most frequent word occurrences and those with few occurrences left the lower-count words unreadable. When I scaled the more frequent counts to reduce the differential, all the lower-frequency words were much more readable.

Related

Why are there Horizontal Stripes on my Palettized Image?

I am trying to make a palettized version of my height image data (using Python/Matplotlib) and for some reason...it is giving me quite weird horizontal lines which I know are not actually present in the dataset.
Both images (mine and the "better" one).
Is this something weird with how Matplotlib normalizes the data? I just don't quite understand how this could happen, so I am at a loss for where to start. I have provided my code below (sorry if there is a typo, I slightly changed it to make sense outside of the code).
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# file location of the raw data
fileloc = r'C:\Users\...\raw_height_profile.csv'
# generate height profile map
palettized_image = getheightprofile(fileloc)
def getheightprofile(fileloc, color_palette='jet'):
# read data from file
data = pd.read_csv(fileloc, skiprows=0)
# generate colormap (I'm using the jet colormap rn)
colormap = plt.get_cmap(color_palette)
# normalize the height data to the range [0, 1]
norm = (data - np.min(data)) / (np.max(data) - np.min(data))
# convert the height data to RGB values using the palette
palettized_data = (colormap(norm)*255).astype(np.uint8)
# save the file as a png (to check quality)
saveloc = r'C:\Users\...\palletized_height_profile.png'
plt.imsave(saveloc, palettized_data)
# return the nice numbers for later analysis
return palettized_data
But instead of returning the nice image that I think I should get, it returns a super weird image with lines across it. note: I know these images aren't quite the same palettization, but I think you can understand the issue.
Does anyone understand how, why, etc.? I have also attached a link to the dataset, because maybe that is helpful...but I am quite sure there is nothing wrong with the data.

Savefig extremely long (or wide) image in matplotlib

I have numpy array of shape (4000000, 200, 3), where first dimension relates to image height, second - width.
I m confused how to save this image as png (or any other format) with high resolution, because when I set dpi = 5000 then I get mermory error
Here is my code
fig, ax = plt.subplots()
im = ax.imshow(final_image_train)
ax.axis('off')
plt.savefig('final.png', dpi = 5000, bbox_inches = 'tight')
Any suggestions are appreciated.
Are you using the default figsize? This parameter gives a determined amount of space to the elements inside the figure, including ticklabels.
Then, if you know which pixel size is needed, for example (1200, 600), you need to choose the combination of figure size and dpi. An example relation would be:
figsize=(12,6) , dpi=100
figsize=( 8,4) , dpi=150
figsize=( 6,3) , dpi=200
There is more about it on other stack overflow posts like this one. Your dpi seems to be extremely high, maybe you need to calculate the dpi and figsize better...
Now, this answer part is just a recommendation. Is the matplotlib and .png mandatory? If not, have a look at the plotly library, which lets you create interactive plots, which are really good if you need to explore a lot of data (.html format). You have the offline version of the library, if you are interested. Also, here you have subplots examples.

Image not updating in python plot during animation

The Problem:
I'm trying to simulate a live video by cycling through a series of still images I have saved in a directory, but when I add the animation and update functions my plot is displayed empty.
Background on why I'm doing this:
I believe its important for me to do it this way rather than a complete change of approach, say turning the images into a video first then displaying that, because what I really want to test is the image analysis I will be adding and then overlaying on each frame. The final application will be receiving frames one by one from a camera and will need to do some processing, display the image + annotations + output the data as .csv etc... I'm simulating this for now because I do not have any of the hardware to generate the images and will not have it for several months during which time I need to get the image processing set up, but I do have access to some sets of stills that are approximately what will be produced. In case its relevant my simulation images are 1680x1220 and are 1.88 Mb TIFFs, though I could covert and compress them if needed, and in the final form the resolution will be a bit higher and probably the image format could be adjusted if needed.
What I have tried:
I followed an example to list all files in a folder, and an example
to update a plot. However, the plot displays blank when I run the
code.
I added a line to print the current file name, and I can see this
cycling as expected.
I also made sure the images will display in the plot if I just create
a plot and add one image, and they do. But, when combined with the
animation function the plot is blank and I'm not sure what I've done
wrong/failed to include.
I also tried adding a plt.pause() in the update, but again this
didn't work.
I increased the interval up to 2000 to give it more time, but that didn't work. I believe 2000 is extreme, I'm expecting it should work with more like 20-30fps. Going to 0.5fps tells me the code is wrong or incomplete, rather than it just being a question of needing time to read the image file.
I appreciate no one else has my images, but they are nothing special. I'm using 60 images but I guess it could be tested with any 2 random images and setting range(60) to range(2) instead?
The example I copied originally demonstrated the animation function by making a random array, and if I do that it will show a plot that updates with random squares as expected.
Replacing:
A = np.random.randn(10,10)
im.set_array(A)
...with my image instead...
im = cv2.imread(files[i],0)
...and the plot remains empty/blank. I get a window shown called "Figure1" (like when using the random array), but unlike with the array there is nothing in this window.
Full code:
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import os
import cv2
def update(i):
im = cv2.imread(files[i],0)
print(files[i])
#plt.pause(0.1)
return im
path = 'C:\\Test Images\\'
files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.TIFF' in file:
files.append(os.path.join(r, file))
ani = FuncAnimation(plt.gcf(), update, frames=range(60), interval=50, blit=False)
plt.show()
I'm a python and a programming novice so have relied on adjusting examples others have given online but I have only a simplistic understanding of how they are working and end up with a lot of trial and error on the syntax. I just can't figure out anything to make this one work though.
Cheers for any help!
The main reason nothing is showing up is because you never add the images to the plot. I've provided some code below to do what you want, be sure to look up anything you are curious about or don't understand!
import glob
import os
from matplotlib import animation
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
IMG_DIRPATH = 'C:\\Test Images\\' # the folder with your images (be careful about
# putting spaces in directory names!)
IMG_EXT = '.TIFF' # the file extension of your images
# Create a figure, and set to the desired size.
fig = plt.figure(figsize=[5, 5])
# Create axes for the current figure so that images can be sized appropriately.
# Passing in [0, 0, 1, 1] makes the axes fill the whole figure.
# frame_on=False means we won't have a bounding box, and setting xticks=[] and
# yticks=[] means that we won't have pesky tick marks along our image.
ax_props = {'frame_on': False, 'xticks': [], 'yticks': []}
ax = plt.axes([0, 0, 1, 1], **ax_props)
# Get all image filenames.
img_filepaths = glob.glob(os.path.join(IMG_DIRPATH, '*' + IMG_EXT))
def update_image(img_filepath):
# Remove all existing images on the axes, and restore our settings.
ax.clear()
ax.update(ax_props)
# Read the current image.
img = mpimg.imread(img_filepath)
# Add the current image to the plot axes.
ax.imshow(img)
anim = animation.FuncAnimation(fig, update_image, frames=img_filepaths, interval=250)
plt.show()

plt.savefig output image quality

I am trying to save a plot into a file using plt.savefig, however I am dissatisfied with the output picture quality. Changing dpi option doesn't help.
plt.savefig('filename.png', dpi=1200, format='png', bbox_inches='tight')
I tried saving to 'svg' and 'eps' - makes no difference. I wonder if the problem is with something else, like version of some library or OS or something alike. It also looks like the problem is not with resolution but the way lines and symbols are drawn - too bold.
plt.show() shows significantly better picture, and I can save it to png with satisfying quality - and surprisingly file size is about 8 times smaller (because of compressing, I suppose, which is fine.)
Part of the picture saved using savefig()
The same part of the picture saved from plot.show()
Figsize option did the trick for me.
The idea is that default parameters for saving to file and for displaying the chart are different for different devices. That's why representation was different in my case.
It's possible to adjust settings manually (as Piotrek suggests), but for me it was enough just to increase figure size - this setting is shared and allows python to auto-adjust visualization.
More details are on the page Piotrek mentioned, answered by doug and Karmel.
I have several subplots, so i used it like that:
fig, ax = plt.subplots(nrows=4, ncols=1, figsize=(20, 10))
For one plot case command is like that:
plt.figure(figsize=(20,10))
P.S. figsize parameters are in inches, not pixels.
Have a look here: Styles and Futurile
In short, you can experiment with the following options to edit the line, ticks etc.
plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.serif'] = 'Ubuntu'
plt.rcParams['font.monospace'] = 'Ubuntu Mono'
plt.rcParams['font.size'] = 10
plt.rcParams['axes.labelsize'] = 10
plt.rcParams['axes.labelweight'] = 'bold'
plt.rcParams['axes.titlesize'] = 10
plt.rcParams['xtick.labelsize'] = 8
plt.rcParams['ytick.labelsize'] = 8
plt.rcParams['legend.fontsize'] = 10
plt.rcParams['figure.titlesize'] = 12
Also have a look at this topic:
matplotlib savefig() plots different from show()

Concatenated images are badly degraded

I am trying to display several pictures on my Jupyter notebook. However, the pixel is really rough like below.
The pixel of original picture is clear. How should I improve this issue ?
This is a certain point of process to have a classification whether the picture is dog or cat. I have a many pictures of dogs and cat in the folder located on same directory and just took them from there. The picture is I just tried to show on the Jupyter notebook with using matplotlib.
Thank you in advance.
To force the resolution of the matplotlib inline images:
import matplotlib as plt
dpi = 300 # Recommended to set between 150-300 for quality image preview
plt.rcParams['figure.dpi'] = dpi
I think it uses a very low setting around 80 dpi by default.
The image quality seems to be degraded in the example picture simply because you are trying to show a 64 pixel large image on 400 pixels or so on screen. Each original pixel thus comprises several pixels on screen.
It seems you do not necessarily want to use matplotlib at all if the aim is to simply show the image in its original size on screen.
%matplotlib inline
import numpy as np
from IPython import display
from PIL import Image
a = np.random.rand(64,64,3)
b = np.random.rand(64,64,3)
c = (np.concatenate((a,b), axis=1)*255).astype(np.uint8)
display.display(Image.fromarray(c))
To achieve a similar result with matplotlib, you need to crop the margin around the axes and make sure the figure size is exactly the size of the array to show.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
a = np.random.rand(64,64,3)
b = np.random.rand(64,64,3)
c = np.concatenate((a,b), axis=1)
fig, ax = plt.subplots(figsize=(c.shape[1]/100.,c.shape[0]/100.), dpi=100)
fig.subplots_adjust(0,0,1,1)
ax.axis("off")
_ = ax.imshow(c)

Categories

Resources