I'm using python with matplotlib to create plots out of data, an I'd like to save this plots on a pdf file (but I could use also a more specific format).
I'm using basically this instructions:
plt.plot(data)
figname = ''.join([filename, '_', label, '.pdf'])
plt.savefig(figname)
But what this does is create an image of the plot with the zoom in which it's displayed; I would like to create a copy that shows all points (>10000) that I'm plotting so I would be able to zoom to any level.
Which is the correct way to do that?
EDIT: is there a format (such as '.fig' for Matlab) that calls directly the viewer of Matplotlib with the data i saved?
Maybe it's possible to create a .py script that saves the points and that i can call to quickly re-display them? I think that this is what is done by the .fig Matlab file.
I don't know of any native Matplotlib file format which includes your data; in fact, I'm not sure the Matploblib objects even have a write function defined.
What I do instead to simulate the Matlab .fig concept is to save the processed data (as a numpy array, or pickled) and run a separate .py script to recreate the Matplotlib plots.
So in steps:
Process your data and make some pretty plots until you are fully content
Save/pickle your processed data as close to the plot commands as possible (you might even want to store the data going into a histogram if making the histogram takes a long time)
Write a new script in which you import the data and copy/paste the plotting commands from the original script
It is a bit clumsy, but it works. If you really want, you could embed the pickled data as a string in your plotting script (Embed pickle (or arbitrary) data in python script). This gives you the benefit of working with a single python script containing both the data as well as the plotting code.
Edit
You can check for the existence of your stored processed data file and skip the processing steps if this file exists. So:
if not processed_data.file exists:
my_data = process_raw_data()
else:
my_data = read_data_from_file(processed_data.file)
plot(my_data)
In this way, you can have one script for both creating the graph in the first place, and re-plotting the graph using pre-processed data.
You might want to add a runtime argument for forcing a re-processing of the data in case you change something to the processing script and don't want to manually remove your processed data file.
Use plt.xlim and plt.ylim to set the domain and range.
Set figsize to indirectly control the pixel resolution of the final image. (figsize sets the size of the figure in inches; the default dpi is 100.)
You can also control the dpi in the call to plt.savefig.
With figsize = (10, 10) and dpi = 100, the image will have resolution 1000x1000.
For example,
import matplotlib.pyplot as plt
import numpy as np
x, y = np.random.random((2,10000))
plt.plot(x, y, ',')
figname = '/tmp/test.pdf'
xmin, xmax = 0, 1
ymin, ymax = 0, 1
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
plt.savefig(figname)
Your pdf viewer should be able to zoom in any region so individual points can be distinguished.
Related
Here is a figured produced by Python Pandas + Matplotlib
The problem is obvious: The x-axis labels are too large relative to the overall figure size.
There are two ways to solve this:
Increase the overall figure size, keeping the label font size the same
Reduce the label font size while keeping the figure size the same
I am saving the output as pdf. I would ideally like to use the first option, as when I open this file on my computer, the actual screen rendered size is about 400 pixels wide, which isn't very large. But this may not be possile when saving as pdf?
The code relevant is just two lines. data_age created from a Pandas dataframe.
# data is a Panadas dataframe, one of the columns is `'age'`.
data_age = data['age'].value_counts().sort_index()
plot = data_age.plot.bar()
pplt.savefig('age.pdf')
I searched around to find a solution to what I would have assumed would be a commonly encountered problem. I then went and read the documentation for matplotlib. There was an option dpi but this doesn't seem to have any effect when writing to a pdf file - which isn't surprising since pdf isn't a rasterized format.
pplt is obtained from import matplotlib.pyplot as pplt.
You should use the ax keyword argument paired with a matplotlib figure. Usage is as follows:
# create a matplotlib figure and set figsize
my_wider_figure, my_ax = plt.subplots(figsize=(15,10))
data_age.plot.bar(ax=my_ax)
In matplotlib, I am using LineCollection to draw and color the countries, where the boundaries of the counties are given. When I am saving the figure as a pdf file:
fig.savefig('filename.pdf',dpi=300)
the figure size are quite big. However, on saving them as png file:
fig.savefig('filename.png',dpi=300)
and then converting them to pdf using linux convert command the files are small. I tried reducing the dpi, however that do not change the pdf file size. Is there a way the figures can be saved directly as smaller-pdf files from matplotlib?
The PDF is larger, since it contains all the vector information. By saving a PNG, you produce a rasterized image. It seems that in your case, you can produce a smaller PDF by rasterizing the plot directly:
plt.plot(x, y, 'r-', rasterized=True)
Here, x, y are some plot coordinates. You basically have to use the additionally keyword argument raterized to achieve the effect.
I think using "rasterized = True" effectively saves the image similarly to png format. When you zoom in, you will see blurring pixels.
If you want the figures to be high quality, my suggestion is to sample from the data and make a plot. The pdf file size is roughly the amount of data points it need to remember.
When I use Matplotlib's plt.show() I get a nice Plot which can can be zoomed to very high precision(practically infinite). But when I save it as a image it loses all this information gives information depending on resolution.
Is there any way I can save the plot with the entire information? i.e Like those interactive plots which can rescaled at any time?
P.S- I know I can set dpi to get high quality images. This is not what I want. I want image similar to Plot which python shows when I run the program. What format is that? Or is it just very high resolution image?
Note- I am plotting .csv files which includes data varying from 10^(-10) to 100's. Thus when I save the plot as .png file I lose all the information/kinks of graph at verŠ½ small scales and only retain features from 1-100.
Maybe the interactive graphic library bokeh is an option for you. See here. It's API is just little different from what you know from matplotlib.
Bokeh creates plots as html files that you can view in your browser. For each graphic you can select wheel zoom to zoom interactively into your graphic. You can change interactively the range that you want to be plotted. Therefore you don't loose information in your graphic.
I have a problem with Matplotlib. I usually make big plots with many data points and then, after zooming or setting limits, I save in pdf only a specific subset of the original plot. The problem comes when I open this file: matplotlib saves all the data into the pdf making not visible the one outside of the range. This makes almost impossible to open afterwards those plots or to import them into latex.
Any idea of how I could solve this problem is really welcome.
Thanks a lot
If you don't have a requirement to use PDF figures, you can save the matplotlib figures as .png; this format just contains the data on the screen, e.g. I tried saving a large scatter plot as PDF, its size was 198M; as png it came out as 270K; plus I've never had any problems using png inside latex.
I have not tested that this will work, but it might be worth rasterizing some of the artists:
fig, ax = plt.subplots()
ax.imshow(..., rasterized=True)
fig.savefig('test.png', dpi=600)
which will rasterize the artist when saving to vector formats. If you use a high enough dpi this should give you reasonable quality.
How can I save Python plots at very high quality?
That is, when I keep zooming in on the object saved in a PDF file, why isn't there any blurring?
Also, what would be the best mode to save it in?
png, eps? Or some other? I can't do pdf, because there is a hidden number that happens that mess with Latexmk compilation.
If you are using Matplotlib and are trying to get good figures in a LaTeX document, save as an EPS. Specifically, try something like this after running the commands to plot the image:
plt.savefig('destination_path.eps', format='eps')
I have found that EPS files work best and the dpi parameter is what really makes them look good in a document.
To specify the orientation of the figure before saving, simply call the following before the plt.savefig call, but after creating the plot (assuming you have plotted using an axes with the name ax):
ax.view_init(elev=elevation_angle, azim=azimuthal_angle)
Where elevation_angle is a number (in degrees) specifying the polar angle (down from vertical z axis) and the azimuthal_angle specifies the azimuthal angle (around the z axis).
I find that it is easiest to determine these values by first plotting the image and then rotating it and watching the current values of the angles appear towards the bottom of the window just below the actual plot. Keep in mind that the x, y, z, positions appear by default, but they are replaced with the two angles when you start to click+drag+rotate the image.
Just to add my results, also using Matplotlib.
.eps made all my text bold and removed transparency. .svg gave me high-resolution pictures that actually looked like my graph.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# Do the plot code
fig.savefig('myimage.svg', format='svg', dpi=1200)
I used 1200 dpi because a lot of scientific journals require images in 1200 / 600 / 300 dpi, depending on what the image is of. Convert to desired dpi and format in GIMP or Inkscape.
Obviously the dpi doesn't matter since .svg are vector graphics and have "infinite resolution".
You can save to a figure that is 1920x1080 (or 1080p) using:
fig = plt.figure(figsize=(19.20,10.80))
You can also go much higher or lower. The above solutions work well for printing, but these days you want the created image to go into a PNG/JPG or appear in a wide screen format.
Okay, I found spencerlyon2's answer working. However, in case anybody would find himself/herself not knowing what to do with that one line, I had to do it this way:
beingsaved = plt.figure()
# Some scatter plots
plt.scatter(X_1_x, X_1_y)
plt.scatter(X_2_x, X_2_y)
beingsaved.savefig('destination_path.eps', format='eps', dpi=1000)
In case you are working with seaborn plots, instead of Matplotlib, you can save a .png image like this:
Let's suppose you have a matrix object (either Pandas or NumPy), and you want to take a heatmap:
import seaborn as sb
image = sb.heatmap(matrix) # This gets you the heatmap
image.figure.savefig("C:/Your/Path/ ... /your_image.png") # This saves it
This code is compatible with the latest version of Seaborn. Other code around Stack Overflow worked only for previous versions.
Another way I like is this. I set the size of the next image as follows:
plt.subplots(figsize=(15,15))
And then later I plot the output in the console, from which I can copy-paste it where I want. (Since Seaborn is built on top of Matplotlib, there will not be any problem.)