How to change offsets of matplotlib LineCollection after creation - python

I would like to create a stack of line plots using a LineCollection. The following code draws two identical sine curves offset from one another by (0, 0.2):
import matplotlib.pyplot as plt
import matplotlib.collections
import numpy as np
x=np.arange(1000)
y=np.sin(x/50.)
l=zip(x,y)
f=plt.figure()
a=f.add_subplot(111)
lines=matplotlib.collections.LineCollection((l,l), offsets=(0,0.2))
a.add_collection(lines)
a.autoscale_view(True, True, True)
plt.show()
So far so good. The problem is that I'd like to be able to adjust that offset after creation. Using set_offsets doesn't seem to behave as I expect it to. The following, for instance, has no effect on the graph
a.collections[0].set_offsets((0, 0.5))
BTW, the other set commands (e.g. set_color) work as I expect. How do I change the spacing between curves after they have been created?

I think you found a bug in matplotlib, but I have a couple work arounds. It looks like lines._paths gets generated in LineCollection().__init__ using the offsets you provide. lines._paths is not property updated when you call lines.set_offsets(). In your simple example, you can re-generate the paths since you still have the originals laying around.
lines.set_offsets( (0., 0.2))
lines.set_segments( (l,l) )
You can also manually apply your offsets. Remember that you're modifying the offset points. So to get an offset of 0.2, you add 0.1 to your pre-existing offset of 0.1.
lines._paths[1].vertices[:,1] += 1

Thanks #matt for your suggestion. Based on that I've hacked together the following which shifts the curves according to new offset values, but takes into account the old offset values. This means I don't have to retain the original curve data. Something similar might be done to correct the set_offsets method of LineCollection but I don't understand the details of the class well enough to risk it.
def set_offsets(newoffsets, ax=None, c_num=0):
'''
Modifies the offsets between curves of a LineCollection
'''
if ax is None:
ax=plt.gca()
lcoll=ax.collections[c_num]
oldoffsets=lcoll.get_offsets()
if len(newoffsets)==1:
newoffsets=[i*np.array(newoffsets[0]) for\
(i,j) in enumerate(lcoll.get_paths())]
if len(oldoffsets)==1:
oldoffsets=[i*oldoffsets[0] for (i,j) in enumerate(newoffsets)]
verts=[path.vertices for path in lcoll.get_paths()]
for (oset, nset, vert) in zip(oldoffsets, newoffsets, verts):
vert[:,0]+=(-oset[0]+nset[0])
vert[:,1]+=(-oset[1]+nset[1])
lcoll.set_offsets(newoffsets)
lcoll.set_paths(verts)

Related

How to normalise plotted points and get a circle?

Given 2000 random points in a unit circle (using numpy.random.normal(0,1)), I want to normalize them such that the output is a circle, how do I do that?
I was requested to show my efforts. This is part of a larger question: Write a program that samples 2000 points uniformly from the circumference of a unit circle. Plot and show it is indeed picked from the circumference. To generate a point (x,y) from the circumference, sample (x,y) from std normal distribution and normalise them.
I'm almost certain my code isn't correct, but this is where I am up to. Any advice would be helpful.
This is the new updated code, but it still doesn't seem to be working.
import numpy as np
import matplotlib.pyplot as plot
def plot():
xy = np.random.normal(0,1,(2000,2))
for i in range(2000):
s=np.linalg.norm(xy[i,])
xy[i,]=xy[i,]/s
plot.plot(xy)
plot.show()
I think the problem is in
plot.plot(xy)
even if I use
plot.plot(xy[:,0],xy[:,1])
it doesn't work.
Connected lines are not a good visualization here. You essentially connect random points on the circle. Since you do this quite often, you will get a filled circle. Try drawing points instead.
Also avoid name space mangling. You import matplotlib.pyplot as plot and also name your function plot. This will lead to name conflicts.
import numpy as np
import matplotlib.pyplot as plt
def plot():
xy = np.random.normal(0,1,(2000,2))
for i in range(2000):
s=np.linalg.norm(xy[i,])
xy[i,]=xy[i,]/s
fig, ax = plt.subplots(figsize=(5,5))
# scatter draws dots instead of lines
ax.scatter(xy[:,0], xy[:,1])
If you use dots instead, you will see that your points indeed lie on the unit circle.
Your code has many problems:
Why using np.random.normal (a gaussian distribution) when the problem text is about uniform (flat) sampling?
To pick points on a circle you need to correlate x and y; i.e. randomly sampling x and y will not give a point on the circle as x**2+y**2 must be 1 (for example for the unit circle centered in (x=0, y=0)).
A couple of ways to get the second point is to either "project" a random point from [-1...1]x[-1...1] on the unit circle or to pick instead uniformly the angle and compute a point on that angle on the circle.
First of all, if you look at the documentation for numpy.random.normal (and, by the way, you could just use numpy.random.randn), it takes an optional size parameter, which lets you create as large of an array as you'd like. You can use this to get a large number of values at once. For example: xy = numpy.random.normal(0,1,(2000,2)) will give you all the values that you need.
At that point, you need to normalize them such that xy[:,0]**2 + xy[:,1]**2 == 1. This should be relatively trivial after computing what xy[:,0]**2 + xy[:,1]**2 is. Simply using norm on each dimension separately isn't going to work.
Usual boilerplate
import numpy as np
import matplotlib.pyplot as plt
generate the random sample with two rows, so that it's more convenient to refer to x's and y's
xy = np.random.normal(0,1,(2,2000))
normalize the random sample using a library function to compute the norm, axis=0 means consider the subarrays obtained varying the first array index, the result is a (2000) shaped array that can be broadcasted to xy /= to have points with unit norm, hence lying on the unit circle
xy /= np.linalg.norm(xy, axis=0)
Eventually, the plot... here the key is the add_subplot() method, and in particular the keyword argument aspect='equal' that requires that the scale from user units to output units it's the same for both axes
plt.figure().add_subplot(111, aspect='equal').scatter(xy[0], xy[1])
pt.show()
to have

Update tripcolor graph in matplotlib animation

I have been trying to create an animation in matplotlib from the graph tripcolor. Let's say I have
field = ax.tripcolor(tri, C)
How do I change the value of C after each iteration of the animation?
Many thanks,
field is guaranteed to be an instance of the matplotlib.collections.Collection base class, which helpfully defines a set_array() method for just such occasions.
In each iteration of your animation, simply pass the new value of C to the field.set_array() method. Assuming you use the FuncAnimation class for animations, as you probably want to, this reduces to:
fig = plt.figure()
ax = plt.subplot(111)
field = ax.tripcolor(tri, C)
def update_tripcolor(frame_number):
# Do something here to update "C"!
C **= frame_number # ...just not this.
# Update the face colors of the previously plotted triangle mesh.
field.set_array(C)
# To triangular infinity and beyond! (Wherever that is. It's probably scary.)
FuncAnimation(fig, update_tripcolor, frames=10)
Updating tri, on the other hand, is considerably more difficult. While this question doesn't attempt to do so, the perspicacious reader may be curious to learn that you basically have to remove, recreate, and re-add the entire triangle mesh (i.e., field) onto this figure's axes. This is both inefficient and painful, of course. (Welcome to Matplotlib. Population: you.)
May the field.set_array() be with you.

Plotting millions of data points in Python?

I have written a complicated code. The code produces a set of numbers which I want to plot them. The problem is that I cannot put those numbers in a list since there are 2 700 000 000 of them.
So I need to plot one point then produce second point (the first point is replaced by second point so the first one is erased because I cannot store them). These numbers are generated in different sections of the code so I need to hold (MATLAB code) the figure.
For making it more conceivable to you, I write a simple code here and I want you to show me how to plot it.
import matplotlib.pyplot as plt
i=0
j=10
while i<2700000000:
plt.stem(i, j, '-')
i = i + 1
j = j + 2
plt.show()
Suppose I have billions of i and j!
Hmm I'm not sure if I understood you correctly but this:
import matplotlib.pyplot as plt
i=0
j=10
fig=plt.figure()
ax=fig.gca()
while i<10000: # Fewer points for speed.
ax.stem([i], [j]) # Need to provide iterable arguments to ax.stem
i = i + 1
j = j + 2
fig.show()
generates the following figure:
Isn't this what you're trying to achieve? After all the input numbers aren't stored anywhere, just added to the figure as soon as they are generated. You don't really need Matlab's hold equivalent, the figure won't be shown until you call fig.show() or plt.show() to show the current figure.
Or are you trying to overcome the problem that you can' hold the matplotlib.figure in your RAM? In which case my answer doesn't answer your question. Then you either have to save partial figures (only parts of the data) as pictures and combine them, as suggested in the comments, or think about an alternative way to show the data, as suggested in the other answer.

Opacity misleading when plotting two histograms at the same time with matplotlib

Let's say I have two histograms and I set the opacity using the parameter of hist: 'alpha=0.5'
I have plotted two histograms yet I get three colors! I understand this makes sense from an opacity point of view.
But! It makes is very confusing to show someone a graph of two things with three colors. Can I just somehow set the smallest bar for each bin to be in front with no opacity?
Example graph
The usual way this issue is handled is to have the plots with some small separation. This is done by default when plt.hist is given multiple sets of data:
import pylab as plt
x = 200 + 25*plt.randn(1000)
y = 150 + 25*plt.randn(1000)
n, bins, patches = plt.hist([x, y])
You instead which to stack them (this could be done above using the argument histtype='barstacked') but notice that the ordering is incorrect.
This can be fixed by individually checking each pair of points to see which is larger and then using zorder to set which one comes first. For simplicity I am using the output of the code above (e.g n is two stacked arrays of the number of points in each bin for x and y):
n_x = n[0]
n_y = n[1]
for i in range(len(n[0])):
if n_x[i] > n_y[i]:
zorder=1
else:
zorder=0
plt.bar(bins[:-1][i], n_x[i], width=10)
plt.bar(bins[:-1][i], n_y[i], width=10, color="g", zorder=zorder)
Here is the resulting image:
By changing the ordering like this the image looks very weird indeed, this is probably why it is not implemented and needs a hack to do it. I would stick with the small separation method, anyone used to these plots assumes they take the same x-value.

Manipulating the numpy.random.exponential distribution in Python

I am trying to create an array of random numbers using Numpy's random exponential distribution. I've got this working fine, however I have one extra requirement for my project and that is the ability to specify precisely how many array elements have a certain value.
Let me explain (code is below, but I'll have a go at explaining it here): I generate my random exponential distribution and plot a histogram of the data, producing a nice exponential curve. What I really want to be able to do is use a variable to specify the y-intercept of this curve (point where curve meets the y-axis). I can achieve this in a basic way by changing the number of bins in my histogram, but this only changes the plot and not the original data.
I have inserted the bones of my code here. To give some context, I am trying to create the exponential disc of a galaxy, hence the random array I want to generate is an array of radii and the variable I want to be able to specify is the number density in the centre of the galaxy:
import numpy as N
import matplotlib.pyplot as P
n = 1000
scale_radius = 2
central_surface_density = 100 #I would like this to be the controlling variable, even if it's specification had knock on effects on n.
radius_array = N.random.exponential(scale_radius,(n,1))
P.figure()
nbins = 100
number_density, radii = N.histogram(radius_array, bins=nbins,normed=False)
P.plot(radii[0:-1], number_density)
P.xlabel('$R$')
P.ylabel(r'$\Sigma$')
P.ylim(0, central_surface_density)
P.legend()
P.show()
This code creates the following histogram:
So, to summarise, I would like to be able to specify where this plot intercepts the y-axis by controlling how I've generated the data, not by changing how the histogram has been plotted.
Any help or requests for further clarification would be very much appreciated.
According to the docs for numpy.random.exponential, the input parameter beta, is 1/lambda for the definition of the exponential described in wikipedia.
What you want is this function evaluated at f(x=0)=lambda=1/beta. Therefore in a normed distribution, your y-intercept should just be the inverse of the numpy function:
import numpy as np
import pylab as plt
target = 250
beta = 1.0/target
Y = np.random.exponential(beta, 5000)
plt.hist(Y, normed=True, bins=200,lw=0,alpha=.8)
plt.plot([0,max(Y)],[target,target],'r--')
plt.ylim(0,target*1.1)
plt.show()
Yes the y-intercept of the histogram will change with different bin sizes, but this doesn't mean anything. The only thing that you can reasonably talk about here is the underlying probability distribution (hence the normed=true)

Categories

Resources