How to add a condition in Numpy 2D random list - python

i need to create a set of 100 random 2D points with two requirements.
A: the points must be inside a rectangle with specific dimensions.
B: the points must satisfy a condition; for example, given the coordinates x and y of a certain generated point, x+y<2.
I'm able to generate a set of points inside a rectangle:
xyMin = [xMin, yMin]
xyMax = [xMax, yMax]
data = np.random.uniform(low=xyMin, high=xyMax, size=(100,2))
How can i add the second condition? I could use a while loop, generating one point per loop and checking the condition. If the condition is satisfied, increase the counter and go to the next point until the index is equal to 100. If not, try again in the next loop without increase the index.
Is it possible to achieve the same result using list comprehension?

Here's a faster way than generating pairs one at a time. It just re-generates all pairs which fail the second condition until there are no failures left.
xyMin = 1.1
xyMax = 0.9
data = np.random.uniform(low=xyMin, high=xyMax, size=(100,2))
while True:
failures = data.sum(axis=1)>=2
n = failures.sum()
if n>0:
data[failures] = np.random.uniform(low=xyMin, high=xyMax, size=(n,2))
else:
break
That said, study this question from stackexchange mathematics. There's going to be a much better way. You can generate points in the triangle x+y<2 like this:
A = np.array([0,0])
B = np.array([2,0])
C = np.array([0,2])
r1,r2 = np.random.random(size=(2,100,1))
points = (1-np.sqrt(r1))*A + (np.sqrt(r1)*(1-r2))*B + r2*np.sqrt(r1)*C

Here is a sample answer using some guess values
import numpy as np
import matplotlib.pyplot as plt
xyMin = [0, 0]
xyMax = [3, 3]
data = np.random.uniform(low=xyMin, high=xyMax, size=(10000,2))
mask = (np.sum(data, axis=1)<2)
data = data[mask]
plt.scatter(data[:, 0], data[:,1])
plt.show()

Related

Build a coupled map lattice using 2D array

So I'm trying to build a coupled map lattice on my computer.
A coupled map lattice (CML) is given by this eq'n:
where, the function f(Xn) is a logistic map :
with x value from 0-1, and r=4 for this CML.
Note: 'n' can be thought of as time, and 'i' as space
I have spent a lot of time understanding the iterations and i came up with a code as below, however i'm not sure if this is the correct code to iterate this equation.
Note: I have used 2d numpy arrays, where rows are 'n' and columns are 'i' as obvious from the code.
So basically, I want to develop a code to simulate this equation, and here is my take on that
Don't jump to the code directly, you won't understand what's happening without bothering to look at the equations first.
import numpy as np
import matplotlib.pyplot as plt
'''The 4 definitions created below are actually similar and only vary in their indexings. These 4
have been created only because of the if conditions I have put in the for loop '''
def logInit(r,x):
y[n,0]=r*x[n,0]*(1-x[n,0])
return y[n,0]
def logPresent(r,x):
y[n,i]=r*x[n,i]*(1-x[n,i])
return y[n,i]
def logLast(r,x):
y[n,L-1]=r*x[n,L-1]*(1-x[n,L-1])
return y[n,L-1]
def logNext(r,x):
y[n,i+1]=r*x[n,i+1]*(1-x[n,i+1])
return y[n,i+1]
def logPrev(r,x):
y[n,i-1]=r*x[n,i-1]*(1-x[n,i-1])
return y[n,i-1]
# 2d array with 4 row, 3 col. I created this because I want to store the evaluated values of log
function into this y[n,i] array
y=np.ones(12).reshape(4,3)
# creating an array of random numbers between 0-1 with 4 rows 3 columns
np.random.seed(0)
x=np.random.random((4,3))
L=3
r=4
eps=0.5
for n in range(3):
for i in range(L):
if i==0:
x[n+1,i]=(1-eps)*logPresent(r,x) + 0.5*eps*(logLast(r,x)+logNext(r,x))
elif i==L-1:
x[n+1,i]=(1-eps)*logPresent(r,x) + 0.5*eps*(logPrev(r,x) + logInit(r,x))
elif i > 0 and i < L - 1:
x[n+1,i]=(1-eps)*logPresent(r,x) + 0.5*eps*(logPrev(r,x) +logNext(r,x))
print(x)
This does give an output. Here it is:
[[0.5488135 0.71518937 0.60276338]
[0.94538775 0.82547604 0.64589411]
[0.43758721 0.891773 0.96366276]
[0.38344152 0.79172504 0.52889492]]
[[0.5488135 0.71518937 0.60276338]
[0.94538775 0.82547604 0.92306303]
[0.2449672 0.49731638 0.96366276]
[0.38344152 0.79172504 0.52889492]]
[[0.5488135 0.71518937 0.60276338]
[0.94538775 0.82547604 0.92306303]
[0.2449672 0.49731638 0.29789622]
[0.75613708 0.93368134 0.52889492]]
But I'm very sure this is not what I'm looking for.
If you can please figure out a correct way to iterate and loop the CML equation with code ? Suggest me the changes I have to make. Thank you very much!!
You'll have to think about the iterations and looping to be made to simulate this equation. It might be tedious, but that's the only way you can suggest me some changes in my code.
Your calculations seem fine to me. You could improve the speed by using vectorization along the space dimension and by reusing your intermediate results y. I restructured your program a little, but in essence it does the same thing as before. For me the results look plausible. The image shows the random initial vector in the first row and as the time goes on (top to bottom) the coupling comes in to play and little islands and patterns form.
import numpy as np
import matplotlib.pyplot as plt
L = 128 # grid size
N = 128 # time steps
r = 4
eps = 0.5
# Create random values for the initial time step
np.random.seed(0)
x = np.zeros((N+1, L))
x[0, :] = np.random.random(L)
# Create a helper matrix to save and reuse part of the calculations
y = np.zeros((N, L))
# Indices for previous, present, next position for every point on the grid
idx_present = np.arange(L) # 0, 1, ..., L-2, L-1
idx_next = (idx_present + 1) % L # 1, 2, ..., L-1, 0
idx_prev = (idx_present - 1) % L # L-1, 0, ..., L-3, L-2
def log_vector(rr, xx):
return rr * xx * (1 - xx)
# Loop over the time steps
for n in range(N):
# Compute y once for the whole time step and reuse it
# to build the next time step with coupling the neighbours
y[n, :] = log_vector(rr=r, xx=x[n, :])
x[n+1, :] = (1-eps)*y[n,idx_present] + 0.5*eps*(y[n,idx_prev]+y[n,idx_next])
# Plot the results
plt.imshow(x)

Fastest way to convert a set of 3D points into image of heights in python

I am trying to convert a set of 3D points into a heightmap (a 2d image that shows the largest displacements of the points from the floor)
The only way I can come up with is writing a for look that iterates through all points and update the heightmap, this method, is quite slow.
import numpy as np
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
heightmap = np.zeros((int(np.max(points[:,1])/heightmap_resolution) + 1,
int(np.max(points[:,0])/heightmap_resolution) + 1))
for point in points:
y = int(point[1]/heightmap_resolution)
x = int(point[0]/heightmap_resolution)
if point[2] > heightmap[y][x]:
heightmap[y][x] = point[2]
I wonder if there is a better way of doing this. Any improvement is greatly appreciated!
The intuition:
If you find yourself using a for loop with numpy, you probably need to check again if numpy has an operation for it. I saw you wanted to compare items to get max and I wasn't sure if the structure was imporant so I changed it.
2nd point is heightmap is pre-allocating a lot of memory you aren't going to use. Try using a dictionary with a tuple (x,y) as the key or this (a dataframe)
import numpy as np
import pandas as pd
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
points_df = pd.DataFrame(points, columns = ['x','y','z'])
#didn't know if you wanted to keep the x and y columns so I made new ones.
points_df['x_normalized'] = (points_df['x']/heightmap_resolution).astype(int)
points_df['y_normalized'] = (points_df['y']/heightmap_resolution).astype(int)
points_df.groupby(['x_normalized','y_normalized'])['z'].max()

Code of plotting a function in an interval (graph result)

I need your help with coding a graph result - plotting a function in an interval.
The question which I got is:
"Plot the following composite function. You probably want to use 'if' statements and a loop to 'build' it. Plot the function in the interval from [-3, 5].
enter code here
f(x) = {|x| x<0}
{-1 0 <= x < 1}
{+1 1 <= x < 2}
{ln(x) 2 <= x}
Can anyone write for me please, a code in which the result shows me a GRAPH, in which the above function is shown, without consistancy in the graph's line.
Thank you very much in advance!
Using if statement would be a more involved way. You can directly make use of NumPy indexing and masking to get the task done. Below is how I would do it.
Explanation: First you create a mesh of x-data points in the interval (3, 5). Then you initialize an empty y-array of same length. Next, you use the conditions on x to get the indices of x-array. This is done by using mask. mask1 = ((x>=0) & (x<1)) defines a condition and then you use y[mask1] = -1 which means, [mask1] would return the array indices where the condition holds True and then you use those indices to assign the y-value. You do this for all 4 conditions. I just used two masks for the middle two conditions. You can also use 4 variables (masks) to do the same thing. It's a matter of personal taste.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-3, 5, 100)
y = np.zeros(len(x))
mask1 = ((x>=0) & (x<1))
mask2 = ((x>=1) & (x<2))
y[x<0] = np.abs(x[x<0])
y[mask1] = -1
y[mask2] = 1
y[x>=2] = np.log(x[x>=2])
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel(r'$f(x)$')
plt.show()
Usually, simple composite functions can easily be written like any other function by multiplying by the respective condition(s). The only place one needs to be careful is with the logarithm, which is not defined over the complete inverval. This problem is circumvented by taking the absolute value here, because it's anyways only relevant in the range > 2.
import numpy as np
import matplotlib.pyplot as plt
f = lambda x: np.abs(x)*(x<0) - ((0<=x) & (x < 1)) + ((1<=x) & (x < 2)) + np.log(np.abs(x))*(2<=x)
x = np.linspace(-3,5,200)
plt.plot(x,f(x))
plt.show()
According to a comment below the answer, one can also evaluate the function in each of the intervals separately,
intervals = [(-3, -1e-6), (0,1-1e-6), (1, 2-1e-6), (2,5)]
for (s,e) in intervals:
x = np.linspace(s,e,100)
plt.plot(x,f(x), color="C0")
Thank you very much for your help, It is really useful :)
In addition, I would like to know how can I eliminate the lines that connecting each step of the interval to the next one?
I need to show only 4 seperate graphic results on the graph, in each step, without the "continuity" of the lines that connect between them.

plot a huge amount of data points

I have encountered a strange problem: when I store a huge amount of data points from a nonlinear equation to 3 arrays (x, y ,and z) and then tried to plot them in a 2D graph (theta-phi plot, hence its 2D).
I tried to eliminate points needed to be plotted by sampling points from every 20 data points, since the z-data is approximately periodic. I picked those points with z value just above zero to make sure I picked one point for every period.
The problem arises when I tried to do the above. I got only a very limited number of points on the graph, approximately 152 points, regardless of how I changed my initial number of data points (as long as it surpassed a certain number of course).
I suspect that it might be some command I use wrongly or the capacity of array is smaller then I expected (seems unlikely), could anyone help me find out where is the problem?
def drawstaticplot(m,n, d_n, n_o):
counter=0
for i in range(0,m):
n=vector.rungekutta1(n, d_n)
d_n=vector.rungekutta2(n, d_n, i)
x1 = n[0]
y1 = n[1]
z1 = n[2]
if i%20==0:
xarray.append(x1)
yarray.append(y1)
zarray.append(z1)
for j in range(0,(m/20)-20):
if (((zarray[j]-n_o)>0) and ((zarray[j+1]-n_o)<0)):
counter= counter +1
print zarray[j]-n_o,counter
plotthetaphi(xarray[j],yarray[j],zarray[j])
def plotthetaphi(x,y,z):
phi= math.acos(z/math.sqrt(x**2+y**2+z**2))
theta = math.acos(x/math.sqrt(x**2 + y**2))
plot(theta, phi,'.',color='red')
Besides, I tried to apply the code in the following SO question to my code, I want a very similar result except that my data points are not randomly generated.
Shiuan,
I am still investigating your problem, how ever a few notes:
Instead of looping and appending to an array you could do:
select every nth element:
# inside IPython console:
[2]: a=np.arange(0,10)
In [3]: a[::2] # here we select every 2nd element.
Out[3]: array([0, 2, 4, 6, 8])
so instead of calcultating runga-kutta on all elements of m:
new_m = m[::20] # select every element of m.
now call your function like this:
def drawstaticplot(new_m,n, d_n, n_o):
n=vector.rungekutta1(n, d_n)
d_n=vector.rungekutta2(n, d_n, i)
x1 = n[0]
y1 = n[1]
z1 = n[2]
xarray.append(x1)
yarray.append(y1)
zarray.append(z1)
...
about appending, and iterating over large data sets:
append in general is slow, because it copies the whole array and then
stacks the new element. Instead, you already know the size of n, so you could do:
def drawstaticplot(new_m,n, d_n, n_o):
# create the storage based on n,
# notice i assumed that rungekutta, returns n the size of new_m,
# but you can change it.
x,y,z = np.zeros(n.shape[0]),np.zeros(n.shape[0]), np.zeros(n.shape[0])
for idx, itme in enumerate(new_m): # notice the function enumerate, make it your friend!
n=vector.rungekutta1(n, d_n)
d_n=vector.rungekutta2(n, d_n, ite,)
x1 = n[0]
y1 = n[1]
z1 = n[2]
#if i%20==0: # we don't need to check for the 20th element, m is already filtered...
xarray[idx] = n[0]
yarray[idx] = n[1]
zarray[idx] = n[2]
# is the second loop necessary?
if (((zarray[idx]-n_o)>0) and ((zarray[j+1]-n_o)<0)):
print zarray[idx]-n_o,counter
plotthetaphi(xarray[idx],yarray[idx],zarray[idx])
You can use the approach suggested here:
Efficiently create a density plot for high-density regions, points for sparse regions
e.g. histogram where you have too many points and points where the density is low.
Or also you can use rasterized flag for matplotlib, which speeds up matplotlib.

Remove data points below a curve with python

I need to compare some theoretical data with real data in python.
The theoretical data comes from resolving an equation.
To improve the comparative I would like to remove data points that fall far from the theoretical curve. I mean, I want to remove the points below and above red dashed lines in the figure (made with matplotlib).
Both the theoretical curves and the data points are arrays of different length.
I can try to remove the points in a roughly-eye way, for example: the first upper point can be detected using:
data2[(data2.redshift<0.4)&data2.dmodulus>1]
rec.array([('1997o', 0.374, 1.0203223485103787, 0.44354759972859786)], dtype=[('SN_name', '|S10'), ('redshift', '<f8'), ('dmodulus', '<f8'), ('dmodulus_error', '<f8')])
But I would like to use a less roughly-eye way.
So, can anyone help me finding an easy way of removing the problematic points?
Thank you!
This might be overkill and is based on your comment
Both the theoretical curves and the data points are arrays of
different length.
I would do the following:
Truncate the data set so that its x values lie within the max and min values of the theoretical set.
Interpolate the theoretical curve using scipy.interpolate.interp1d and the above truncated data x values. The reason for step (1) is to satisfy the constraints of interp1d.
Use numpy.where to find data y values that are out side the range of acceptable theory values.
DONT discard these values, as was suggested in comments and other answers. If you want for clarity, point them out by plotting the 'inliners' one color and the 'outliers' an other color.
Here's a script that is close to what you are looking for, I think. It hopefully will help you accomplish what you want:
import numpy as np
import scipy.interpolate as interpolate
import matplotlib.pyplot as plt
# make up data
def makeUpData():
'''Make many more data points (x,y,yerr) than theory (x,y),
with theory yerr corresponding to a constant "sigma" in y,
about x,y value'''
NX= 150
dataX = (np.random.rand(NX)*1.1)**2
dataY = (1.5*dataX+np.random.rand(NX)**2)*dataX
dataErr = np.random.rand(NX)*dataX*1.3
theoryX = np.arange(0,1,0.1)
theoryY = theoryX*theoryX*1.5
theoryErr = 0.5
return dataX,dataY,dataErr,theoryX,theoryY,theoryErr
def makeSameXrange(theoryX,dataX,dataY):
'''
Truncate the dataX and dataY ranges so that dataX min and max are with in
the max and min of theoryX.
'''
minT,maxT = theoryX.min(),theoryX.max()
goodIdxMax = np.where(dataX<maxT)
goodIdxMin = np.where(dataX[goodIdxMax]>minT)
return (dataX[goodIdxMax])[goodIdxMin],(dataY[goodIdxMax])[goodIdxMin]
# take 'theory' and get values at every 'data' x point
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated thoeryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
# collect valid points
def findInlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY<(interpTheoryY+theoryErr))
withinLower = np.where(dataY[withinUpper]
>(interpTheoryY[withinUpper]-theoryErr))
return (dataX[withinUpper])[withinLower],(dataY[withinUpper])[withinLower]
def findOutlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY>(interpTheoryY+theoryErr))
withinLower = np.where(dataY<(interpTheoryY-theoryErr))
return (dataX[withinUpper],dataY[withinUpper],
dataX[withinLower],dataY[withinLower])
if __name__ == "__main__":
dataX,dataY,dataErr,theoryX,theoryY,theoryErr = makeUpData()
TruncDataX,TruncDataY = makeSameXrange(theoryX,dataX,dataY)
interpTheoryY = theoryYatDataX(theoryX,theoryY,TruncDataX)
inDataX,inDataY = findInlierSet(TruncDataX,TruncDataY,interpTheoryY,
theoryErr)
outUpX,outUpY,outDownX,outDownY = findOutlierSet(TruncDataX,
TruncDataY,
interpTheoryY,
theoryErr)
#print inlierIndex
fig = plt.figure()
ax = fig.add_subplot(211)
ax.errorbar(dataX,dataY,dataErr,fmt='.',color='k')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
ax = fig.add_subplot(212)
ax.plot(inDataX,inDataY,'ko')
ax.plot(outUpX,outUpY,'bo')
ax.plot(outDownX,outDownY,'ro')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
fig.savefig('findInliers.png')
This figure is the result:
At the end I use some of the Yann code:
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated theoryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
def findOutlierSet(data,interpTheoryY,theoryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
up = np.where(data.dmodulus > (interpTheoryY+theoryErr))
low = np.where(data.dmodulus < (interpTheoryY-theoryErr))
# join all the index together in a flat array
out = np.hstack([up,low]).ravel()
index = np.array(np.ones(len(data),dtype=bool))
index[out]=False
datain = data[index]
dataout = data[out]
return datain, dataout
def selectdata(data,theoryX,theoryY):
"""
Data selection: z<1 and +-0.5 LFLRW separation
"""
# Select data with redshift z<1
data1 = data[data.redshift < 1]
# From modulus to light distance:
data1.dmodulus, data1.dmodulus_error = modulus2distance(data1.dmodulus,data1.dmodulus_error)
# redshift data order
data1.sort(order='redshift')
# Outliers: distance to LFLRW curve bigger than +-0.5
theoryErr = 0.5
# Theory curve Interpolation to get the same points as data
interpy = theoryYatDataX(theoryX,theoryY,data1.redshift)
datain, dataout = findOutlierSet(data1,interpy,theoryErr)
return datain, dataout
Using those functions I can finally obtain:
Thank you all for your help.
Just look at the difference between the red curve and the points, if it is bigger than the difference between the red curve and the dashed red curve remove it.
diff=np.abs(points-red_curve)
index= (diff>(dashed_curve-redcurve))
filtered=points[index]
But please take the comment from NickLH serious. Your Data looks pretty good without any filtering, your "outlieres" all have a very big error and won't affect the fit much.
Either you could use the numpy.where() to identify which xy pairs meet your plotting criteria, or perhaps enumerate to do pretty much the same thing. Example:
x_list = [ 1, 2, 3, 4, 5, 6 ]
y_list = ['f','o','o','b','a','r']
result = [y_list[i] for i, x in enumerate(x_list) if 2 <= x < 5]
print result
I'm sure you could change the conditions so that '2' and '5' in the above example are the functions of your curves

Categories

Resources