This is in astronomy, but I think my question is probably very elementary - I'm not very experienced, I apologise.
I am plotting the relationship between the colour of a star-forming galaxy (y axis) with the redshift (x axis). The plot is a line that rises up from around 0 up to maybe 9, then decays again to about -2. The peak (~9 colour) is around 4 in terms of redshift, and I want to find the peak is more exactly. The redshift is given by quite a confusing function, and I can't figure out how to differentiate it or else I would just do that.
Could I maybe differentiate the complicated redshift (z) function? If so, how?
If not, how could I estimate a peak graphically/numerically?
Sorry for the very basic question and thank you very much in advance. My code is below.
import numpy as np
import matplotlib.pyplot as plt
import IGM
import scipy.integrate as integrate
SF = np.load('StarForming.npy')
lam = SF[0]
SED = SF[1]
filters = ['f435w','f606w','f814w','f105w','f125w','f140w','f160w']
filters_wl = {'f435w':0.435,'f606w':0.606,'f814w':0.814,'f105w':1.05,'f125w':1.25,'f140w':1.40,'f160w':1.60} # filter dictionary to give wavelengths of filters in microns
fT = {} # this is a dictionary
for f in filters:
data = np.loadtxt(f+'.txt').T
fT[f]= data
fluxes = {}
for f in filters: fluxes[f] = [] # make empty list for each
redshifts = np.arange(0.0,10.0,0.1) # redshifts going from 0 to 10
for z in redshifts:
lamz = lam * (1. + z)
obsSED = SED * IGM.madau(lamz, z)
for f in filters:
newT = np.interp(lamz,fT[f][0],fT[f][1]) # for each filter, refer back
bb_flux = integrate.trapz((1./lamz)*obsSED*newT,x=lamz)/integrate.trapz((1./lamz)*newT,x=lamz)
# 1st bit integrates, 2nd bit divides by area under filter to normalise filter
# loops over all z, for all z it creates a new SED, redshift wl grid
fluxes[f].append(bb_flux)
for f in filters: fluxes[f] = np.array(fluxes[f])
colour = -2.5*np.log10(fluxes['f435w']/fluxes['f606w'])
plt.plot(redshifts,colour)
plt.xlabel('Redshift')
plt.ylabel('Colour')
plt.show
I do not have high enough reputation to comment, but this may solve your problem, so I guess its answer. Store all your y-coordinates in a list, then use the max(list) function to find the max. If you want an ordered pair, store your coordinates as (y,x) tuples and use max(list)
lst = [(3,2), (4,1), (1, 200)]
max(lst)
yields (4,1)
Related
I am trying to convert a set of 3D points into a heightmap (a 2d image that shows the largest displacements of the points from the floor)
The only way I can come up with is writing a for look that iterates through all points and update the heightmap, this method, is quite slow.
import numpy as np
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
heightmap = np.zeros((int(np.max(points[:,1])/heightmap_resolution) + 1,
int(np.max(points[:,0])/heightmap_resolution) + 1))
for point in points:
y = int(point[1]/heightmap_resolution)
x = int(point[0]/heightmap_resolution)
if point[2] > heightmap[y][x]:
heightmap[y][x] = point[2]
I wonder if there is a better way of doing this. Any improvement is greatly appreciated!
The intuition:
If you find yourself using a for loop with numpy, you probably need to check again if numpy has an operation for it. I saw you wanted to compare items to get max and I wasn't sure if the structure was imporant so I changed it.
2nd point is heightmap is pre-allocating a lot of memory you aren't going to use. Try using a dictionary with a tuple (x,y) as the key or this (a dataframe)
import numpy as np
import pandas as pd
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
points_df = pd.DataFrame(points, columns = ['x','y','z'])
#didn't know if you wanted to keep the x and y columns so I made new ones.
points_df['x_normalized'] = (points_df['x']/heightmap_resolution).astype(int)
points_df['y_normalized'] = (points_df['y']/heightmap_resolution).astype(int)
points_df.groupby(['x_normalized','y_normalized'])['z'].max()
I need your help with coding a graph result - plotting a function in an interval.
The question which I got is:
"Plot the following composite function. You probably want to use 'if' statements and a loop to 'build' it. Plot the function in the interval from [-3, 5].
enter code here
f(x) = {|x| x<0}
{-1 0 <= x < 1}
{+1 1 <= x < 2}
{ln(x) 2 <= x}
Can anyone write for me please, a code in which the result shows me a GRAPH, in which the above function is shown, without consistancy in the graph's line.
Thank you very much in advance!
Using if statement would be a more involved way. You can directly make use of NumPy indexing and masking to get the task done. Below is how I would do it.
Explanation: First you create a mesh of x-data points in the interval (3, 5). Then you initialize an empty y-array of same length. Next, you use the conditions on x to get the indices of x-array. This is done by using mask. mask1 = ((x>=0) & (x<1)) defines a condition and then you use y[mask1] = -1 which means, [mask1] would return the array indices where the condition holds True and then you use those indices to assign the y-value. You do this for all 4 conditions. I just used two masks for the middle two conditions. You can also use 4 variables (masks) to do the same thing. It's a matter of personal taste.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-3, 5, 100)
y = np.zeros(len(x))
mask1 = ((x>=0) & (x<1))
mask2 = ((x>=1) & (x<2))
y[x<0] = np.abs(x[x<0])
y[mask1] = -1
y[mask2] = 1
y[x>=2] = np.log(x[x>=2])
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel(r'$f(x)$')
plt.show()
Usually, simple composite functions can easily be written like any other function by multiplying by the respective condition(s). The only place one needs to be careful is with the logarithm, which is not defined over the complete inverval. This problem is circumvented by taking the absolute value here, because it's anyways only relevant in the range > 2.
import numpy as np
import matplotlib.pyplot as plt
f = lambda x: np.abs(x)*(x<0) - ((0<=x) & (x < 1)) + ((1<=x) & (x < 2)) + np.log(np.abs(x))*(2<=x)
x = np.linspace(-3,5,200)
plt.plot(x,f(x))
plt.show()
According to a comment below the answer, one can also evaluate the function in each of the intervals separately,
intervals = [(-3, -1e-6), (0,1-1e-6), (1, 2-1e-6), (2,5)]
for (s,e) in intervals:
x = np.linspace(s,e,100)
plt.plot(x,f(x), color="C0")
Thank you very much for your help, It is really useful :)
In addition, I would like to know how can I eliminate the lines that connecting each step of the interval to the next one?
I need to show only 4 seperate graphic results on the graph, in each step, without the "continuity" of the lines that connect between them.
I want to find the x value for a given y (I want to know at what t, X, the conversion, reaches 0.9). There are questions like this all over SO and they say use np.interp but I did that in two ways and both were wrong. The code is:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
# Create time domain
t = np.linspace(0,4000,100)
# Parameters
A = 1.5*10**(-3) # Arrhenius constant
T = 300 # Temperature [K]
R = 8.31 # Ideal gas constant [J/molK]
E_a= 1000 # Activation energy [J/mol]
V = 5 # Reactor volume [m3]
# Initial condition
C_A0 = 0.1 # Initial concentration [mol/m3]
def dNdt(C_A,t):
r_A = (-k*C_A)/V
dNdt = r_A*V
return dNdt
k=A*np.exp(-E_a/(R*T))
C_A = odeint(dNdt,C_A0,t)
N_A0 = C_A0*V
N_A = C_A*V
X = (N_A0 - N_A)/N_A0
# Plot
plt.figure()
plt.plot(t,X,'b-',label='Conversion')
plt.plot(t,C_A,'r--',label='Concentration')
plt.legend(loc='best')
plt.grid(True)
plt.xlabel('Time [s]')
plt.ylabel('Conversion')
Looking at the graph, at roughly t=2300, the conversion is 0.9.
Method 1:
I wrote this function so I can ask for any given point and get the x-value:
def find(x_val,f):
f = np.reshape(f,len(f))
global t
t = np.reshape(t,len(t))
return np.interp(x_val,t,f)
print('Conversion of 0.9 is reached at: ',int(find(0.9,X)),'s')
When I call the function at 0.9 I get 0.0008858 which gets rounded to 0 which is wrong. I thought maybe something is going wrong when I declare global t??
Method 2:
When I do it outside the function; so I manually reshape X and t and use np.interp(0.9,t,X), the output is 0.9.
X = np.reshape(X,len(X))
t = np.reshape(t,len(t))
print(np.interp(0.9,t,X))
I thought I made a mistake in the order of the variables so I did np.interp(0.9,X,t), and again it surprised me with 0.9.
I'm unsure as to where I'm going wrong. Any help would be appreciated. Many thanks :)
On your plot, t is horizontal and X is vertical. You want to find the horizontal coordinate where the vertical one is 0.9. That is, find t for a given X. Saying
find x value for a given y
is bound to lead to confusion, as it did here.
The problem is solved with
print(np.interp(0.9, X.ravel(), t)) # prints 2292.765497278863
(It's better to use ravel for flattening, instead of the reshape as you did). There is no need to reshape t, which is already one-dimensional.
I did np.interp(0.9,X,t), and again it surprised me with 0.9.
That sounds unlikely, you probably mistyped. This was the correct order.
I need to generate a Healpyx map (using Healpy) from random $a_{\ell m}$, for a spin-2 function.
Schematically, this should look like that:
import healpy as hp
nside = 16 # for example
for el in range(1, L+1): #loop over ell mode
for m in range(-el,el): #for each ell mode loop over m
ind = hp.sphtfunc.Alm.getidx(nside, el, m)
if m == 0:
a_lm[ind] = np.random.randn()
else:
a_lm[ind] = np.random.randn() + 1j * np.random.randn()
a_tmp = hp.sphtfunc.alm2map(a_lm, nside, pol=True)
My two questions are:
1) how do I initialise a_lm ? Specifically, what would be its dimension, using
a_lm = np.zeros(???)
2) if I understood correctly, the output a_tmp is a 1 dimensional list. How do I reshape it into a two-dimensional list (the map) for plotting?
1) What properties do you want your alm to have? You could also just assume a certain power spectrum (C_ell) and use hp.synalm() or hp.synfast().
For the initialization, you've already implemented that m goes from -ell to +ell, so you have a one-dimensional array of length sum_0^ell [2ell+1]. Doing the math should give you the length you need.
2) For the plotting, you could just directly generate a random map and then use e.g. hp.mollview(), which takes the 1-dimensional HEALPix map.
Alternatively, you can use hp.alm2map() to convert your alm to a map.
I also suggest you check out the tutorial for the plotting.
Usually we can follow the following steps to get the length of a_lm.
import healpy as hp
inside = 16
# Get the maximum multipole with the current nside
lmax = 3*nside - 1 #This can vary according to the use. In cosmology, the common value is 2*nside
alm_len = hp.Alm.getsize(lmax)
a_lm = np.empty(alm_len)
I think the tutorial linked in #Daniel's answer is a good resource for plotting Healpix maps.
I need to compare some theoretical data with real data in python.
The theoretical data comes from resolving an equation.
To improve the comparative I would like to remove data points that fall far from the theoretical curve. I mean, I want to remove the points below and above red dashed lines in the figure (made with matplotlib).
Both the theoretical curves and the data points are arrays of different length.
I can try to remove the points in a roughly-eye way, for example: the first upper point can be detected using:
data2[(data2.redshift<0.4)&data2.dmodulus>1]
rec.array([('1997o', 0.374, 1.0203223485103787, 0.44354759972859786)], dtype=[('SN_name', '|S10'), ('redshift', '<f8'), ('dmodulus', '<f8'), ('dmodulus_error', '<f8')])
But I would like to use a less roughly-eye way.
So, can anyone help me finding an easy way of removing the problematic points?
Thank you!
This might be overkill and is based on your comment
Both the theoretical curves and the data points are arrays of
different length.
I would do the following:
Truncate the data set so that its x values lie within the max and min values of the theoretical set.
Interpolate the theoretical curve using scipy.interpolate.interp1d and the above truncated data x values. The reason for step (1) is to satisfy the constraints of interp1d.
Use numpy.where to find data y values that are out side the range of acceptable theory values.
DONT discard these values, as was suggested in comments and other answers. If you want for clarity, point them out by plotting the 'inliners' one color and the 'outliers' an other color.
Here's a script that is close to what you are looking for, I think. It hopefully will help you accomplish what you want:
import numpy as np
import scipy.interpolate as interpolate
import matplotlib.pyplot as plt
# make up data
def makeUpData():
'''Make many more data points (x,y,yerr) than theory (x,y),
with theory yerr corresponding to a constant "sigma" in y,
about x,y value'''
NX= 150
dataX = (np.random.rand(NX)*1.1)**2
dataY = (1.5*dataX+np.random.rand(NX)**2)*dataX
dataErr = np.random.rand(NX)*dataX*1.3
theoryX = np.arange(0,1,0.1)
theoryY = theoryX*theoryX*1.5
theoryErr = 0.5
return dataX,dataY,dataErr,theoryX,theoryY,theoryErr
def makeSameXrange(theoryX,dataX,dataY):
'''
Truncate the dataX and dataY ranges so that dataX min and max are with in
the max and min of theoryX.
'''
minT,maxT = theoryX.min(),theoryX.max()
goodIdxMax = np.where(dataX<maxT)
goodIdxMin = np.where(dataX[goodIdxMax]>minT)
return (dataX[goodIdxMax])[goodIdxMin],(dataY[goodIdxMax])[goodIdxMin]
# take 'theory' and get values at every 'data' x point
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated thoeryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
# collect valid points
def findInlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY<(interpTheoryY+theoryErr))
withinLower = np.where(dataY[withinUpper]
>(interpTheoryY[withinUpper]-theoryErr))
return (dataX[withinUpper])[withinLower],(dataY[withinUpper])[withinLower]
def findOutlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY>(interpTheoryY+theoryErr))
withinLower = np.where(dataY<(interpTheoryY-theoryErr))
return (dataX[withinUpper],dataY[withinUpper],
dataX[withinLower],dataY[withinLower])
if __name__ == "__main__":
dataX,dataY,dataErr,theoryX,theoryY,theoryErr = makeUpData()
TruncDataX,TruncDataY = makeSameXrange(theoryX,dataX,dataY)
interpTheoryY = theoryYatDataX(theoryX,theoryY,TruncDataX)
inDataX,inDataY = findInlierSet(TruncDataX,TruncDataY,interpTheoryY,
theoryErr)
outUpX,outUpY,outDownX,outDownY = findOutlierSet(TruncDataX,
TruncDataY,
interpTheoryY,
theoryErr)
#print inlierIndex
fig = plt.figure()
ax = fig.add_subplot(211)
ax.errorbar(dataX,dataY,dataErr,fmt='.',color='k')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
ax = fig.add_subplot(212)
ax.plot(inDataX,inDataY,'ko')
ax.plot(outUpX,outUpY,'bo')
ax.plot(outDownX,outDownY,'ro')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
fig.savefig('findInliers.png')
This figure is the result:
At the end I use some of the Yann code:
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated theoryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
def findOutlierSet(data,interpTheoryY,theoryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
up = np.where(data.dmodulus > (interpTheoryY+theoryErr))
low = np.where(data.dmodulus < (interpTheoryY-theoryErr))
# join all the index together in a flat array
out = np.hstack([up,low]).ravel()
index = np.array(np.ones(len(data),dtype=bool))
index[out]=False
datain = data[index]
dataout = data[out]
return datain, dataout
def selectdata(data,theoryX,theoryY):
"""
Data selection: z<1 and +-0.5 LFLRW separation
"""
# Select data with redshift z<1
data1 = data[data.redshift < 1]
# From modulus to light distance:
data1.dmodulus, data1.dmodulus_error = modulus2distance(data1.dmodulus,data1.dmodulus_error)
# redshift data order
data1.sort(order='redshift')
# Outliers: distance to LFLRW curve bigger than +-0.5
theoryErr = 0.5
# Theory curve Interpolation to get the same points as data
interpy = theoryYatDataX(theoryX,theoryY,data1.redshift)
datain, dataout = findOutlierSet(data1,interpy,theoryErr)
return datain, dataout
Using those functions I can finally obtain:
Thank you all for your help.
Just look at the difference between the red curve and the points, if it is bigger than the difference between the red curve and the dashed red curve remove it.
diff=np.abs(points-red_curve)
index= (diff>(dashed_curve-redcurve))
filtered=points[index]
But please take the comment from NickLH serious. Your Data looks pretty good without any filtering, your "outlieres" all have a very big error and won't affect the fit much.
Either you could use the numpy.where() to identify which xy pairs meet your plotting criteria, or perhaps enumerate to do pretty much the same thing. Example:
x_list = [ 1, 2, 3, 4, 5, 6 ]
y_list = ['f','o','o','b','a','r']
result = [y_list[i] for i, x in enumerate(x_list) if 2 <= x < 5]
print result
I'm sure you could change the conditions so that '2' and '5' in the above example are the functions of your curves