I have created a graph in python but I now need to take a section of the graph and expand this by using a small range of the original data, but I don't know how to find the row number of the results that form the range or how I can create a graph using just these results form the file. This is the code I have for the graph:
import numpy as np
import matplotlib.pyplot as plt
#variable for data to plot
spec_to_plot = "SN2012fr_20121129.42_wifes_BR.dat"
#tells python where to look for the file
spec_directory = '/home/fh1u16/Documents/spectra/'
data = np.loadtxt(spec_directory + spec_to_plot, dtype=np.float)
x = data[:,0]
y = data[:,1]
plt.plot(x, y)
plt.xlabel("Wavelength")
plt.ylabel("Flux")
plt.title(spec_to_plot)
plt.show()
edit: data is between 3.5e+3 and 9.9e+3 in the first column, I need to use just the data between 5.5e+3 and 6e+3 to plot another graph, but this only applies to the first column. Hope this makes a bit more sense?
Python version 2.7
If I understand you correctly, you could do it this way:
my_slice = slice(np.argwhere(x>5.5e3)[0], np.argwhere(x>6e3)[0])
x = data[my_slice,0]
y = data[my_slice,1]
np.argwhere(x>5.5e3)[0] is the index of the first occurrence of x>5.5e3 and like wise for the end of the slice. (assuming your data is sorted)
A more general way working even if your data is not sorted:
mask = (x>5.5e3) & (x<6e3)
x = data[mask, 0]
y = data[mask, 1]
solved by using
plt.axis([5500, 6000, 0, 8e-15])
thanks for help.
Related
I am trying to convert a set of 3D points into a heightmap (a 2d image that shows the largest displacements of the points from the floor)
The only way I can come up with is writing a for look that iterates through all points and update the heightmap, this method, is quite slow.
import numpy as np
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
heightmap = np.zeros((int(np.max(points[:,1])/heightmap_resolution) + 1,
int(np.max(points[:,0])/heightmap_resolution) + 1))
for point in points:
y = int(point[1]/heightmap_resolution)
x = int(point[0]/heightmap_resolution)
if point[2] > heightmap[y][x]:
heightmap[y][x] = point[2]
I wonder if there is a better way of doing this. Any improvement is greatly appreciated!
The intuition:
If you find yourself using a for loop with numpy, you probably need to check again if numpy has an operation for it. I saw you wanted to compare items to get max and I wasn't sure if the structure was imporant so I changed it.
2nd point is heightmap is pre-allocating a lot of memory you aren't going to use. Try using a dictionary with a tuple (x,y) as the key or this (a dataframe)
import numpy as np
import pandas as pd
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
points_df = pd.DataFrame(points, columns = ['x','y','z'])
#didn't know if you wanted to keep the x and y columns so I made new ones.
points_df['x_normalized'] = (points_df['x']/heightmap_resolution).astype(int)
points_df['y_normalized'] = (points_df['y']/heightmap_resolution).astype(int)
points_df.groupby(['x_normalized','y_normalized'])['z'].max()
I need to create a histogram of a very large data set in python 3. However, I cannot use a list to create a histogram because the list would be too large given my data. I need a way to update a histogram as each data point is created. That way my computer is only ever dealing with a single point and updating the plot.
I've been using matplotlib. Tried plt.draw() but couldn't get it to work. (See code below)
#Proof of concept code
l = [1, 2, 3, 2, 3, 2]
n = 0
p = False
for x in range(0,6):
n = l[x]
if p == False:
fig = plt.hist(n)
p = True
else:
plt.draw()
I need a plot that looks like plt.hist(l). But have only been getting the first point plotted.
Are you familiar with Numpy? Numpy handles large arrays pretty well.
Here's an example using a random integer set from 1 to 3 (inclusive).
import matplotlib.pyplot as plt
import numpy as np
arr_random = np.random.randint(1,4,10000)
plt.hist(arr_random)
plt.show()
It's very efficient to compute plt.hist() with Numpy arrays.
Is there a way to extract the data from an array, which corresponds to a line of a contourplot in python? I.e. I have the following code:
n = 100
x, y = np.mgrid[0:1:n*1j, 0:1:n*1j]
plt.contour(x,y,values)
where values is a 2d array with data (I stored the data in a file but it seems not to be possible to upload it here). The picture below shows the corresponding contourplot. My question is, if it is possible to get exactly the data from values, which corresponds e.g. to the left contourline in the plot?
Worth noting here, since this post was the top hit when I had the same question, that this can be done with scikit-image much more simply than with matplotlib. I'd encourage you to check out skimage.measure.find_contours. A snippet of their example:
from skimage import measure
x, y = np.ogrid[-np.pi:np.pi:100j, -np.pi:np.pi:100j]
r = np.sin(np.exp((np.sin(x)**3 + np.cos(y)**2)))
contours = measure.find_contours(r, 0.8)
which can then be plotted/manipulated as you need. I like this more because you don't have to get into the deep weeds of matplotlib.
plt.contour returns a QuadContourSet. From that, we can access the individual lines using:
cs.collections[0].get_paths()
This returns all the individual paths. To access the actual x, y locations, we need to look at the vertices attribute of each path. The first contour drawn should be accessible using:
X, Y = cs.collections[0].get_paths()[0].vertices.T
See the example below to see how to access any of the given lines. In the example I only access the first one:
import matplotlib.pyplot as plt
import numpy as np
n = 100
x, y = np.mgrid[0:1:n*1j, 0:1:n*1j]
values = x**0.5 * y**0.5
fig1, ax1 = plt.subplots(1)
cs = plt.contour(x, y, values)
lines = []
for line in cs.collections[0].get_paths():
lines.append(line.vertices)
fig1.savefig('contours1.png')
fig2, ax2 = plt.subplots(1)
ax2.plot(lines[0][:, 0], lines[0][:, 1])
fig2.savefig('contours2.png')
contours1.png:
contours2.png:
plt.contour returns a QuadContourSet which holds the data you're after.
See Get coordinates from the contour in matplotlib? (which this question is probably a duplicate of...)
Hi I have a 3D list (I realise this may not be the best representation of my data so any advice here is appreciated) as such:
y_data = [
[[a,0],[b,1],[c,None],[d,6],[e,7]],
[[a,5],[b,2],[c,1],[d,None],[e,1]],
[[a,3],[b,None],[c,4],[d,9],[e,None]],
]
The y-axis data is such that each sublist is a list of values for one hour. The hours are the x-axis data. Each sublist of this has the following format:
[label,value]
So essentially:
line a is [0,5,3] on the y-axis
line b is [1,2,None] on the y-axis etc.
My x-data is:
x_data = [0,1,2,3,4]
Now when I plot this list directly i.e.
for i in range(0,5):
ax.plot(x_data, [row[i][1] for row in y_data], label=y_data[0][i][0])
I get a line graph however where the value is None the point is not drawn and the line not connected.
What I would like to do is to have a graph which will plot my data in it's current format, but ignore missing points and draw a line between the point before the missing data and the point after (i.e. interpolating the missing point).
I tried doing it like this https://stackoverflow.com/a/14399830/1800665 but I couldn't work out how to do this for a 3D list.
Thanks for any help!
The general approach that you linked to will work fine here ; it looks like the question you're asking is how to apply that approach to your data. I'd like to suggest that by factoring out the data you're plotting, you'll see more clearly how to do it.
import numpy as np
y_data = [
[[a,0],[b,1],[c,None],[d,6],[e,7]],
[[a,5],[b,2],[c,1],[d,None],[e,1]],
[[a,3],[b,None],[c,4],[d,9],[e,None]],
]
x_data = [0, 1, 2, 3, 4]
for i in range(5):
xv = []
yv = []
for j, v in enumerate(row[i][1] for row in y_data):
if v is not None:
xv.append(j)
yv.append(v)
ax.plot(xv, yv, label=y_data[0][i][0])
Here instead of using a mask like in the linked question/answer, I've explicitly built up the lists of valid data points that are to be plotted.
I'm new really to python programming, and I was just wondering if you can create a regular grid of 0.5 by o.5 m of resolution using LiDAR points.
My data are in LAS format (reading with from liblas import file as lasfile) and they have the following format: X,Y,Z. Where X and Y are coordinates.
The points are randomly positioned and some pixel are empty (NAN value) and in some pixel there are more of one points. Where there are more of one point, I wish to obtain a mean value. In the end i need to save the data in a TIF format or Ascii format.
I am studying osgeo module and GDAL but I honest to say that i don't know if osgeo module is the best solution.
I am really glad for help with some code that i can study and implement,
Thanks in Advance for the help, I really need.
I don't know the best way to get a grid with these parameters.
It's a bit late but maybe this answer will be useful for others, if not for you...
I have done this with Numpy and Pandas, and it's pretty fast. I was using TLS data and could do this with several million data points without any trouble on a decent 2009-vintage laptop. The key is 'binning' by rounding the data, and then using Pandas' GroupBy methods to do the aggregating and calculate the means.
If you need to round to a power of 10 you can use np.round, otherwise you can round to an arbitrary value by making a function to do so, which I have done by modifying this SO answer.
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round( np.array(a, dtype=float) / round_val) * round_val
# load data
data = np.load( 'shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty( [n_d, 5] )
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val( d_round[:,0], 0.5)
d_round[:,4] = round_to_val( d_round[:,1], 0.5)
# sorting data
ind = np.lexsort( (d_round[:,4], d_round[:,3]) )
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame( d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array( [x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
When I've done this, I have saved the output as a .npy and converted to a tiff with the Image library, and then georeferenced in ArcMap. There is probably a way to do that with osgeo but I haven't used it.
Hope this helps someone at least...
You can use the histogram function in Numpy to do binning, for instance:
import numpy as np
points = np.random.random(1000)
#create 10 bins from 0 to 1
bins = np.linspace(0, 1, 10)
means = (numpy.histogram(points, bins, weights=data)[0] /
numpy.histogram(points, bins)[0])
Try LAStools, particularly lasgrid or las2dem.