appending an index to laspy file (.las) - python
I have two files, one an esri shapefile (.shp), the other a point cloud (.las).
Using laspy and shapefile modules I've managed to find which points of the .las file fall within specific polygons of the shapefile. What I now wish to do is to add an index number that enables identification between the two datasets. So e.g. all points that fall within polygon 231 should get number 231.
The problem is that as of yet I'm unable to append anything to the list of points when writing the .las file. The piece of code that I'm trying to do it in is here:
outFile1 = laspy.file.File("laswrite2.las", mode = "w",header = inFile.header)
outFile1.points = truepoints
outFile1.points.append(indexfromshp)
outFile1.close()
The error I'm getting now is: AttributeError: 'numpy.ndarray' object has no attribute 'append'. I've tried multiple things already including np.append but I'm really at a loss here as to how to add anything to the las file.
Any help is much appreciated!
There are several ways to do this.
Las files have classification field, you could store the indexes in this field
las_file = laspy.file.File("las.las", mode="rw")
las_file.classification = indexfromshp
However if the Las file has version <= 1.2 the classification field can only store values in the range [0, 35], but you can use the 'user_data' field which can hold values in the range [0, 255].
Or if you need to store values higher than 255 / you need a separate field you can define a new dimension (see laspy's doc on how to add extra dimensions).
Your code should be close to something like this
outFile1 = laspy.file.File("laswrite2.las", mode = "w",header = inFile.header)
# copy fields
for dimension in inFile.point_format:
dat = inFile.reader.get_dimension(dimension.name)
outFile1.writer.set_dimension(dimension.name, dat)
outFile1.define_new_dimension(
name="index_from_shape",
data_type=7, # uint64_t
description = "Index of corresponding polygon from shape file"
)
outFile1.index_from_shape = indexfromshp
outFile1.close()
Related
Creating a Subway Station Network in Python using Graph, Vertex and Edge data structure
I'm fairly new to Python and I have a college work that requires to use the graph, vertex and edge concepts, however I'm struggling on how to properly implement this. Below is the description of the work required: a) In Vertex, create a Station subclass, to contain the station id and the position, in decimal degrees, of the location of the object on the earth's surface. b) Create a list of Station objects from the dataset in lisbon.stations.csv. c) Under Edge, create an Edge_line subclass, to contain the line that the connection is part of. d) Create a list of "Edge_line" objects from the same data set. e) Create a Graph structure. f) Create a procedure/method to be able to view the network. Example: Import myplotlib.myplot as plt fig = plt.figure() plt.line(x,y, c=color) To view the connections where x and y are lists with the positions on the map of the end stations of an Edge; Run as many times as the number of connections. Encode the color as shown in the table in lisbon.lines.csv. Below is my code at the moment. I think I managed to made it to point c), but would appreciate help on reviewing the entire code. https://pastebin.com/5t9A8dfC Also, please find below the CSV files required to use on this work. lisbon.lines.csv "line","name","colour" 1,"Vermelha","r" 2,"Azul","b" 3,"Verde","g" 4,"Amarela","y" lisbon.stations.csv "id","latitude","longitude","name" 1,38.771746,-9.1306207,"Aeroporto" 2,38.7758674,-9.1177189,"Encarnação" 3,38.775383,-9.1048967,"Moscavide" 4,38.770726,-9.1014767,"Oriente" 5,38.764569,-9.1064637,"Cabo Ruivo" 6,38.761624,-9.112163,"Olivais" 7,38.755357,-9.113954,"Chelas" 8,38.747860,-9.118439,"Bela Vista" 9,38.739911,-9.123824,"Olaias" 10,38.737158,-9.133888,"Alameda" 11,38.735426,-9.145261,"Saldanha" 12,38.734815,-9.154220,"São Sebastião" 13,38.713814,-9.122570,"Santa Apolónia" 14,38.707175,-9.133352,"Terreiro do Paço" 15,38.710700,-9.139274,"Chiado" 16,38.715169,-9.141632,"Restauradores" 17,38.720109,-9.145883,"Avenida" 18,38.725357,-9.150099,"Marquês de Pombal" 19,38.729726,-9.150335,"Parque" 20,38.737849,-9.158458,"Praça de Espanha" 21,38.742029,-9.168977,"Jardim Zoologico" 22,38.748530,-9.172462,"Laranjeiras" 24,38.749609,-9.179994,"Alto dos Moinhos" 25,38.752836,-9.189426,"Colégio Militar" 26,38.759139,-9.192718,"Carnide" 27,38.762243,-9.196849,"Pontinha" 28,38.760578,-9.204724,"Alfornelos" 29,38.758411,-9.219122,"Amadora Este" 30,38.752186,-9.224132,"Reboleira" 31,38.706116,-9.145123,"Cais do Sodré" 32,38.713994,-9.138450,"Rossio" 33,38.716256,-9.136219,"Martim Moniz" 34,38.722266,-9.135371,"Intendente" 35,38.726275,-9.134942,"Anjos" 36,38.733639,-9.134167,"Arroios" 37,38.748653,-9.141581,"Roma" 38,38.753129,-9.144016,"Alvalade" 39,38.760212,-9.157874,"Campo Grande" 40,38.760296,-9.166135,"Telheiras" 41,38.793193,-9.173379,"Odivelas" 42,38.785867,-9.172124,"Senhor Roubado" 43,38.779645,-9.159721,"Ameixoeira" 44,38.772903,-9.159753,"Lumiar" 45,38.767022,-9.155333,"Quinta das Conchas" 46,38.751812,-9.159045,"Cidade Universitária" 47,38.747001,-9.147973,"Entre Campos" 48,38.741545,-9.146739,"Campo Pequeno" 49,38.730925,-9.146653,"Picoas" 50,38.720094,-9.154077,"Rato" 23,38.742541,-9.133747,"Areeiro" lisbon.connections.csv "station1","station2","line" 1,2,1 2,3,1 3,4,1 4,5,1 5,6,1 6,7,1 7,8,1 8,9,1 9,10,1 10,11,1 11,12,1 13,14,2 14,15,2 15,16,2 16,17,2 17,18,2 18,19,2 19,12,2 12,20,2 20,21,2 21,22,2 22,24,2 24,25,2 25,26,2 26,27,2 27,28,2 28,29,2 29,30,2 31,15,3 15,32,3 32,33,3 33,34,3 34,35,3 35,36,3 36,10,3 10,23,3 23,37,3 37,38,3 38,39,3 39,40,3 41,42,4 42,43,4 43,44,4 44,45,4 45,39,4 39,46,4 46,47,4 47,48,4 48,11,4 11,49,4 49,18,4 18,50,4 PS: Let me know if there is any other way to attach the actual csv files to this thread.
Converting pixels into wavelength using 2 FITS files
I am new to python and FITS image files, as such I am running into issues. I have two FITS files; the first FITS file is pixels/counts and the second FITS file (calibration file) is pixels/wavelength. I need to convert pixels/counts into wavelength/counts. Once this is done, I need to output wavelength/counts as a new FITS file for further analysis. So far I have managed to array the required data as shown in the code below. import numpy as np from astropy.io import fits # read the images image_file = ("run_1.fits") image_calibration = ("cali_1.fits") hdr = fits.getheader(image_file) hdr_c = fits.getheader(image_calibration) # print headers sp = fits.open(image_file) print('\n\nHeader of the spectrum :\n\n', sp[0].header, '\n\n') sp_c = fits.open(image_calibration) print('\n\nHeader of the spectrum :\n\n', sp_c[0].header, '\n\n') # generation of arrays with the wavelengths and counts count = np.array(sp[0].data) wave = np.array(sp_c[0].data) I do not understand how to save two separate arrays into one FITS file. I tried an alternative approach by creating list as shown in this code file_list = fits.open(image_file) calibration_list = fits.open(image_calibration) image_data = file_list[0].data calibration_data = calibration_list[0].data # make a list to hold images img_list = [] img_list.append(image_data) img_list.append(calibration_data) # list to numpy array img_array = np.array(img_list) # save the array as fits - image cube fits.writeto('mycube.fits', img_array) However I could only save as a cube, which is not correct because I just need wavelength and counts data. Also, I lost all the headers in the newly created FITS file. To say I am lost is an understatement! Could someone point me in the right direction please? Thank you. I am still working on this problem. I have now managed (I think) to produce a FITS file containing the wavelength and counts using this website: https://www.mubdirahman.com/assets/lecture-3---numerical-manipulation-ii.pdf This is my code: # Making a Primary HDU (required): primaryhdu = fits.PrimaryHDU(flux) # Makes a header # or if you have a header that you’ve created: primaryhdu = fits.PrimaryHDU(arr1, header=head1) # If you have additional extensions: secondhdu = fits.ImageHDU(wave) # Making a new HDU List: hdulist1 = fits.HDUList([primaryhdu, secondhdu]) # Writing the file: hdulist1.writeto("filename.fits", overwrite=True) image = ("filename.fits") hdr = fits.open(image) image_data = hdr[0].data wave_data = hdr[1].data I am sure this is not the correct format for wavelength/counts. I need both wavelength and counts to be contained in hdr[0].data
If you are working with spectral data, it might be useful to look into specutils which is designed for common tasks associated with reading/writing/manipulating spectra. It's common to store spectral data in FITS files using tables, rather than images. For example you can create a table containing wavelength, flux, and counts columns, and include the associated units in the column metadata. The docs include an example on how to create a generic "FITS table" writer with wavelength and flux columns. You could start from this example and modify it to suit your exact needs (which can vary quite a bit from case to case, which is probably why a "generic" FITS writer is not built-in). You might also be able to use the fits-wcs1d format. If you prefer not to use specutils, that example still might be useful as it demonstrates how to create an Astropy Table from your data and output it to a well-formatted FITS file.
How to specify burn values by attribute using GDAL rasterize (python API)?
I'm using gdal.RasterizeLayer() to convert a shapefile to a GeoTiff using a GeoTiff template while burning the output values by ATTRIBUTE. What I want to output is a .tif where the burn value corresponds to the value of a given attribute. What I find is that gdal.RasterizeLayer() is burning to strange values that do not correspond to the values in my attribute field. Here's what I have currently: gdalformat = 'GTiff' datatype = gdal.GDT_Byte # Open Shapefile shapefile = ogr.Open(self.filename) shapefile_layer = shapefile.GetLayer() # Get projection info from reference image image = gdal.Open(ref_image, gdal.GA_ReadOnly) output = gdal.GetDriverByName(gdalformat).Create(output_tif, image.RasterXSize, image.RasterYSize, 1, datatype, options=['COMPRESS=DEFLATE']) output.SetProjection(image.GetProjectionRef()) output.SetGeoTransform(image.GetGeoTransform()) # Write data to band 1 band = output.GetRasterBand(1) band.SetNoDataValue(0) gdal.RasterizeLayer(output, [1], shapefile_layer, options=['ATTRIBUTE=FCode']) # Close datasets band = None output = None image = None shapefile = None # Build image overviews subprocess.call("gdaladdo --config COMPRESS_OVERVIEW DEFLATE " + output_tif + " 2 4 8 16 32 64", shell=True) What occurs is that the output .tif correctly assigns different burn values for each attribute, but the value does not correspond to the attribute value. For example, the input attribute value FCode=46006 turns into a burn value of 182 (and it's not clear why!). I tried adding and removing the 'COMPRESS=DEFLATE' option, and adding and removing the '3D' option for gdal.RasterizeLayer(). None affect the output burn values. You can see the input shapefile and attribute values here: input .shp And the output, with the incorrect values, here: output raster
I fixed this myself by changing the type to gdal.GDT_Int32.
Error in reading HDF5 using h5py
I have saved my dataset in this form as mentioned in the following image (HDF5 format). So I have different groups i.e. 4, 2, 40 etc. and for each group I have 2 datasets Annotation and Features. I have save them successfully using code but I am unable to load them back. Strange thing is the error occurs only when I try to read Annotation. And reading works fine when I try to read Features. I am using the following code: dataSet = np.array([]) annotation = np.array([]) hdf5Object = readHDF5File('abc.hdf5','r') w = 2 myGroup = hdf5Object[str(w)] dataSet = np.array(myGroup['Features']) annotation = np.array(myGroup['Annotation']) Please enlighten me here as I am struggling a lot for this for a while now. Thanks. EDIT 1 I am getting the following error when I read Annotation Traceback (most recent call last): File "xyz.py", line 76, in getAllData annotation = np.array(myGroup['Annotation']) File "/usr/lib/python2.7/dist-packages/h5py/_hl/group.py", line 153, in __getitem__ oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5o.pyx", line 173, in h5py.h5o.open (h5py/h5o.c:3403) KeyError: "unable to open object (Symbol table: Can't open object)" EDIT 2 So the hdf5 file was formed in 2 steps, in 1st step Features were calculated as follows: features = <numpy array of thousand rows and 100 columns contains only floating numbers> w = 2 f = h5py.File('abc.hdf5', 'a') myGroup = f[str(w)] myGroup.create_dataset('Features', data=features) For different w file was appended and features were calculated at different times. For annotation, same kind of procedure is used. Annotation contains only floating points as well. EDIT 3 In the following image is content of data in Annotation and Features of one w. Left window is Annotation and right one is Features.
I just figured out that the way I was trying to access dataset was using string and somehow while saving dataset name it was saved under unicode or utf-8. So when I convert my dataset name to utf-8 it works fine. How I figured out its datatype myGroup = hdf5Object[str(w)] childsIter = myGroup.iterkeys() for child in childsIter: print type(child) This gave me the clue that the data type of my key of dataset is unicode and not just string. So I converted my string to unicode as follows: key = unicode('Annotation', "utf-8") dS = np.array(myGroup[key]) or myGroup = hdf5Object[str(w)] childsIter = myGroup.iterkeys() for child in childsIter: dS = np.array(myGroup[child])
Count number of points in multipolygon shapefile using Python
I have a polygon shapefile of the U.S. made up of individual states as their attribute values. In addition, I have arrays storing latitude and longitude values of point events that I am also interested in. Essentially, I would like to 'spatial join' the points and polygons (or perform a check to see which polygon [i.e., state] each point is in), then sum the number of points in each state to find out which state has the most number of 'events'. I believe the pseudocode would be something like: Read in US.shp Read in lat/lon points of events Loop through each state in the shapefile and find number of points in each state print 'Here is a list of the number of points in each state: ' Any libraries or syntax would be greatly appreciated. Based on what I can tell, the OGR library is what I need, but I am having trouble with the syntax: dsPolygons = ogr.Open('US.shp') polygonsLayer = dsPolygons.GetLayer() #Iterating all the polygons polygonFeature = polygonsLayer.GetNextFeature() k=0 while polygonFeature: k = k + 1 print "processing " + polygonFeature.GetField("STATE") + "-" + str(k) + " of " + str(polygonsLayer.GetFeatureCount()) geometry = polygonFeature.GetGeometryRef() #Read in some points? geomcol = ogr.Geometry(ogr.wkbGeometryCollection) point = ogr.Geometry(ogr.wkbPoint) point.AddPoint(-122.33,47.09) point.AddPoint(-110.11,33.33) #geomcol.AddGeometry(point) print point.ExportToWkt() print point numCounts=0.0 while pointFeature: if pointFeature.GetGeometryRef().Within(geometry): numCounts = numCounts + 1 pointFeature = pointsLayer.GetNextFeature() polygonFeature = polygonsLayer.GetNextFeature() #Loop through to see how many events in each state
I like the question. I doubt I can give you the best answer, and definitely can't help with OGR, but FWIW I'll tell you what I'm doing right now. I use GeoPandas, a geospatial extension of pandas. I recommend it — it's high-level and does a lot, giving you everything in Shapely and fiona for free. It is in active development by twitter/#kajord and others. Here's a version of my working code. It assumes you have everything in shapefiles, but it's easy to generate a geopandas.GeoDataFrame from a list. import geopandas as gpd # Read the data. polygons = gpd.GeoDataFrame.from_file('polygons.shp') points = gpd.GeoDataFrame.from_file('points.shp') # Make a copy because I'm going to drop points as I # assign them to polys, to speed up subsequent search. pts = points.copy() # We're going to keep a list of how many points we find. pts_in_polys = [] # Loop over polygons with index i. for i, poly in polygons.iterrows(): # Keep a list of points in this poly pts_in_this_poly = [] # Now loop over all points with index j. for j, pt in pts.iterrows(): if poly.geometry.contains(pt.geometry): # Then it's a hit! Add it to the list, # and drop it so we have less hunting. pts_in_this_poly.append(pt.geometry) pts = pts.drop([j]) # We could do all sorts, like grab a property of the # points, but let's just append the number of them. pts_in_polys.append(len(pts_in_this_poly)) # Add the number of points for each poly to the dataframe. polygons['number of points'] = gpd.GeoSeries(pts_in_polys) The developer tells me that spatial joins are 'new in the dev version', so if you feel like poking around in there, I'd love to hear how that goes! The main problem with my code is that it's slow.
import geopandas as gpd # Read the data. polygons = gpd.GeoDataFrame.from_file('polygons.shp') points = gpd.GeoDataFrame.from_file('points.shp') # Spatial Joins pointsInPolygon = gpd.sjoin(points, polygons, how="inner", op='intersects') # Add a field with 1 as a constant value pointsInPolygon['const']=1 # Group according to the column by which you want to aggregate data pointsInPolygon.groupby(['statename']).sum() **The column ['const'] will give you the count number of points in your multipolygons.** #If you want to see others columns as well, just type something like this : pointsInPolygon = pointsInPolygon.groupby('statename').agg({'columnA':'first', 'columnB':'first', 'const':'sum'}).reset_index() [1]: https://geopandas.org/docs/user_guide/mergingdata.html#spatial-joins [2]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html