How do I extract longitude, latitude from geojson column in my data - python
I have the data in a csv file which contains zipcodes in 1 column and probably geojson data in the other column. I loaded the data in pandas dataframe. How do I extract just the coordinates from the geojson column.
zips.head(2)
Out[14]:
postal_code geojson
0 85309 {"type":"MultiPolygon","coordinates":[[[[-112....
1 85310 {"type":"MultiPolygon","coordinates":[[[[-112....
zips.geojson[1]
zips.geojson.values[0]
'{"type":"MultiPolygon","coordinates":[[[[-112.363501,33.551312],[-112.363457,33.551312],[-112.36253,33.551309],[-112.361378,33.551311],[-112.360977,33.55131],[-112.358913,33.551305],[-112.358916,33.551104],[-112.358898,33.550758],[-112.358825,33.549401],[-112.358763,33.548056],[-112.358652,33.546016],[-112.358635,33.54554],[-112.358629,33.545429],[-112.358613,33.545143],[-112.358607,33.545039],[-112.358599,33.544897],[-112.358596,33.544838],[-112.358592,33.54478],[-112.358545,33.543923],[-112.358475,33.542427],[-112.358444,33.541913],[-112.35842,33.541399],[-112.358363,33.540373],[-112.358345,33.540104],[-112.35833,33.539878],[-112.35828,33.538863],[-112.358263,33.538352],[-112.358204,33.537335],[-112.358196,33.536892],[-112.358193,33.536444],[-112.358192,33.53631],[-112.358182,33.536031],[-112.358175,33.535797],[-112.358186,33.534197],[-112.358187,33.53324],[-112.358185,33.53278],[-112.358182,33.532218],[-112.358168,33.530732],[-112.358163,33.530174],[-112.35815,33.529797],[-112.359343,33.529819],[-112.359387,33.529812],[-112.359354,33.529716],[-112.360874,33.529732],[-112.370575,33.529805],[-112.375373,33.529907],[-112.37537,33.528961],[-112.375382,33.527693],[-112.375384,33.527033],[-112.375393,33.526355],[-112.374883,33.526353],[-112.371535,33.52634],[-112.366678,33.526323],[-112.366665,33.523201],[-112.366664,33.52285],[-112.366661,33.522734],[-112.366658,33.522596],[-112.366657,33.522553],[-112.366655,33.522502],[-112.366658,33.522388],[-112.368754,33.522441],[-112.370106,33.522618],[-112.370917,33.522624],[-112.371875,33.522633],[-112.371865,33.522389],[-112.371875,33.522162],[-112.37175,33.51916],[-112.375186,33.519096],[-112.375306,33.519094],[-112.375305,33.51971],[-112.375309,33.519728],[-112.375351,33.521607],[-112.375367,33.522304],[-112.375426,33.522419],[-112.375587,33.522423],[-112.375767,33.522426],[-112.382694,33.522547],[-112.382697,33.522654],[-112.382698,33.522709],[-112.382714,33.523282],[-112.382958,33.523283],[-112.383939,33.52329],[-112.383935,33.523153],[-112.386882,33.523097],[-112.38781,33.523781],[-112.38801,33.523609],[-112.388673,33.523001],[-112.388794,33.522895],[-112.388852,33.522844],[-112.389115,33.522837],[-112.389205,33.522761],[-112.389319,33.522661],[-112.392416,33.51994],[-112.392509,33.519195],[-112.392516,33.51914],[-112.401093,33.51914],[-112.401098,33.519779],[-112.401098,33.519838],[-112.401137,33.519885],[-112.401146,33.519903],[-112.40124,33.520001],[-112.401311,33.520066],[-112.401432,33.520158],[-112.401754,33.520412],[-112.402133,33.520685],[-112.402411,33.520892],[-112.402552,33.52098],[-112.402692,33.521087],[-112.402882,33.521256],[-112.402948,33.52133],[-112.403016,33.521428],[-112.403062,33.521517],[-112.4031,33.521621],[-112.40312,33.521715],[-112.403129,33.521822],[-112.403119,33.521937],[-112.403102,33.522011],[-112.403064,33.522109],[-112.403009,33.522208],[-112.402908,33.522336],[-112.402781,33.522475],[-112.402685,33.52257],[-112.402641,33.522613],[-112.402553,33.522692],[-112.401659,33.523488],[-112.401228,33.52388],[-112.401157,33.523961],[-112.401123,33.524028],[-112.401107,33.524102],[-112.401108,33.524213],[-112.401116,33.525097],[-112.401119,33.5263],[-112.401119,33.52634],[-112.401119,33.526441],[-112.399658,33.52646],[-112.399258,33.526743],[-112.395079,33.52973],[-112.394771,33.529977],[-112.39013,33.534207],[-112.388661,33.535533],[-112.385957,33.538011],[-112.384107,33.539698],[-112.384007,33.539732],[-112.383947,33.539786],[-112.38381,33.539862],[-112.384585,33.551063],[-112.384605,33.551372],[-112.384609,33.551434],[-112.384614,33.551508],[-112.384416,33.551505],[-112.38385,33.551499],[-112.38131,33.551461],[-112.380126,33.551454],[-112.378928,33.551432],[-112.376262,33.551405],[-112.373858,33.551381],[-112.372583,33.551378],[-112.370038,33.551354],[-112.368768,33.55135],[-112.367585,33.551339],[-112.36749,33.551338],[-112.363501,33.551312]]]]}'
I tried to use it the way I would use values inside a dictionary but I am unable to it.
This might help. It is untested, so it might not work, or might need to be adjusted slightly for your use case.
The important features of this program are:
Use json.loads() to convert a JSON string to a Python data structure
Decompose the data structure according to GeoJSON standard.
Reference:
http://geojson.org/geojson-spec.html#multipolygon
#UNTESTED
import json
# For every zipcode, print the X position of the first
# coordinate of the exterior of the multipolygon associated
# with that zip code
for zipcode, geo in zips:
geo = json.loads(geo)
assert geo["type"] == "MultiPolygon"
# Coordinates of a MultiPolygon are an array of Polygon coordinate arrays
array_of_polygons = geo["coordinates"]
polygon0 = array_of_polygons[0]
# Coordinates of a Polygon are an array of LinearRing coordinate arrays
ring0 = polygon0[0]
# A LinearRing is a closed LineString with 4 or more positions
# A LineString is an array of positions
vertex0 = ring0[0]
# A position is represented by an array of numbers.
x0 = vertex0[0]
print zipcode, x0
You can import json and apply json.loads to convert the string data in your geojson column to dict. Then, you can extract data from dict directly, or use one of many Python modules that deal with GIS data. I find shapely easy to use and helpful in many cases.
Related
Get address using Latitude and Longitude from two columns in DataFrame?
I have a dataframe with the longitude column and latitude column. When I try to get the address using geolocator.reverse() I get the error ValueError: Must be a coordinate pair or Point I can't for the life of me insert the lat and long into the reverse function without getting that error. I tried creating a tuple using list(zip(zips['Store_latitude'], zips['Store_longitude'])) but I get the same error. Code: import pandas as pd from geopy.geocoders import Nominatim from decimal import Decimal from geopy.point import Point zips = pd.read_excel("zips.xlsx") geolocator = Nominatim(user_agent="geoapiExercises") zips['Store_latitude']= zips['Store_latitude'].astype(str) zips['Store_longitude'] = zips['Store_longitude'].astype(str) zips['Location'] = list(zip(zips['Store_latitude'], zips['Store_longitude'])) zips['Address'] = geolocator.reverse(zips['Location']) What my DataFrame looks like Store_latitude Store_longitude 34.2262225 -118.4508349 34.017667 -118.149135
I think you might try with a tuple or a geopy.point.Point before going to a list to see whether the package works all right. I tested just now as follows (Python 3.9.13, command line style) import geopy p = geopy.point.Point(51.4,3.45) gl = geopy.geocoders.Nominatim(user_agent="my_test") # Without the user_agent it raises a ConfigurationError. gl.reverse(p) output: Location(Vlissingen, Zeeland, Nederland, (51.49433865, 3.415005767601362, 0.0)) This is as expected. Maybe you should cast your dataframe['Store_latitude'] and dataframe['Store_longitude'] before/after you convert to list? They are not strings? More information on your dataframe and content would be required to further assist, I think. Good luck! EDIT: added information after OP's comments below. When you read your excel file as zips = pd.read("yourexcel.xlsx") you will get a pandas dataframe. The content of the dataframe is two columns (which will be of type Series) and each element will be a numpy.float64 (if your excel has real values as input and not strings!). You can check this using the type() command: >>> type(zips) <class 'pandas.core.frame.DataFrame'> >>> type(zips['Lat']) <class 'pandas.core.series.Series'> >>> type(zips['Lat'][0]) <class 'numpy.float64'> What you then do is convert these floats (=decimal numbers) to a string (=text) by performing zips[...] = zips[...].astype(str). There is no reason to do that, because your geolocator requires numbers, not text. As shown in the comment by #Derek, you need to iterate over each row and while doing so, you can put the resulting Locations you receive from the geolocator in a new column. So in the next block, I first create a new (empty) list. Then i iterate over couples of lat,lon by combining your zips['Lat'] and zips['lon'] using the zip command (so the naming of zips is a bit unlucky if you don't know the zip command; it thus may be confusing you). But don't worry, what it does is just combining the entries of each row in the varables lat and lon. Within the for-each loop, I append the result of the geolocator lookup. Note that the argument of the reverse command is a tuple (lat,lon), so the complete syntax is reverse( (lat,lon) ). Instead of (lat,lon), you could also have created a Point as in my original example. But that is not necessary imo. (note: for brevity I just write 'Lat' and 'Lon' instead of your Store...). Finally, assign the result list as a new column in your zip pandas dataframe. import geopy as gp # instiate a geolocator gl = gp.geocoders.Nominatim(user_agent="my_test") locations = [] # Create empty list # For loop over each couple of lat, lon for lat,lon in zip(zips['Lat'], zips['Lon']): locations.append(gl.reverse((lat,lon)) # Add extra column to your pandas table (address will be the column name) zips = zips.assign(address=locations) One thing you still may want, is just have the text string instead of the complete geopy.Location() string in your table. To get that you write the for loop with this small modification ([0] as the first element of the Location object). Note that this won't work if the result of the lookup of a given row is empty (None). Then the [0] will raise an error. # For loop over each couple of lat, lon for lat,lon in zip(zips['Lat'], zips['Lon']: locations.append(gl.reverse((lat,lon)[0]) I hope this gets you going!
Spatially joining a series of netCDF data to a polygon object in python
I have a set of Polygon instances (Loaded via geopandas), which comprise of historical severe-weather based watches and warnings, and netCDF data at 15-minute intervals containing parameters such as tornado / hail forecast parameters and updraft helicity (These data are point data). Each polygon has a set start time and an end time. What I'm trying to do is for each of the 15-minute interval within the polygon's time range (start time -> end time), I want to spatially join in from the netCDF files, the highest value of each of these forecast parameter within the polygon. I already have the code that pulls the range of time steps (time_list) required to analyze from the netCDF files for each polygon (TOR): # Pull the warning issue time, locate the closest index in our subhour files issue_time = datetime.strptime(str(TOR.ISSUED), "%Y%m%d%H%M") expire_time = datetime.strptime(str(TOR.EXPIRED), "%Y%m%d%H%M") closest_first = min(time_list, key=lambda d: abs(d - issue_time)) closest_last = min(time_list, key=lambda d: abs(d - expire_time)) # Pull the timesteps in between the issue time and the expire time before = time_list.index(closest_first) if closest_first < issue_time else time_list.index(closest_first)-1 after = time_list.index(closest_last) if closest_last > issue_time else time_list.index(closest_last)+1 The TOR object is generated by slicing the geoDataFrame of the full polygon data. Right now I am using itertuples on the below slice to get the TOR object: TORNADOES_IN_WINDOW = df_slice_TO_WRN[(df_slice_TO_WRN.ISSUED > int(first.strftime("%Y%m%d%H%M"))) & (df_slice_TO_WRN.ISSUED < int(last.strftime("%Y%m%d%H%M")))] Inside the loop, I then iterate over the list of netCDF files in the range of timesteps found (before -> after), and load these into geopandas as well such that I can perform a spatial join, this is done with this block of code: xr_i = xr.open_dataset(subhr_files[i]) ds_i = xr_i.to_dataframe() ds_i = ds_i.reset_index() geom = [Point(x,y) for x, y in zip(ds_i['XLONG'], ds_i['XLAT'])] gdf = gpd.GeoDataFrame(ds_i, geometry=geom) The set of problems I'm encountering right now comprise of the following. When trying to run itertuples on the geoDataFrame object (TORNADOES_IN_WINDOW), each iterable returns as a Pandas object, not a GeoPandas object, and therefore I cannot run sjoin to join the attributes. Also, this is not a very efficient methodology, and I'm wondering if perhaps there is a better way to do it.
Create list of dictionary items from lists
I am working on a project that involves going through two columns of latitude and longitude values. If the lat/long in one pair of columns are blank, then I need to figure out which pair of lat/long values in another two columns are (geographically) closest to those in the destination. The dataframe looks like this: origin_lat | origin_lon | destination_lat | destination_lon ---------------------------------------------------------------- 20.291326 -155.838488 25.145242 -98.491404 25.611236 -80.551706 25.646763 -81.466360 26.897654 -75.867564 nan nan I am trying to build two dictionaries, one with the origin lat and long, and the other with the destination lat and long, in this format: tmplist = [{'origin_lat': 39.7612992, 'origin_lon': -86.1519681}, {'origin_lat': 39.762241, 'origin_lon': -86.158436 }, {'origin_lat': 39.7622292, 'origin_lon': -86.1578917}] What I want to do is for every row where the destination lat/lon are blank, compare the origin lat/lon in the same row to a dictionary of all the non-nan destination lat/lon values, then print the geographically closest lat/lon from the dictionary of destination lat/lon to the row in place of the nan values. I've been playing around with creating lists of dictionary objects but can't seem to build a dictionary in the correct format. Any help would be appreciated!
If df is your pandas.DataFrame, you can generate the requested dictionaries by iterating through the rows of df: origin_dicts = [{'origin_lat': row['origin_lat'], 'origin_long': row['origin_lon']} for _, row in df.iterrows()] and analogously for destination_dicts. Remark: if the only reason for creating the dictionaries is the calculation of values replacing the nan-entries, it might be easier to do this directly on the data frame, e.g. df['destination_lon'] = df.apply(find_closest_lon, axis=1) df['destination_lat'] = df.apply(find_closest_lat, axis=1) where find_closest_lon, find_closes_lat are functions receiving a data frame row as an argument and having access to the values of the origin-columns of the data frame.
The format that you want is the built-in 'records' format: df[['origin_lat','origin_lon']].to_dict(orient = 'records') produces [{'origin_lat': 20.291326, 'origin_lon': -155.83848799999998}, {'origin_lat': 25.611235999999998, 'origin_lon': -80.55170600000001}, {'origin_lat': 26.897654, 'origin_lon': -75.867564}] and of course you can equally have df[['destination_lat','destination_lon']].to_dict(orient = 'records') But I agree with #ctenar that you do not need to generate dictionaries for your ultimate task, Pandas provide enough functionality for that
Changing North/South latitude values in Python
In Python I'm dealing with a couple of large csv's containing geographical data in different kinds of formats for latitude and longitude. I settled on converting them to decimal degrees. My issue is that some files are already formatted this way, but with a direction (N,S,E,W) attached at the end of each individual coordinate. Also, the south and west coordinates are not yet negative, and they should be when in decimal degrees. I was initially using regex to filter these directions out, but can't figure out a way to attach a negative to South and West coordinates before dropping them. I am using pandas to read the csv in. Example coordinates: Latitude, Longitude 30.112342N, 10.678982W 20.443459S, 30.678997E import *pandas* as pd df = pd.read_csv("mydataset.csv") if df['Latitude'].str.endswith('S'): df.Latitude = -float(df['Latitude'].str.strip('S')) else: df.Latitude = float(df['Latitude'].str.strip('N')) Depending on how I tweak it, I get different errors, the most common being: Attribute error: 'Latitude' object has no attribute 'strip'. I've tried changing the dtype to string, among other methods, with no luck. I can filter out the directions with regular expressions, but can't discern what the direction was to change to negative if necessary. Any help is appreciated.
Look into .apply(). The value df['Latitude'] is the full column, so you can't work on all of them at once for this sort of operation. Instead, do something like this: def fix_latitude(x): """Work on individual latitude value X.""" if x.endswith('S'): x = -float(x.strip('S')) else: x = x.strip('N') return x df['fixedLatitude'] = df.Latitude.apply(fix_latitude)
How do I save centroids in a shapefile, together with a geometry?
I calculate a centroid for some GeoDataFrame M["centroid"] = M.centroid now, I'd like to save that column in a shp file, and do M[["centroid","geometry"]].to_file("mfile.shp") But the driver complains I cannot save a Point together with my geometry. I guess that's ok, but I'm wondering what's the best way to store controid information alongside other geo info using geopandas
Because GeoPandas will not allow you to save two geometry columns (when create shapefiles), I would recommend saving xy coordinates into columns for each centroid. From that you can always easily get shapely.Point. for index, row in M.iterrows(): centroid_coords = row.geometry.centroid.coords.xy M.loc[index, 'cen_x'] = centroid_coords[0][0] M.loc[index, 'cen_y'] = centroid_coords[1][0] M[["cen_x", "cen_y", "geometry"]].to_file("mfile.shp") EDIT based on comment below (thanks, didn't realise it): temp = M.centroid M['cen_x'] = temp.x M['cen_y'] = temp.y M[["cen_x", "cen_y", "geometry"]].to_file("mfile.shp")