I downloaded for one single city the corresponding openstreetmap data. I want to get the maximum and minimum values for latitude and longitude. How can I do that?
My approach is the following:
import osmread as osm
#...
def _parse_map(self):
geo = [(entity.lat, entity.lon) for entity in osm.parse_file('map.osm') if isinstance(entity, osm.Node)]
return max(geo[0]), min(geo[0]), max(geo[1]), min(geo[1])
But when I print these values I don't think they are right. When I downloaded the area from the OpenStreetMap site I had a white area indicating which region I am exporting. And for this area I also got the minimum and maximum values for latitude and longitude. And these values arent fitting with the ones I get from my simple script.
What am I doing wrong?
return max(geo[0]), min(geo[0]), max(geo[1]), min(geo[1])
You are taking the extrema of the first and second element of geo. But geo is a list of 2-tuples. So the first element geo[0] is a 2-tuple consisting of entity.lat and entity.lon for the first node. Therefore you are just choosing min/max of latitude and longitude for one node.
If you want to feed the first (second) element of each tuple in the list to the aggregate function, then you have to specifically choose these. For example with a generator:
return max(x[0] for x in geo), min(x[0] for x in geo), max(x[1] for x in geo), min(x[1] for x in geo)
Related
I have a set of Polygon instances (Loaded via geopandas), which comprise of historical severe-weather based watches and warnings, and netCDF data at 15-minute intervals containing parameters such as tornado / hail forecast parameters and updraft helicity (These data are point data). Each polygon has a set start time and an end time. What I'm trying to do is for each of the 15-minute interval within the polygon's time range (start time -> end time), I want to spatially join in from the netCDF files, the highest value of each of these forecast parameter within the polygon.
I already have the code that pulls the range of time steps (time_list) required to analyze from the netCDF files for each polygon (TOR):
# Pull the warning issue time, locate the closest index in our subhour files
issue_time = datetime.strptime(str(TOR.ISSUED), "%Y%m%d%H%M")
expire_time = datetime.strptime(str(TOR.EXPIRED), "%Y%m%d%H%M")
closest_first = min(time_list, key=lambda d: abs(d - issue_time))
closest_last = min(time_list, key=lambda d: abs(d - expire_time))
# Pull the timesteps in between the issue time and the expire time
before = time_list.index(closest_first) if closest_first < issue_time else time_list.index(closest_first)-1
after = time_list.index(closest_last) if closest_last > issue_time else time_list.index(closest_last)+1
The TOR object is generated by slicing the geoDataFrame of the full polygon data. Right now I am using itertuples on the below slice to get the TOR object:
TORNADOES_IN_WINDOW = df_slice_TO_WRN[(df_slice_TO_WRN.ISSUED > int(first.strftime("%Y%m%d%H%M")))
& (df_slice_TO_WRN.ISSUED < int(last.strftime("%Y%m%d%H%M")))]
Inside the loop, I then iterate over the list of netCDF files in the range of timesteps found (before -> after), and load these into geopandas as well such that I can perform a spatial join, this is done with this block of code:
xr_i = xr.open_dataset(subhr_files[i])
ds_i = xr_i.to_dataframe()
ds_i = ds_i.reset_index()
geom = [Point(x,y) for x, y in zip(ds_i['XLONG'], ds_i['XLAT'])]
gdf = gpd.GeoDataFrame(ds_i, geometry=geom)
The set of problems I'm encountering right now comprise of the following. When trying to run itertuples on the geoDataFrame object (TORNADOES_IN_WINDOW), each iterable returns as a Pandas object, not a GeoPandas object, and therefore I cannot run sjoin to join the attributes. Also, this is not a very efficient methodology, and I'm wondering if perhaps there is a better way to do it.
I am working on a project that involves going through two columns of latitude and longitude values. If the lat/long in one pair of columns are blank, then I need to figure out which pair of lat/long values in another two columns are (geographically) closest to those in the destination. The dataframe looks like this:
origin_lat | origin_lon | destination_lat | destination_lon
----------------------------------------------------------------
20.291326 -155.838488 25.145242 -98.491404
25.611236 -80.551706 25.646763 -81.466360
26.897654 -75.867564 nan nan
I am trying to build two dictionaries, one with the origin lat and long, and the other with the destination lat and long, in this format:
tmplist = [{'origin_lat': 39.7612992, 'origin_lon': -86.1519681},
{'origin_lat': 39.762241, 'origin_lon': -86.158436 },
{'origin_lat': 39.7622292, 'origin_lon': -86.1578917}]
What I want to do is for every row where the destination lat/lon are blank, compare the origin lat/lon in the same row to a dictionary of all the non-nan destination lat/lon values, then print the geographically closest lat/lon from the dictionary of destination lat/lon to the row in place of the nan values. I've been playing around with creating lists of dictionary objects but can't seem to build a dictionary in the correct format. Any help would be appreciated!
If df is your pandas.DataFrame, you can generate the requested dictionaries by iterating through the rows of df:
origin_dicts = [{'origin_lat': row['origin_lat'], 'origin_long': row['origin_lon']} for _, row in df.iterrows()]
and analogously for destination_dicts.
Remark: if the only reason for creating the dictionaries is the calculation of values replacing the nan-entries, it might be easier to do this directly on the data frame, e.g.
df['destination_lon'] = df.apply(find_closest_lon, axis=1)
df['destination_lat'] = df.apply(find_closest_lat, axis=1)
where find_closest_lon, find_closes_lat are functions receiving a data frame row as an argument and having access to the values of the origin-columns of the data frame.
The format that you want is the built-in 'records' format:
df[['origin_lat','origin_lon']].to_dict(orient = 'records')
produces
[{'origin_lat': 20.291326, 'origin_lon': -155.83848799999998},
{'origin_lat': 25.611235999999998, 'origin_lon': -80.55170600000001},
{'origin_lat': 26.897654, 'origin_lon': -75.867564}]
and of course you can equally have
df[['destination_lat','destination_lon']].to_dict(orient = 'records')
But I agree with #ctenar that you do not need to generate dictionaries for your ultimate task, Pandas provide enough functionality for that
In Python I'm dealing with a couple of large csv's containing geographical data in different kinds of formats for latitude and longitude. I settled on converting them to decimal degrees. My issue is that some files are already formatted this way, but with a direction (N,S,E,W) attached at the end of each individual coordinate. Also, the south and west coordinates are not yet negative, and they should be when in decimal degrees.
I was initially using regex to filter these directions out, but can't figure out a way to attach a negative to South and West coordinates before dropping them. I am using pandas to read the csv in.
Example coordinates:
Latitude, Longitude
30.112342N, 10.678982W
20.443459S, 30.678997E
import *pandas* as pd
df = pd.read_csv("mydataset.csv")
if df['Latitude'].str.endswith('S'):
df.Latitude = -float(df['Latitude'].str.strip('S'))
else:
df.Latitude = float(df['Latitude'].str.strip('N'))
Depending on how I tweak it, I get different errors, the most common being:
Attribute error: 'Latitude' object has no attribute 'strip'.
I've tried changing the dtype to string, among other methods, with no luck. I can filter out the directions with regular expressions, but can't discern what the direction was to change to negative if necessary. Any help is appreciated.
Look into .apply(). The value df['Latitude'] is the full column, so you can't work on all of them at once for this sort of operation.
Instead, do something like this:
def fix_latitude(x):
"""Work on individual latitude value X."""
if x.endswith('S'):
x = -float(x.strip('S'))
else:
x = x.strip('N')
return x
df['fixedLatitude'] = df.Latitude.apply(fix_latitude)
I have a dataset of location and some random values assigned to each location like following
dataset picture.
I have fetched the geographic location of each area, saved it in a json file, and then showed it accordingly on the map using markers. But what I want is to show markers based on the total values. That means showing as many markers as the values assigned to each location. How can I do that? here is my code:
data = pd.read_json("Address.json")
lat = list(data["Latitude"])
lon = list(data["Longitude"])
total = list(data["Total"])
map = folium.Map(location =[23.6850,90.3563])
f_group = folium.FeatureGroup(name="Map")
for lt,ln,total in zip(lat,lon,total):
f_group.add_child(folium.CircleMarker(location = [lt,ln], popup="Affected: "+str(total),radius=5, fill_color='red',color='grey',fill_opacity=1))
map.add_child(f_group)
map.save("Map1.html")
If you want to use just markers, try:
for lt,ln,total in zip(lat,lon,total):
for i in range(total):
f_group.add_child(folium.CircleMarker(location = [lt,ln], popup="Affected: "+str(total),radius=5, fill_color='red',color='grey',fill_opacity=1))
But watch out for how many add_child's you do. That may slow down the map quite a bit depending on how you plan to interact with it.
I was following this tutorial nearest neighbor analysis:
https://automating-gis-processes.github.io/2017/lessons/L3/nearest-neighbour.html
I get this error:
('id', 'occurred at index 0')
after i run this:
def nearest (row, geom_union, df1, df2, geom1_col='geometry', geom2_col='geometry', src_column=None):
# Find the nearest point and return the corresponding value from specified column.
# Find the geometry that is closest
nearest= df2[geom2_col] == nearest_points(row[geom1_col], geom_union)[1]
#Get the corresponding value from df2 (matching is based on the geometry)
value = df2[nearest][src_column].get_values()[0]
return value
df1['nearest_id'] = df1.apply(nearest, geom_union=unary_union, df1=df1, df2=df2, geom1_col='centroid', src_column='id', axis=1)
I am using my own data for this. It is similar to the one given in example. But i have the addresses, geometry, latitude and longitude in a shp file. So i am not using a .kml file. I can`t figure out this error.
Did you follow the code literally,
df1['nearest_id'] = df1.apply(nearest, geom_union=unary_union,
df1=df1, df2=df2, geom1_col='centroid',
src_column='id', axis=1)
Then the problem is likely src_column - the function returns the value of the column src_column argument, which is given value id by the sample code. If you get problem with column id, most likely you don't have a column with such name and should provide the name of existing column in your dataset.