Changing North/South latitude values in Python - python

In Python I'm dealing with a couple of large csv's containing geographical data in different kinds of formats for latitude and longitude. I settled on converting them to decimal degrees. My issue is that some files are already formatted this way, but with a direction (N,S,E,W) attached at the end of each individual coordinate. Also, the south and west coordinates are not yet negative, and they should be when in decimal degrees.
I was initially using regex to filter these directions out, but can't figure out a way to attach a negative to South and West coordinates before dropping them. I am using pandas to read the csv in.
Example coordinates:
Latitude, Longitude
30.112342N, 10.678982W
20.443459S, 30.678997E
import *pandas* as pd
df = pd.read_csv("mydataset.csv")
if df['Latitude'].str.endswith('S'):
df.Latitude = -float(df['Latitude'].str.strip('S'))
else:
df.Latitude = float(df['Latitude'].str.strip('N'))
Depending on how I tweak it, I get different errors, the most common being:
Attribute error: 'Latitude' object has no attribute 'strip'.
I've tried changing the dtype to string, among other methods, with no luck. I can filter out the directions with regular expressions, but can't discern what the direction was to change to negative if necessary. Any help is appreciated.

Look into .apply(). The value df['Latitude'] is the full column, so you can't work on all of them at once for this sort of operation.
Instead, do something like this:
def fix_latitude(x):
"""Work on individual latitude value X."""
if x.endswith('S'):
x = -float(x.strip('S'))
else:
x = x.strip('N')
return x
df['fixedLatitude'] = df.Latitude.apply(fix_latitude)

Related

Spatially joining a series of netCDF data to a polygon object in python

I have a set of Polygon instances (Loaded via geopandas), which comprise of historical severe-weather based watches and warnings, and netCDF data at 15-minute intervals containing parameters such as tornado / hail forecast parameters and updraft helicity (These data are point data). Each polygon has a set start time and an end time. What I'm trying to do is for each of the 15-minute interval within the polygon's time range (start time -> end time), I want to spatially join in from the netCDF files, the highest value of each of these forecast parameter within the polygon.
I already have the code that pulls the range of time steps (time_list) required to analyze from the netCDF files for each polygon (TOR):
# Pull the warning issue time, locate the closest index in our subhour files
issue_time = datetime.strptime(str(TOR.ISSUED), "%Y%m%d%H%M")
expire_time = datetime.strptime(str(TOR.EXPIRED), "%Y%m%d%H%M")
closest_first = min(time_list, key=lambda d: abs(d - issue_time))
closest_last = min(time_list, key=lambda d: abs(d - expire_time))
# Pull the timesteps in between the issue time and the expire time
before = time_list.index(closest_first) if closest_first < issue_time else time_list.index(closest_first)-1
after = time_list.index(closest_last) if closest_last > issue_time else time_list.index(closest_last)+1
The TOR object is generated by slicing the geoDataFrame of the full polygon data. Right now I am using itertuples on the below slice to get the TOR object:
TORNADOES_IN_WINDOW = df_slice_TO_WRN[(df_slice_TO_WRN.ISSUED > int(first.strftime("%Y%m%d%H%M")))
& (df_slice_TO_WRN.ISSUED < int(last.strftime("%Y%m%d%H%M")))]
Inside the loop, I then iterate over the list of netCDF files in the range of timesteps found (before -> after), and load these into geopandas as well such that I can perform a spatial join, this is done with this block of code:
xr_i = xr.open_dataset(subhr_files[i])
ds_i = xr_i.to_dataframe()
ds_i = ds_i.reset_index()
geom = [Point(x,y) for x, y in zip(ds_i['XLONG'], ds_i['XLAT'])]
gdf = gpd.GeoDataFrame(ds_i, geometry=geom)
The set of problems I'm encountering right now comprise of the following. When trying to run itertuples on the geoDataFrame object (TORNADOES_IN_WINDOW), each iterable returns as a Pandas object, not a GeoPandas object, and therefore I cannot run sjoin to join the attributes. Also, this is not a very efficient methodology, and I'm wondering if perhaps there is a better way to do it.

How do I save centroids in a shapefile, together with a geometry?

I calculate a centroid for some GeoDataFrame
M["centroid"] = M.centroid
now, I'd like to save that column in a shp file, and do
M[["centroid","geometry"]].to_file("mfile.shp")
But the driver complains I cannot save a Point together with my geometry. I guess that's ok, but I'm wondering what's the best way to store controid information alongside other geo info using geopandas
Because GeoPandas will not allow you to save two geometry columns (when create shapefiles), I would recommend saving xy coordinates into columns for each centroid. From that you can always easily get shapely.Point.
for index, row in M.iterrows():
centroid_coords = row.geometry.centroid.coords.xy
M.loc[index, 'cen_x'] = centroid_coords[0][0]
M.loc[index, 'cen_y'] = centroid_coords[1][0]
M[["cen_x", "cen_y", "geometry"]].to_file("mfile.shp")
EDIT based on comment below (thanks, didn't realise it):
temp = M.centroid
M['cen_x'] = temp.x
M['cen_y'] = temp.y
M[["cen_x", "cen_y", "geometry"]].to_file("mfile.shp")

Detecting and Removing GPS coordinates of "0.0000, 0.00000"(non fixed) data using python

I am doing a vehicle monitoring process with raw data files.
As of now, some cleaning up was done before the issue surface. As I have inconsistent data, It causes some problem to me. Data includes "Model(v,w,x,y and z), Timestamp, Latitude, Longitude and Mode(0,2,4,8)".
The objective of this process is to calculate distance and duration with cleaning of data
I have successfully done calculating of duration using the timestamp with respect to both Model and Mode. I have also successfully done calculating the distance between rows using coordinates and haversine formula. HERE COMES THE PROBLEM:
So I can only successfully calculate the distance among rows if both Lat & Long is present and in the right format (e.g. 1.035436, 103.234623). Data received can be of empty field which causes an error. This error was solved by identifying the empty field and removing the line (As without lat long, the data is useless)
mydataset = mydataset[mydataset['Mode'].notnull()] #for removing empty mode
mydataset = mydataset[mydataset['Latitude'].notnull()] #for removing empty latitude
But there are lat long received as 0.00000000,0.00000000 and i would like to remove rows with lat long in this numbers. Some methods has been tried but it doesnt work. I've tried identifying the 0 and remove it using:
mydataset = mydataset[(mydataset[['Latitude','Longitude']] != 0).all(axis=1)]
and
mydataset = mydataset[(mydataset.Latitude != 0).any()]
Due to confidential data and code, I cannot provide much but would like to know why the above 2 method do not work and if possible, can anyone advice me on how to tackle with such problem?
Thank you! Much appreciate and Thank you for your time!
Some fake data are as shown below:
,Model,Timestamp,Longitude,Latitude,Mode
0,x,1970-01-19 01:29:17.058,103.235623,1.045436,0
1,x,1970-01-19 01:29:22.058,0.00000000,0.00000000,0 #Would like to remove this row
2,x,1970-01-19 01:29:27.058,103.234813,1.038436,2
3,x,1970-01-19 01:29:32.058,103.235623,1.039436,2
4,x,1970-01-19 01:29:38.058,103.234123,1.036436,0
5,x,1970-01-19 01:29:38.058,,,0 #removed via the code above
I am not sure, if i understood correctly, do you need something like this ?
Sample df
Lat,Long
55.6,22.06
0.00000000,0.00000000
56.056,22.10
df1 = df[df[['Lat','Long']] != 0].dropna(how='any').reset_index(drop= True)
print(df1)
Lat Long
0 55.600 22.06
1 56.056 22.10

How do I extract longitude, latitude from geojson column in my data

I have the data in a csv file which contains zipcodes in 1 column and probably geojson data in the other column. I loaded the data in pandas dataframe. How do I extract just the coordinates from the geojson column.
zips.head(2)
Out[14]:
postal_code geojson
0 85309 {"type":"MultiPolygon","coordinates":[[[[-112....
1 85310 {"type":"MultiPolygon","coordinates":[[[[-112....
zips.geojson[1]
zips.geojson.values[0]
'{"type":"MultiPolygon","coordinates":[[[[-112.363501,33.551312],[-112.363457,33.551312],[-112.36253,33.551309],[-112.361378,33.551311],[-112.360977,33.55131],[-112.358913,33.551305],[-112.358916,33.551104],[-112.358898,33.550758],[-112.358825,33.549401],[-112.358763,33.548056],[-112.358652,33.546016],[-112.358635,33.54554],[-112.358629,33.545429],[-112.358613,33.545143],[-112.358607,33.545039],[-112.358599,33.544897],[-112.358596,33.544838],[-112.358592,33.54478],[-112.358545,33.543923],[-112.358475,33.542427],[-112.358444,33.541913],[-112.35842,33.541399],[-112.358363,33.540373],[-112.358345,33.540104],[-112.35833,33.539878],[-112.35828,33.538863],[-112.358263,33.538352],[-112.358204,33.537335],[-112.358196,33.536892],[-112.358193,33.536444],[-112.358192,33.53631],[-112.358182,33.536031],[-112.358175,33.535797],[-112.358186,33.534197],[-112.358187,33.53324],[-112.358185,33.53278],[-112.358182,33.532218],[-112.358168,33.530732],[-112.358163,33.530174],[-112.35815,33.529797],[-112.359343,33.529819],[-112.359387,33.529812],[-112.359354,33.529716],[-112.360874,33.529732],[-112.370575,33.529805],[-112.375373,33.529907],[-112.37537,33.528961],[-112.375382,33.527693],[-112.375384,33.527033],[-112.375393,33.526355],[-112.374883,33.526353],[-112.371535,33.52634],[-112.366678,33.526323],[-112.366665,33.523201],[-112.366664,33.52285],[-112.366661,33.522734],[-112.366658,33.522596],[-112.366657,33.522553],[-112.366655,33.522502],[-112.366658,33.522388],[-112.368754,33.522441],[-112.370106,33.522618],[-112.370917,33.522624],[-112.371875,33.522633],[-112.371865,33.522389],[-112.371875,33.522162],[-112.37175,33.51916],[-112.375186,33.519096],[-112.375306,33.519094],[-112.375305,33.51971],[-112.375309,33.519728],[-112.375351,33.521607],[-112.375367,33.522304],[-112.375426,33.522419],[-112.375587,33.522423],[-112.375767,33.522426],[-112.382694,33.522547],[-112.382697,33.522654],[-112.382698,33.522709],[-112.382714,33.523282],[-112.382958,33.523283],[-112.383939,33.52329],[-112.383935,33.523153],[-112.386882,33.523097],[-112.38781,33.523781],[-112.38801,33.523609],[-112.388673,33.523001],[-112.388794,33.522895],[-112.388852,33.522844],[-112.389115,33.522837],[-112.389205,33.522761],[-112.389319,33.522661],[-112.392416,33.51994],[-112.392509,33.519195],[-112.392516,33.51914],[-112.401093,33.51914],[-112.401098,33.519779],[-112.401098,33.519838],[-112.401137,33.519885],[-112.401146,33.519903],[-112.40124,33.520001],[-112.401311,33.520066],[-112.401432,33.520158],[-112.401754,33.520412],[-112.402133,33.520685],[-112.402411,33.520892],[-112.402552,33.52098],[-112.402692,33.521087],[-112.402882,33.521256],[-112.402948,33.52133],[-112.403016,33.521428],[-112.403062,33.521517],[-112.4031,33.521621],[-112.40312,33.521715],[-112.403129,33.521822],[-112.403119,33.521937],[-112.403102,33.522011],[-112.403064,33.522109],[-112.403009,33.522208],[-112.402908,33.522336],[-112.402781,33.522475],[-112.402685,33.52257],[-112.402641,33.522613],[-112.402553,33.522692],[-112.401659,33.523488],[-112.401228,33.52388],[-112.401157,33.523961],[-112.401123,33.524028],[-112.401107,33.524102],[-112.401108,33.524213],[-112.401116,33.525097],[-112.401119,33.5263],[-112.401119,33.52634],[-112.401119,33.526441],[-112.399658,33.52646],[-112.399258,33.526743],[-112.395079,33.52973],[-112.394771,33.529977],[-112.39013,33.534207],[-112.388661,33.535533],[-112.385957,33.538011],[-112.384107,33.539698],[-112.384007,33.539732],[-112.383947,33.539786],[-112.38381,33.539862],[-112.384585,33.551063],[-112.384605,33.551372],[-112.384609,33.551434],[-112.384614,33.551508],[-112.384416,33.551505],[-112.38385,33.551499],[-112.38131,33.551461],[-112.380126,33.551454],[-112.378928,33.551432],[-112.376262,33.551405],[-112.373858,33.551381],[-112.372583,33.551378],[-112.370038,33.551354],[-112.368768,33.55135],[-112.367585,33.551339],[-112.36749,33.551338],[-112.363501,33.551312]]]]}'
I tried to use it the way I would use values inside a dictionary but I am unable to it.
This might help. It is untested, so it might not work, or might need to be adjusted slightly for your use case.
The important features of this program are:
Use json.loads() to convert a JSON string to a Python data structure
Decompose the data structure according to GeoJSON standard.
Reference:
http://geojson.org/geojson-spec.html#multipolygon
#UNTESTED
import json
# For every zipcode, print the X position of the first
# coordinate of the exterior of the multipolygon associated
# with that zip code
for zipcode, geo in zips:
geo = json.loads(geo)
assert geo["type"] == "MultiPolygon"
# Coordinates of a MultiPolygon are an array of Polygon coordinate arrays
array_of_polygons = geo["coordinates"]
polygon0 = array_of_polygons[0]
# Coordinates of a Polygon are an array of LinearRing coordinate arrays
ring0 = polygon0[0]
# A LinearRing is a closed LineString with 4 or more positions
# A LineString is an array of positions
vertex0 = ring0[0]
# A position is represented by an array of numbers.
x0 = vertex0[0]
print zipcode, x0
You can import json and apply json.loads to convert the string data in your geojson column to dict. Then, you can extract data from dict directly, or use one of many Python modules that deal with GIS data. I find shapely easy to use and helpful in many cases.

Getting maximum and minimum of OpenStreetMap map

I downloaded for one single city the corresponding openstreetmap data. I want to get the maximum and minimum values for latitude and longitude. How can I do that?
My approach is the following:
import osmread as osm
#...
def _parse_map(self):
geo = [(entity.lat, entity.lon) for entity in osm.parse_file('map.osm') if isinstance(entity, osm.Node)]
return max(geo[0]), min(geo[0]), max(geo[1]), min(geo[1])
But when I print these values I don't think they are right. When I downloaded the area from the OpenStreetMap site I had a white area indicating which region I am exporting. And for this area I also got the minimum and maximum values for latitude and longitude. And these values arent fitting with the ones I get from my simple script.
What am I doing wrong?
return max(geo[0]), min(geo[0]), max(geo[1]), min(geo[1])
You are taking the extrema of the first and second element of geo. But geo is a list of 2-tuples. So the first element geo[0] is a 2-tuple consisting of entity.lat and entity.lon for the first node. Therefore you are just choosing min/max of latitude and longitude for one node.
If you want to feed the first (second) element of each tuple in the list to the aggregate function, then you have to specifically choose these. For example with a generator:
return max(x[0] for x in geo), min(x[0] for x in geo), max(x[1] for x in geo), min(x[1] for x in geo)

Categories

Resources