Am Trying to Generate Random Coordinates for a Country
I used this library Faker
def geo_point():
"""make random cordinates"""
faker = factory.Faker('local_latlng', country_code = 'IN')
coords = faker.generate()
return (coords[1], coords[0])
But the problem in this is, it has a very limited set of coordinates around 30-40 we require at least 10,000 for testing.
I tried a simple approach
def random_geo_cordinate():
"""make random geocordinates"""
x, y = uniform(-180,180), uniform(-90, 90)
return (y, x)
But then only 10-20 coordinates for Specific Country Comes.
There were a lot of references I found that through shape_files we can generate but in all of them only geom parameters are only available.
I found a method through which I can check that these coordinates lie in that country or not via the Geom column.
But am still missing something in generating random coordinates for a country.
Is there any simple and direct approach.
Am using
POST GIS Database
GeoDjango Server
Note:
I used GDAL for getting shapefiles for a country
You could use Overpass API, which queries the OSM database, so you get real coordinates.
For example fetching all villages in India:
import requests
import json
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];area[name="India"];(node[place="village"](area););out;
"""
response = requests.get(
overpass_url,
params={'data': overpass_query}
)
coords = []
if response.status_code == 200:
data = response.json()
places = data.get('elements', [])
for place in places:
coords.append((place['lat'], place['lon']))
print ("Got %s village coordinates!" % len(coords))
print (coords[0])
else:
print("Error")
Output:
Got 102420 village coordinates!
(9.9436615, 77.8978759)
Note: Overpass API is rate limited, so you should save the all coordinates locally and extract your random set from there!
Additionally, you can play around with places parameter fetching just cities or towns, or fetch restaurant locations for a specific district, ...
https://3geonames.org/randomland.IN is a free API that returns random locations in any country of the world.
Related
I am using this Mapillary endpoint: https://tiles.mapillary.com/maps/vtp/mly1_public/2/{zoom_level}/{x}/{y}?access_token={} and getting such responses back (see photo). Also, here is the Mapillary documentation.
It is not quite clear to me what the nested coordinate lists in the response represent. By the looks of it, I initially thought it may have to do with pixel coordinates. But judging by the context (the API documentation) and the endpoint I am using, I would say that is not the case. Also, I am not sure if the json response you see in the picture is valid geojson. Some online formatters did not accept it as valid.
I would like to find the bounding box of the "sequence". For context, that would be the minimal-area rectangle defined by two lat, lon positions that fully encompasses the geometry of the so-called "sequence"; and a "sequence" is basically a series of photos taken during a vehicle/on-foot trip, together with the metadata associated with the photos (metadata is available using another endpoint, but that is just for context).
My question is: is it possbile to turn the coordinates you see in the pictures into (lat,lon)? Having those, it would be easy for me to find the bounding box of the sequence. And if so, how? Also, please notice that some of the nested lists are of type LineString while others are MultiLineString (which I read about the difference here: help.arcgis.com, hope this helps)
Minimal reproducible code snippet:
import json
import requests
import mercantile
import mapbox_vector_tile as mvt
ACCESS_TOKEN = 'XXX' # can be provided from here: https://www.mapillary.com/dashboard/developers
z6_tiles = list(mercantile.tiles( #us_west_coast_bbox
west=-125.066423,
south=42.042594,
east=-119.837770,
north=49.148042,
zooms=6
))
# pprint(z6_tiles)
vector_tiles_url = 'https://tiles.mapillary.com/maps/vtp/mly1_public/2/{}/{}/{}?access_token={}'
for tile in z6_tiles:
res = requests.get(vector_tiles_url.format(tile.z,tile.x,tile.y,ACCESS_TOKEN))
res_json = mvt.decode(res.content)
with open('idea.json','w+') as f:
json.dump(res_json, f, indent=4)
I think this get_normalized_coordinates is the solution I was looking for. Please take this with a grain of salt, as I did not fully test it yet. Will try to and then I will update my answer. Also, please be cautious, because for tiles closer to either the South or the North Pole, the Z14_TILE_DMD_WIDTH constant will not be the one you see, but something more like: 0.0018958715374282065.
Z14_TILE_DMD_WIDTH = 0.02197265625
Z14_TILE_DMD_HEIGHT = 0.018241950298914844
def get_normalized_coordinates(bbox: mercantile.LngLatBbox,
target_lat: int,
target_lon: int,
extent: int=4096): # 4096 is Mapillary's default
"""
Returns lon,lat tuple representing real position on world map of a map feature.
"""
min_lon, min_lat, _, _ = bbox
return min_lon + target_lon / extent * Z14_TILE_DMD_WIDTH,
min_lat + target_lat / extent * Z14_TILE_DMD_HEIGHT
And if you are wondering how I came with the constants that you see, I simply iterated over the list of tiles that I am interested in and checked to make sure they all have the same width/height size (this might have not been the case, keeping in mind what I mentioned above about tiles closer to one of the poles - I think this is called "distortion", not sure). Also, for context: these tiles I iterated over are within this bbox: (-125.024414, 31.128199, -108.896484, 49.152970) (min_lon, min_lat, max_lon, max_lat; US west coast) which I believe is also why all the tiles have the same width/height sizes.
set_test = set()
for tile in relevant_tiles_set:
curr_bbox = mercantile.bounds(list_relevant_tiles_set[i])
dm_width_diff: float = curr_bbox.east - curr_bbox.west
dm_height_diff: float = curr_bbox.north - curr_bbox.south
set_test.add((dm_width_diff, dm_height_diff))
set_test
output:
{(0.02197265625, 0.018241950298914844}
UPDATE: forgot to mention that you actually do not need to compute those WIDTH, HEIGHT constants. You just replace those with (max_lon - min_lon) and (max_lat - min_lat) respectively. What I did with those constants was something for testing purposes only
As mentioned in the title, I have a bigquery table with 18 million rows, nearly half of them are useless and I am supposed to assign a topic/niche to each row based on an important column (that has detail about a product a website), I have tested NLP API on a sample data with size of 10,000 and it did wonders but my standard approach where I am iterating over the newarr (which is the important details column I am obtaining through querying my bigquery table), here I am sending only one cell at a time, awaiting response from the api and appending it to the results array.
Ideally I want to do this operation on 18 Million rows in the minimum time, my per minute quota is increased to 3000 api requests so thats the max I can make, But I cant figure out how can i send a batch of 3000 rows one after another each minute.
for x in newarr:
i += 1
results.append(sample_classify_text(x))
Sample Classify text is a function straight from Documentation
#this function will return category for the text
from google.cloud import language_v1
def sample_classify_text(text_content):
"""
Classifying Content in a String
Args:
text_content The text content to analyze. Must include at least 20 words.
"""
client = language_v1.LanguageServiceClient()
# text_content = 'That actor on TV makes movies in Hollywood and also stars in a variety of popular new TV shows.'
# Available types: PLAIN_TEXT, HTML
type_ = language_v1.Document.Type.PLAIN_TEXT
# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {"content": text_content, "type_": type_, "language": language}
response = client.classify_text(request = {'document': document})
#return response.categories
# Loop through classified categories returned from the API
for category in response.categories:
# Get the name of the category representing the document.
# See the predefined taxonomy of categories:
# https://cloud.google.com/natural-language/docs/categories
x = format(category.name)
return x
# Get the confidence. Number representing how certain the classifier
# is that this category represents the provided text.
I have a polygon shapefile of the U.S. made up of individual states as their attribute values. In addition, I have arrays storing latitude and longitude values of point events that I am also interested in. Essentially, I would like to 'spatial join' the points and polygons (or perform a check to see which polygon [i.e., state] each point is in), then sum the number of points in each state to find out which state has the most number of 'events'.
I believe the pseudocode would be something like:
Read in US.shp
Read in lat/lon points of events
Loop through each state in the shapefile and find number of points in each state
print 'Here is a list of the number of points in each state: '
Any libraries or syntax would be greatly appreciated.
Based on what I can tell, the OGR library is what I need, but I am having trouble with the syntax:
dsPolygons = ogr.Open('US.shp')
polygonsLayer = dsPolygons.GetLayer()
#Iterating all the polygons
polygonFeature = polygonsLayer.GetNextFeature()
k=0
while polygonFeature:
k = k + 1
print "processing " + polygonFeature.GetField("STATE") + "-" + str(k) + " of " + str(polygonsLayer.GetFeatureCount())
geometry = polygonFeature.GetGeometryRef()
#Read in some points?
geomcol = ogr.Geometry(ogr.wkbGeometryCollection)
point = ogr.Geometry(ogr.wkbPoint)
point.AddPoint(-122.33,47.09)
point.AddPoint(-110.11,33.33)
#geomcol.AddGeometry(point)
print point.ExportToWkt()
print point
numCounts=0.0
while pointFeature:
if pointFeature.GetGeometryRef().Within(geometry):
numCounts = numCounts + 1
pointFeature = pointsLayer.GetNextFeature()
polygonFeature = polygonsLayer.GetNextFeature()
#Loop through to see how many events in each state
I like the question. I doubt I can give you the best answer, and definitely can't help with OGR, but FWIW I'll tell you what I'm doing right now.
I use GeoPandas, a geospatial extension of pandas. I recommend it — it's high-level and does a lot, giving you everything in Shapely and fiona for free. It is in active development by twitter/#kajord and others.
Here's a version of my working code. It assumes you have everything in shapefiles, but it's easy to generate a geopandas.GeoDataFrame from a list.
import geopandas as gpd
# Read the data.
polygons = gpd.GeoDataFrame.from_file('polygons.shp')
points = gpd.GeoDataFrame.from_file('points.shp')
# Make a copy because I'm going to drop points as I
# assign them to polys, to speed up subsequent search.
pts = points.copy()
# We're going to keep a list of how many points we find.
pts_in_polys = []
# Loop over polygons with index i.
for i, poly in polygons.iterrows():
# Keep a list of points in this poly
pts_in_this_poly = []
# Now loop over all points with index j.
for j, pt in pts.iterrows():
if poly.geometry.contains(pt.geometry):
# Then it's a hit! Add it to the list,
# and drop it so we have less hunting.
pts_in_this_poly.append(pt.geometry)
pts = pts.drop([j])
# We could do all sorts, like grab a property of the
# points, but let's just append the number of them.
pts_in_polys.append(len(pts_in_this_poly))
# Add the number of points for each poly to the dataframe.
polygons['number of points'] = gpd.GeoSeries(pts_in_polys)
The developer tells me that spatial joins are 'new in the dev version', so if you feel like poking around in there, I'd love to hear how that goes! The main problem with my code is that it's slow.
import geopandas as gpd
# Read the data.
polygons = gpd.GeoDataFrame.from_file('polygons.shp')
points = gpd.GeoDataFrame.from_file('points.shp')
# Spatial Joins
pointsInPolygon = gpd.sjoin(points, polygons, how="inner", op='intersects')
# Add a field with 1 as a constant value
pointsInPolygon['const']=1
# Group according to the column by which you want to aggregate data
pointsInPolygon.groupby(['statename']).sum()
**The column ['const'] will give you the count number of points in your multipolygons.**
#If you want to see others columns as well, just type something like this :
pointsInPolygon = pointsInPolygon.groupby('statename').agg({'columnA':'first', 'columnB':'first', 'const':'sum'}).reset_index()
[1]: https://geopandas.org/docs/user_guide/mergingdata.html#spatial-joins
[2]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
I need to calculate distances between UK postcodes.
I don't want to use a web api.
Does a python module/ library exist for this?
Or do I have to put together something of my own using data from the OrdnanceSurvey?
Thanks,
1/You can use any rest geolocation api eg google maps, that would provide you accurate distance based on the pincodes.
2/ You can use any updated database which has post codes and latitude/longitude information, and use that information to calculate distance between the two points.
Helpful links:
i) http://blog.acmultimedia.co.uk/2008/03/uk-post-code-distance-calculator-using-phpmysql/
ii) Django - how can I find the distance between two locations?
This question is a bit old now but maybe worth pointing out the difference between polar coordinates (Long/Lat) used by Google and SatNavs and Eastings & Northings provided by the Ordnance Survey which shows how far a given postcode centroid is from a given datum point somewhere off to the South West of the Scilly Isles.
If you just want to work out distances it's a one-liner using Pythagoras. EG Stonehenge to House of Parliament
>>> # Coordinates of Stonehenge and Westminster in E/Northings
... s = (412183, 142346)
... p = (530268, 179545)
... d = ((s[0] - p[0]) ** 2 + (s[1] - p[1]) ** 2) ** 0.5
... print (d)
123805.62517914927
I was looking for a library to convert from polar to Cartesian coordinates which is relatively tricky even on a perfect sphere which the world most certainly isn't. I'm currently trying, and mostly failing, to get my head around this:
https://scipython.com/book/chapter-2-the-core-python-language-i/additional-problems/converting-between-an-os-grid-reference-and-longitudelatitude/
I don't know of a directly usable module, but you could use GRASS or QGIS, both of which support python scripting so the functionality can be used as python modules. You would still need to figure out how to do it manually in either of these tools though, but that's not really very difficult.
See below for simple code using the postcodes module for python along with the google maps distance matrix API.
from postcodes import PostCoder
import requests
import json
import pprint
pc = PostCoder()
origin_postcode = str.upper(raw_input("Enter origin PostCode:"))
dest_postcode = str.upper(raw_input("Enter destination PostCode:"))
origin = str(pc.get('%s' % origin_postcode)['geo']['lat'])+','+str(pc.get('%s' % origin_postcode)['geo']['lng'])
dest = str(pc.get('%s' % dest_postcode)['geo']['lat'])+','+str(pc.get('%s' % dest_postcode)['geo']['lng'])
data = requests.get('https://maps.googleapis.com/maps/api/distancematrix/json?units=imperial&origins=%s&destinations=%s&key=AIzaSyDfTip2PrdaRkF1muCLP8REAk3FsLjmBrU' % (origin, dest))
new = json.loads(data.text)
miles = str(new['rows'][0]['elements'][0]['distance']['text']).replace('mi','miles')
duration = str((new['rows'][0]['elements'][0]['duration']['text']))
print '\n\nThe distance from %s to %s is %s, taking approximately %s (driving).' % (origin_postcode, dest_postcode, miles, duration)
I'm attempting to do a bounding box fetch in the GAE using geomodel in python. It is my understanding that you define a box and then the geomodel fetch will return all results with co-ordinates that lie within this box. I am currently inputting a GPS latitude and longitude (55.497527,-3.114624), and then establishing a bounding box with N,S,E,W within a given range of this co-ordinate like so:
latRange = 1.0
longRange = 0.10
provlat = float(self.request.get('latitude'))
provlon = float(self.request.get('longitude'))
logging.info("Doing proximity lookup")
theBox = geotypes.Box(provlat+latRange, provlon-longRange, provlat-latRange, provlon+longRange)
logging.info("Box created with N:%f E:%f S:%f, W:%f" % (theBox.north, theBox.east, theBox.south, theBox.west))
query = GeoVenue.all().filter('Country =', provcountry)
results = GeoVenue.bounding_box_fetch(query, theBox, max_results=10)
if (len(results) == 0):
jsonencode = json.dumps([{"error":"no results"}])
self.response.out.write(jsonencode)
return;
...
This always returns an empty result set, even though I know for a fact there are results within the range specified in the box logging output :
INFO 2011-07-19 20:45:41,129 main.py:117] Box created with N:56.497527 E:-3.214624 S:54.497527, W:-3.014624
The entries in my datastore include:
{"venueLat": 55.9570323, "venueCity": "Edinburgh", "venueZip": "EH1 3AA", "venueLong": -3.1850223, "venueName": "Edinburgh Playhouse", "venueState": "", "venueCountry": "UK"}
and
{"venueLat": 55.9466506, "venueCity": "Edinburgh", "venueZip": "EH8 9FT", "venueLong": -3.1863224, "venueName": "Festival Theatre Edinburgh", "venueState": "", "venueCountry": "UK"}
Both of which most definitely have positions that are within the bounding box defined above. I have turned debug on and the bounding box fetch does seem to search geocells since I get output along the lines of :
INFO 2011-07-19 20:47:09,487 geomodel.py:114] bbox query looked in 4 geocells
However, no results ever seem to get returned. I have ensured I ran update_location() for all models to make sure the underlying geocell data was correct. Does anyone have any ideas?
Thanks
Code to add to the database -
from google.appengine.ext import db
from models.place import Place
place = Place(location=db.GeoPt(LAT, LON)) # location is a required field
# LAT, LON are floats
place.state = "New York"
place.zip_code = 10003
#... set other fields
place.update_location() # This is required even when
# you are creating the object and
# not just when you are changing it
place.put()
Code to search nearby objects
base_query = Place.all() # apply appropriate filters if needed
center = geotypes.Point(float(40.658895),float(-74.042760))
max_results = 50
max_distance = 8000
results = Place.proximity_fetch(base_query, center, max_results=max_results,
max_distance=max_distance)
It should work with bounding box queries as well, just remember to call update_location before adding the object to the database.