Append geoJSON feature with Python? - python

I have the following structure in a geojsonfile:
{"crs":
{"type": "name",
"properties":
{"name": "urn:ogc:def:crs:EPSG::4326"}
},
"type": "FeatureCollection",
"features": [
{"geometry":
{"type": "Polygon",
"coordinates": [[[10.914622377957983, 45.682007076150505],
[10.927456267537572, 45.68179119797432],
[10.927147329501077, 45.672795442796335],
[10.914315493899755, 45.67301125363092],
[10.914622377957983, 45.682007076150505]]]},
"type": "Feature",
"id": 0,
"properties": {"cellId": 38}
},
{"geometry":
{"type": "Polygon",
"coordinates":
... etc. ...
I want to read this geoJSON into Google Maps and have each cell colored based on a property I calculated in Python for each cell individually. So my most question would be: How can I read the geoJSON in with Python and add another property to these Polygons (there are like 12 000 polygons, so adding them one by one is not an option), then write the new file?
I think what I'm looking for is a library for Python that can handle geoJSON, so I don't have to add these feature via srting manipulation.

There is a way with Python geojson package.
Like that, you can read the geojson has an object:
import geojson
loaded = geojson.loads("Any geojson file or geojson string")
for feature in loaded.features[0:50]: #[0:50] for the only first 50th.
print feature
There is Feature, FeatureCollection and Custom classes to help you to add your attributes.

The geoJSON is just a JSON doc (a simplification but is all you need for this purpose). Python reads that as a dict object.
Since dict are updated inplace, we don't need to store a new variable for the geo objects.
import json
# read in json
geo_objects = json.load(open("/path/to/files"))
# code ...
for d in geo_objects:
d["path"]["to"]["field"] = calculated_value
json.dump(geofiles, open("/path/to/output/file"))
No string manipulation needed, no need to load new library!

Related

How To Create A Wind Barb Map In Python From Geojson Data?

I have gridded winds data in geojson so all the geometries are points and each feature has wind speed and direction in its properties section - I would like to generate a wind barb map from this data.
I am totally new to python, but i found an excellent sample to do this at the unidata site:-
https://unidata.github.io/python-training/gallery/500hpa_hght_winds/
The only thing I think I need to do is modify it so that it can use geojson as a data source instead of a netcdf file.
Being completely new to this, I googled for a while and didn't find much to help with the geojson bit directly, but I did find that I could use geopandas to convert geojson to a shapefile then use gdal to convert the shapefile to a netcdf file. I am currently trying to work out how to modify the sample to use this data structure so I may encounter issues along the way.
I feel like I may have gone down a rabbit hole on this though, can anyone recommend a better way of doing this? Or alternatively, is this a valid method?
Here is a sample of the geojson:-
{
"type": "FeatureCollection",
"totalFeatures": 2124,
"features": [
{
"type": "Feature",
"id": "1bdffe7b-af88-4f33-a792-e3a2a53c08b3",
"geometry": {
"type": "Point",
"coordinates": [
-120,
40
]
},
"properties": {
"altitude": 38615,
"flightLevel": 400,
"temperature": -60.619,
"windSpeed": 18.9,
"windDirection": 352.7
}
},
...
And here is my code to do the 2-stage convert, but if I can go to the dataset straight from the geopandas dataframe that would be great!
import geopandas as gpd
import xarray as xr
df = gpd.read_file('./winds.json')
df[['altitude', 'flightLevel', 'temperature', 'windSpeed', 'windDirection', 'geometry']].to_file('./winds.shp')
from osgeo import gdal
inputfile = './winds.shp'
outputfile = './winds.nc'
# actually i haven't got this bit working yet, temp workaround is to use cmd line app (ogr2ogr -F netCDF './winds.nc' './winds.shp')
gdal.Translate(outputfile, inputfile, format='NetCDF')
ds = xr.open_dataset('./winds.nc')

Geodjango: how to load .shp file and convert to geojson with the right CRS?

I have multiple shapefiles (.shp) with their auxiliary files that I want to display on a Leaflet map. The shapefiles use different coordinate reference systems (CRS) and I struggle to grasp the most straightforward and reliable way to show things on the map. In the geodjango tutorial, DataSource is used to load a shapefile and then manipulate it. However, in their examples, they only retrieve the geometry of individual features, not of the entire shapefile. I have used PyShp and I am able to show a map using something like:
sf = shapefile.Reader(filename)
shapes = sf.shapes()
geojson = shapes.__geo_interface__
geojson = json.dumps(geojson)
However, this fails when the CRS is not WGS84 things don't work, and I don't see how to convert it.
Reading a bit more, this post complains about CRS support and pyshp, and suggests using ogr2ogr.
So after trying to understand the options, I see using Datasource, pyshp, and ogr2ogr as possible options, but I don't know which option really makes the most sense.
All I want is convert a .shp file using Django into a geojson string that uses WGS84 so that I can include it on an HTML page that uses Leaflet.
Can anyone with more experience suggest a particular route?
There is not a straight forward way to read any shapefile using Django's DataSource and then translate it to EPSG:4326 (aka WGS84), therefore we need to create one step by step and solve the issues that arise as we get to them.
Let's begin the process:
Create a list of all the .shp file paths that you need to read in. That should look like this:
SHP_FILE_PATHS = [
'full/path/to/shapefile_0.shp',
'full/path/to/shapefile_1.shp',
...
'full/path/to/shapefile_n.shp'
]
DataSource reads the shapefile into an object. The information is stored in the object's Layers (representing a multilayered shapefile) that are aware of their srs as a SpatialReference. That is important because we will transform the geometry later to WGS84 in order to be displayable on the map.
From each Layer of each shapefile, we will utilize the get_geoms() method to extract a list of OGRGeometry srs aware objects.
Each such geometry has a json method that:
Returns a string representation of this geometry in JSON format:
>>> OGRGeometry('POINT(1 2)').json
'{ "type": "Point", "coordinates": [ 1.000000, 2.000000 ] }'
That is very useful because is half the solution to create a FeatureCollection type geojson that will be displayable to the map.
A FeatureCollection geojson has a very specific format and therefore we will create the basis and fill it procedurally:
feature_collection = {
'type': 'FeatureCollection',
'crs': {
'type': 'name',
'properties': {'name': 'EPSG:4326'}
},
'features': []
}
Finally, we need to populate the features list with the extracted geometries in the following format:
{
'type': 'Feature',
'geometry': {
'type': Geometry_String,
'coordinates': coord_list
},
'properties': {
'name': feature_name_string
}
}
Let's put all the above together:
for shp_i, shp_path in enumerate(SHP_FILE_PATHS):
ds = DataSource(shp_path)
for n in range(ds.layer_count):
layer = ds[n]
# Transform the coordinates to epsg:4326
features = map(lambda geom: geom.transform(4326, clone=True), layer.get_geoms())
for feature_i, feature in enumerate(features):
feature_collection['features'].append(
{
'type': 'Feature',
'geometry': json.loads(feature.json),
'properties': {
'name': f'shapefile_{shp_i}_feature_{feature_i}'
}
}
)
Now the feature_collection dict will contain the extracted feature collection transformed to epsg:4326 and you can create a json form it (ex. json.dump(feature_collection))
NOTE: Although this will work, it seems a bit counterproductive and you may consider reading the shapefiles into a model permanently instead of loading them on the fly.

How to convert from TSV file to JSON file?

So I know this question might be duplicated but I just want to know and understand how can you convert from TSV file to JSON? I've tried searching everywhere and I can't find a clue or understand the code.
So this is not the Python code, but it's the TSV file that I want to convert to JSON:
title content difficulty
week01 python syntax very easy
week02 python data manipulation easy
week03 python files and requests intermediate
week04 python class and advanced concepts hard
And this is the JSON file that I want as an output.
[{
"title": "week 01",
"content": "python syntax",
"difficulty": "very easy"
},
{
"title": "week 02",
"content": "python data manipulation",
"difficulty": "easy"
},
{
"title": "week 03",
"content": "python files and requests",
"difficulty": "intermediate"
},
{
"title": "week 04",
"content": "python class and advanced concepts",
"difficulty": "hard"
}
]
The built-in modules you need for this are csv and json.
To read tab-separated data with the CSV module, use the delimiter="\t" parameter:
Even more conveniently, the CSV module has a DictReader that automatically reads the first row as column keys, and returns the remaining rows as dictionaries:
with open('file.txt') as file:
reader = csv.DictReader(file, delimiter="\t")
data = list(reader)
return json.dumps(data)
The JSON module can also write directly to a file instead of a string.
if you are using pandas you can use the to_json method with the option orient="records"to obtain the list of entries you want.
my_data_frame.to_json(orient="records")

Reading pretty print json files in Apache Spark

I have a lot of json files in my S3 bucket and I want to be able to read them and query those files. The problem is they are pretty printed. One json file has just one massive dictionary but it's not in one line. As per this thread, a dictionary in the json file should be in one line which is a limitation of Apache Spark. I don't have it structured that way.
My JSON schema looks like this -
{
"dataset": [
{
"key1": [
{
"range": "range1",
"value": 0.0
},
{
"range": "range2",
"value": 0.23
}
]
}, {..}, {..}
],
"last_refreshed_time": "2016/09/08 15:05:31"
}
Here are my questions -
Can I avoid converting these files to match the schema required by Apache Spark (one dictionary per line in a file) and still be able to read it?
If not, what's the best way to do it in Python? I have a bunch of these files for each day in the bucket. The bucket is partitioned by day.
Is there any other tool better suited to query these files other than Apache Spark? I'm on AWS stack so can try out any other suggested tool with Zeppelin notebook.
You could use sc.wholeTextFiles() Here is a related post.
Alternatively, you could reformat your json using a simple function and load the generated file.
def reformat_json(input_path, output_path):
with open(input_path, 'r') as handle:
jarr = json.load(handle)
f = open(output_path, 'w')
for entry in jarr:
f.write(json.dumps(entry)+"\n")
f.close()

Writing BSON to disk

I'm storing hierarchical data in a format similar to JSON:
{
"preferences": {
"is_latest": true,
"revision": 18,
// ...
},
"updates": [
{ "id": 1, "content": "..." },
// ...
]
}
I'm writing this data to disk and I'd like to store it efficiently. I assume that, towards this end, BSON would be more efficient as a storage format than raw JSON.
How can I read and write BSON trees to/from disk in Python?
I haven't used it, but it looks like there is a bson module on PyPI:
https://pypi.python.org/pypi/bson
The project is hosted in GitHub here:
https://github.com/martinkou/bson

Categories

Resources