I am building a database using Django, geodjango and postgresql of field data. The data includes lats and lons. One of the tasks I have is to ingest data that has already been collected. I would like to use .json file to define the metadata and write some code to batch process some json files.
What I have so far is, a model:
class deployment(models.Model):
'''
#brief This is the abstract deployment class.
'''
startPosition=models.PointField()
startTimeStamp=models.DateTimeField()
endTimeStamp=models.DateTimeField()
missionAim=models.TextField()
minDepth=models.FloatField() # IT seems there is no double in Django
maxDepth=models.FloatField()
class auvDeployment(deployment):
'''
#brief AUV meta data
'''
#==================================================#
# StartPosition : <point>
# distanceCovered : <double>
# startTimeStamp : <dateTime>
# endTimeStamp : <dateTime>
# transectShape : <>
# missionAim : <Text>
# minDepth : <double>
# maxDepth : <double>
#--------------------------------------------------#
# Maybe need to add unique AUV fields here later when
# we have more deployments
#==================================================#
transectShape=models.PolygonField()
distanceCovered=models.FloatField()
And I function I want to use to ingest the data
#staticmethod
def importDeploymentFromFile(file):
'''
#brief This function reads in a metadta file that includes campaign information. Destinction between deployment types is made on the fine name. <type><deployment>.<supported text> auvdeployment.json
#param file The file that holds the metata data. formats include .json todo:-> .xml .yaml
'''
catamiWebPortal.logging.info("Importing metadata from " + file)
fileName, fileExtension = os.path.splitext(file)
if fileExtension == '.json':
if os.path.basename(fileName.upper()) == 'AUVDEPLOYMENT':
catamiWebPortal.logging.info("Found valid deployment file")
data = json.load(open(file))
Model = auvDeployment(**data)
Model.save()
And the file I am trying to read in this
{
"id":1,
"startTimeStamp":"2011-09-09 13:20:00",
"endTimeStamp":"2011-10-19 14:23:54",
"missionAim":"for fun times, call luke",
"minDepth":10.0,
"maxDepth":20.0,
"startPosition":{{"type": "PointField", "coordinates": [ 5.000000, 23.000000 ] }},
"distanceCovered":20.0
}
The error that I am getting is this
TypeError: cannot set auvDeployment GeometryProxy with value of type: <type 'dict'>
If I remove the geo types from the model and file. It will read the file and populate the database table.
I would appreciate any advice one how I am parse the datafile with the geotypes.
Thanks
Okay the solution is as follows. The file format is not the geoJSON file format, it's the geos format. The .json file should be as follows.
{
"id": 1,
"startTimeStamp": "2011-10-19 10:23:54",
"endTimeStamp":"2011-10-19 14:23:54",
"missionAim": "for fun times, call luke",
"minDepth":10.0,
"maxDepth":20.0,
"startPosition":"POINT(-23.15 113.12)",
"distanceCovered":20,
"transectShape":"POLYGON((-23.15 113.12, -23.53 113.34, -23.67 112.9, -23.25 112.82, -23.15 113.12))"
}
Not the StartPosition syntax has changed.
A quick fix would be to use the GEOs API in geoDjango to change the startPosition field from geoJson format to a GEOSGeometry object before you save the model. This should allow it to pass validation.
Include the GEOSGeometry function from Django with:
from django.contrib.gis.geos import GEOSGeometry
...
Model = auvDeployment(**data)
Model.startPosition = GEOSGeometry(str(Model.startPosition))
Model.save()
The GEOS API cant construct objects from a GeoJSON format, as long as you make it a string first. As it stands, you are loading it as a dictionary type instead of a string.
I suggest you use the default command for loading fixtures: loaddata
python manage.py loaddata path/to/myfixture.json ...
The structure of your json would have to be slighty adjusted, but you could make a simple dumpdata to see how the structure should look like.
Related
I'm trying to get the value of ["pooled_metrics"]["vmaf"]["harmonic_mean"] from a JSON file I want to parse using python. This is the current state of my code:
for crf in crf_ranges:
vmaf_output_crf_list_log = job['config']['output_dir'] + '/' + build_name(stream) + f'/vmaf_{crf}.json'
# read the vmaf_output_crf_list_log file and get value from ["pooled_metrics"]["vmaf"]["harmonic_mean"]
with open(vmaf_output_crf_list_log, 'r') as json_vmaf_file:
# load the json_string["pooled_metrics"] into a python dictionary
vm = json.loads(json_vmaf_file.read())
vmaf_values.append((crf, vm["pooled_metrics"]["vmaf"]["harmonic_mean"]))
This will give me back the following error:
AttributeError: 'dict' object has no attribute 'loads'
I always get back the same AttributeError not matter if I use "load" or "loads".
I validated the contents of the JSON, which is valid using various online validators, but still, I am not able to load the JSON for further parsing operations.
I expect that I can load a file that contains valid JSON data. The content of the file looks like this:
{
"frames": [
{
"frameNum": 0,
"metrics": {
"integer_vif_scale2": 0.997330,
}
},
],
"pooled_metrics": {
"vmaf": {
"min": 89.617207,
"harmonic_mean": 99.868023
}
},
"aggregate_metrics": {
}
}
Can somebody provide me some advice onto this behavior, what does it seem so absolutely impossible to load this JSON file?
loads is a method for the json library as the docs say https://docs.python.org/3/library/json.html#json.loads. In this case you are having a AttributeError this means that probably you have created another variable named "json" and when you call json.loads is calling that variable hence it won't have a loads method.
I want to load GCS files written in JSON format into a BQ Table through an Airflow DAG.
So, i used the GoogleCloudStorageToBigQueryOperator. Additionally, to avoid using the autodetect option, i created a Schema JSON file stored in the GCS bucket which i have my JSON raw data files to be used as schema_object.
Below is the JSON Schema file:
[{"name": "id", "type": "INTEGER", "mode": "NULLABLE"},{"name": "description", "type": "INTEGER", "mode": "NULLABLE"}]
And for my JSON raw data file, it looks like that (New line Delimited JSON File):
{"description":"HR Department","id":9}
{"description":"Restaurant Department","id":10}
Here is my operator look like:
gcs_to_bq = GoogleCloudStorageToBigQueryOperator(
task_id=table_name + "_gcs_to_bq",
bucket=bucket_name,
bigquery_conn_id="bigquery_default",
google_cloud_storage_conn_id="google_cloud_storage_default",
source_objects=[table_name + "/{{ ds_nodash }}/data_json/*.json"],
schema_object=table_name+"/{{ ds_nodash }}/data_json/schema_file.json",
allow_jagged_rows=True,
ignore_unknown_values=True,
source_format="NEWLINE_DELIMITED_JSON",
destination_project_dataset_table=project_id
+ "."
+ data_set
+ "."
+ table_name,
write_disposition="WRITE_TRUNCATE",
create_disposition="CREATE_IF_NEEDED",
dag=dag,
)
The error i got is:
google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: Failed to parse JSON: No object found when new array is started.; BeginArray returned false; Parser terminated before end of string File: schema_file.json
Could you please help me solving this issue?
Thanks in Advance.
I see 2 problems :
Your BigQuery table schema is incorrect, the type of description column is INTEGER instead of STRING. You have to set it to STRING
You are using an old Airflow version. In the recent version, normally the source object retrieves by default object from bucket specified in bucket param. For the old version, I am not sure about this behaviour. You can set the full path to check if it solve your issue, example : schema_object='gs://test-bucket/schema.json'
I have an app which allows the user to upload a large datafile, process its contents into a python object (not Django model), and then present a summary of the contents to the user. In other words, the big data is present in the view and summarised to the template.
The user then selects which of the content sections to save to the database, and submits the form to do so.
I'm wrestling with how to pass the python object to the AJAX-called function without having to do all the processing again?
I've used AJAX in the past and have read the answers to suggestions for not reloading pages etc, but none of them have involved passing large objects from within a view.
# retrieve the file
storage = TemporaryParseStorage.objects.get(id=item_id)
# open the file from memory
f = open(storage.file.path, "r")
file_text = f.read()
# Parse the file:
parser = Parser()
# Process its contents to create the object - I want to store this
# object and call its member functions based on a button click in the template
objectIWantToKeep = parser.parseModel(file_text)
# Builds tree for preview
tree = build_tree_from_model(storage, model)
context = {
'storage': storage,
'model_name': model.name(),
'tree': tree
}
return render(request, 'main/upload_check.html', context)
Observe the following Python file:
# configmanager.py
"""
ConfigManager controls the modification and validation of config files.
"""
import os
from ruamel import yaml
from voluptuous import Schema
class ConfigManager():
"""
Controls all interaction with configuration files
"""
def __init__(self):
super().__init__()
self.configvalidator = ConfigValidator()
# The config directory inside users home directory.
# Config files will be stored here.
config_dir = os.path.expanduser('~')+'/.config/MyProject/'
# The default config file
config_file = config_dir+'myproject.conf'
# The default configuration
default_config = {
'key1': {},
'key2': {}
}
def _get_config(self):
"""
Get the config file and return it as python dictionary.
Will create the config directory and default config file if they
do not exist.
"""
# Create config directory if it does not exist
if not os.path.exists(self.config_dir):
os.makedirs(self.config_dir)
# Create default config file if it does not exist
if not os.path.isfile(self.config_file):
config_file = open(self.config_file, 'w')
config_file.write(yaml.dump(self.default_config))
# Open config file, and load from YAML
config_file = open(self.config_file, 'r')
config = yaml.safe_load(config_file)
# Validate config
self.configvalidator.validate(config)
return config
def _save_config(self, config):
"""
Save the config file to disk as YAML
"""
# Open current config file
config_file = open(self.config_file, 'w')
# Validate new config
# THE ERROR IS HERE
# If this runs then the config file gets saved as an empty file.
self.configvalidator.validate(config)
# This shows that the 'config' variable still holds the data
print(config)
# This shows that yaml.dump() is working correctly
print(yaml.dump(config))
config_file.write(yaml.dump(config))
def edit_config(self):
"""
Edits the configuration file
"""
config = self._get_config()
# Perform manipulation on config here
# No actual changes to the config are necessary for the bug to occur
self._save_config(config)
class ConfigValidator():
def __init__(self):
super().__init__()
# Config file schema
# Used for validating the config file with voluptuous
self.schema = Schema({
'key1': {},
'key2': {},
})
def validate(self, config):
"""
Validates the data against the defined schema.
"""
self.schema(config)
app = ConfigManager()
app.edit_config()
-
# ~/.config/MyProject/myproject.conf
key1: {}
key2: {}
Description of my module
This is a module I am working on which is for modifying the config file for my project. It accesses the file in ~/.config/MyProject/myproject.conf, which is saved in YAML format, and stores various pieces of information that are used by my program. I have removed as much of the code as I can, leaving only that necessary for understanding the bug.
ConfigManager
ConfigManager is the class containing methods for manipulating my config file. Here it contains three methods: _get_config(), _save_config(), and edit_config(). When instantised, it will get an instance of ConfigValidator (described below), and assign it to self.configvalidator.
_get_config
_get_config() simply opens the file defined by the class variables, specifically ~/.config/MyProject/myproject.conf, or creates the file with default values if it does not exist. The file is saved in YAML format, so this method loads it into a python object, using ruamel.yaml, validates it using self.configvalidator.validate(config) and returns it for use by other pieces of code.
_save_config
_save_config() is where the error occurs, which is described in detail below. It's purpose is to validate the given data, and if it is valid, save it to disk in YAML format.
edit_config
This a generic function, which, in my program, would make specific changes to my config file, depending on the arguments given. In my example, this function simply gets the config file with self._get_config(), and then saves it using self._save_config, without making any changes.
ConfigValidator
This class is for validating my config file using voluptuous. When instantised, it will create the schema that is to be used, and assign it to self.schema. When the validate method is run, it validates the given data using voluptuous.
The error
Observe the line self.configvalidator.validate(config) in ConfigManager._save_config(). This will validate the given data against the schema, and raise an error if it does not pass validation.
But, in the following line config_file.write(yaml.dump(config)), which simply saves the given data to a file as YAML, it will instead save an empty file to disk. (Note: the file is empty, not deleted)
If I disable the validation, by commenting out self.configvalidator.validate(config), then the file is written correctly as YAML.
If self.configvalidator.validate(config) is run, then the config file is saved as an empty file.
My testing
As can be seen with the line print(config), the data in the variable config does not change after being used for validation, yet when being saved to disk, it would seem that config is an empty variable.
print(yaml.dump(config)) shows that ruamel.yaml does work correctly.
If I change edit_config to give invalid data to _save_config, then self.configvalidator.validate(config) will raise an error, as expected. self.configvalidator.validate(config) is running correctly.
End
If self.configvalidator.validate(config) is run, then config_file.write(yaml.dump(config)) saves the config file as an empty file, despite the data in the variable config not changing.
If self.configvalidator.validate(config) is not run, then config_file.write(yaml.dump(config)) saves the file correctly.
That is my error, and it makes absolutely no sense to me.
If your keen to help, then configmanager.py should run correctly (with the error) on your machine, as long as it has access to ruamel.yaml and voluptuous. It will create ~/.config/MyProject/myproject.conf, then save it as empty. Save my example myproject.conf to see how it is then saved as empty when configmanager.py is run. If configmanager.py is run again, when myproject.conf is empty, then a validation error will be raised in _get_config, as expected.
I am so confused by this bug, so if you have any insight it would be greatly appreciated.
Cheers
I have a simple flask function that renders a template with a valid GeoJSON string:
#app.route('/json', methods=['POST'])
def json():
polygon = Polygon([[[0,1],[1,0],[0,0],[0,1]]])
return render_template('json.html',string=polygon)
In my json.html file, I am attempting to render this GeoJSON with OpenLayers:
function init(){
map = new OpenLayers.Map( 'map' );
layer = new OpenLayers.Layer.WMS( "OpenLayers WMS",
"http://vmap0.tiles.osgeo.org/wms/vmap0",
{layers: 'basic'} );
map.addLayer(layer);
map.setCenter(new OpenLayers.LonLat(lon, lat), zoom);
var fc = {{string}}; //Here is the JSON string
var geojson_format = new OpenLayers.Format.GeoJSON();
var vector_layer = new OpenLayers.Layer.Vector();
map.addLayer(vector_layer);
vector_layer.addFeatures(geojson_format.read(fc));
But this fails and the " characters become '. I have tried string formatting as seen in this question, but it didn't work.
EDIT:
I did forget to dump my json to an actual string, I'm using the geojson library so adding the function
dumps(polygon)
takes care of that, however I still can't parse the GeoJSON in OpenLayers, even though it is a valid string according to geojsonlint.com
This is the Javascript code to create a variable from the string sent from flask:
var geoJson = '{{string}}';
And here's what it looks like in the source page:
'{"type": "Polygon", "coordinates": [[[22.739485934746977, 39.26596659794341], [22.73902517923571, 39.266115931275074], [22.738329551588276, 39.26493626464484], [22.738796023230854, 39.26477459496181], [22.739485934746977, 39.26596659794341]]]}';
I am still having a problem rendering the quote characters.
Look like you use shapely which has http://toblerity.org/shapely/shapely.geometry.html#shapely.geometry.mapping method to create GeoJSON-like object.
To render json use tojson filter which safe (see safe filter) for latest flask versions, because jinja2 in flask by default escape all dangerous symbols to protect XSS.