what is the best way to validate JSON data in Django/python.
Is it best to create a bunch of classes like the Django FormMixin classes that can validate the data/ parameters being passed in?
What's the best DRY way of doing this? Are there existing apps that I can leverage?
I'd like to take in JSON data and perform some actions/ updates to my model instances as a result. The data I'm taking in is not user generated - that is they are id's and flags (no text) so I don't want to use Forms.
I just instantiate a model object from the json data and call full_clean() on the model to validate: https://docs.djangoproject.com/en/dev/ref/models/instances/#django.db.models.Model.full_clean
m = myModel(**jsondata)
m.full_clean()
validictory validates json to a json-schema. It works. Of course, now you need to define your schema in json which may be a little much for what you want to do, but it does have it's place.
I would recommend a python library named DictShield for this https://github.com/j2labs/dictshield
DictShield is a database-agnostic modeling system. It provides a way to model, validate and reshape data easily.
There is even a sample for doing JSON validation:
Validating User Input
Let's say we get this JSON string from a user.
{"bio": "Python, Erlang and guitars!", "secret": "e8b5d682452313a6142c10b045a9a135", "name": "J2D2"}
We might write some server code that looks like this:
json_string = request.get_arg('data')
user_input = json.loads(json_string)
user.validate(**user_input)
Related
I've a field in my model called, test_data = models.TextField(...), and the model is called MyOrm
and this test_data contains data which is actually string, some holds JSON data, and some reference to blob-url.
Now I'm trying to streamline my data. So I want to filter all the MyOrm object whose test_data ain't JSON.
I'm just storing/trying to store some meta-data along with url, then I will convert them.
Can anyone suggest me a way to do so??
pseudocode:
select all my-orm where is_not_json(my-orm.test-data)
There is no database function for this that I'm aware of. You can define your own, but then this is more database programming.
I think you have no other option than to enumerate over the MyOrm model objects, and check if you can JSON decode these, with:
import json
for item in MyOrm.objects.all():
try:
json.loads(item.test_data)
except ValueError:
# is invalid JSON, process
# …
pass
or if memory might be a problem, you can work with .iterator(…) [Django-doc]:
import json
for item in MyOrm.objects.all().iterator():
try:
json.loads(item.test_data)
except ValueError:
# is invalid JSON, process
# …
pass
I am making a program that consists of scraping data from a job page, and I get to this data
{"job":{"ciphertext":"~01142b81f148312a7c","rid":225177647,"uid":"1416152499115024384","type":2,"access":4,"title":"Need app developers to handle our app upgrades","status":1,"category":{"name":"Mobile Development","urlSlug":"mobile-development"
,"contractorTier":2,"description":"We have an app currently built, we are looking for someone to \n\n1) Manage the app for bugs etc \n2) Provide feature upgrades \n3) Overall Management and optimization \n\nPlease get in touch and i will share more details. ","questions":null,"qualifications":{"type":0,"location":null,"minOdeskHours":0,"groupRecno":0,"shouldHavePortfolio":false,"tests":null,"minHoursWeek":40,"group":null,"prefEnglishSkill":0,"minJobSuccessScore":0,"risingTalent":true,"locationCheckRequired":false,"countries":null,"regions":null,"states":null,"timezones":null,"localMarket":false,"onSiteType":null,"locations":null,"localDescription":null,"localFlexibilityDescription":null,"earnings":null,"languages":null
],"clientActivity":{"lastBuyerActivity":null,"totalApplicants":0,"totalHired":0,"totalInvitedToInterview":0,"unansweredInvites":0,"invitationsSent":0
,"buyer":{"isPaymentMethodVerified":false,"location":{"offsetFromUtcMillis":14400000,"countryTimezone":"United Arab Emirates (UTC+04:00)","city":"Dubai","country":"United Arab Emirates"
,"stats":{"totalAssignments":31,"activeAssignmentsCount":3,"feedbackCount":27,"score":4.9258937139,"totalJobsWithHires":30,"hoursCount":7.16666667,"totalCharges":{"currencyCode":"USD","amount":19695.83
,"jobs":{"postedCount":59,"openCount":2
,"avgHourlyJobsRate":{"amount":19.999534874418824
But the problem is that the only data I need is:
-Title
-Description
-Customer activity (lastBuyerActivity, totalApplicants, totalHired, totalInvitedToInterview, unansweredInvites, invitationsSent)
-Buyer (isPaymentMethodVerified, location (Country))
-stats (All items)
-jobs (all items)
-avgHourlyJobsRate
These sort of data are JSON type data. Python understands these sort of data through dictionary data type.
Suppose you have your data stored in a string. You can use di = exec(myData) to convert the string to dictionary. Then you can access the structured data like: di["job"] which return's the job section of the data.
di = exec(myData)
print(`di["job"]`)
However this is just a hack and it is not recommended because it's a
bit messy and unpythonic.
The appropriate way is to use JSON library to convert the data to dictionary. Take a look at the code snippet below to get an idea of what is the appropriate way:
import json
myData = "Put your data Here"
res = json.loads(myData)
print(res["jobs"])
convert the data to dictionary using json.loads
then you can easily use the dictionary keys that your want to lookup or filter the data.
This seems to be a dictionary so you can extract something from it by doing: dictionary["job"]["uid"] for example. If it is a Json file convert the data to a Python dictionary
I am a programming self-learner and I am new to python and django and would like to optimize my code.
My problem is that I want to do a get_or_create with some loaded json data. Each dictionary entry is directly linked to my model. example:
data=json.load(file)
Person.objects.get_or_create(
firstname=data['firstname'],
surname=data['surname'],
gender=data['gender'],
datebirth=data['datebirth'],
)
Are there any ways to automatically link the json properties to my model fields instead of typing all my properties one by one?
What you might want to do is to unpack your list of arguments. Link to Python docs.
Say your model is Person:
p = Person(**data_dict)
p.save()
Reference
You need to write following code in python shell:
import json
data = json.loads(source)
print(json.dumps(data, indent=2))
If my json object contains several attributes then should I write assertion for each and every attributes or is there any better way?
eg :
{'data':{'id' : 123, 'first_name' : 'bruce', 'last_name' : 'wayne', 'phone' : 12345, 'is_superhero' : 'yes', 'can_fly' : 'uses_tech', 'aka' : 'batman'}}
Now, I can write assertions as follows:
Approach 1:
assertEquals(response['data']['first_name'] == 'bruce') and so on for all the attributes, imagine if my json response have 100 fields then my test code would include 100 assertEquals.
Approach 2:
I can do the json comparison, but then how that would work when there are dynamic values(like id) which are present in json object & are bound to change every time!
If you have done API automation and made assertions on complex json objects then I'm looking forward to the approach you have followed.
I feel like there has to be a better approach. Any suggestions?
What I suggest you do is using JSON Schemas, read more about it here: https://spacetelescope.github.io/understanding-json-schema/.
Once you have the schema for your API you can validate it with a library like JSON schema, check it here: https://pypi.python.org/pypi/jsonschema
Furthermore you can generate the schemas from your JSON automatically with the GenSON library: https://github.com/wolverdude/GenSON. Take into account that this library has limitations and you would need to tune the schema generated to fit your requirements.
Some background first: I have a few rather simple data structures which are persisted as json files on disk. These json files are shared between applications of different languages and different environments (like web frontend and data manipulation tools).
For each of the files I want to create a Python "POPO" (Plain Old Python Object), and a corresponding data mapper class for each item should implement some simple CRUD like behavior (e.g. save will serialize the class and store as json file on disk).
I think a simple mapper (which only knows about basic types) will work. However, I'm concerned about security. Some of the json files will be generated by a web frontend, so a possible security risk if a user feeds me some bad json.
Finally, here is the simple mapping code (found at How to convert JSON data into a Python object):
class User(object):
def __init__(self, name, username):
self.name = name
self.username = username
import json
j = json.loads(your_json)
u = User(**j)
What possible security issues do you see?
NB: I'm new to Python.
Edit: Thanks all for your comments. I've found out that I have one json where I have 2 arrays, each having a map. Unfortunately this starts to look like it gets cumbersome when I get more of these.
I'm extending the question to mapping a json input to a recordtype. The original code is from here: https://stackoverflow.com/a/15882054/1708349.
Since I need mutable objects, I'd change it to use a namedlist instead of a namedtuple:
import json
from namedlist import namedlist
data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'
# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedlist('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id
Is it still safe?
There's not much wrong that can happen in the first case. You're limiting what arguments can be provided and it's easy to add validation/conversion right after loading from JSON.
The second example is a bit worse. Packing things into records like this will not help you in any way. You don't inherit any methods, because each type you define is new. You can't compare values easily, because dicts are not ordered. You don't know if you have all arguments handled, or if there is some extra data, which can lead to hidden problems later.
So in summary: with User(**data), you're pretty safe. With namedlist there's space for ambiguity and you don't really gain anything. (compared to bare, parsed json)
If you blindly accept users json input without sanity check, you are at risk of become json injection victim.
See detail explanation of json injection attack here: https://www.acunetix.com/blog/web-security-zone/what-are-json-injections/
Besides security vulnerability, parse JSON to Python object this way is not type safe.
With your example of User class, I would assume you expect both fields name and username to be string type. What if the json input is like this:
{
"name": "my name",
"username": 1
}
j = json.loads(your_json)
u = User(**j)
type(u.username) # int
You have gotten an object with unexpected type.
One solution to make sure type safe is to use json schema to validate input json. more about json schema: https://json-schema.org/