creating multiple python class instance from json - python

I am writing some tests to evaluate a rest service
my response is
[
{
"Title_Id": 1,
"Title": "Mr",
"TitleDescription": "Mr",
"TitleGender": "Male",
"Update_Date": "2012-07-21T18:43:04"
},
{
"Title_Id": 2,
"Title": "Mrs",
"TitleDescription": "Mrs",
"TitleGender": "Female",
"Update_Date": "2012-07-21T18:42:59"
},
{
"Title_Id": 3,
"Title": "Sir",
"TitleDescription": "Sir",
"TitleGender": "Male",
"Update_Date": null
}
]
and need to create multiple instance of the class
class TitleInfo:
def __init__(self, Title_Id, Title, TitleDescription, TitleGender, Update_Date ):
self.Title_Id = Title_Id
self.Title = Title
self.TitleDescription = TitleDescription
self.TitleGender = TitleGender
self.Update_Date = Update_Date
what I have done is
def GetTitle(self):
try:
response = *#......"The string shown above"*
if isinstance(response, str) :
Records = json.loads(response)
RecTitles = []
for num in range(0, len(Records)):
RecTitle =TitleInfo(Records[num]['Title_Id'],Records[num]['Title'],Records[num]['TitleDescription'],Records[num]['TitleGender'],Records[num]['Update_Date'])
RecTitles.append(RecTitle)
This is working fine ....I need to know is there more short and sweet way to do that?

You could just unpack each dict and give that as an argument to TitleInfo:
RecTitles = [TitleInfo(**x) for x in json.loads(response)]
Here's the explanation from Python tutorial:
In the same fashion, dictionaries can deliver keyword arguments with the **-operator:
>>> def parrot(voltage, state='a stiff', action='voom'):
... print("-- This parrot wouldn't", action, end=' ')
... print("if you put", voltage, "volts through it.", end=' ')
... print("E's", state, "!")
...
>>> d = {"voltage": "four million", "state": "bleedin' demised", "action": "VOOM"}
>>> parrot(**d)
-- This parrot wouldn't VOOM if you put four million volts through it. E's bleedin' demised !

As an aside, you generally want to avoid hand-coding validation code. Checkout an API documentation framework: swagger, RAML, API Blueprint. All of them have tooling for request/response validation.
The next step would be to use a testing framework like dredd.

Related

Marshmallow: dynamically choose Schema depending on data?

For googlers also that question in github.
I had users of 2 types. I need to validate user data by one of 2 schemas.
I prepare awesome scratch with code of my idea, of course its not working.
class ExtraType0(Schema):
nickname = fields.String()
class ExtraType1(Schema):
id = fields.Integer()
class UserSchema(Schema):
type = fields.String()
extra = fields.Method(deserialize="get_extra_schema_by_user_type")
def get_extra_schema_by_user_type(self, obj):
if obj == 0:
# call ExtraType0 scheme and its (de)serialization
return fields.Nested(ExtraType0())
else:
# call ExtraType01scheme and its (de)serialization
return fields.Nested(ExtraType1())
# correct data
result = UserSchema().load(
{
"type": 0,
"extra": {
"id": 0
}
})
# also correct data
result1 = UserSchema().load(
{
"type": 1,
"extra": {
"nickname": "user123"
}
})
How I can proper choose schema depends on loaded in type field data?

Populate in Mongoengine, Reference Field

Does mongoengine have a similar method to populate
db.collection.posts.findById(id).populate('user_id')
day goes by but still I didn't saw anything that there is an API documentation related to .populate() in mongoengine and I chose to make my own way.
mongoengine is an Object-Oriented driver, compare to mongoose there is already predefined function .populate() but if you know how OOP works it is just a piece of cake for you, here is a simple trick
# .schema.author.py
class Author(Document):
name = StringField(required=True)
# .schema.book.py
class Book(Document):
title = StringField()
author = ReferenceField(Author)
def tojson(self):
json = {
'title ': self.title ,
'author ': self.author,
}
json['author'] = self.author.name
return json
# .route.book.py
def get(self):
try:
books = []
for book in Book.objects:
books.push(book.tojson())
return {'status': True, 'response': books}
except Exception as e:
print(e)
and the expected result must be populated Author for each Book object
{
"status": true,
"response": [
{
"title": "<book-name>",
"author": {
"name": "<author-name>"
}
}
]
}
I am not an expert in MongoEngine, but you can query like this:
post = Posts.objects(id=id).first()
That returns the data from that Model and it's already populated. You can access the author of the post using:
post.author.name

How can I parse nested JSON to CSV

I have a new project where I obtain JSON data back from a REST API - I'm trying to parse this data to csv pipe delimited to import to our legacy software
I can't seem to get all the value pairs parsed properly - this is my first exposure to JSON and I've tried so many things but only getting a little right at a time
I have used Python and can get some items that I need but not the whole JSON tree - it comes across as a list and has some dictionaries and lists in it as well
I know my code is incomplete and just looking for someone to point me in the right direction on what tools in python can get the job done
import json
import csv
with open('tenants.json') as access_json:
read_content = json.load(access_json)
for rm_access in read_content:
rm_data = rm_access
print(rm_data)
contacts_data = rm_data['Contacts']
leases_data = rm_data['Leases']
udfs_data = rm_data['UserDefinedValues']
for contacts_access in contacts_data:
rm_contacts = contacts_access
UPDATED:
import pandas as pd
with open('tenants.json') as access_json:
read_content = json.load(access_json)
for rm_access in read_content:
rm_data = rm_access
pd.set_option('display.max_rows', 10000)
pd.set_option('display.max_columns', 150)
TenantID = []
TenantDisplayID = []
Name = []
FirstName = []
LastName = []
WebMessage = []
Comment = []
RentDueDay = []
RentPeriod = []
FirstContact = []
PropertyID = []
PostingStartDate = []
CreateDate = []
CreateUserID = []
UpdateDate = []
UpdateUserID = []
Contacts = []
for rm_access in read_content:
rm_data = rm_access
TenantID.append(rm_data["TenantID"])
TenantDisplayID.append(rm_data["TenantDisplayID"])
Name.append(rm_data["Name"])
FirstName.append(rm_data["FirstName"])
LastName.append(rm_data["LastName"])
WebMessage.append(rm_data["WebMessage"])
Comment.append(rm_data["Comment"])
RentDueDay.append(rm_data["RentDueDay"])
RentPeriod.append(rm_data["RentPeriod"])
# FirstContact.append(rm_data["FirstContact"])
PropertyID.append(rm_data["PropertyID"])
PostingStartDate.append(rm_data["PostingStartDate"])
CreateDate.append(rm_data["CreateDate"])
CreateUserID.append(rm_data["CreateUserID"])
UpdateUserID.append(rm_data["UpdateUserID"])
Contacts.append(rm_data["Contacts"])
df = pd.DataFrame({"TenantID":TenantID,"TenantDisplayID":TenantDisplayID, "Name"
: Name,"FirstName":FirstName, "LastName": LastName,"WebMessage": WebMessage,"Com
ment": Comment, "RentDueDay": RentDueDay, "RentPeriod": RentPeriod, "PropertyID"
: PropertyID, "PostingStartDate": PostingStartDate,"CreateDate": CreateDate, "Cr
eateUserID": CreateUserID,"UpdateUserID": UpdateUserID,"Contacts": Contacts})
print(df)
Here is sample of the file
[
{
"TenantID": 115,
"TenantDisplayID": 115,
"Name": "Jane Doe",
"FirstName": "Jane",
"LastName": "Doe",
"WebMessage": "",
"Comment": "",
"RentDueDay": 1,
"RentPeriod": "Monthly",
"FirstContact": "2015-11-01T15:30:00",
"PropertyID": 17,
"PostingStartDate": "2010-10-01T00:00:00",
"CreateDate": "2014-04-16T13:35:37",
"CreateUserID": 1,
"UpdateDate": "2017-03-22T11:31:48",
"UpdateUserID": 1,
"Contacts": [
{
"ContactID": 128,
"FirstName": "Jane",
"LastName": "Doe",
"MiddleName": "",
"IsPrimary": true,
"DateOfBirth": "1975-02-27T00:00:00",
"FederalTaxID": "111-11-1111",
"Comment": "",
"Email": "jane.doe#mail.com",
"License": "ZZT4532",
"Vehicle": "BMW 3 Series",
"IsShowOnBill": true,
"Employer": "REW",
"ApplicantType": "Applicant",
"CreateDate": "2014-04-16T13:35:37",
"CreateUserID": 1,
"UpdateDate": "2017-03-22T11:31:48",
"AnnualIncome": 0.0,
"UpdateUserID": 1,
"ParentID": 115,
"ParentType": "Tenant",
"PhoneNumbers": [
{
"PhoneNumberID": 286,
"PhoneNumberTypeID": 2,
"PhoneNumber": "703-555-5610",
"Extension": "",
"StrippedPhoneNumber": "7035555610",
"IsPrimary": true,
"ParentID": 128,
"ParentType": "Contact"
}
]
}
],
"UserDefinedValues": [
{
"UserDefinedValueID": 1,
"UserDefinedFieldID": 4,
"ParentID": 115,
"Name": "Emerg Contact Name",
"Value": "Terry Harper",
"UpdateDate": "2016-01-22T15:41:53",
"FieldType": "Text",
"UpdateUserID": 2,
"CreateUserID": 2
},
{
"UserDefinedValueID": 174,
"UserDefinedFieldID": 5,
"ParentID": 115,
"Name": "Emerg Contact Phone",
"Value": "703-555-3568",
"UpdateDate": "2016-01-22T15:42:03",
"FieldType": "Text",
"UpdateUserID": 2,
"CreateUserID": 2
}
],
"Leases": [
{
"LeaseID": 115,
"TenantID": 115,
"UnitID": 181,
"PropertyID": 17,
"MoveInDate": "2010-10-01T00:00:00",
"SortOrder": 1,
"CreateDate": "2014-04-16T13:35:37",
"UpdateDate": "2017-03-22T11:31:48",
"CreateUserID": 1,
"UpdateUserID": 1
}
],
"Addresses": [
{
"AddressID": 286,
"AddressTypeID": 1,
"Address": "14393 Montgomery Road Lot #102\r\nCincinnati, OH 45122",
"Street": "14393 Montgomery Road Lot #102",
"City": "Cincinnati",
"State": "OH",
"PostalCode": "45122",
"IsPrimary": true,
"ParentID": 115,
"ParentType": "Tenant"
}
],
"OpenReceivables": [],
"Status": "Current"
},
Not all tenants will have all elements which is also tricky
I need the data from the top where there is TenantID, TenantDisplayID, etc
I also need the data from the Contacts, PhoneNumbers, Leases, etc values
Each line should be static so if it doesn't have certain tags then I'd like a Null or None so it would look like
TentantID|TenantDisplayID|FirstName….etc so each line has same number of fields
Something like this should work:
import pandas as pd
pd.set_option('display.max_rows', 10000)
pd.set_option('display.max_columns', 100000)
TenantID = []
TenantDisplayID = []
Name = []
FirstName = []
LastName = []
WebMessage = []
Comment = []
RentDueDay = []
RentPeriod = []
FirstContact = []
PropertyID = []
PostingStartDate = []
CreateDate = []
CreateUserID = []
UpdateDate = []
UpdateUserID = []
Contacts = []
for rm_access in read_content:
rm_data = rm_access
print(rm_data)
TenantID.append(rm_data["TenantID"])
TenantDisplayID.append(rm_data["TenantDisplayID"])
Name.append(rm_data["Name"])
FirstName.append(rm_data["FirstName"])
LastName.append(rm_data["LastName"])
WebMessage.append(rm_data["WebMessage"])
Comment.append(rm_data["Comment"])
RentDueDay.append(rm_data["RentDueDay"])
RentPeriod.append(rm_data["RentPeriod"])
FirstContact.append(rm_data["FirstContact"])
PropertyID.append(rm_data["PropertyID"])
PostingStartDate.append(rm_data["PostingStartDate"])
CreateDate.append(rm_data["CreateDate"])
CreateUserID.append(rm_data["CreateUserID"])
UpdateUserID.append(rm_data["UpdateUserID"])
Contacts.append(rm_data["Contacts"])
df = pd.DataFrame({"TenantID":TenantID,"TenantDisplayID":TenantDisplayID, "Name": Name,
"FirstName":FirstName, "LastName": LastName,"WebMessage": WebMessage,
"Comment": Comment, "RentDueDay": RentDueDay, "RentPeriod": RentPeriod,
"FirstContact": FirstContact, "PropertyID": PropertyID, "PostingStartDate": PostingStartDate,
"CreateDate": CreateDate, "CreateUserID": CreateUserID,"UpdateUserID": UpdateUserID,
"Contacts": Contacts})
print(df)
The General Problem
The problem with this task (and other similar ones) is not just how to create an algorithm - I am sure you will theoretically be able to solve this with a (not so) nice amount of nested for-loops. The problem is to organise the code in a way that you don't get a headache - i.e. in a way that you can fix bugs easily, that you can write unittests, that you can understand the code easily from reading it (in six months from now) and that you can easily change your code in case you need to do so.
I do not know anybody who does not make mistakes when wrapping their head around a deeply nested structure. And chasing for bugs in a code which is heavily nested because it mirrors the nested structure of the data, can be quite frustrating.
The Quick (and most probably: Best) Solution
Rely on packages that are made for your exact usecase, such as
https://github.com/cwacek/python-jsonschema-objects
In case you have a formal definition of the API schema, you could use packages for that. If, for instance, your API has a Swagger schema definition, you cann use swagger-py (https://github.com/digium/swagger-py) to get your JSON response into Python objects.
The Principle Solution: Object Oriented Programming and Recursion
Even if there might be some libraries for your concrete use case, I would like to explain the principle of how to deal with "that kind" of tasks:
A good way to organise code for this kind of problem is using Object Oriented Programming. The nesting hassle can be laid out much clearer by making use of the principle of recursion. This also makes it easier to chabge the code, in case the JSON schema of your API response changes for any reasons (an update of the API, for instance). In your case I would suggest you create something like the following:
class JsonObject:
"""Parent Class for any Object that will be retrieved from the JSON
and potentially has nested JsonObjects inside.
This class takes care of parsing the json into python Objects and deals
with the recursion into the nested structures."""
primitives = []
json_objects = {
# For each class, this dict defines all the "embedded" classes which
# live directly "under" that class in the nested JSON. It will have the
# following structure:
# attribute_name : class
# In your case the JSON schema does not have any "single" objects
# in the nesting strcuture, but only lists of nested objects. I
# still , to demonstrate how you would do it in case, there would be
# single "embedded"
}
json_object_lists = {
# For each class, this dict defines all the "embedded" subclasses which
# are provided in a list "under" that class in the nested JSON.
# It will have the following structure:
# attribute_name : class
}
#classmethod
def from_dict(cls, d: dict) -> "JsonObject":
instance = cls()
for attribute in cls.primitives:
# Here we just parse all the primitives
instance.attribute = getattr(d, attribute, None)
for attribute, klass in cls.json_object_lists.items():
# Here we parse all lists of embedded JSON Objects
nested_objects = []
l = getattr(d, attribute, [])
for nested_dict in l:
nested_objects += klass.from_dict(nested_dict)
setattr(instance, attribute, nested_objects)
for attribute, klass in cls.json_objects.items():
# Here we parse all "single" embedded JSON Objects
setattr(
instance,
attribute,
klass.from_dict(getattr(d, attribute, None)
)
def to_csv(self) -> str:
pass
Since you didn't explain how exactly you want to create a csv from the JSON, I didn't implement that method and left this to you. It is also not necessary to explain the overall approach.
Now we have the general Parent class all our specific will inherit from, so that we can apply recursion to our problem. Now we only need to define these concrete structures, according to the JSON schema we want to parse. I got the following from your sample, but you can easily change the things you need to:
class Address(JsonObject):
primitives = [
"AddressID",
"AddressTypeID",
"Address",
"Street",
"City",
"State",
"PostalCode",
"IsPrimary",
"ParentID",
"ParentType",
]
json_objects = {}
json_object_lists = {}
class Lease(JsonObject):
primitives = [
"LeaseID",
"TenantID",
"UnitID",
"PropertyID",
"MoveInDate",
"SortOrder",
"CreateDate",
"UpdateDate",
"CreateUserID",
"UpdateUserID",
]
json_objects = {}
json_object_lists = {}
class UserDefinedValue(JsonObject):
primitives = [
"UserDefinedValueID",
"UserDefinedFieldID",
"ParentID",
"Name",
"Value",
"UpdateDate",
"FieldType",
"UpdateUserID",
"CreateUserID",
]
json_objects = {}
json_object_lists = {}
class PhoneNumber(JsonObject):
primitives = [
"PhoneNumberID",
"PhoneNumberTypeID",
"PhoneNumber",
"Extension",
"StrippedPhoneNumber",
"IsPrimary",
"ParentID",
"ParentType",
]
json_objects = {}
json_object_lists = {}
class Contact(JsonObject):
primitives = [
"ContactID",
"FirstName",
"LastName",
"MiddleName",
"IsPrimary",
"DateOfBirth",
"FederalTaxID",
"Comment",
"Email",
"License",
"Vehicle",
"IsShowOnBill",
"Employer",
"ApplicantType",
"CreateDate",
"CreateUserID",
"UpdateDate",
"AnnualIncome",
"UpdateUserID",
"ParentID",
"ParentType",
]
json_objects = {}
json_object_lists = {
"PhoneNumbers": PhoneNumber,
}
class Tenant(JsonObject):
primitives = [
"TenantID",
"TenantDisplayID",
"Name",
"FirstName",
"LastName",
"WebMessage",
"Comment",
"RentDueDay",
"RentPeriod",
"FirstContact",
"PropertyID",
"PostingStartDate",
"CreateDate",
"CreateUserID",
"UpdateDate",
"UpdateUserID",
"OpenReceivables", # Maybe this is also a nested Object? Not clear from your sample.
"Status",
]
json_object_lists = {
"Contacts": Contact,
"UserDefinedValues": UserDefinedValue,
"Leases": Lease,
"Addresses": Address,
}
json_objects = {}
You might imagine the "beauty" (at least: order) of that approach, which lies in the following: With this structure, we could tackle any level of nesting in the JSON response of your API without additional headache - our code would not deepen its indentation level, because we have separated the nasty nesting into the recursive definition of JsonObjects from_json method. That is why it is much easier now to identify bugs or apply changes to our code.
To finally parse the JSON now into our Objects, you would do something like the following:
import typing
import json
def tenants_from_json(json_string: str) -> typing.Iterable["Tenant"]:
tenants = [
Tenant.from_dict(tenant_dict)
for tenant_dict in json.loads(json_string)
]
return tenants
Important Final Side Note: This is just the basic Principle
My code example is just a very brief introduction into the idea of using objects and recursion to deal with an overwhelming (and nasty) nesting of a structure. The code has some flaws. For instance one should avoid define mutable class variables. And of course the whole code should validate the data it gets from the API. You also might want to add the type of each attribute and represent that correctly in the Python objects (Your sample has integers, datetimes and strings, for instance).
I really only wanted to show you the very principle of Object Oriented Programming here.
I didn't take the time to test my code. So there are probably bugs left. Again, I just wanted to demonstrate the principle.

Python Marshmallow: Dict validation Error

I'm quite new to marshmallow but my question refers to the issue of handling dict-like objects. There are no workable examples in the Marshmallow documentation. I came across with a simple example here in stack overflow Original question and this is the original code for the answer suppose this should be quite simple
from marshmallow import Schema, fields, post_load, pprint
class UserSchema(Schema):
name = fields.String()
email = fields.Email()
friends = fields.List(fields.String())
class AddressBookSchema(Schema):
contacts =fields.Dict(keys=fields.String(),values=fields.Nested(UserSchema))
#post_load
def trans_friends(self, item):
for name in item['contacts']:
item['contacts'][name]['friends'] = [item['contacts'][n] for n in item['contacts'][name]['friends']]
data = """
{"contacts": {
"Steve": {
"name": "Steve",
"email": "steve#example.com",
"friends": ["Mike"]
},
"Mike": {
"name": "Mike",
"email": "mike#example.com",
"friends": []
}
}
}
"""
deserialized_data = AddressBookSchema().loads(data)
pprint(deserialized_data)
However, when I run the code I get the following NoneType value
`None`
The input hasn't been marshalled.
I'm using the latest beta version of marshmallow 3.0.0b20. I can't find a way to make this work even it looks so simple. The information seems to indicate that nested dictionaries are being worked by the framework.
Currently I'm working in a cataloging application for flask where I'm receiving JSON messages where I can't really specify the schema beforehand. My specific problem is the following:
data = """
{"book": {
"title": {
"english": "Don Quixiote",
"spanish": "Don Quijote"
},
"author": {
"first_name": "Miguel",
"last_name": "Cervantes de Saavedra"
}
},
"book": {
"title": {
"english": "20000 Leagues Under The Sea",
"french": "20000 Lieues Sous Le Mer",
"japanese": "海の下で20000リーグ",
"spanish": "20000 Leguas Bajo El Mar",
"german": "20000 Meilen unter dem Meeresspiegel",
"russian": "20000 лиг под водой"
},
"author": {
"first_name": "Jules",
"last_name": "Verne"
}
}
}
This is just toy data but exemplifies that the keys in the dictionaries are not fixed, they change in number and text.
So the questions are why am I getting the validation error in a simple already worked example and if it's possible to use the marshmallow framework to validate my data,
Thanks
There are two issues in your code.
The first is the indentation of the post_load decorator. You introduced it when copying the code here, but you don't have it in the code you're running, otherwise you wouldn't get None.
The second is due to a documented change in marshmallow 3. pre/post_load/dump functions are expected to return the value rather than mutate it.
Here's a working version. I also reworked the decorator:
from marshmallow import Schema, fields, post_load, pprint
class UserSchema(Schema):
name = fields.String()
email = fields.Email()
friends = fields.List(fields.String())
class AddressBookSchema(Schema):
contacts = fields.Dict(keys=fields.String(),values=fields.Nested(UserSchema))
#post_load
def trans_friends(self, item):
for contact in item['contacts'].values():
contact['friends'] = [item['contacts'][n] for n in contact['friends']]
return item
data = """
{
"contacts": {
"Steve": {
"name": "Steve",
"email": "steve#example.com",
"friends": ["Mike"]
},
"Mike": {
"name": "Mike",
"email": "mike#example.com",
"friends": []
}
}
}
"""
deserialized_data = AddressBookSchema().loads(data)
pprint(deserialized_data)
And finally, the Dict in marshmallow 2 doesn't have key/value validation feature, so it will just silently ignore the keys and values argument and perform no validation.

Tastypie: How can I fill the resource without database?

I want to grab some information from Foursquare , add some fields and return it via django-tastypie.
UPDATE:
def obj_get_list(self, request=None, **kwargs):
near = ''
if 'near' in request.GET and request.GET['near']:
near = request.GET['near']
if 'q' in request.GET and request.GET['q']:
q = request.GET['q']
client = foursquare.Foursquare(client_id=settings.FSQ_CLIENT_ID, client_secret=settings.FSQ_CLIENT_SECRET)
a = client.venues.search(params={'query': q, 'near' : near, 'categoryId' : '4d4b7105d754a06374d81259' })
objects = []
for venue in a['venues']:
bundle = self.build_bundle(obj=venue, request=request)
bundle = self.full_dehydrate(bundle)
objects.append(bundle)
return objects
Now I am getting:
{
"meta": {
"limit": 20,
"next": "/api/v1/venue/?q=Borek&near=Kadikoy",
"offset": 0,
"previous": null,
"total_count": 30
},
"objects": [
{
"resource_uri": ""
},
{
"resource_uri": ""
}]
}
There are 2 empty objects. What should I do in order to fill this resource?
ModelResource is only suitable when you have ORM Model behind the resource. In other cases you should use Resource.
This subject is discussed in ModelResource description, mentioning when it is suitable and when it is not: http://django-tastypie.readthedocs.org/en/latest/resources.html#why-resource-vs-modelresource
Also there is a whole chapter in the documentation, aimed at providing the details on how to implement non-ORM data sources (in this case: external API): http://django-tastypie.readthedocs.org/en/latest/non_orm_data_sources.html

Categories

Resources