Split dictionary into multiple dicts based on List value - python

I'm working with a nested mongo backend and am in the process of mapping it to SQL database tables. A user can fill in forms which will be stored as following
{
"response": {
"question_1": "answer_1",
"question_2": [
"answer_a",
"answer_b"
]
},
"user": "user_1",
"form_id": "2344"
}
Questions can have multiple answers stored as an array.
I would like to flatten this into a long format (Ie a single record per question-answer combination) like so
[
{
"question_id": "question_1",
"answer": "answer_1",
"user": "user_1",
"form_id": "2344"
},
{
"question_id": "question_2",
"answer": "answer_a",
"user": "user_1",
"form_id": "2344"
},
{
"question_id": "question_2",
"answer": "answer_b",
"user": "user_1",
"form_id": "2344"
}
]
What would be the most efficient way to achieve this in python?
I could brute-force it by looping over every response in the response dict but I have the feeling that's overkill...
Many thanks
EDIT:
A first working attempt uses the following function:
def transform_responses_to_long(completed_form: Dict):
responses = completed_form["response"]
del completed_form["response"]
for key, values in responses.items():
if not isinstance(values, (List, Dict)):
values = [values]
for value in values:
result = {}
result.update(completed_form)
result.update({"question_id": key, "answer": value})
yield result
Which yields the correct dict for every question: answer combination

Related

Python function to extract specific values from complex JSON logs data

I am trying to write a Python function (for use in a Google Cloud Function) that extracts specific values from JSON logs data. Ordinarily, I do this using the standard method of sorting through keys:
my_data['key1'], etc.
This JSON data, however is quite different, since it appears to have the data I need as lists inside of dictionaries. Here is a sample of the logs data:
{
"insertId": "-mgv16adfcja",
"logName": "projects/my_project/logs/cloudaudit.googleapis.com%2Factivity",
"protoPayload": {
"#type": "type.googleapis.com/google.cloud.audit.AuditLog",
"authenticationInfo": {
"principalEmail": "email#email.com"
},
"authorizationInfo": [{
"granted": true,
"permission": "resourcemanager.projects.setIamPolicy",
"resource": "projects/my_project",
"resourceAttributes": {
"name": "projects/my_project",
"service": "cloudresourcemanager.googleapis.com",
"type": "cloudresourcemanager.googleapis.com/Project"
}
},
{
"granted": true,
"permission": "resourcemanager.projects.setIamPolicy",
"resource": "projects/my_project",
"resourceAttributes": {
"name": "projects/my_project",
"service": "cloudresourcemanager.googleapis.com",
"type": "cloudresourcemanager.googleapis.com/Project"
}
}
],
"methodName": "SetIamPolicy",
"request": {
"#type": "type.SetIamPolicyRequest",
"policy": {
"bindings": [{
"members": [
"serviceAccount:my-test-
sa #my_project.iam.gserviceaccount.com "
],
"role": "projects/my_project/roles/PubBuckets"
},
{
"members": [
"serviceAccount:my-test-sa-
2 #my_project.iam.gserviceaccount.com "
],
"role": "roles/owner"
},
{
"members": [
"serviceAccount:my-test-sa-3#my_project.iam.gserviceaccount.com",
"serviceAccount:my-test-sa-4#my_project.iam.gserviceaccount.com"
]
}
My goal with this data is to extract the "role":"roles/editor" and the associated "members." So in this case, I would like to extract service accounts my-test-sa-3, 4, and 5, and print them.
When the JSON enters my cloud function I do the following:
pubsub_message = base64.b64decode(event['data']).decode('utf-8')
msg = json.loads(pubsub_message)
print(msg)
And I can get to other data that I need, e.g., project id-
proj_id = msg['resource']['labels']['project_id']
But I cannot get into the lists within the dictionaries effectively. The deepest I can currently get is to the 'bindings' key.
I have additionally tried restructuring and flattening output as a list:
policy_request =credentials.projects().getIamPolicy(resource=proj_id, body={})
policy_response = policy_request.execute()
my_bindings = policy_response['bindings']
flat_list = []
for element in my_bindings:
if type(element) is list:
for item in element:
flat_list.append(item)
else:
flat_list.append(element)
print('Here is flat_list: ', flat_list)
I then use an if statement to search the list, which returns nothing. I can't use indices, because the output will change consistently, so I need a solution that can extract the values by a key, value approach if at all possible.
Expected Output:
Role: roles/editor
Members:
sa-1#gcloud.com
sa2#gcloud.com
sa3#gcloud.com
and so on
Appreciate any help.

Dynamic JSON Schema Validation

In Python 3.8, I'm trying to mock up a validation JSON schema for the structure below:
{
# some other key/value pairs
"data_checks": {
"check_name": {
"sql": "SELECT col FROM blah",
"expectations": {
"expect_column_values_to_be_unique": {
"column": "col",
},
# additional items as required
}
},
# additional items as required
}
}
The requirements I'm trying to enforce include:
At least one item in data_checks that can have a dynamic name. Item keys should be unique.
sql and expectations keys must be present
sql should be a text string
At least one item in expectations. Item keys should be unique.
Within expectations, item keys must be equal to available methods provided by dir(class_name)
More advanced capability would include:
Enforcing expectations method items to only include kwargs for that method
I currently have the following JSON schema for the data_checks portion:
"data_checks": {
"description": "Data quality checks against provided sources.",
"minProperties": 1,
"type": "object",
"patternProperties": {
".+": {
"required": ["expectations", "sql"],
"sql": {
"description": "SQL for data quality check.",
"minLength": 1,
"type": "string",
},
"expectations": {
"description": "Great Expectations function name.",
"minProperties": 1,
"type": "object",
"anyOf": [
{
"type": "string",
"minLength": 1,
"pattern": [e for e in dir(SqlAlchemyDataset) if e.startswith("expect_")],
}
],
},
},
},
},
This JSON schema does not enforce expectations to have at least one item nor does it enforce valid method names for the nested keys as expected from [e for e in dir(SqlAlchemyDataset) if e.startswith("expect_")]. I haven't really looked into enforcing kwargs for the selected method (is that even possible?).
I don't know if this is related to things being nested, but how would I enforce the proper validation requirements?
Thanks!

How to reach a particular key value pair inside a dictionary in python?

I have json document as below. I would like to update all "points" to type integer. I'm familair with how to typecast. However I'm not able to reach to each indiviaul key/value pair inside the key "score". Would you please help me guide with the concept here
{"playerID": "123",
"score": [{"date": "1/1/2019",
"points": "10",
"somekey": "somevalue"
},
{"date": "1/1/2018",
"points": "100",
"somekey": "valuexyz"
}]
}
I tried to read the json data into variable called "data"
data.get("score") returned a "list" of items
item(0) gets me entire record - date, points, somekey.
I'm not able to get to a specific key called points.
Should I onceagain convert my list to dictionary and then iterate to get to points? Isn't there any other way
data = {"playerID": "123",
"score": [{"date": "1/1/2019",
"points": "10",
"somekey": "somevalue"
},
{"date": "1/1/2018",
"points": "100",
"somekey": "valuexyz"
}]
}
dlist = data.get("score") #returns list
for x in data['score']:
x['points'] = int(x['points'])

Return list of values from list inside json

I have a json and I'd like to get only specific values into a list. I can do this just fine iterating through, but I'm wondering if there's an easy one-liner list comprehension method to do this. Suppose I have a json:
{
"results": {
"types":
[
{
"ID": 1
"field": [
{
"type": "date",
"field": "PrjDate"
},
{
"type": "date",
"field": "ComplDate"
}
]
}
]
}
}
I'd like to get all of the field values into a single list:
fieldsList = ['PrjDate', 'ComplDate']
I can do this easily with
for types in myjson['results']['types']:
fieldsList = []
for fields in types['field']:
fieldsList.append(fields['field'])
But that seems unnecessarily clunky, is there an easy one-liner list comprehension method I can use here?
You could try
myfields = [fields['field'] for types in myjson['results']['types'] for fields in types['field']]

Nested Dictionary objects json serialization

I am using django rest framework to returned response in json format for jquery.
I have a dictionary object which contains another dictionary object:
items = {
['name':'Chairs','options':{'type':'office','price':100}],
['name':'Tables','options':{'type':'office','price':45}],
}
response = Response( json.dumps(output_items) , status=status.HTTP_200_OK)
On JavaScript side I am using this code:
var array = JSON.parse(json);
that is not parsing JSON, that is creating errors.
I want to create this json format:
{
"1": {
"name": "Chairs",
"description": "All chairs",
"options": {
"1":{"type": "Office", "price": 130 },
"2":{"type": "Home", "price": 75 },
"3":{"type": "Shop", "price": 100 }
}
},
"2": {
"name": "Tables",
"description": "table description",
"options": {
"1":{"type": "Office", "price": 240 },
"2":{"type": "Home", "price": 200 },
"3":{"type": "Shop", "price": 180 }
}
}
}
I stored all my data using python dictionary and list objects , how can i create this format json output string from dictionaries data?
This is not a proper json or Python object. Python Lists cannot take named arguments. Only a dictionary can take key, value pairs. If you wish to go for a list, it has to be dictionaries added to the list and not as key value pairs. The items should be something like this:
List of dictionaries
items = [
{"name":"Chairs","options":{"type":"office","price":100}},
{"name":"Tables","options":{"type":"office","price":45}},
]
(or)
Dictionary of dictionaries
items = {
"first":{"name":"Chairs","options":{"type":"office","price":100}},
"second":{"name":"Tables","options":{"type":"office","price":45}}
}
You have an error in you object "items". Try this one
items = [
{'name':'Chairs','options':{'type':'office','price':100}},
{'name':'Tables','options':{'type':'office','price':45}},
]
You have an error in dict creation ['name':'Chairs','options':{'type':'office','price':100}] This is not a pair of key: value
you are not doing it correctly as mentioned by others, you can test it in your browser's console just type in
x={'type':'office','price':100}
//Object {type: "office", price: 100}
y={'type':'office','price':45}
//Object {type: "office", price: 45}
opt1={'type':x}
//Object {type: Object}
opt2={'type':y}
//Object {type: Object}
val1={'name':'Chairs', 'options':opt1}
//Object {name: "Chairs", options: Object}
val2={name:'tables','options':opt2}
//Object {name: "tables", options: Object}
items={'1':val1,'2':val2}
you will have the needed format for your data and you will get an idea about how to formulate the data too. hope it helps

Categories

Resources