How to do a SQL like INNER JOIN on multiple Python dictionaries - python

I am currently planning out a django app that allows users to not only build custom tables associated with models (e.g., a user could create a trivial custom "parking spot" table that is associated with the "employee" model without having to edit models.py), but to also build custom reports using those custom tables. The only way I can think to do this is by having a model that stores custom table data in a JSONField (I'm using Postgres as a backend so this actually works out great), and then have a reports model that allows users to build and save "SQL-like" queries that return joined datasets for their custom reports.
I've figured out how to store the custom tables and use them in my app, and I even have a loose concept on how to merge multiple JSON objects on pseudo foreign keys to be pulled into custom reports, but I have only gotten as far as creating one-to-one joins.
With the script below, if any of my dicts have multiple records on a single foreign key only the last record is used. Does anyone have any idea how I can accomplish a one-to-many join of multiple python dictionaries?
If I have these three datasets:
employees = [{"id": 1, "user_id": 303, "name": "Mike"},
{"id": 2, "user_id": 304, "name": "James"},
{"id": 3, "user_id": 305, "name": "David"},]
roles = [{"id": 1, "user_id": 303, "role": "Manager"},
{"id": 2, "user_id": 304, "role": "Assistant"},
{"id": 3, "user_id": 305, "role": "Assistant"},]
absences = [{"id": 1, "user_id": 303, "date": "2015-03-01"},
{"id": 2, "user_id": 303, "date": "2015-03-02"},
{"id": 3, "user_id": 303, "date": "2015-03-03"},
{"id": 4, "user_id": 304, "date": "2015-03-15"},
{"id": 5, "user_id": 305, "date": "2015-03-19"},]
My desired outcome on a straight join would be:
[{'date': '2015-03-01', 'role': 'Manager', 'user_id': 303, 'id': 1, 'name': 'Mike'},
{'date': '2015-03-02', 'role': 'Manager', 'user_id': 303, 'id': 1, 'name': 'Mike'},
{'date': '2015-03-03', 'role': 'Manager', 'user_id': 303, 'id': 1, 'name': 'Mike'},
{'date': '2015-03-15', 'role': 'Assistant', 'user_id': 304, 'id': 2, 'name': 'James'},
{'date': '2015-03-19', 'role': 'Assistant', 'user_id': 305, 'id': 3, 'name': 'David'}]
but since my script loops through my FROM dictionary first (in this case, employees), all I am able to get is this:
[{'date': '2015-03-03', 'role': 'Manager', 'user_id': 303, 'id': 1, 'name': 'Mike'},
{'date': '2015-03-15', 'role': 'Assistant', 'user_id': 304, 'id': 2, 'name': 'James'},
{'date': '2015-03-19', 'role': 'Assistant', 'user_id': 305, 'id': 3, 'name': 'David'}]
And here are the basics of my code:
def joiner(from_table, joins):
report_data = []
for row in from_table:
new_row = row
for table in joins:
table_dict = table["table"]
table_fk = table["fk"]
for tdr in table_dict:
if tdr[table_fk] == row[table_fk]:
for field in table["fields"]:
new_row[field] = tdr[field]
report_data = from_table
return report_data
join_tables = [{"table": roles, "fk": "user_id", "fields": ["role"]},
{"table": absences, "fk": "user_id", "fields": ["date"]},
]
joiner(employees, join_tables)
The simplest fix I could think of was to start with the "absences" dict as the from_table instead of employees, but then that is a Many-to-One join, which would be very limiting for my purposes.
Also, if anyone has a better idea for building user created data schemas that can be merged in custom reports using django, I'm all ears. The only other solution I can think of would be to bypass django models entirely and just have all custom tables created, updated, and queried using straight SQL.

As long as you put the longest list of dictionaries first (Can be easily modified) when you call the merge, here is a crude solution
def merge_lists(listdict1, listdict2,listdict3, joinkey):
mergedlist=listdict1
for i in range(len(listdict1)):
for j in range(len(listdict2)):
if (listdict1[i][joinkey]==listdict2[j][joinkey]):
for keys in listdict2[j].keys():
mergedlist[i][keys]=listdict2[j][keys]
for k in range(len(listdict3)):
if listdict1[i][joinkey]==listdict3[k][joinkey]:
for keys in listdict3[k].keys():
mergedlist[i][keys]=listdict3[k][keys]
return mergedlist
merge_lists(absences, employees, roles, "user_id")
[
{
"date":"2015-03-01",
"id":1,
"name":"Mike",
"role":"Manager",
"user_id":303
},
{
"date":"2015-03-02",
"id":1,
"name":"Mike",
"role":"Manager",
"user_id":303
},
{
"date":"2015-03-03",
"id":1,
"name":"Mike",
"role":"Manager",
"user_id":303
},
{
"date":"2015-03-15",
"id":2,
"name":"James",
"role":"Assistant",
"user_id":304
},
{
"date":"2015-03-19",
"id":3,
"name":"David",
"role":"Assistant",
"user_id":305
}
]

Related

Dealing with scope in a recursive Python function

I have a JSON input file that looks like this:
{"nodes": [
{"properties": {
"id": "rootNode",
"name": "Bertina Dunmore"},
"nodes": [
{"properties": {
"id": 1,
"name": "Gwenneth Rylett",
"parent_id": "rootNode"},
"nodes": [
{"properties": {
"id": 11,
"name": "Joell Waye",
"parent_id": 1}},
{"properties": {
"id": 12,
"name": "Stan Willcox",
"parent_id": 1}}]},
{"properties": {
"id": 2,
"name": "Delbert Dukesbury",
"parent_id": "rootNode"},
"nodes": [
{"properties": {
"id": 21,
"name": "Cecil McKeever",
"parent_id": 2}},
{"properties": {
"id": 22,
"name": "Joy Obee",
"parent_id": 2}}]}]}]}
I want to get the nested properties dictionaries into a (flat) list of dictionaries. Creating a recursive function that will read this dictionaries is easy:
def get_node(nodes):
for node in nodes:
print(node['properties'])
if 'nodes' in node.keys():
get_node(node['nodes'])
Now, I'm struggling to append these to a single list:
def get_node(nodes):
prop_list = []
for node in nodes:
print(node['properties'])
prop_list.append(node['properties'])
if 'nodes' in node.keys():
get_node(node['nodes'])
return prop_list
This returns [{'id': 'rootNode', 'name': 'Bertina Dunmore'}], even though all properties dictionaries are printed. I suspect that this is because I'm not handling the function scope properly.
Can someone please help me get my head around this?
your problem is that every time you call get_node, the list where you append is initialized again. you can avoid this by passing the list to append in the recursive function
Moreover, I think would be nice to use dataclass to deal with this problem,
from dataclasses import dataclass
from typing import Union
#dataclass
class Property:
id: int
name: str
parent_id: Union[str, None] = None
def explore_json(data, properties: list=None):
if properties is None:
properties = []
for key, val in data.items():
if key == "nodes":
for node in val:
explore_json(node, properties)
elif key == "properties":
properties.append(Property(**val))
return properties
explore_json(data)
output
[Property(id='rootNode', name='Bertina Dunmore', parent_id=None),
Property(id=1, name='Gwenneth Rylett', parent_id='rootNode'),
Property(id=11, name='Joell Waye', parent_id=1),
Property(id=12, name='Stan Willcox', parent_id=1),
Property(id=2, name='Delbert Dukesbury', parent_id='rootNode'),
Property(id=21, name='Cecil McKeever', parent_id=2),
Property(id=22, name='Joy Obee', parent_id=2)]
You need to combine the prop_list returned by the recursive call with the prop_list in the current scope. For example,
def get_node(nodes):
prop_list = []
for node in nodes:
print(node['properties'])
prop_list.append(node['properties'])
if 'nodes' in node.keys():
prop_list.extend(get_node(node['nodes']))
return prop_list
With that:
def get_node(prop_list, nodes):
for node in nodes:
print(node['properties'])
prop_list.append(node['properties'])
if 'nodes' in node.keys():
get_node(prop_list, node['nodes'])
You can just do:
prop_list = []
get_node(prop_list, <yourdictnodes>)
Should alter prop_list into:
{'id': 'rootNode', 'name': 'Bertina Dunmore'}
{'id': 1, 'name': 'Gwenneth Rylett', 'parent_id': 'rootNode'}
{'id': 11, 'name': 'Joell Waye', 'parent_id': 1}
{'id': 12, 'name': 'Stan Willcox', 'parent_id': 1}
{'id': 2, 'name': 'Delbert Dukesbury', 'parent_id': 'rootNode'}
{'id': 21, 'name': 'Cecil McKeever', 'parent_id': 2}
{'id': 22, 'name': 'Joy Obee', 'parent_id': 2}

Python select value on same level in json if matches

I have an api call from my python file that returns me some data in json. I want to create a list in python with all the different item ids with matching description. I made a simplified version of it below which basically shows where I am stuck currently. Also I am not sure how to ask this question so if someone has a better title please edit it.
Right now what I have thought of is saving the size of items array and loop through that but I can't figure out how to select name value from the same level if desc_id is matching. I'm not even sure if this is the right way to do that.
json
{
'items':[
{
'id': 111,
'desc_id': 1,
},
{
'id': 222,
'desc_id': 2
},
{
'id': 333,
'desc_id': 2
}
],
'desc': [
{
'desc_id': 1,
'name': 'test',
...
},
{
'desc_id': 2,
'name': 'something else',
...
},
]
}
Desired output in python:
[['111', 'test', ...]['222', 'something else', ...]['333', 'something else', ...]]
I'd create desc as a dictionary of id: name pairs. Then when you iterate through items it's a nested lookup of the item['desc_id']
desc = {item['desc_id']: item['name'] for item in data['desc']}
items = [[item['id'], desc[item['desc_id']]] for item in data['items']]
items
[[111, 'test'], [222, 'something else'], [333, 'something else']]

Updating value in json object in python

I'm struggling with updating a value in a json object.
import json
userBoard = '' #see example below. is loaded in a separate function
#app.get("/setItem")
def setItem():
id = request.args.get('itemId')
id = int(id[2:]) # is for instance 2
for item in json.loads(session['userBoard']):
if item['id'] == id:
item['solved']='true'
else:
print('Nothing found!')
return('OK')
Example of the json:
[{"id": 1, "name": "t1", "solved": "false"}, {"id": 2, "name": "t2", "solved": "false"}, {"id": 3, "name": "t3"}]
However, when I check the printout of the userBoard, the value is still 'false'. Does anyone have any idea? Does this need to be serialized somehow? Tried many things but it didn't work out...
Many thanks!
One could say the question is somehow specific and is lacking some information to provide a simple answer. So I am going to make some assumptions and propose a solution.
First, id and input are python built-ins and should not be used as variable names. I will use these strings with a _ prefix on purpose, so that you could still use these names in a safer way.
import json
from typing import List
json_ex = '[{"id": 1, "name": "t1", "solved": "false"}, {"id": 2, "name": "t2", "solved": "false"}, {"id": 3, "name": "t3"}]'
_id = 2 # for now a constant for demonstration purposes
def setItem(_input: List[dict]):
for item in _input:
if (this_id := item['id']) == _id: # requires python 3.8+, otherwise you can simplify this
item['solved'] = 'true'
print(f'Updated item id {this_id}')
else:
print('Nothing found!')
json_ex_parsed = json.loads(json_ex) # this is now a list of dictionaries
setItem(json_ex_parsed)
Output:
Nothing found!
Updated item id 2
Nothing found!
The contents of json_ex_parsed before applying setItem:
[{'id': 1, 'name': 't1', 'solved': 'false'},
{'id': 2, 'name': 't2', 'solved': 'false'},
{'id': 3, 'name': 't3'}]
and after:
[{'id': 1, 'name': 't1', 'solved': 'false'},
{'id': 2, 'name': 't2', 'solved': 'true'}, # note here has been updated
{'id': 3, 'name': 't3'}]

How to get distinct results when ordering by a related annotated field?

This is a Django (2.2) project using Python (3.7). Given the following models, how would I get distinct results in the query below?
class Profile(models.Model):
user = models.ForeignKey(User, ...)
class Location(models.Model):
profile = models.ForeignKey(Profile, ...)
point = PointField()
class ProfileService(models.Model):
profile = models.ForeignKey(Profile, ...)
service = models.ForeignKey(Service, ...)
Here's the query I have so far which works but I end up with duplicate 'ProfileService' objects:
service = Service.objects.get(id=1)
qs = (
ProfileService.objects
.filter(service=service)
.annotate(distance=Distance('profile__location__point', self.point))
.order_by('distance')
)
If I add .distinct('profile') it obviously fails with SELECT DISTINCT ON expressions must match initial ORDER BY expressions.
I have a feeling that the solution lies in using __in but I need to keep the annotated distance field.
Further explanation
To help illustrate further, the lists below represent dummy data that will reproduce the issue:
services = [
{ 'id': 1, 'service': 'A', ... },
{ 'id': 2, 'service': 'B', ... },
]
users = [
{ 'id': 1, 'username': 'Jane Doe', 'email': 'jane#test.com', ... },
{ 'id': 2, 'username': 'John Doe', 'email': 'john#test.com', ... },
]
profiles = [
{ 'id': 1, 'user': 1, ... },
{ 'id': 2, 'user': 2, ... },
]
locations = [
{ 'id': 1, 'profile': 1, 'point': 'X', ... },
{ 'id': 2, 'profile': 1, 'point': 'Y', ... },
{ 'id': 3, 'profile': 2, 'point': 'Z', ... },
]
# 'point' would normally contain actual Point data.
# Letters (XYZ) just intended to represent unique Point data.
profile_services = [
{ 'id': 1, 'profile': 1, 'service': 1 },
{ 'id': 2, 'profile': 1, 'service': 2 },
{ 'id': 3, 'profile': 2, 'service': 1 },
]
It is the 'Location' objects that cause the duplications in the 'qs' queryset above (if a 'Profile' has only 1 'Location' associated with it, there is no duplicate result in 'qs'), however the user does need to keep the ability to provide multiple locations, we just need the closest.
Progress
Following the advice from 'Ivan Starostin', I have put together the following using subqueries:
locations = (
Location.objects
.filter(profile=OuterRef('profile'))
.annotate(distance=Distance('point', self.point))
.order_by('distance')
)
qs = (
ProfileService.objects
.filter(service=service)
.filter(profile__id__in=Subquery(locations.values('profile_id')[:1]))
.annotate(distance=Subquery(locations.values('distance')[:1]))
)
Now this solves the issue of duplicate results but it loses the annotated 'distance' value which should be annotated against the applicable ProfileService query object. Not sure if this is going in the right direction or not (any pointers would be greatly appreciated), I just want to avoid pulling the data into Python memory to get rid of the duplicates.
I have been refering to the following post too but the accepted answer refuses to work in my queryset: Similar question

Append the list of dictnary

I am writing a Python program with restart capabilities. I wanted to store the state of the program execution in a JSON format, so that during re-start it could query the Json and restart from the failed point.
JSON will be something like this:
{
"job_name": xxxxx,
"job_start_time": xxxxx,
"site": xxxxxx,
"tasks": [
{
"id": <unique id to look-up on restart>
"task_start_time":
"task_end_time":
"runtime":
"successful": <true or false>
"error_message":<if successful is false>
}
]
}
When a stage completes successfully, it appends a task dict to the list of tasks.
My question is how to append the task dictionary, while the entire python object remains.
Is it possible in JSON?
My question is how to append the task dictionary, while the entire python object remains.
You can use the update method of dictionaries to modify the object. Here is an example:
d = {'inventory': [{'Color': 'Brown', 'Model': 'Camry', 'Year': 2018},
{'Model': 'Corolla', 'Year': 2017}],
'name': 'Toyota'}
d['inventory'][0].update({'Doors': 4})
print(d)
{'inventory': [{'Color': 'Brown', 'Doors': 4, 'Model': 'Camry', 'Year': 2018},
{'Model': 'Corolla', 'Year': 2017}],
'name': 'Toyota'}
You can pass parameters in input json to the next job which is going to be the output from the current job. For ex:-
{
"job_name": xxxxx_2,
"job_start_time": xxxxx,
"site": xxxxxx,
"tasks": [
{
"id": <unique id to look-up on restart>
"task_start_time":
"task_end_time":
"runtime":
"successful": <true or false>
"error_message":<if successful is false>,
"parameters":{
"--Dict":{dict_values}
}
}
]
}
Is it a dictionary? If so...
dictionary["tasks"].append(whatever)

Categories

Resources