Consolidating row data from DB into a list of dicts

Consolidating row data from DB into a list of dicts - python

I'm reading data from a SELECT statement of SQLite. Date comes in the following form:
ID|Phone|Email|Status|Role
Multiple rows may be returned for the same ID, Phone, or Email. And for a given row, either Phone or Email can be empty/NULL. However, for the same ID, it's always the same value for Status and the same for Role. for example:
1|1234567892|a#email.com| active |typeA
2|3434567893|b#email.com| active |typeB
2|3434567893|c#email.com| active |typeB
3|5664567891|d#email.com|inactive|typeC
3|7942367891|d#email.com|inactive|typeC
4|5342234233| NULL | active |typeD
5| NULL |e#email.com| active |typeD
These data are returned as a list by Sqlite3, let's call it results. I need to go through them and reorganize the data to construct another list structure in Python. The final list basically consolidates the data for each ID, such that:
Each item of the final list is a dict, one for each unique ID in results. In other words, multiple rows for the same ID will be merged.
Each dict contains these keys: 'id', 'phones', 'emails', 'types', 'role', 'status'.
'phones' and 'emails' are lists, and contains zero or more items, but no duplicates.
'types' is also a list, and contains either 'phone' or 'email' or both, but no duplicates.
The order of dicts in the final list does not matter.
So far I have come up this:
processed = {}
for r in results:
if r['ID'] in processed:
p_data = processed[r['ID']]
if r['Phone']:
p_data['phones'].add(r['Phone'])
p_data['types'].add('phone')
if r['Email']:
p_data['emails'].add(r['Email'])
p_data['types'].add('email')
else:
p_data = {'id': r['ID'], 'status': r['Status'], 'role': r['Role']}
if r['Phone']:
p_data['phones'] = set([r['Phone']])
p_data.setdefault('types', set).add('phone')
if r['Email']:
p_data['emails'] = set([r['Email']])
p_data.setdefault('types', set).add('email')
processed[r['ID']] = p_data
consolidated = list(processed.values())
I wonder if there is a faster and/or more concise way to do this.
EDIT:
A final detail: I would prefer to have 'phones', 'emails', and 'types' in each dict as list instead of set. The reason is that I need to dump consolidated into JSON, and JSON does not allow set.

When faced with something like this I usually use:
processed = collections.defaultdict(lambda:{'phone':set(),'email':set(),'status':None,'type':set()})
and then something like:
for r in results:
for field in ['Phone','Email']:
if r[field]:
processed[r['ID']][field.lower()].add(r[field])
processed[r['ID']]['type'].add(field.lower())
Finally, you can dump it into a dictionary or a list:
a_list = processed.items()
a_dict = dict(a_list)
Regarding the JSON problem with sets, you can either convert the sets to lists right before serializing or write a custom encoder (very useful!). Here is an example of one I have for dates extended to handle sets:
class JSONDateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime.datetime):
return int(time.mktime(obj.timetuple()))
elif isinstance(ojb, set):
return list(obj)
try:
return json.JSONEncoder.default(self, obj)
except:
return str(obj)
and to use it:
json.dumps(a_list,sort_keys=True, indent=2, cls =JSONDateTimeEncoder)

I assume results is a 2d list:
print results
#[['1', '1234567892', 'a#email.com', ' active ', 'typeA'],
#['2', '3434567893', 'b#email.com', ' active ', 'typeB'],
#['2', '3434567893', 'c#email.com', ' active ', 'typeB'],
#['3', '5664567891', 'd#email.com', 'inactive', 'typeC'],
#['3', '7942367891', 'd#email.com', 'inactive', 'typeC'],
#['4', '5342234233', ' NULL ', ' active ', 'typeD'],
#['5', ' NULL ', 'e#email.com', ' active ', 'typeD']]
Now we group this list by id:
from itertools import groupby
data_grouped = [ (k,list(v)) for k,v in groupby( sorted(results, key=lambda x:x[0]) , lambda x : x[0] )]
# make list of column names (should correspond to results). These will be dict keys
names = [ 'id', 'phone','email', 'status', 'roll' ]
ID_info = { g[0]: {names[i]: list(list( map( set, zip(*g[1] )))[i]) for i in range( len(names))} for g in data_grouped }
Now for the types:
for k in ID_info:
email = [ i for i in ID_info[k]['email'] if i.strip() != 'NULL' and i != '']
phone = [ i for i in ID_info[k]['phone'] if i.strip() != 'NULL' and i != '']
if email and phone:
ID_info[k]['types'] = [ 'phone', 'email' ]
elif email and not phone:
ID_info[k]['types'] = ['email']
elif phone and not email:
ID_info[k]['types'] = ['phone']
else:
ID_info[k]['types'] = []
# project
ID_info[k]['id'] = ID_info[k]['id'][0]
ID_info[k]['roll'] = ID_info[k]['roll'][0]
ID_info[k]['status'] = ID_info[k]['status'][0]
And what you asked for (a list of dicts) is returned by ID_info.values()

Related

Get the value of a specific record in dicts with lists in python

I have a dict like this:
contactos = dict([
"id", id,
"nombres", nombres,
"apellidos", apellidos,
"telefonos", telefonos,
"correos", correos
])
And it works when I put a new register in every key:value, my problem is, how can I get the record for only one contact?
I have a part where I can input a number and search the position in the list of the dict, then I want to only show the record of that specific record in every key:value
I made this code, but it doesn´t work.
telefo = input(Fore.LIGHTGREEN_EX + "TELEFONO CONTACTO: " + Fore.RESET)
for x in range(len(telefonos)):
if(telefonos[x] == telefo):
print(contactos["telefonos"][x])
else:
print("No encontrado")
I print only the telefono value, ´cause it´s my test code.

This should be your working script:
# I imagine your data to be somethig like this. If it isn't, sorry:
id = 0
nombres = ['John', 'Anna', 'Robert']
apellidos = ['J.', 'A.', 'Rob.']
telefonos = ['333-444', '222-111', '555-888']
correos = ['john#email.com', 'anna#email.com', 'rob#email.com']
# This is the part where you made it wrong.
# Dictionaries are created with {}
#
# [] creates a list, not a dictionary structure.
#
# Also, key and values must be grouped as:
# "key": value
contactos = dict({
"id": id,
"nombres": nombres,
"apellidos": apellidos,
"telefonos": telefonos,
"correos": correos
})
# Now, imagine this this is the input from user:
telefo = "333-444"
for x in range(len(telefonos)):
if (telefonos[x] == telefo):
print(contactos["telefonos"][x])
break
else:
print("No encontrado")
When testing the script, the output is 333-444.

Odoo 13 : How to write a good filters in order to sent data to odoo website page?

I've been asked to do some filters before passing the data to the website.
I have four(4) models that are linked with many2many fields. Let me add an image of the four models.
In order to print the model.a, we need to check if it has model.b linked to it, then check if some model.c is linked to model.b and finally, check if some model.d is linked to model.c. After all of that. The result is the same as this image
To do that, I wrote this code :
#http.route(['/agenda'], auth="public", website=True)
def agenda(self):
months = DATES_SELECT
# all dictionary used in the implementation
model_c_dict = {}
model_b_dict = {}
model_a_dict = {}
model_a_key = []
# filter the model.d according to certain condition
# should I set registrations_left field as store=True for performance when using .search()
model_d_ids = request.env['model.d'].search([('date_start', '>', dt.now().date()), ('state', '=', 'opened')], order="date_start").filtered(lambda k: k.registrations_left != 0)
for session in model_d_ids:
course_id = session.course_id_many[:1]
if not course_id.state == 'validated':
continue
model_c_dict.setdefault(course_id.id, {'object': course_id, 'sessions': []})
model_c_dict[course_id.id]['sessions'].append(session)
for k, v in model_c_dict.items():
category_id = v['object'].category_ids[:1]
if not category_id:
continue
model_b_dict.setdefault(category_id.id, {'object': category_id, 'course': {}})
model_b_dict[category_id.id]['course'].setdefault(k, v)
for k, v in model_b_dict.items():
catalogue_id = v['object'].catalogue_ids[:1]
if not catalogue_id:
continue
model_a_dict.setdefault(catalogue_id.id, {'object': catalogue_id, 'category': {}})
model_a_dict[catalogue_id.id]['category'].setdefault(k, v)
if catalogue_id.id in model_a_dict:
model_a_key.append(catalogue_id)
# sort the model_a with model_a.sequence as key
model_a_key = sorted(list(set(model_a_key)), key=lambda k: k.sequence)
# pack key
dict_key = {'model_a_key': model_a_key}
values = {
'months': months,
'categs': model_a_dict,
'dict_key': dict_key,
}
return request.render('website_custom.agenda', values)
It works as intended, but I don't know if It has performance issues, if it's bad coding, ...
So I'm asking your opinion.
PS: I didn't design the models and its relations.

I loved the slice technique to avoid index out of range error, and can be very usefull to check if the record is connected
all the way up to A (catalogue model) in filtered function k.course_id_many[:1].category_ids[:1].catalogue_ids[:1] but I prefer doing this in the domain:
#http.route(['/agenda'], auth="public", website=True)
def agenda(self):
courses_dict = {}
category_dict = {}
catalogue_dict = {}
# extract all record of Model D connected all the way up to A model
sessions = request.env['model.d'].search([('date_start', '>', dt.now().date()),
('state', '=', 'opened'),
# this will make sure that the record retrieved will be connected to catalogue model (A)
('course_id_many.category_ids.catalogue_ids', '!=', False)], order="date_start") \
.filtered(lambda k: k.registrations_left != 0)
for session in sessions:
# if you want to treat olny the first record you can add the slice on the many2many [:1]
# but I think you will skip the rest of the record in the many2many field
# and if this what you want the loop are not needed at all just do `course = session.course_id_many[0]`
# and do the same for all loops. because you don't have to check if the record are connected we all ready did that in search method
course = session.course_id_many[0]
if not course.state == 'validated': continue # skip validated courses
# add course to dict, and add the session to it's list of sessions
course_obj = courses_dict.setdefault(course.id, {'object': course, 'sessions': []})
course_obj['sessions'].append(session)
category = course.category_ids[0]
# store category, and add course to it's list of courses
category_obj = category_dict.setdefault(category.id, {'object': category, 'course': {}})
category_obj = category_dict[category.id]['course'][course.id] = course_obj
catalogue = category.catalogue_ids[0]
# sotre catalog, and add category to it's categories list
catalogue_dict.setdefault(catalogue.id, {'object': catalogue, 'category': {}})['category'][category.id] = category_obj
# sort catalogue
catalogue_keys = sorted(catalogue_dict.keys(), key=lambda k: catalogue_dict[k]['object'].sequence)
values = {
'months': DATES_SELECT,
'categs': catalogue_dict,
'dict_key': catalogue_keys,
}
return request.render('website_custom.agenda', values)
I hope this work I did the best to check for syntax errors, It should work.

How to reference a json array element by name using Python?

In this json array:
json_string=[{"Id": "report","Value": "3001"},{"Id": "user","Value": "user123"}]
How can I get back user123 if I pass in user
When I try to do this:
content = json.loads(json_string)
content['user']
I get an error that says you have to use integer to reference an element.
I am brand new to Python.
Thanks!

content is a list so you should get the element by index first:
>>> content[1]['Value']
'user123'
>>> for d in content:
... if 'user' in d.values():
... print d['Value']
'user123'
Assuming user is always mapped to Id:
>>> for d in content:
... if d['Id'] == 'user':
... print d['Value']
One liner:
>>> [d['Value'] for d in content if d['Id'] == 'user'][0]
'user123'

Assuming you want to focus on the first occurrence of an element in the list with a given field (e.g. 'Id') with a certain value (e.g. 'user'):
def look_for(string, field, val):
return next((el['Value'] for el in string if el[field] == val))
json_string = [{"Id": "report","Value": "3001"}, {"Id": "user","Value": "user123"}]
found_val = look_for(json_string, 'Id', 'user')
produces
'user123'
Obviously, also the output field can become a parameter instead of being hardcoded to Value

Better way to do conditional django queries

Is there a better way to do the following:
if column_sort == 'size':
if sort == 'desc':
results = results.order_by('-size')
else:
results = results.order_by('size')
elif column_sort == 'modified':
if sort == 'desc':
results = results.order_by('-last_modified')
else:
results = results.order_by('last_modified')
else:
if sort == 'desc':
results = results.order_by('-path')
else:
results = results.order_by('path')
results = results[:100]
For example, a way in which to reverse the query order?

How about you just write the code once?
valid_column_sorts = ['size', 'last_modified', 'path']
if column_sort in valid_column_sorts:
if sort == 'desc':
column_sort = '-' + column_sort
results = results.order_by(column_sort)
Oh yeah, I see that there's some difference between sort fields and values, in which case you need a simple map, like results.order_by(col_map[column_sort])

A better way would be to use a dict to map the values of column_sort:
d = {'size': 'size',
'modified': 'last_modified',
#... add more items as needed
}
Now you just need to call get on the dict, with the second argument specifying a default if the value of column_sort isn't in the dictionary:
orderby = d.get(column_sort, "path")
For the sort variable, just add the '-' as neccessary:
orderby = '-' + orderby if sort == "desc" else orderby
And now just execute it:
results = results.order_by(orderby)

This should word:
columns = dict(size='size', modified='last_modified')
column_sort = columns.get(column_sort, 'path')
order = lambda v: '-' if v == 'desc' else ''
results.order_by(order(sort) + column_sort)
It will translate user-specified column name to your schema and assume that the default is 'path'.

How do I look get an associated value in a json variable using python?

How do I look up the 'id' associated with the a person's 'name' when the 2 are in a dictionary?
user = 'PersonA'
id = ? #How do I retrieve the 'id' from the user_stream json variable?
json, stored in a variable named "user_stream"
[
{
'name': 'PersonA',
'id': '135963'
},
{
'name': 'PersonB',
'id': '152265'
},
]

You'll have to decode the JSON structure and loop through all the dictionaries until you find a match:
for person in json.loads(user_stream):
if person['name'] == user:
id = person['id']
break
else:
# The else branch is only ever reached if no match was found
raise ValueError('No such person')
If you need to make multiple lookups, you probably want to transform this structure to a dict to ease lookups:
name_to_id = {p['name']: p['id'] for p in json.loads(user_stream)}
then look up the id directly:
id = name_to_id.get(name) # if name is not found, id will be None
The above example assumes that names are unique, if they are not, use:
from collections import defaultdict
name_to_id = defaultdict(list)
for person in json.loads(user_stream):
name_to_id[person['name']).append(person['id'])
# lookup
ids = name_to_id.get(name, []) # list of ids, defaults to empty
This is as always a trade-off, you trade memory for speed.

Martijn Pieters's solution is correct, but if you intend to make many such look-ups it's better to load the json and iterate over it just once, and not for every look-up.
name_id = {}
for person in json.loads(user_stream):
name = person['name']
id = person['id']
name_id[name] = id
user = 'PersonA'
print name_id[user]

persons = json.loads(...)
results = filter(lambda p:p['name'] == 'avi',persons)
if results:
id = results[0]["id"]
results can be more than 1 of course..

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Consolidating row data from DB into a list of dicts - python

Related

Get the value of a specific record in dicts with lists in python

Odoo 13 : How to write a good filters in order to sent data to odoo website page?

How to reference a json array element by name using Python?

Better way to do conditional django queries

How do I look get an associated value in a json variable using python?

Categories

Resources