I have this query that returns a list of student objects:
query = db.session.query(Student).filter(Student.is_deleted == false())
query = query.options(joinedload('project'))
query = query.options(joinedload('image'))
query = query.options(joinedload('student_locator_map'))
query = query.options(subqueryload('attached_addresses'))
query = query.options(subqueryload('student_meta'))
query = query.order_by(Student.student_last_name, Student.student_first_name,
Student.student_middle_name, Student.student_grade, Student.student_id)
query = query.filter(filter_column == field_value)
students = query.all()
The query itself does not take much time. The problem is converting all these objects (can be 5000+) to Python dicts. It takes over a minute with this many objects.Currently, the code loops thru the objects and converts using to_dict(). I have also tried _dict__ which was much faster but this does not convert all relational objects it seems.
How can I convert all these Student objects and related objects quickly?
Maybe this will help you...
from collections import defaultdict
def query_to_dict(student_results):
result = defaultdict(list)
for obj in student_results:
instance = inspect(obj)
for key, x in instance.attrs.items():
result[key].append(x.value)
return result
output = query_to_dict(students)
query = query.options(joinedload('attached_addresses').joinedload('address'))
By chaining address joinedload to attached_addresses I was able to significantly speed up the query.
My understanding of why this is the case:
Address objects were not being loaded with the initial query. Every iteration thru the loop, the db would get hit to retrieve the Address object. With joined load, Address objects are now loaded upon initial query.
Thanks to Corley Brigman for the help.
I'm currently dealing with dictionary filled by a dozen items which grow until a dozen millions of dictionary's items after few iterations. Fundamentally my item is defined by several ID, value, and characteristics. I create my dict with data in JSON I gather from a SQL server.
The operation I execute are for example:
get SQL results in JSON
search item where 'id1' and/or 'id2' which are identical
merge all items with same 'id1' by summing float('value')
See an example of what looks like my dict:
[
{'id1':'01234-01234-01234',
'value':'10',
'category':'K'}
...
{'id1':'01234-01234-01234',
'value':'5',
'category':'K'}
...
]
What I would like to get looks like:
[
...
{'id1':'01234-01234-01234',
'value':'15',
'category':'K'}
...
]
I could use dict of dicts instead:
{
'01234-01234-01234': {'value':'10',
'categorie':'K'}
...
'01234-01234-01234': {'value':'5',
'categorie':'K'}
...
}
and get:
{'01234-01234-01234': {'value':'15',
'categorie':'K'}
...
}
I've just got dedicated 4Go in Ram and millions of dicts in one dictionary on 64bit architecture I would like to optimise my code and my operations in time and RAM. Are there tricks or better containers than dictionary of dictionaries to realise these kind of operations? Is it better to create a new object which erase the first one for each iteration or change the hashable object itself?
I'm using Python 3.4.
EDIT: simplified the question in one question about the value.
The question is similar to How to sum dict elements or Fastest way to merge n-dictionaries and add values on 2.6, but in my case I've string in my dict.
EDIT2: for the moment, the best performances are get thanks to this method:
def merge_similar_dict(input_list=list):
i=0
#sorted the dictionnary of exchanges with the id.
try:
merge_list = sorted(deepcopy(input_list), key=lambda k: k['id'])
while (i+1)<=(len(merge_list)-1):
while (merge_list[i]['id']==merge_list[i+1]['id']):
merge_list[i]['amount'] = str(float(merge_list[i]['amount']) + float(merge_list[i+1]['amount']))
merge_list.remove(merge_list[i+1])
if i+1 >= len(merge_list):
break
else:
pass
i += 1
except Exception as error:
print('The merge of similar dict has failed')
print(error)
raise
return merge_list
return merge_list
When I get dozen thousand dicts in list, it begins to become very long (several minutes).
The list comprehension is a great structure for generalising working with lists in such a way that the creation of lists can be managed elegantly. Is there a similar tool for managing Dictionaries in Python?
I have the following functions:
# takes in 3 lists of lists and a column specification by which to group
def custom_groupby(atts, zmat, zmat2, col):
result = dict()
for i in range(0, len(atts)):
val = atts[i][col]
row = (atts[i], zmat[i], zmat2[i])
try:
result[val].append(row)
except KeyError:
result[val] = list()
result[val].append(row)
return result
# organises samples into dictionaries using the groupby
def organise_samples(attributes, z_matrix, original_z_matrix):
strucdict = custom_groupby(attributes, z_matrix, original_z_matrix, 'SecStruc')
strucfrontdict = dict()
for k, v in strucdict.iteritems():
strucfrontdict[k] = custom_groupby([x[0] for x in strucdict[k]],
[x[1] for x in strucdict[k]], [x[2] for x in strucdict[k]], 'Front')
samples = dict()
for k in strucfrontdict:
samples[k] = dict()
for k2 in strucfrontdict[k]:
samples[k][k2] = dict()
samples[k][k2] = custom_groupby([x[0] for x in strucfrontdict[k][k2]],
[x[1] for x in strucfrontdict[k][k2]], [x[2] for x in strucfrontdict[k][k2]], 'Back')
return samples
It seems like this is unwieldy. There being elegant ways to do almost everything in Python, I'm inclined to think I'm using Python wrongly.
More importantly, I'd like to be able to generalise this function better so that I can specify how many "layers" should be in the dictionary (without using several lambdas and approaching the problem in a Lisp style). I would like a function:
# organises samples into a dictionary by specified columns
# number of layers could also be assumed by number of criterion
def organise_samples(number_layers, list_of_strings_for_column_ids)
Is this possible to do in Python?
Thank you! Even if there isn't a way to do it elegantly in Python, any suggestions towards making the above code more elegant would be really appreciated.
::EDIT::
For context, the attributes object, z_matrix, and original_zmatrix are all lists of Numpy arrays.
Attributes might look like this:
Type,Num,Phi,Psi,SecStruc,Front,Back
11,181,-123.815,65.4652,2,3,19
11,203,148.581,-89.9584,1,4,1
11,181,-123.815,65.4652,2,3,19
11,203,148.581,-89.9584,1,4,1
11,137,-20.2349,-129.396,2,0,1
11,163,-34.75,-59.1221,0,1,9
The Z-matrices might both look like this:
CA-1, CA-2, CA-CB-1, CA-CB-2, N-CA-CB-SG-1, N-CA-CB-SG-2
-16.801, 28.993, -1.189, -0.515, 118.093, 74.4629
-24.918, 27.398, -0.706, 0.989, 112.854, -175.458
-1.01, 37.855, 0.462, 1.442, 108.323, -72.2786
61.369, 113.576, 0.355, -1.127, 111.217, -69.8672
Samples is a dict{num => dict {num => dict {num => tuple(attributes, z_matrix)}}}, having one row of the z-matrix.
The list comprehension is a great structure for generalising working with lists in such a way that the creation of lists can be managed elegantly. Is there a similar tool for managing Dictionaries in Python?
Have you tries using dictionary comprehensions?
see this great question about dictionary comperhansions
I want to cross reference a dictionary and django queryset to determine which elements have unique dictionary['name'] and djangoModel.name values, respectively. The way I'm doing this now is to:
Create a list of the dictionary['name'] values
Create a list of djangoModel.name values
Generate the list of unique values by checking for inclusion in those lists
This looks as follows:
alldbTests = dbp.test_set.exclude(end_date__isnull=False) #django queryset
vctestNames = [vctest['name'] for vctest in vcdict['tests']] #from dictionary
dbtestNames = [dbtest.name for dbtest in alldbTests] #from django model
# Compare tests in protocol in fortytwo's db with protocol from vc
obsoleteTests = [dbtest for dbtest in alldbTests if dbtest.name not in vctestNames]
newTests = [vctest for vctest in vcdict if vctest['name'] not in dbtestNames]
It feels unpythonic to have to generate the intermediate list of names (lines 2 and 3 above), just to be able to check for inclusion immediately after. Am I missing anything? I suppose I could put two list comprehensions in one line like this:
obsoleteTests = [dbtest for dbtest in alldbTests if dbtest.name not in [vctest['name'] for vctest in vcdict['tests']]]
But that seems harder to follow.
Edit:
Think of the initial state like this:
# vcdict is a list of django models where the following are all true
alldBTests[0].name == 'test1'
alldBTests[1].name == 'test2'
alldBTests[2].name == 'test4'
dict1 = {'name':'test1', 'status':'pass'}
dict2 = {'name':'test2', 'status':'pass'}
dict3 = {'name':'test5', 'status':'fail'}
vcdict = [dict1, dict2, dict3]
I can't convert to sets and take the difference unless I strip things down to just the name string, but then I lose access to the rest of the model/dictionary, right? Sets only would work here if I had the same type of object in both cases.
vctestNames = dict((vctest['name'], vctest) for vctest in vcdict['tests'])
dbtestNames = dict((dbtest.name, dbtest) for dbtest in alldbTests)
obsoleteTests = [vctestNames[key]
for key in set(vctestNames.keys()) - set(dbtestNames.keys())]
newTests = [dbtestNames[key]
for key in set(dbtestNames.keys()) - set(vctestNames.keys())]
You're working with basic set operations here. You could convert your objects to sets and just find the intersection (think Venn Diagrams):
obsoleteTests = list(set([a.name for a in alldbTests]) - set(vctestNames))
Sets are really useful when comparing two lists of objects (pseudopython):
set(a) - set(b) = [c for c in a and not in b]
set(a) + set(b) = [c for c in a or in b]
set(a).intersection(set(b)) = [c for c in a and in b]
The intersection- and difference-operations of sets should help you solve your problem more elegant.
But as you're originally dealing with dicts these examples and discussion may provide some inspirations: http://code.activestate.com/recipes/59875-finding-the-intersection-of-two-dicts
Is there a way to convert the GqlQuery object to an array of keys, or is there a way to force the query to return an array of keys? For example:
items = db.GqlQuery("SELECT __key__ FROM Items")
returns an object containing the keys:
<google.appengine.ext.db.GqlQuery object at 0x0415E210>
I need to compare it to an array of keys that look like:
[datastore_types.Key.from_path(u'Item', 100L, _app_id_namespace=u'items'),
..., datastore_types.Key.from_path(u'Item', 105L, _app_id_namespace=u'fitems')]
Note: I can get around the problem by querying for the stored objects, and then calling .key(), but this seems wasteful.
items = db.GqlQuery("SELECT * FROM Items")
keyArray = []
for item in items:
keyArray.append(item.key())
Certainly - you can fetch the results by calling .fetch(count) on the GqlQuery object. This is the recommended way, in fact - iterating fetches results in batches, and so is less efficient.