How can I write this with less redundancy/copy-paste? - python

I usually don't write my Python code in the best way since I'm relatively new to it, someone requested that I make changes to a Django app since the code doesn't look so nice.
Here's what it looks like:
#login_required
def submission_set_rank(request):
r1_obj_id = request.GET.get('rank1','')
r2_obj_id = request.GET.get('rank2','')
r3_obj_id = request.GET.get('rank3','')
r4_obj_id = request.GET.get('rank4','')
r5_obj_id = request.GET.get('rank5','')
#rate the first BallotStats object
ballot_1 = BallotStats.objects.get(object_id=r1_obj_id)
ballot_2 = BallotStats.objects.get(object_id=r2_obj_id)
ballot_3 = BallotStats.objects.get(object_id=r3_obj_id)
ballot_4 = BallotStats.objects.get(object_id=r4_obj_id)
ballot_5 = BallotStats.objects.get(object_id=r5_obj_id)
ballot_1.score += 5
ballot_2.score += 4
ballot_3.score += 3
ballot_4.score += 2
ballot_5.score += 1
ballot_1.save()
ballot_2.save()
ballot_3.save()
ballot_4.save()
ballot_5.save()
return HttpResponseRedirect('/submissions/results/film/')
As it turns out I realized that I've always been writing my Python code this way, is there a way to make it look better instead of taking up 21+ lines of code?

The biggest problem is not the style of the code - it is that you are making 10 queries: 5 for getting the objects and 5 for updating them.
Filter out objects using __in at once:
#login_required
def submission_set_rank(request):
keys = {'rank1': 5, 'rank2': 4, 'rank3': 3, 'rank4': 2, 'rank5': 1}
ranks = [request.GET.get(key,'') for key in keys]
for ballot in BallotStats.objects.filter(object_id__in=ranks):
ballot.score += keys[ballot.object_id]
ballot.save()
return HttpResponseRedirect('/submissions/results/film/')
This will make 6 queries at most: 1 for getting the objects and 5 for updating them.
Also, you can "mark" the view with the commit_manually decorator (commit_on_success would also work for you). It should speed up things significantly:
#login_required
#transaction.commit_manually
def submission_set_rank(request):
keys = {'rank1': 5, 'rank2': 4, 'rank3': 3, 'rank4': 2, 'rank5': 1}
ranks = [request.GET.get(key,'') for key in keys]
for ballot in BallotStats.objects.filter(object_id__in=ranks):
ballot.score += keys[ballot.object_id]
ballot.save()
transaction.commit()
return HttpResponseRedirect('/submissions/results/film/')
And I have the strong feeling that you can do this in even a single update query. For example, by using connection.cursor() directly with the help of executemany():
#login_required
def submission_set_rank(request):
keys = {'rank1': 5, 'rank2': 4, 'rank3': 3, 'rank4': 2, 'rank5': 1}
ranks = [{'score': request.GET.get(key,''), 'id': key} for key in keys]
cursor = connection.cursor()
cursor.executemany("""
UPDATE
ballot_stats
SET
score = score + %(score)s
WHERE
object_id = %(id)s
""", ranks)
return HttpResponseRedirect('/submissions/results/film/')
Make sure the field and table names are correct.

In your case, a little bit of looping wouldn't hurt at all. In fact, as a general rule, whenever you have to repeat something more than twice, try to make it a loop.
n = 5
for i in range(1, n+1):
obj_id = request.GET('rank' + str(i), '')
ballot = BallotStats.objects.get(object_id=obj_id)
ballot.score += n - i + 1
ballot.save()

If we're talking about saving lines of code, you can combine the 4 lines into one line, by replacing your .save() with a .update() and using an F() expression to take care of the +=. Also, as discussed by #alecxe, this will cut your queries in half. It'd look like this:
#login_required
def submission_set_rank(request):
BallotStats.objects.filter(object_id=request.GET.get('rank1','')).update(score=‌​F('score') + 5)
BallotStats.objects.filter(object_id=request.GET.get('rank2','')).update(score=‌​F('score') + 4)
BallotStats.objects.filter(object_id=request.GET.get('rank3','')).update(score=‌​F('score') + 3)
BallotStats.objects.filter(object_id=request.GET.get('rank4','')).update(score=‌​F('score') + 2)
BallotStats.objects.filter(object_id=request.GET.get('rank5','')).update(score=‌​F('score') + 1)
return HttpResponseRedirect('/submissions/results/film/')

Related

Python function repeatable logic - how to pass argument that is not string

def bulk_save_coordinates():
index = 0
coordinates_list = []
for element in range(count_indexes()):
coordinates = Coordinates(latitude=all_location_coordinates[index], location_id=index + 1, longitude=all_location_coordinates[index])
coordinates_list.append(coordinates)
index += 1
session.add_all(coordinates_list)
session.commit()
def bulk_save_timezones():
index = 0
timezones_list = []
for element in range(count_indexes()):
timezones = Timezone(offset=all_location_coordinates[index], location_id=index + 1, description=all_location_coordinates[index])
timezones_list.append(timezones)
index += 1
session.add_all(timezones_list)
session.commit()
That is my function. I need to use bulk_save_something a lot.
I see that the logic repeats itself there, it is the same pattern. I would like to put in function args something that will not be a string.
Maybe someone have an idea how to change that?
You can create a more generic function by sending the changing parameter (here the Coordinates/Timezone class ?)
def bulk_save(obj_class):
index = 0
obj_list = []
for element in range(count_indexes()):
objs = obj_class(offset=all_location_coordinates[index], location_id=index + 1, description=all_location_coordinates[index])
obj_list.append(objs)
index += 1
session.add_all(obj_list)
session.commit()

Wrong list output than what was expected

So I have to iterate through this list and divide the even numbers by 2 and multiply the odds by 3, but when I join the list together to print it gives me [0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]. I printed each value inside the loop in order to check if it was an arithmetic error but it prints out the correct value. Using lambda I have found out that it rewrites data every time it is called, so I'm trying to find other ways to do this while still using the map function. The constraint for the code is that it needs to be done using a map function. Here is a snippet of the code:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data_list1 = []
i = 0
while i < len(data):
if (data[i] % 2) == 0:
data_list1 = list(map(lambda a: a / 2, data))
print(data_list1[i])
i += 1
else:
data_list1 = list(map(lambda a: a * 3, data))
print(data_list1[i])
i += 1
print(list(data_list1))1
Edit: Error has been fixed.
The easiest way for me to do this is as follows:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data_list1 = []
i = 0
for i in range(len(data)):
if (data[i]%2) == 0:
data_list1=data_list1+[int(data[i]/2)]
elif (data[i]%2) == 1: # alternatively a else: would do, but only if every entry in data is an int()
data_list1=data_list1+[data[i]*3]
print(data_list1)
In your case a for loop makes the code much more easy to read, but a while loop works just as well.
In your original code the issue is your map() function. If you look into the documentation for it, you will see that map() affects every item in the iterable. You do not want this, instead you want to change only the specified entry.
Edit: If you want to use lambda for some reason, here's a (pretty useless) way to do it:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data_list1 = []
for i in range(len(data)):
if (data[i] % 2) == 0:
x = lambda a: a/2
data_list1.append(x(data[i]))
else:
y = lambda a: a*3
data_list1.append(y(data[i]))
print(data_list1)
If you have additional design constraints, please specify them in your question, so we can help.
Edit2: And once again onto the breach: Since you added your constraints, here's how to do it with a mapping function:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def changer(a):
if a%2==0:
return a/2
else:
return a*3
print(list(map(changer,data)))
If you want it to be in a new list, just add data_list1=list(map(changer,data)).
Hope this is what you were looking for!
You can format the string for the output like this:
print(','.join(["%2.1f " % (.5*x if (x%2)==0 else 3*x) for x in data]))
From your latest comment I completely edited the answer below (old version can be found in the edit-history of this post).
From your update, I see that your constraint is to use map. So let's address how this works:
map is a function which exists in many languages and it might be surprising to see at first because it takes a function as an argument. One analogy could be: You give a craftsmen (the "map" function) pieces of metal (the list of values) and a tool (the function passed into "map") and you tell him, to use the tool on each piece of metal and give you back the modified pieces.
A very important thing to understand is that map takes a complete list/iterable and return a new iterable all by itself. map takes care of the looping so you don't have to.
If you hand him a hammer as tool, each piece of metal will have a dent in it.
If you hand him a scriber, each piece of metal will have a scratch in it.
If you hand him a forge as tool, each piece of metal will be returned molten.
The core to underand here is that "map" will take any list (or more precisely an "iterable") and will apply whatever function you give it to each item and return the modified list (again, the return value is not really a list but a new "iterable").
So for example (using strings):
def scribe(piece_of_metal):
"""
This function takes a string and appends "with a scratch" at the end
"""
return "%s with a scratch" % piece_of_metal
def hammer(piece_of_metal):
"""
This function takes a string and appends "with a dent" at the end
"""
return "%s with a dent" % piece_of_metal
def forge(piece_of_metal):
"""
This function takes a string and prepends it with "molten"
"""
return "molten %s" % piece_of_metal
metals = ["iron", "gold", "silver"]
scribed_metals = map(scribe, metals)
dented_metals = map(hammer, metals)
molten_metals = map(forge, metals)
for row in scribed_metals:
print(row)
for row in dented_metals:
print(row)
for row in molten_metals:
print(row)
I have delibrately not responded to the core of your question as it is homework but I hope this post gives you a practical example of using map which helps with the exercise.
Another, more practical example, saving data to disk
The above example is deliberately contrived to keep it simple. But it's not very practical. Here is another example which could actually be useful, storing documents on disk. We assume that we hava a function fetch_documents which returns a list of strings, where the strings are the text-content of the documents. We want to store those into .txt files. As filenames we will use the MD5 hash of the contents. The reason MD5 is chosen is to keep things simple. This way we still only require one argument to the "mapped" function and it is sufficiently unique to avoid overwrites:
from assume_we_have import fetch_documents
from hashlib import md5
def store_document(contents):
"""
Store the contents into a unique filename and return the generated filename.
"""
hash = md5(contents)
filename = '%s.txt' % hash.hexdigest()
with open(filename, 'w') as outfile:
outfile.write(contents)
return filename
documents = fetch_documents()
stored_filenames = map(store_document, documents)
The last line which is using map could be replaced with:
stored_filenames = []
for document in documents:
filename = store_document(document)
stored_filenames.append(filename)

Python: Concatenate similiar objects in List

I have a list containing strings as ['Country-Points'].
For example:
lst = ['Albania-10', 'Albania-5', 'Andorra-0', 'Andorra-4', 'Andorra-8', ...other countries...]
I want to calculate the average for each country without creating a new list. So the output would be (in the case above):
lst = ['Albania-7.5', 'Andorra-4.25', ...other countries...]
Would realy appreciate if anyone can help me with this.
EDIT:
this is what I've got so far. So, "data" is actually a dictionary, where the keys are countries and the values are list of other countries points' to this country (the one as Key). Again, I'm new at Python so I don't realy know all the built-in functions.
for key in self.data:
lst = []
index = 0
score = 0
cnt = 0
s = str(self.data[key][0]).split("-")[0]
for i in range(len(self.data[key])):
if s in self.data[key][i]:
a = str(self.data[key][i]).split("-")
score += int(float(a[1]))
cnt+=1
index+=1
if i+1 != len(self.data[key]) and not s in self.data[key][i+1]:
lst.append(s + "-" + str(float(score/cnt)))
s = str(self.data[key][index]).split("-")[0]
score = 0
self.data[key] = lst
itertools.groupby with a suitable key function can help:
import itertools
def get_country_name(item):
return item.split('-', 1)[0]
def get_country_value(item):
return float(item.split('-', 1)[1])
def country_avg_grouper(lst) :
for ctry, group in itertools.groupby(lst, key=get_country_name):
values = list(get_country_value(c) for c in group)
avg = sum(values)/len(values)
yield '{country}-{avg}'.format(country=ctry, avg=avg)
lst[:] = country_avg_grouper(lst)
The key here is that I wrote a function to do the change out of place and then I can easily make the substitution happen in place by using slice assignment.
I would probabkly do this with an intermediate dictionary.
def country(s):
return s.split('-')[0]
def value(s):
return float(s.split('-')[1])
def country_average(lst):
country_map = {}|
for point in lst:
c = country(pair)
v = value(pair)
old = country_map.get(c, (0, 0))
country_map[c] = (old[0]+v, old[1]+1)
return ['%s-%f' % (country, sum/count)
for (country, (sum, count)) in country_map.items()]
It tries hard to only traverse the original list only once, at the expense of quite a few tuple allocations.

Python: Track latest message sequence ID without gaps

I'm writing a networking app in Python that receives numbered messages from a server. The messages have sequence numbers in the range 1..N and may come out of order. I want to track the latest message received conditioned on there being no gaps in the messages so far.
So for instance,
if the messages were 1,3,2 I would mark 3 as the latest ungapped message received.
If the messages were 1,2,5,4 I would mark 2 as the latest ungapped message received since I haven't yet received 3.
Once 3 comes in, I would mark 5 as the latest message received.
What's the most efficient way to do this? Is there some data structure or programming idiom that implements an algorithm to solve this problem?
I looked around a bit and didn't find a great answer immediately.
Here's a stab I took at doing this via a small blocking class. This could technically be O(N) for a single handle_new_index, but the average time for the handle_new_index operations should still be guaranteed to be constant.
I don't think time complexity will get much better since you have to do an insert on some kind of data structure no matter what you do.
With billions of requests and a really wide spread the non_contiguous set can have a moderate memory footprint.
import random
class gapHandler:
def __init__(self):
self.greatest = 0
self.non_contiguous = set()
def handle_new_index(self, message_index):
"""
Called when a new numbered request is sent. Updates the current
index representing the greatest contiguous request.
"""
self.non_contiguous.add(message_index)
if message_index == self.greatest + 1:
self._update_greatest()
def _update_greatest(self):
done_updating = False
while done_updating is False:
next_index = self.greatest + 1
if next_index in self.non_contiguous:
self.greatest = next_index
self.non_contiguous.remove(next_index)
else:
done_updating = True
def demo_gap_handler():
""" Runs the gapHandler class through a mock trial. """
gh = gapHandler()
for block_id in range(20000):
start = block_id*500 + 1
end = (block_id + 1)*500 + 1
indices = [x for x in range(start, end)]
random.shuffle(indices)
while indices:
new_index = indices.pop()
gh.handle_new_index(new_index)
if new_index % 50 == 0:
print(gh.greatest)
if __name__ == "__main__":
demo_gap_handler()
Here's some basic tests:
import unittest
import gaps
class testGaps(unittest.TestCase):
def test_update_greatest(self):
gh = gaps.gapHandler()
gh.non_contiguous = set((2, 3, 4, 6))
gh._update_greatest()
self.assertEqual(gh.greatest, 0)
gh.greatest = 1
gh._update_greatest()
self.assertEqual(gh.greatest, 4)
def test_handle_new_index(self):
gh = gaps.gapHandler()
gh.non_contiguous = set((2, 3, 4, 6, 2000))
gh.handle_new_index(7)
self.assertEqual(gh.greatest, 0)
gh.handle_new_index(1)
self.assertEqual(gh.greatest, 4)
gh.handle_new_index(5)
self.assertEqual(gh.greatest, 7)
if __name__ == "__main__":
unittest.main()

Is there a faster way to get subtrees from tree like structures in python than the standard "recursive"?

Let's assume the following data structur with three numpy arrays (id, parent_id) (parent_id of the root element is -1):
import numpy as np
class MyStructure(object):
def __init__(self):
"""
Default structure for now:
1
/ \
2 3
/ \
4 5
"""
self.ids = np.array([1,2,3,4,5])
self.parent_ids = np.array([-1, 1, 1, 3, 3])
def id_successors(self, idOfInterest):
"""
Return logical index.
"""
return self.parent_ids == idOfInterest
def subtree(self, newRootElement):
"""
Return logical index pointing to elements of the subtree.
"""
init_vector = np.zeros(len(self.ids), bool)
init_vector[np.where(self.ids==newRootElement)[0]] = 1
if sum(self.id_successors(newRootElement))==0:
return init_vector
else:
subtree_vec = init_vector
for sucs in self.ids[self.id_successors(newRootElement)==1]:
subtree_vec += self.subtree(sucs)
return subtree_vec
This get's really slow for many ids (>1000). Is there a faster way to implement that?
Have you tried to use psyco module if you are using Python 2.6? It can sometimes do dramatic speed up of code.
Have you considered recursive data structure: list?
Your example is also as standard list:
[1, 2, [3, [4],[5]]]
or
[1, [2, None, None], [3, [4, None, None],[5, None, None]]]
By my pretty printer:
[1,
[2, None, None],
[3,
[4, None, None],
[5, None, None]]]
Subtrees are ready there, cost you some time inserting values to right tree. Also worth while to check if heapq module fits your needs.
Also Guido himself gives some insight on traversing and trees in http://python.org/doc/essays/graphs.html, maybe you are aware of it.
Here is some advanced looking tree stuff, actually proposed for Python as basic list type replacement, but rejected in that function. Blist module
I think it's not the recursion as such that's hurting you, but the multitude of very wide operations (over all elements) for every step. Consider:
init_vector[np.where(self.ids==newRootElement)[0]] = 1
That runs a scan through all elements, calculates the index of every matching element, then uses only the index of the first one. This particular operation is available as the method index for lists, tuples, and arrays - and faster there. If IDs are unique, init_vector is simply ids==newRootElement anyway.
if sum(self.id_successors(newRootElement))==0:
Again a linear scan of every element, then a reduction on the whole array, just to check if any matches are there. Use any for this type of operation, but once again we don't even need to do the check on all elements - "if newRootElement not in self.parent_ids" does the job, but it's not necessary as it's perfectly valid to do a for loop over an empty list.
Finally there's the last loop:
for sucs in self.ids[self.id_successors(newRootElement)==1]:
This time, an id_successors call is repeated, and then the result is compared to 1 needlessly. Only after that comes the recursion, making sure all the above operations are repeated (for different newRootElement) for each branch.
The whole code is a reversed traversal of a unidirectional tree. We have parents and need children. If we're to do wide operations such as numpy is designed for, we'd best make them count - and thus the only operation we care about is building a list of children per parent. That's not very hard to do with one iteration:
import collections
children=collections.defaultdict(list)
for i,p in zip(ids,parent_ids):
children[p].append(i)
def subtree(i):
return i, map(subtree, children[i])
The exact structure you need will depend on more factors, such as how often the tree changes, how large it is, how much it branches, and how large and many subtrees you need to request. The dictionary+list structure above isn't terribly memory efficient, for instance. Your example is also sorted, which could make the operation even easier.
In theory, every algorithm can be written iteratively as well as recursively. But this is a fallacy (like Turing-completeness). In practice, walking an arbitrarily-nested tree via iteration is generally not feasible. I doubt there is much to optimize (at least you're modifying subtree_vec in-place). Doing x on thousands of elements is inherently damn expensive, no matter whether you do it iteratively or recursively. At most there are a few micro-optimizations possible on the concrete implementation, which will at most yield <5% improvement. Best bet would be caching/memoization, if you need the same data several times. Maybe someone has a fancy O(log n) algorithm for your specific tree structure up their sleeve, I don't even know if one is possible (I'd assume no, but tree manipulation isn't my staff of life).
This is my answer (written without access to your class, so the interface is slightly different, but I'm attaching it as is so that you can test if it is fast enough):
=======================file graph_array.py==========================
import collections
import numpy
def find_subtree(pids, subtree_id):
N = len(pids)
assert 1 <= subtree_id <= N
subtreeids = numpy.zeros(pids.shape, dtype=bool)
todo = collections.deque([subtree_id])
iter = 0
while todo:
id = todo.popleft()
assert 1 <= id <= N
subtreeids[id - 1] = True
sons = (pids == id).nonzero()[0] + 1
#print 'id={0} sons={1} todo={2}'.format(id, sons, todo)
todo.extend(sons)
iter = iter+1
if iter>N:
raise ValueError()
return subtreeids
=======================file graph_array_test.py==========================
import numpy
from graph_array import find_subtree
def _random_graph(n, maxsons):
import random
pids = numpy.zeros(n, dtype=int)
sons = numpy.zeros(n, dtype=int)
available = []
for id in xrange(1, n+1):
if available:
pid = random.choice(available)
sons[pid - 1] += 1
if sons[pid - 1] == maxsons:
available.remove(pid)
else:
pid = -1
pids[id - 1] = pid
available.append(id)
assert sons.max() <= maxsons
return pids
def verify_subtree(pids, subtree_id, subtree):
ids = set(subtree.nonzero()[0] + 1)
sons = set(ids) - set([subtree_id])
fathers = set(pids[id - 1] for id in sons)
leafs = set(id for id in ids if not (pids == id).any())
rest = set(xrange(1, pids.size+1)) - fathers - leafs
assert fathers & leafs == set()
assert fathers | leafs == ids
assert ids & rest == set()
def test_linear_graph_gen(n, genfunc, maxsons):
assert maxsons == 1
pids = genfunc(n, maxsons)
last = -1
seen = set()
for _ in xrange(pids.size):
id = int((pids == last).nonzero()[0]) + 1
assert id not in seen
seen.add(id)
last = id
assert seen == set(xrange(1, pids.size + 1))
def test_case1():
"""
1
/ \
2 4
/
3
"""
pids = numpy.array([-1, 1, 2, 1])
subtrees = {1: [True, True, True, True],
2: [False, True, True, False],
3: [False, False, True, False],
4: [False, False, False, True]}
for id in xrange(1, 5):
sub = find_subtree(pids, id)
assert (sub == numpy.array(subtrees[id])).all()
verify_subtree(pids, id, sub)
def test_random(n, genfunc, maxsons):
pids = genfunc(n, maxsons)
for subtree_id in numpy.arange(1, n+1):
subtree = find_subtree(pids, subtree_id)
verify_subtree(pids, subtree_id, subtree)
def test_timing(n, genfunc, maxsons):
import time
pids = genfunc(n, maxsons)
t = time.time()
for subtree_id in numpy.arange(1, n+1):
subtree = find_subtree(pids, subtree_id)
t = time.time() - t
print 't={0}s = {1:.2}ms/subtree = {2:.5}ms/subtree/node '.format(
t, t / n * 1000, t / n**2 * 1000),
def pytest_generate_tests(metafunc):
if 'case' in metafunc.function.__name__:
return
ns = [1, 2, 3, 4, 5, 10, 20, 50, 100, 1000]
if 'timing' in metafunc.function.__name__:
ns += [10000, 100000, 1000000]
pass
for n in ns:
func = _random_graph
for maxsons in sorted(set([1, 2, 3, 4, 5, 10, (n+1)//2, n])):
metafunc.addcall(
funcargs=dict(n=n, genfunc=func, maxsons=maxsons),
id='n={0} {1.__name__}/{2}'.format(n, func, maxsons))
if 'linear' in metafunc.function.__name__:
break
===================py.test --tb=short -v -s test_graph_array.py============
...
test_graph_array.py:72: test_timing[n=1000 _random_graph/1] t=13.4850590229s = 13.0ms/subtree = 0.013485ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/2] t=0.318281888962s = 0.32ms/subtree = 0.00031828ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/3] t=0.265519142151s = 0.27ms/subtree = 0.00026552ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/4] t=0.24147105217s = 0.24ms/subtree = 0.00024147ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/5] t=0.211434841156s = 0.21ms/subtree = 0.00021143ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/10] t=0.178458213806s = 0.18ms/subtree = 0.00017846ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/500] t=0.209936141968s = 0.21ms/subtree = 0.00020994ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/1000] t=0.245707988739s = 0.25ms/subtree = 0.00024571ms/subtree/node PASS
...
Here every subtree of every tree is taken, and the interesting value is the mean time to extract a tree: ~0.2ms per subtree, except for strictly linear trees. I'm not sure what is happening here.

Categories

Resources