Python recursive function variable scope [duplicate] - python

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
“Least Astonishment” in Python: The Mutable Default Argument
I'm using the MailSnake in Python, which is a wrapper for the MailChimp API.
Now I'm getting some curious behaviour for a function I've written to pull lists of subscribers we have. This is the code I'm using:
from mailsnake import MailSnake
from mailsnake.exceptions import *
ms = MailSnake('key here')
def return_members (status, list_id, members = [], start = 0, limit = 15000, done = 0):
temp_list = ms.listMembers(status=status, id=list_id, start=page, limit=limit, since='2000-01-01 01:01:01')
for item in temp_list['data']: # Add latest pulled data to our list
members.append(item)
done = limit + done
if done < temp_list['total']: # Continue if we have yet to
start = start + 1
if limit > (temp_list['total'] - done): # Restrict how many more results we get out if are on the penultimate page
limit = temp_list['total'] - done
print 'Making another API call to get complete list'
return_members(status, list_id, members, page, limit, done)
return members
for id in lists:
unsubs = return_members('subscribed',id)
for person in unsubs:
print person['email']
print 'Finished getting information'
So this function runs recursively until we have pulled all members from a given list.
But what I've noticed is that the variable unsubs seems to just get bigger and bigger. In that when the function return_members is called with different list ids, I get an amalgamation of the emails of every list I have called so far (rather than just one particular list).
If I call return_members('subscribed', id, []) which explicitly gives it a fresh array then it's fine. But I don't see why I need to do this, as if I am calling the function with a different list ID, it's not running recursively and since I haven't specificed the members variable, it defaults to []
I think this may be a quirk of python, or I've just missed something!

The linked SO infamous question by Martjin would help you understand the underline issue, but to get this sorted out you can write the following loop
for item in temp_list['data']: # Add latest pulled data to our list
members.append(item)
to a more pythonic version
members = members + temp_list['data'] # Add latest pulled data to our list
this small change would ensure that you are working with an instance different from the one passed as the parameter to return_members

Try replacing:
def return_members (status, list_id, members = [], start = 0, limit = 15000, done = 0):
with:
def return_members (status, list_id, members = None, start = 0, limit = 15000, done = 0):
if not members: members = []

Related

Iterate over Python list with clear code - rewriting functions

I've followed a tutorial to write a Flask REST API and have a special request about a Python code.
The offered code is following:
# data list is where my objects are stored
def put_one(name):
list_by_id = [list for list in data_list if list['name'] == name]
list_by_id[0]['name'] = [new_name]
print({'list_by_id' : list_by_id[0]})
It works, which is nice, and even though I understand what line 2 is doing, I would like to rewrite it in a way that it's clear how the function iterates over the different lists. I already have an approach but it returns Key Error: 0
def put(name):
list_by_id = []
list = []
for list in data_list:
if(list['name'] == name):
list_by_id = list
list_by_id[0]['name'] = request.json['name']
return jsonify({'list_by_id' : list_by_id[0]})
My goal with this is also to be able to put other elements, that don't necessarily have the type 'name'. If I get to rewrite the function in an other way I'll be more likely to adapt it to my needs.
I've looked for tools to convert one way of coding into the other and answers in forums before coming here and couldn't find it.
It may not be beatiful code, but it gets the job done:
def put(value):
for i in range(len(data_list)):
key_list = list(data_list[i].keys())
if data_list[i][key_list[0]] == value:
print(f"old value: {key_list[0], data_list[i][key_list[0]]}")
data_list[i][key_list[0]] = request.json[test_key]
print(f"new value: {key_list[0], data_list[i][key_list[0]]}")
break
Now it doesn't matter what the key value is, with this iteration the method will only change the value when it finds in the data_list. Before the code breaked at every iteration cause the keys were different and they played a role.

Project structures for making API calls in python

I am doing my first personal project and I don't know what is the best way to structure my project. It is to create a pennystock screener that screens for stocks that gap up overnight and then filter them based on market cap, premarket volume, price, public float etc.
Right now I've written separate functions that each does its own little thing. I have a function that connects to the REST endpoint to retrieve data, one that uses the data to return a list of stocks that gap up over 50%, and then ones that filter the list of gap ups and return a dictionary of filtered tickers (e.g. ticker name, its float, whether it satisfies the condition of the filter). Then I'd use the value in that dictionary to create another filter (like float rotations based on the values of float that I've stored in the float dictionary). Finally, I have a main function that get the final list of stock which satisfies all the conditions.
The problem is that I don't know how to better structure the project. What I have right now is very inefficient. Take the following snippet as example: for ticker name, i need to call the connect rest function, and then for top gainers, I'm calling ticker names. Then for the float filter, I'm calling the top gainers. For float rotation filter, I need to call the float filter function. Finally with the main function I need to call everything... I’m making api calls in every function. It’s making too many requests and I’d like to store the data after I make the requests.
I don’t know whether to use nested functions, or make global variables, or to write the functions in separate files and then import. Also, I am confused about whether I need to create classes.
def connect_REST(data):
return data
def get_ticker_names(ticker_names, US_listed):
data = []
data = connect_REST(data)
return ticker_names, US_listed
#filter by percent change since last close to get top gainers(>= 50% gap-up pre market)
def get_top_gainer(top_gainer_list):
return top_gainer_list
# Condition 2: Float > 2M and < 30M
def filter_SharesFloat(floatData, backup_list2):
top_gainer_list = []
top_gainer_list = get_top_gainer(top_gainer_list)
floatData = {}
backup_list2 = []
print(floatData)
print("Float data not available: ", backup_list2)
return floatData, backup_list2
def filter_float_rotation():
top_gainer_list = []
predicted_intra_volume = []
predicted_intra_volume = get_predicted_intra_volume()
top_gainer_list = get_top_gainer(top_gainer_list)
floatData = {}
floatData = filter_SharesFloat(floatData)
for ticker in top_gainer_list:
floatRotations = predictedVolume / floatData[ticker]['float']
if floatRotations < 1:
cond_5 = True
else:
cond_5 = False
return floatRotationData
def main():
#get ticker that satisfies condition 1, 2, 3, 4, 5

how to iterate array of dictionary without loop using django?

This my scenario. I have 30 records in the array of dictionary in django. So, I tried to iterate it's working fine. but it takes around one minute. How to reduce iteration time. I tried map function but it's not working. How to fix this and I will share my example code.
Example Code
def find_places():
data = [{'a':1},{'a':2},{'a':3},{'a':4},{'a':5},{'a':6},{'a':7},{'a':8}]
places =[]
for p in range(1,len(data)):
a = p.a
try:
s1 = sample.object.filter(a=a)
except:
s1 = sample(a=a)
s1.save()
plac={id:s1.id,
a:s1.a}
places.append(plac)
return places
find_places()
I need an efficient way to iterate the array of objects in python without a loop.
You can filter outside the loop and run get_or_create instead of reverting to an object creation if the filter doesn't match.
data_a = [d.a for d in data]
samples = sample.objects.filter(a__in=data_a)
places = []
for a in data_a:
s1, created = samples.get_or_create(
a=a
)
place = {id: s1.id, a:s1.a}
places.append(place)
You can try this:
You can create a list hen save it at once, try this:
def find_places():
data = [{'a':1},{'a':2},{'a':3},{'a':4},{'a':5},{'a':6},{'a':7},{'a':8}]
places =[]
lst = []
for p in data:
a = p['a']
lst.append(a) # store it at once
Then try to store it into database. You can search: How to store a list into Model in Django.
I only made changes to loop of the code, if database side also fails you can let me know.

Adding multiple elements to list in dictionary

So I made this method to set parameters from a text file:
def set_params(self, params, previous_response=None):
if len(params) > 0:
param_value_list = params.split('&')
self.params = {
param_value.split()[0]: json.loads(previous_response.decode())[param_value.split()[1]] if
param_value.split()[0] == 'o' and previous_response else param_value.split()[1]
for param_value in param_value_list
}
When i call this method for example like this:
apiRequest.set_params("lim 5 & status active")
//now self.params={"lim" : 5, "status" : "active"}
it works well. Now I want to be able to add the same parameter multiple times, and when that happens, set the param like a list:
apiRequest.set_params("lim 5 & status active & status = other")
//I want this: self.params={"lim" : 5, "status" : ["active", "other"]}
How can I modify this method beautifully? All I can think of is kinda ugly... I am new with python
Just write it as simple and straightforward as you can. That is usually the best approach. In my code, below, I made one change to your requirements: all values are a list, some may have just one element in the list.
In this method I apply the following choices and techniques:
decode and parse the previous response only once, not every time it is referenced
start with an empty dictionary
split each string only once: this is faster because it avoids redundant operations and memory allocations, and (even more importantly) it is easier to read because the code is not repetitive
adjust the value according to the special-case
use setdefault() to obtain the current list of values, if present, or set a new empty list object if it is not present
append the new value to the list of values
def set_params(self, params, previous_response=None):
if len(params) <= 0:
return
previous_data = json.loads(previous_response.decode())
self.params = {}
for param_value in params.split('&'):
key, value = param_value.split()
if key == 'o' and previous_response:
value = previous_data[value]
values = self.params.setdefault(key, [])
values.append(value)
# end set_params()
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.
— Brian W. Kernighan and P. J. Plauger in The Elements of Programming Style.
Reference: http://quotes.cat-v.org/programming/

sort/graph traversal order of a list of objects which "depend" on each other [duplicate]

I'm trying to work out if my problem is solvable using the builtin sorted() function or if I need to do myself - old school using cmp would have been relatively easy.
My data-set looks like:
x = [
('business', Set('fleet','address'))
('device', Set('business','model','status','pack'))
('txn', Set('device','business','operator'))
....
The sort rule should be basically for all value of N & Y where Y > N, x[N][0] not in x[Y][1]
Although I'm using Python 2.6 where the cmp argument is still available I'm trying to make this Python 3 safe.
So, can this be done using some lambda magic and the key argument?
-== UPDATE ==-
Thanks Eli & Winston! I didn't really think using key would work, or if it could I suspected it would be a shoe horn solution which isn't ideal.
Because my problem was for database table dependencies I had to make a minor addition to Eli's code to remove an item from its list of dependencies (in a well designed database this wouldn't happen, but who lives in that magical perfect world?)
My Solution:
def topological_sort(source):
"""perform topo sort on elements.
:arg source: list of ``(name, set(names of dependancies))`` pairs
:returns: list of names, with dependancies listed first
"""
pending = [(name, set(deps)) for name, deps in source]
emitted = []
while pending:
next_pending = []
next_emitted = []
for entry in pending:
name, deps = entry
deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
if deps:
next_pending.append(entry)
else:
yield name
emitted.append(name) # <-- not required, but preserves original order
next_emitted.append(name)
if not next_emitted:
raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
pending = next_pending
emitted = next_emitted
What you want is called a topological sort. While it's possible to implement using the builtin sort(), it's rather awkward, and it's better to implement a topological sort directly in python.
Why is it going to be awkward? If you study the two algorithms on the wiki page, they both rely on a running set of "marked nodes", a concept that's hard to contort into a form sort() can use, since key=xxx (or even cmp=xxx) works best with stateless comparison functions, particularly because timsort doesn't guarantee the order the elements will be examined in. I'm (pretty) sure any solution which does use sort() is going to end up redundantly calculating some information for each call to the key/cmp function, in order to get around the statelessness issue.
The following is the alg I've been using (to sort some javascript library dependancies):
edit: reworked this greatly, based on Winston Ewert's solution
def topological_sort(source):
"""perform topo sort on elements.
:arg source: list of ``(name, [list of dependancies])`` pairs
:returns: list of names, with dependancies listed first
"""
pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place
emitted = []
while pending:
next_pending = []
next_emitted = []
for entry in pending:
name, deps = entry
deps.difference_update(emitted) # remove deps we emitted last pass
if deps: # still has deps? recheck during next pass
next_pending.append(entry)
else: # no more deps? time to emit
yield name
emitted.append(name) # <-- not required, but helps preserve original ordering
next_emitted.append(name) # remember what we emitted for difference_update() in next pass
if not next_emitted: # all entries have unmet deps, one of two things is wrong...
raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,))
pending = next_pending
emitted = next_emitted
Sidenote: it is possible to shoe-horn a cmp() function into key=xxx, as outlined in this python bug tracker message.
I do a topological sort something like this:
def topological_sort(items):
provided = set()
while items:
remaining_items = []
emitted = False
for item, dependencies in items:
if dependencies.issubset(provided):
yield item
provided.add(item)
emitted = True
else:
remaining_items.append( (item, dependencies) )
if not emitted:
raise TopologicalSortFailure()
items = remaining_items
I think its a little more straightforward than Eli's version, I don't know about efficiency.
Over looking bad formatting and this strange Set type... (I've kept them as tuples and delimited the list items correctly...) ... and using the networkx library to make things convenient...
x = [
('business', ('fleet','address')),
('device', ('business','model','status','pack')),
('txn', ('device','business','operator'))
]
import networkx as nx
g = nx.DiGraph()
for key, vals in x:
for val in vals:
g.add_edge(key, val)
print nx.topological_sort(g)
This is Winston's suggestion, with a docstring and a tiny tweak, reversing dependencies.issubset(provided) with provided.issuperset(dependencies). That change permits you to pass the dependencies in each input pair as an arbitrary iterable rather than necessarily a set.
My use case involves a dict whose keys are the item strings, with the value for each key being a list of the item names on which that key depends. Once I've established that the dict is non-empty, I can pass its iteritems() to the modified algorithm.
Thanks again to Winston.
def topological_sort(items):
"""
'items' is an iterable of (item, dependencies) pairs, where 'dependencies'
is an iterable of the same type as 'items'.
If 'items' is a generator rather than a data structure, it should not be
empty. Passing an empty generator for 'items' (zero yields before return)
will cause topological_sort() to raise TopologicalSortFailure.
An empty iterable (e.g. list, tuple, set, ...) produces no items but
raises no exception.
"""
provided = set()
while items:
remaining_items = []
emitted = False
for item, dependencies in items:
if provided.issuperset(dependencies):
yield item
provided.add(item)
emitted = True
else:
remaining_items.append( (item, dependencies) )
if not emitted:
raise TopologicalSortFailure()
items = remaining_items

Categories

Resources