Set with expiration not working inside running Python script - python

I have this class called DecayingSet which is a deque with expiration
class DecayingSet:
def __init__(self, timeout): # timeout in seconds
from collections import deque
self.timeout = timeout
self.d = deque()
self.present = set()
def add(self, thing):
# Return True if `thing` not already in set,
# else return False.
result = thing not in self.present
if result:
self.present.add(thing)
self.d.append((time(), thing))
self.clean()
return result
def clean(self):
# forget stuff added >= `timeout` seconds ago
now = time()
d = self.d
while d and now - d[0][0] >= self.timeout:
_, thing = d.popleft()
self.present.remove(thing)
I'm trying to use it inside a running script, that connects to a streaming api.
The streaming api is returning urls that I am trying to put inside the deque to limit them from entering the next step of the program.
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status, include_entities=True):
longUrl = status.entities['urls'][0]['expanded_url']
limit = DecayingSet(86400)
l = limit.add(longUrl)
print l
if l == False:
pass
else:
r = requests.get("http://api.some.url/show?url=%s"% longUrl)
When i use this class in an interpreter, everything is good.
But when the script is running, and I repeatedly send in the same url, l returns True every time indicating that the url is not inside the set, when is supposed to be. What gives?

Copying my comment ;-) I think the indentation is screwed up, but it looks like you're creating a brand new limit object every time on_status() is called. Then of course it would always return True: you'd always be starting with an empty limit.
Regardless, change this:
l = limit.add(longUrl)
print l
if l == False:
pass
else:
r = requests.get("http://api.some.url/show?url=%s"% longUrl)
to this:
if limit.add(longUrl):
r = requests.get("http://api.some.url/show?url=%s"% longUrl)
Much easier to follow. It's usually the case that when you're comparing something to a literal True or False, the code can be made more readable.
Edit
i just saw in the interpreter the var assignment is the culprit.
How would I use the same obj?
You could, for example, create the limit object at the module level. Cut and paste ;-)

Related

multiprocessing a function with parameters that are iterated through

I'm trying to improve the speed of my program and I decided to use multiprocessing!
the problem is I can't seem to find any way to use the pool function (i think this is what i need) to use my function
here is the code that i am dealing with:
def dataLoading(output):
name = ""
link = ""
upCheck = ""
isSuccess = ""
for i in os.listdir():
with open(i) as currentFile:
data = json.loads(currentFile.read())
try:
name = data["name"]
link = data["link"]
upCheck = data["upCheck"]
isSuccess = data["isSuccess"]
except:
print("error in loading data from config: improper naming or formating used")
output[name] = [link, upCheck, isSuccess]
#working
def userCheck(link, user, isSuccess):
link = link.replace("<USERNAME>", user)
isSuccess = isSuccess.replace("<USERNAME>", user)
html = requests.get(link, headers=headers)
page_source = html.text
count = page_source.count(isSuccess)
if count > 0:
return True
else:
return False
I have a parent function to run these two together but I don't think i need to show the whole thing, just the part that gets the data iteratively:
for i in configData:
data = configData[i]
link = data[0]
print(link)
upCheck = data[1] #just for future use
isSuccess = data[2]
if userCheck(link, username, isSuccess) == True:
good.append(i)
you can see how I enter all of the data in there, how would I be able to use multiprocessing to do this when I am iterating through the dictionary to collect multiple parameters?
I like to use mp.Pool().map. I think it is easiest and most straight forward and handles most multiprocessing cases. So how does map work? For starts, we have to keep in mind that mp creates workers, each worker receives a copy of the namespace (ya the whole thing), then each worker works on what they are assigned and returns. Hence, doing something like "updating a global variable" while they work, doesn't work; since they are each going to receive a copy of the global variable and none of the workers are going to be communicating. (If you want communicating workers you need to use mp.Queue's and such, it gets complicated). Anyway, here is using map:
from multiprocessing import Pool
t = 'abcd'
def func(s):
return t[int(s)]
results = Pool().map(func,range(4))
Each worker received a copy of t, func, and the portion of range(4) they were assigned. They are then automatically tracked and everything is cleaned up in the end by Pool.
Something like your dataLoading won't work very well, we need to modify it. I also cleaned the code a little.
def loadfromfile(file):
data = json.loads(open(file).read())
items = [data.get(k,"") for k in ['name','link','upCheck','isSuccess']]
return items[0],items[1:]
output = dict(Pool().map(loadfromfile,os.listdir()))

How to slow down asynchrounous API calls to match API limits?

I have a list of ~300K URLs for an API i need to get data from.
The API limit is 100 calls per second.
I have made a class for the asynchronous but this is working to fast and I am hitting an error on the API.
How do I slow down the asynchronous, so that I can make 100 calls per second?
import grequests
lst = ['url.com','url2.com']
class Test:
def __init__(self):
self.urls = lst
def exception(self, request, exception):
print ("Problem: {}: {}".format(request.url, exception))
def async(self):
return grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=5)
def collate_responses(self, results):
return [x.text for x in results]
test = Test()
#here we collect the results returned by the async function
results = test.async()
response_text = test.collate_responses(results)
The first step that I took was to create an object who can distribute a maximum of n coins every t ms.
import time
class CoinsDistribution:
"""Object that distribute a maximum of maxCoins every timeLimit ms"""
def __init__(self, maxCoins, timeLimit):
self.maxCoins = maxCoins
self.timeLimit = timeLimit
self.coin = maxCoins
self.time = time.perf_counter()
def getCoin(self):
if self.coin <= 0 and not self.restock():
return False
self.coin -= 1
return True
def restock(self):
t = time.perf_counter()
if (t - self.time) * 1000 < self.timeLimit:
return False
self.coin = self.maxCoins
self.time = t
return True
Now we need a way of forcing function to only get called if they can get a coin.
To do that we can write a decorator function that we could use like that:
#limitCalls(callLimit=1, timeLimit=1000)
def uniqFunctionRequestingServer1():
return 'response from s1'
But sometimes, multiple functions are calling requesting the same server so we would want them to get coins from the the same CoinsDistribution object.
Therefor, another use of the decorator would be by supplying the CoinsDistribution object:
server_2_limit = CoinsDistribution(3, 1000)
#limitCalls(server_2_limit)
def sendRequestToServer2():
return 'it worked !!'
#limitCalls(server_2_limit)
def sendAnOtherRequestToServer2():
return 'it worked too !!'
We now have to create the decorator, it can take either a CoinsDistribution object or enough data to create a new one.
import functools
def limitCalls(obj=None, *, callLimit=100, timeLimit=1000):
if obj is None:
obj = CoinsDistribution(callLimit, timeLimit)
def limit_decorator(func):
#functools.wraps(func)
def limit_wrapper(*args, **kwargs):
if obj.getCoin():
return func(*args, **kwargs)
return 'limit reached, please wait'
return limit_wrapper
return limit_decorator
And it's done ! Now you can limit the number of calls any API that you use and you can build a dictionary to keep track of your CoinsDistribution objects if you have to manage a lot of them (to differrent API endpoints or to different APIs).
Note: Here I have choosen to return an error message if there are no coins available. You should adapt this behaviour to your needs.
You can just keep track of how much time has passed and decide if you want to do more requests or not.
This will print 100 numbers per second, for example:
from datetime import datetime
import time
start = datetime.now()
time.sleep(1);
counter = 0
while (True):
end = datetime.now()
s = (end-start).seconds
if (counter >= 100):
if (s <= 1):
time.sleep(1) # You can keep track of the time and sleep less, actually
start = datetime.now()
counter = 0
print(counter)
counter += 1
This other question in SO shows exactly how to do this. By the way, what you need is usually called throttling.

Conditional if in asynchronous python program with twisted

I'm creating a program that uses the Twisted module and callbacks.
However, I keep having problems because the asynchronous part goes wrecked.
I have learned (also from previous questions..) that the callbacks will be executed at a certain point, but this is unpredictable.
However, I have a certain program that goes like
j = calc(a)
i = calc2(b)
f = calc3(c)
if s:
combine(i, j, f)
Now the boolean s is set by a callback done by calc3. Obviously, this leads to an undefined error because the callback is not executed before the s is needed.
However, I'm unsure how you SHOULD do if statements with asynchronous programming using Twisted. I've been trying many different things, but can't find anything that works.
Is there some way to use conditionals that require callback values?
Also, I'm using VIFF for secure computations (which uses Twisted): VIFF
Maybe what you're looking for is twisted.internet.defer.gatherResults:
d = gatherResults([calc(a), calc2(b), calc3(c)])
def calculated((j, i, f)):
if s:
return combine(i, j, f)
d.addCallback(calculated)
However, this still has the problem that s is undefined. I can't quite tell how you expect s to be defined. If it is a local variable in calc3, then you need to return it so the caller can use it.
Perhaps calc3 looks something like this:
def calc3(argument):
s = bool(argument % 2)
return argument + 1
So, instead, consider making it look like this:
Calc3Result = namedtuple("Calc3Result", "condition value")
def calc3(argument):
s = bool(argument % 2)
return Calc3Result(s, argument + 1)
Now you can rewrite the calling code so it actually works:
It's sort of unclear what you're asking here. It sounds like you know what callbacks are, but if so then you should be able to arrive at this answer yourself:
d = gatherResults([calc(a), calc2(b), calc3(c)])
def calculated((j, i, calc3result)):
if calc3result.condition:
return combine(i, j, calc3result.value)
d.addCallback(calculated)
Or, based on your comment below, maybe calc3 looks more like this (this is the last guess I'm going to make, if it's wrong and you'd like more input, then please actually share the definition of calc3):
def _calc3Result(result, argument):
if result == "250":
# SMTP Success response, yay
return Calc3Result(True, argument)
# Anything else is bad
return Calc3Result(False, argument)
def calc3(argument):
d = emailObserver("The argument was %s" % (argument,))
d.addCallback(_calc3Result)
return d
Fortunately, this definition of calc3 will work just fine with the gatherResults / calculated code block immediately above.
You have to put if in the callback. You may use Deferred to structure your callback.
As stated in previous answer - the preocessing logic should be handled in callback chain, below is simple code demonstration how this could work. C{DelayedTask} is a dummy implementation of a task which happens in the future and fires supplied deferred.
So we first construct a special object - C{ConditionalTask} which takes care of storring the multiple results and servicing callbacks.
calc1, calc2 and calc3 returns the deferreds, which have their callbacks pointed to C{ConditionalTask}.x_callback.
Every C{ConditionalTask}.x_callback does a call to C{ConditionalTask}.process which checks if all of the results have been registered and fires on a full set.
Additionally - C{ConditionalTask}.c_callback sets a flag of wheather or not the data should be processed at all.
from twisted.internet import reactor, defer
class DelayedTask(object):
"""
Delayed async task dummy implementation
"""
def __init__(self,delay,deferred,retVal):
self.deferred = deferred
self.retVal = retVal
reactor.callLater(delay, self.on_completed)
def on_completed(self):
self.deferred.callback(self.retVal)
class ConditionalTask(object):
def __init__(self):
self.resultA=None
self.resultB=None
self.resultC=None
self.should_process=False
def a_callback(self,result):
self.resultA = result
self.process()
def b_callback(self,result):
self.resultB=result
self.process()
def c_callback(self,result):
self.resultC=result
"""
Here is an abstraction for your "s" boolean flag, obviously the logic
normally would go further than just setting the flag, you could
inspect the result variable and do other strange stuff
"""
self.should_process = True
self.process()
def process(self):
if None not in (self.resultA,self.resultB,self.resultC):
if self.should_process:
print 'We will now call the processor function and stop reactor'
reactor.stop()
def calc(a):
deferred = defer.Deferred()
DelayedTask(5,deferred,a)
return deferred
def calc2(a):
deferred = defer.Deferred()
DelayedTask(5,deferred,a*2)
return deferred
def calc3(a):
deferred = defer.Deferred()
DelayedTask(5,deferred,a*3)
return deferred
def main():
conditional_task = ConditionalTask()
dFA = calc(1)
dFB = calc2(2)
dFC = calc3(3)
dFA.addCallback(conditional_task.a_callback)
dFB.addCallback(conditional_task.b_callback)
dFC.addCallback(conditional_task.c_callback)
reactor.run()

How can I refer to a function not by name in its definition in python?

I am maintaining a little library of useful functions for interacting with my company's APIs and I have come across (what I think is) a neat question that I can't find the answer to.
I frequently have to request large amounts of data from an API, so I do something like:
class Client(object):
def __init__(self):
self.data = []
def get_data(self, offset = 0):
done = False
while not done:
data = get_more_starting_at(offset)
self.data.extend(data)
offset += 1
if not data:
done = True
This works fine and allows me to restart the retrieval where I left off if something goes horribly wrong. However, since python functions are just regular objects, we can do stuff like:
def yo():
yo.hi = "yo!"
return None
and then we can interrogate yo about its properties later, like:
yo.hi => "yo!"
my question is: Can I rewrite my class-based example to pin the data to the function itself, without referring to the function by name. I know I can do this by:
def get_data(offset=0):
done = False
get_data.data = []
while not done:
data = get_more_starting_from(offset)
get_data.data.extend(data)
offset += 1
if not data:
done = True
return get_data.data
but I would like to do something like:
def get_data(offset=0):
done = False
self.data = [] # <===== this is the bit I can't figure out
while not done:
data = get_more_starting_from(offset)
self.data.extend(data) # <====== also this!
offset += 1
if not data:
done = True
return self.data # <======== want to refer to the "current" object
Is it possible to refer to the "current" object by anything other than its name?
Something like "this", "self", or "memememe!" is what I'm looking for.
I don't understand why you want to do this, but it's what a fixed point combinator allows you to do:
import functools
def Y(f):
#functools.wraps(f)
def Yf(*args):
return inner(*args)
inner = f(Yf)
return Yf
#Y
def get_data(f):
def inner_get_data(*args):
# This is your real get data function
# define it as normal
# but just refer to it as 'f' inside itself
print 'setting get_data.foo to', args
f.foo = args
return inner_get_data
get_data(1, 2, 3)
print get_data.foo
So you call get_data as normal, and it "magically" knows that f means itself.
You could do this, but (a) the data is not per-function-invocation, but per function (b) it's much easier to achieve this sort of thing with a class.
If you had to do it, you might do something like this:
def ybother(a,b,c,yrselflambda = lambda: ybother):
yrself = yrselflambda()
#other stuff
The lambda is necessary, because you need to delay evaluation of the term ybother until something has been bound to it.
Alternatively, and increasingly pointlessly:
from functools import partial
def ybother(a,b,c,yrself=None):
#whatever
yrself.data = [] # this will blow up if the default argument is used
#more stuff
bothered = partial(ybother, yrself=ybother)
Or:
def unbothered(a,b,c):
def inbothered(yrself):
#whatever
yrself.data = []
return inbothered, inbothered(inbothered)
This last version gives you a different function object each time, which you might like.
There are almost certainly introspective tricks to do this, but they are even less worthwhile.
Not sure what doing it like this gains you, but what about using a decorator.
import functools
def add_self(f):
#functools.wraps(f)
def wrapper(*args,**kwargs):
if not getattr(f, 'content', None):
f.content = []
return f(f, *args, **kwargs)
return wrapper
#add_self
def example(self, arg1):
self.content.append(arg1)
print self.content
example(1)
example(2)
example(3)
OUTPUT
[1]
[1, 2]
[1, 2, 3]

Why is this recursive statement wrong?

This is a bank simulation that takes into account 20 different serving lines with a single queue, customers arrive following an exponential rate and they are served during a time that follows a normal probability distribution with mean 40 and standard deviation 20.
Things were working just fine till I decided to exclude the negative values given by the normal distribution using this method:
def getNormal(self):
normal = normalvariate(40,20)
if (normal>=1):
return normal
else:
getNormal(self)
Am I screwing up the recursive call? I don't get why it wouldn't work. I have changed the getNormal() method to:
def getNormal(self):
normal = normalvariate(40,20)
while (normal <=1):
normal = normalvariate (40,20)
return normal
But I'm curious on why the previous recursive statement gets busted.
This is the complete source code, in case you're interested.
""" bank21: One counter with impatient customers """
from SimPy.SimulationTrace import *
from random import *
## Model components ------------------------
class Source(Process):
""" Source generates customers randomly """
def generate(self,number):
for i in range(number):
c = Customer(name = "Customer%02d"%(i,))
activate(c,c.visit(tiempoDeUso=15.0))
validateTime=now()
if validateTime<=600:
interval = getLambda(self)
t = expovariate(interval)
yield hold,self,t #esta es la rata de generaciĆ³n
else:
detenerGeneracion=999
yield hold,self,detenerGeneracion
class Customer(Process):
""" Customer arrives, is served and leaves """
def visit(self,tiempoDeUso=0):
arrive = now() # arrival time
print "%8.3f %s: Here I am "%(now(),self.name)
yield (request,self,counter),(hold,self,maxWaitTime)
wait = now()-arrive # waiting time
if self.acquired(counter):
print "%8.3f %s: Waited %6.3f"%(now(),self.name,wait)
tiempoDeUso=getNormal(self)
yield hold,self,tiempoDeUso
yield release,self,counter
print "%8.3f %s: Completed"%(now(),self.name)
else:
print "%8.3f %s: Waited %6.3f. I am off"%(now(),self.name,wait)
## Experiment data -------------------------
maxTime = 60*10.5 # minutes
maxWaitTime = 12.0 # minutes. maximum time to wait
## Model ----------------------------------
def model():
global counter
#seed(98989)
counter = Resource(name="Las maquinas",capacity=20)
initialize()
source = Source('Source')
firstArrival= expovariate(20.0/60.0) #chequear el expovariate
activate(source,
source.generate(number=99999),at=firstArrival)
simulate(until=maxTime)
def getNormal(self):
normal = normalvariate(40,20)
if (normal>=1):
return normal
else:
getNormal(self)
def getLambda (self):
actualTime=now()
if (actualTime <=60):
return 20.0/60.0
if (actualTime>60)and (actualTime<=120):
return 25.0/60.0
if (actualTime>120)and (actualTime<=180):
return 40.0/60.0
if (actualTime>180)and (actualTime<=240):
return 30.0/60.0
if (actualTime>240)and (actualTime<=300):
return 35.0/60.0
if (actualTime>300)and (actualTime<=360):
return 42.0/60.0
if (actualTime>360)and (actualTime<=420):
return 50.0/60.0
if (actualTime>420)and (actualTime<=480):
return 55.0/60.0
if (actualTime>480)and (actualTime<=540):
return 45.0/60.0
if (actualTime>540)and (actualTime<=600):
return 10.0/60.0
## Experiment ----------------------------------
model()
I think you want
return getnormal(self)
instead of
getnormal(self)
If the function exits without hitting a return statement, then it returns the special value None, which is a NoneType object - that's why Python complains about a 'NoneType.' The abs() function wants a number, and it doesn't know what to do with a None.
Also, you could avoid recursion (and the cost of creating a new stack frame) by using
def getNormal(self):
normal = 0
while normal < 1:
normal = normalvariate(40,20)
return normal
I am not entirely sure, but I think you need to change your method to the following:
def getNormal(self):
normal = normalvariate(40,20)
if (normal>=1):
return normal
else:
return getNormal(self)
You need to have:
return getNormal(self)
instead of
getNormal(self)
Really though, there's no need for recursion:
def getNormal(self):
normal = 0
while normal < 1:
normal = normalvariate(40,20)
return normal

Categories

Resources