I am trying to simulate a situation where we have 5 machines that occur in a 1 -> 3 -> 1 situation. i.e the 3 in the middle operate in parallel to reduce the effective time they take.
I can easily simulate this by create a SimPy resource with a value of three like this:
simpy.Resource(env, capacity=3)
However in my situation each of the three resources operates slightly differently and sometimes I want to be able to use any of them (when I'm operating) or book a specific one (when i want to clean). Basically the three machines slowly foul up at different rates and operate slower, I want to be able to simulate these and also enable a clean to occur when one gets too dirty.
I have tried a few ways of simulating this but have come up with problems and issues every time.
The first was when it booked the resource it also booked one of the 3 machines (A,B,C) globals flags and a flag itself to tell it which machine it was using. This works but it's not clean and makes it really difficult to understand what is occurring with huge if statements everywhere.
The second was to model it as three separate resources and then try to wait and request one of the 3 machines with something like:
reqA = A.res.request()
reqB = B.res.request()
reqC = C.res.request()
unitnumber = yield reqA | reqB | reqC
yield env.process(batch_op(env, name, machineA, machineB, machineC, unitnumber))
But this doesn't work and I can't work out the best way to look at yielding one of a choice.
What would be the best way to simulate this scenario. For completeness here is what im looking for:
Request any of 3 machines
Request a specific machine
Have each machine track it's history
Have each machines characteristics be different. i.e on fouls up faster but works faster initially
Detect and schedule a clean based on the performance or indicator
This is what I have so far on my latest version of trying to model each as seperate resources
class Machine(object):
def __init__(self, env, cycletime, cleantime, k1foul, k2foul):
self.env = env
self.res = simpy.Resource(env, 1)
self.cycletime = cycletime
self.cleantime = cleantime
self.k1foul = k1foul
self.k2foul = k2foul
self.batchessinceclean = 0
def operate(self):
self.cycletime = self.cycletime + self.k2foul * np.log(self.k1foul * self.batchessinceclean + 1)
self.batchessinceclean += 1
yield self.env.timeout(self.cycletime)
def clean(self):
print('%s begin cleaning at %s' % (self.env.now))
self.batchessinceclean = 0
yield env.timeout(self.cleantime)
print('%s finished cleaning at %s' % (self.env.now))
You should try (Filter)Store:
import simpy
def user(machine):
m = yield machine.get()
print(m)
yield machine.put(m)
m = yield machine.get(lambda m: m['id'] == 1)
print(m)
yield machine.put(m)
m = yield machine.get(lambda m: m['health'] > 98)
print(m)
yield machine.put(m)
env = simpy.Environment()
machine = simpy.FilterStore(env, 3)
machine.put({'id': 0, 'health': 100})
machine.put({'id': 1, 'health': 95})
machine.put({'id': 2, 'health': 97.2})
env.process(user(machine))
env.run()
Related
Hey guys I have a script that compares each possible user and checks how similar their text is:
dictionary = {
t.id: (
t.text,
t.set,
t.compare_string
)
for t in dataframe.itertuples()
}
highly_similar = []
for a, b in itertools.combinations(dictionary.items(), 2):
if a[1][2] == b[1][2] and not a[1][1].isdisjoint(b[1][1]):
similarity_score = fuzz.ratio(a[1][0], b[1][0])
if (similarity_score >= 95 and len(a[1][0]) >= 10) or similarity_score == 100:
highly_similar.append([a[0], b[0], a[1][0], b[1][0], similarity_score])
This script takes around 15 minutes to run, the dataframe contains 120k users, so comparing each possible combination takes quite a bit of time, if I just write pass on the for loop it takes 2 minutes to loop through all values.
I tried using filter() and map() for the if statements and fuzzy score but the performance was worse. I tried improving the script as much as I could but I don't know how I can improve this further.
Would really appreciate some help!
It is slightly complicated to reason about the data since you have not attached it, but we can see multiple places that might provide an improvement:
First, let's rewrite the code in a way which is easier to reason about than using the indices:
dictionary = {
t.id: (
t.text,
t.set,
t.compare_string
)
for t in dataframe.itertuples()
}
highly_similar = []
for a, b in itertools.combinations(dictionary.items(), 2):
a_id, (a_text, a_set, a_compre_string) = a
b_id, (b_text, b_set, b_compre_string) = b
if (a_compre_string == b_compre_string
and not a_set.isdisjoint(b_set)):
similarity_score = fuzz.ratio(a_text, b_text)
if (similarity_score >= 95 and len(a_text) >= 10)
or similarity_score == 100):
highly_similar.append(
[a_id, b_id, a_text, b_text, similarity_score])
You seem to only care about pairs having the same compare_string values. Therefore, and assuming this is not something that all pairs share, we can key by whatever that value is to cover much less pairs.
To put some numbers into it, let's say you have 120K inputs, and 1K values for each value of val[1][2] - then instead of covering 120K * 120K = 14 * 10^9 combinations, you would have 120 bins of size 1K (where in each bin we'd need to check all pairs) = 120 * 1K * 1K = 120 * 10^6 which is about 1000 times faster. And it would be even faster if each bin has less than 1K elements.
import collections
# Create a dictionary from compare_string to all items
# with the same compare_string
items_by_compare_string = collections.defaultdict(list)
for item in dictionary.items():
compare_string = item[1][2]
items_by_compare_string[compare_string].append(items)
# Iterate over each group of items that have the same
# compare string
for item_group in items_by_compare_string.values():
# Check pairs only within that group
for a, b in itertools.combinations(item_group, 2):
# No need to compare the compare_strings!
if not a_set.isdisjoint(b_set):
similarity_score = fuzz.ratio(a_text, b_text)
if (similarity_score >= 95 and len(a_text) >= 10)
or similarity_score == 100):
highly_similar.append(
[a_id, b_id, a_text, b_text, similarity_score])
But, what if we want more speed? Let's look at the remaining operations:
We have a check to find if two sets share at least one item
This seems like an obvious candidate for optimization if we have any knowledge about these sets (to allow us to determine which pairs are even relevant to compare)
Without additional knowledge, and just looking at every two pairs and trying to speed this up, I doubt we can do much - this is probably highly optimized using internal details of Python sets, I don't think it's likely to optimize it further
We a fuzz.ratio computation which is some external function, and I'm going to assume is heavy
If you are using this from the FuzzyWuzzy package, make sure to install python-Levenshtein to get the speedups detailed here
We have some comparisons which we are unlikely to be able to speed up
We might be able to cache the length of a_text by nesting the two loops, but that's negligible
We have appends to a list, which runs on average ("amortized") constant time per operation, so we can't really speed that up
Therefore, I don't think we can reasonably suggest any more speedups without additional knowledge. If we know something about the sets that can help optimize which pairs are relevant we might be able to speed things up further, but I think this is about it.
EDIT: As pointed out in other answers, you can obviously run the code in multi-threading. I assumed you were looking for an algorithmic change that would possibly reduce the number of operations significantly, instead of just splitting these over more CPUs.
Essentially, from python programming side, i see two things that can improve your processing time:
Multi-threads and Vectorized operations
From the fuzzy score side, here is a list of tips you can use to improve your processing time (new anonymous tab to avoid paywall):
https://towardsdatascience.com/fuzzy-matching-at-scale-84f2bfd0c536
Using multi thread you can speed you operation up to N times, being N the number of threads in you CPU. You can check it with:
import multiprocessing
multiprocessing.cpu_count()
Using vectorized operations you can parallel process your operations in low level with SIMD (single instruction / multiple data) operations, or with gpu tensor operations (like those in tensorflow/pytorch).
Here is a small comparison of results for each case:
import numpy as np
import time
A = [np.random.rand(512) for i in range(2000)]
B = [np.random.rand(512) for i in range(2000)]
high_similarity = []
def measure(i,j,a,b,high_similarity):
d = ((a-b)**2).sum()
if d>12:
high_similarity.append((i,j,d))
start_single_thread = time.time()
for i in range(len(A)):
for j in range(len(B)):
if i<j:
measure(i,j,A[i],B[j],high_similarity)
finis_single_thread = time.time()
print("single thread time:",finis_single_thread-start_single_thread)
out[0] single thread time: 147.64517450332642
running on multi thread:
from threading import Thread
high_similarity = []
def measure(a = None,b= None,high_similarity = None):
d = ((a-b)**2).sum()
if d > 12:
high_similarity.append(d)
start_multi_thread = time.time()
for i in range(len(A)):
for j in range(len(B)):
if i<j:
thread = Thread(target=measure,kwargs= {'a':A[i],'b':B[j],'high_similarity':high_similarity} )
thread.start()
thread.join()
finish_multi_thread = time.time()
print("time to run on multi threads:",finish_multi_thread - start_multi_thread)
out[1] time to run on multi-threads: 11.946279764175415
A_array = np.array(A)
B_array = np.array(B)
start_vectorized = time.time()
for i in range(len(A_array)):
#vectorized distance operation
dists = (A_array-B_array)**2
high_similarity+= dists[dists>12].tolist()
aux = B_array[-1]
np.delete(B_array,-1)
np.insert(B_array, 0, aux)
finish_vectorized = time.time()
print("time to run vectorized operations:",finish_vectorized-start_vectorized)
out[2] time to run vectorized operations: 2.302949905395508
Note that you can't guarantee any order of execution, so will you also need to store the index of results. The snippet of code is just to illustrate that you can use parallel process, but i highly recommend to use a pool of threads and divide your dataset in N subsets for each worker and join the final result (instead of create a thread for each function call like i did).
Intro:
Hello. I am exploring the python rxpy library for my use case - where I am building an execution pipeline using the reactive programming concepts. This way I expect I would not have to manipulate too many states. Though my solution seems to be functional, but I am having trouble trying to compose a new Observable from other Observables.
The problem is that the way I am composing my observables is causing some expensive calculations to be repeated twice. For performance, I really want to prevent triggering expensive calculations.
I am very new the reactive programming. Trying to scratch my head and have looked through internet resources and reference documentation - seems a little too terse for me to grasp. Please advice.
Following is a toy example which illustrates what I am doing:
import rx
from rx import operators as op
from rx.subject import Subject
root = Subject()
foo = root.pipe(
op.map( lambda x : x + 1 ),
op.do_action(lambda r: print("foo(x) = %s (expensive)" % str(r)))
)
bar_foo = foo.pipe(
op.map( lambda x : x * 2 ),
op.do_action(lambda r: print("bar(foo(x)) = %s" % str(r)))
)
bar_foo.pipe(
op.zip(foo),
op.map(lambda i: i[0]+i[1]),
op.do_action(lambda r: print("foo(x) + bar(foo(x)) = %s" % str(r)))
).subscribe()
print("-------------")
root.on_next(10)
print("-------------")
Output:
-------------
foo(x) = 11 (expensive)
bar(foo(x)) = 22
foo(x) = 11 (expensive)
foo(x) + bar(foo(x)) = 33
-------------
You could think of foo() and bar() to be expensive and complex operations. I first build an observable foo. Then compose a new observable bar_foo that incorporates foo. Later both are zipped together to calculate the final result foo(x)+bar(foo(x)).
Question:
What can I do to prevent foo() from getting triggered more than once for a single input?
I have really strong reasons to keep foo() and bar() separate. Also I also do not want to explicitly memoize foo().
Anyone with experience using rxpy in production could share their experiences. Will using rxpy lead to better performance or slowdowns as compared to an equivalent hand crafted (but unmaintainable) code?
Adding op.share() right after the expensive calculation in the foo pipeline could be useful here. So changing the foo pipeline to:
foo = root.pipe(
op.map( lambda x : x + 1 ),
op.do_action(lambda r: print("foo(x) = %s (expensive)" % str(r))),
op.share() # added to pipeline
)
will result in:
-------------
foo(x) = 11 (expensive)
bar(foo(x)) = 22
foo(x) + bar(foo(x)) = 33
-------------
I believe that .share() makes the emitted events of the expensive operation being shared among downstream subscribers, so that the result of a single expensive calculation can be used multiple times.
Regarding your second question; I am new to RxPy as well, so interested in the answer of more experienced users. Until now, I've noticed that as a beginner you can easily create (bad) pipelines where messages and calculations are repeated in the background. .share() seems to reduce this to some extend, but not sure about what is happening in the background.
Using the threading library to accelerate calculating each point's neighborhood in a points-cloud. By calling function CalculateAllPointsNeighbors at the bottom of the post.
The function receives a search radius, maximum number of neighbors and a number of threads to split the work on. No changes are done on any of the points. And each point stores data in its own np.ndarray cell accessed by its own index.
The following function times how long it takes N number of threads to finish calculating all points neighborhoods:
def TimeFuncThreads(classObj, uptothreads):
listTimers = []
startNum = 1
EndNum = uptothreads + 1
for i in range(startNum, EndNum):
print("Current Number of Threads to Test: ", i)
tempT = time.time()
classObj.CalculateAllPointsNeighbors(searchRadius=0.05, maxNN=25, maxThreads=i)
tempT = time.time() - tempT
listTimers.append(tempT)
PlotXY(np.arange(startNum, EndNum), listTimers)
The problem is, I've been getting very different results in each run. Here are the plots from 5 subsequent runs of the function TimeFuncThreads. The X axis is number of threads, Y is the runtime. First thing is, they look totally random. And second, there is no significant acceleration boost.
I'm confused now whether I'm using the threading library wrong and what is this behavior that I'm getting?
The function that handles the threading and the function that is being called from each thread:
def CalculateAllPointsNeighbors(self, searchRadius=0.20, maxNN=50, maxThreads=8):
threadsList = []
pointsIndices = np.arange(self.numberOfPoints)
splitIndices = np.array_split(pointsIndices, maxThreads)
for i in range(maxThreads):
threadsList.append(threading.Thread(target=self.GetPointsNeighborsByID,
args=(splitIndices[i], searchRadius, maxNN)))
[t.start() for t in threadsList]
[t.join() for t in threadsList]
def GetPointsNeighborsByID(self, idx, searchRadius=0.05, maxNN=20):
if isinstance(idx, int):
idx = [idx]
for currentPointIndex in idx:
currentPoint = self.pointsOpen3D.points[currentPointIndex]
pointNeighborhoodObject = self.GetPointNeighborsByCoordinates(currentPoint, searchRadius, maxNN)
self.pointsNeighborsArray[currentPointIndex] = pointNeighborhoodObject
self.__RotatePointNeighborhood(currentPointIndex)
It pains me to be the one to introduce you to the Python Gil. Is a very nice feature that makes parallelism using threads in Python a nightmare.
If you really want to improve your code speed, you should be looking at the multiprocessing module
I am trying to run gensim WMD similarity faster. Typically, this is what is in the docs:
Example corpus:
my_corpus = ["Human machine interface for lab abc computer applications",
>>> "A survey of user opinion of computer system response time",
>>> "The EPS user interface management system",
>>> "System and human system engineering testing of EPS",
>>> "Relation of user perceived response time to error measurement",
>>> "The generation of random binary unordered trees",
>>> "The intersection graph of paths in trees",
>>> "Graph minors IV Widths of trees and well quasi ordering",
>>> "Graph minors A survey"]
my_query = 'Human and artificial intelligence software programs'
my_tokenized_query =['human','artificial','intelligence','software','programs']
model = a trained word2Vec model on about 100,000 documents similar to my_corpus.
model = Word2Vec.load(word2vec_model)
from gensim import Word2Vec
from gensim.similarities import WmdSimilarity
def init_instance(my_corpus,model,num_best):
instance = WmdSimilarity(my_corpus, model,num_best = 1)
return instance
instance[my_tokenized_query]
the best matched document is "Human machine interface for lab abc computer applications" which is great.
However the function instance above takes an extremely long time. So I thought of breaking up the corpus into N parts and then doing WMD on each with num_best = 1, then at the end of it, the part with the max score will be the most similar.
from multiprocessing import Process, Queue ,Manager
def main( my_query,global_jobs,process_tmp):
process_query = gensim.utils.simple_preprocess(my_query)
def worker(num,process_query,return_dict):
instance=init_instance\
(my_corpus[num*chunk+1:num*chunk+chunk], model,1)
x = instance[process_query][0][0]
y = instance[process_query][0][1]
return_dict[x] = y
manager = Manager()
return_dict = manager.dict()
for num in range(num_workers):
process_tmp = Process(target=worker, args=(num,process_query,return_dict))
global_jobs.append(process_tmp)
process_tmp.start()
for proc in global_jobs:
proc.join()
return_dict = dict(return_dict)
ind = max(return_dict.iteritems(), key=operator.itemgetter(1))[0]
print corpus[ind]
>>> "Graph minors A survey"
The problem I have with this is that, even though it outputs something, it doesn't give me a good similar query from my corpus even though it gets the max similarity of all the parts.
Am I doing something wrong?
Comment: chunk is a static variable: e.g. chunk = 600 ...
If you define chunk static, then you have to compute num_workers.
10001 / 600 = 16,6683333333 = 17 num_workers
It's common to use not more process than cores you have.
If you have 17 cores, that's ok.
cores are static, therefore you should:
num_workers = os.cpu_count()
chunk = chunksize(my_corpus, num_workers)
Not the same result, changed to:
#process_query = gensim.utils.simple_preprocess(my_query)
process_query = my_tokenized_query
All worker results Index 0..n.
Therefore, return_dict[x] could be overwritten from last worker with same Index having lower value. The Index in return_dict is NOT the same as Index in my_corpus. Changed to:
#return_dict[x] = y
return_dict[ (num * chunk)+x ] = y
Using +1 in chunk size computing, will skip that first Document.
I don't know how you compute chunk, consider this example:
def chunksize(iterable, num_workers):
c_size, extra = divmod(len(iterable), num_workers)
if extra:
c_size += 1
if len(iterable) == 0:
c_size = 0
return c_size
#Usage
chunk = chunksize(my_corpus, num_workers)
...
#my_corpus_chunk = my_corpus[num*chunk+1:num*chunk+chunk]
my_corpus_chunk = my_corpus[num * chunk:(num+1) * chunk]
Results: 10 cycle, Tuple=(Index worker num=0, Index worker num=1)
With multiprocessing, with chunk=5:
02,09:(3, 8), 01,03:(3, 5):
System and human system engineering testing of EPS
04,06,07:(0, 8), 05,08:(0, 5), 10:(0, 7):
Human machine interface for lab abc computer applications
Without multiprocessing, with chunk=5:
01:(3, 6), 02:(3, 5), 05,08,10:(3, 7), 07,09:(3, 8):
System and human system engineering testing of EPS
03,04,06:(0, 5):
Human machine interface for lab abc computer applications
Without multiprocessing, without chunking:
01,02,03,04,06,07,08:(3, -1):
System and human system engineering testing of EPS
05,09,10:(0, -1):
Human machine interface for lab abc computer applications
Tested with Python: 3.4.2
Using Python 2.7:
I used threading instead of multi-processing.
In the WMD-Instance creation thread, I do something like this:
wmd_instances = []
if wmd_instance_count > len(wmd_corpus):
wmd_instance_count = len(wmd_corpus)
chunk_size = int(len(wmd_corpus) / wmd_instance_count)
for i in range(0, wmd_instance_count):
if i == wmd_instance_count -1:
wmd_instance = WmdSimilarity(wmd_corpus[i*chunk_size:], wmd_model, num_results)
else:
wmd_instance = WmdSimilarity(wmd_corpus[i*chunk_size:chunk_size], wmd_model, num_results)
wmd_instances.append(wmd_instance)
wmd_logic.setWMDInstances(wmd_instances, chunk_size)
'wmd_instance_count' is the number of threads to use to search. I also remember the chunk-size. Then, when I want to search for something, I start "wmd_instance_count"-threads to search for and they return found sims:
def perform_query_for_job_on_instance(wmd_logic, wmd_instances, query, jobID, instance):
wmd_instance = wmd_instances[instance]
sims = wmd_instance[query]
wmd_logic.set_mt_thread_result(jobID, instance, sims)
'wmd_logic' is the instance of a class that then does this:
def set_mt_thread_result(self, jobID, instance, sims):
res = []
#
# We need to scale the found ids back to our complete corpus size...
#
for sim in sims:
aSim = (int(sim[0] + (instance * self.chunk_size)), sim[1])
res.append(aSim)
I know, the code isn't nice, but it works. It uses 'wmd_instance_count' threads to find results, I aggregate them and then choose the top-10 or something like that.
Hope this helps.
I have this code in Pylons that calculates the network usage of the Linux system on which the webapp runs. Basically, to calculate the network utilization, we need to read the file /proc/net/dev twice, which gives us the the amount of transmitted data, and divide the subtracted values by the time elapsed between two reads.
I don’t want to do this calculation at regular intervals. There’s a JS code which periodically fetches this data. The transfer rate is the average amount of transmitted bytes between two requests per time unit. In Pylons, I used pylons.app_globals to store the reading which is going to be subtracted from the next reading at subsequent request. But apparently there’s no app_globals in Pyramid and I’m not sure if using thread locals is the correct course of action. Also, although request.registry.settings is apparently shared across all requests, I’m reluctant to store my data there, as the name implies it should only store the settings.
def netUsage():
netusage = {'rx':0, 'tx':0, 'time': time.time()}
rtn = {}
net_file = open('/proc/net/dev')
for line in net_file.readlines()[2:]:
tmp = map(string.atof, re.compile('\d+').findall(line[line.find(':'):]))
if line[:line.find(':')].strip() == "lo":
continue
netusage['rx'] += tmp[0]
netusage['tx'] += tmp[8]
net_file.close()
rx = netusage['rx'] - app_globals.prevNetusage['rx'] if app_globals.prevNetusage['rx'] else 0
tx = netusage['tx'] - app_globals.prevNetusage['tx'] if app_globals.prevNetusage'tx'] else 0
elapsed = netusage['time'] - app_globals.prevNetusage['time']
rtn['rx'] = humanReadable(rx / elapsed)
rtn['tx'] = humanReadable(tx / elapsed)
app_globals.prevNetusage = netusage
return rtn
#memorize(duration = 3)
def getSysStat():
memTotal, memUsed = getMemUsage()
net = netUsage()
loadavg = getLoadAverage()
return {'cpu': getCPUUsage(),
'mem': int((memUsed / memTotal) * 100),
'load1': loadavg[0],
'load5': loadavg[1],
'load15': loadavg[2],
'procNum': loadavg[3],
'lastProc': loadavg[4],
'rx': net['rx'],
'tx': net['tx']
}
Using request thread locals is considered bad design and should not be abused according to official pyramid docs.
My advice is to use some simple key-value storage like memcached or redis if possible.