Python question on threads - python

I am trying to run two thread concurrently on two function like the ones listed below :
import threading
def functionA():
for i in range(5):
print "Calling function A"
def functionB():
for i in range(5):
print "Calling function B"
t1 = threading.Thread(functionA())
t2 = threading.Thread(functionB())
t1.start()
t2.start()
Results :
Calling function A
Calling function A
Calling function A
Calling function A
Calling function A
Calling function B
Calling function B
Calling function B
Calling function B
Calling function B
But unfortunately after trying out several times. I am not able to get the result
The Desired results :
Calling function A
Calling function B
Calling function A
Calling function B
Calling function A
Calling function B
Calling function A
Calling function B
Calling function A
Can someone guide me so that the two threads can run concurrently at the same time and produce the desired results. Thanks in advance.

You're calling the functions and pass the result to the Thread constructor instead of passing the function. Also, you must use the target argument (instead of the unused group which comes first). Just use Thread(target=functionA) and Thread(target=functionB). Note the lack of parens after the functions.
Note that you still won't get multithreading in CPython, but that's a different question.

#delnan already answered how to use Thread correctly, so I'm going to focus on what you want the desired output to be.
You will most likely NOT be able to get the desired output that you want. The timing of when threads execute is not guaranteed, especially in Python. The OS scheduling can impact when each thread gets to run. When running two threads like this, you're effectively saying "these two pieces of work do not depend on the order of each other and can be run at the same time".
You could get output like this:
a,a,b,b,a,a,b,b,a,b
Or:
a,b,b,b,b,b,a,a,a,a
It will change on every execution of your program. Do NOT rely on the order of thread execution!
Threading in Python is a dangerous beast. No two threads are ever running within Python at exactly the same time. Read up on the Global Interpret Lock for more information.

You are writing a new thread, the operating system takes care of how threads use the processor. That's why the sorting isn't regular. You should use another varible to define which function has turn. But still a bad idea.

It will be great if python 3.2 is release as looking at the link below there are built in libraries that can help me achieve my goals.
http://docs.python.org/dev/library/concurrent.futures.html
But nevertheless will look into the alternative provided by other helpful memebers. Thanks for the help provided once again.

Related

Why is lambda function execution time different when passed as callable vs being passed as a string statement to timeit.repeat()?

I got different results of the following two python timeit lines.
print(min(timeit.repeat(lambda: 1+1)))
print(min(timeit.repeat('lambda: 1+1')))
The output is something like:
0.13658121100002063
0.10372773000017332
Could you pls help explain the difference between them?
On second sight, this is a really interesting question!
But first, please have another look at the docs:
The constructor takes a statement to be timed, an additional statement used for setup, and a timer function. Both statements default to 'pass'; the timer function is platform-dependent (see the module doc string).
[...]
The stmt and setup parameters can also take objects that are callable without arguments. This will embed calls to them in a timer function that will then be executed by timeit(). Note that the timing overhead is a little larger in this case because of the extra function calls.
When you manage to not fall for the trap to attribute the observed difference to the function call overhead, you notice: the first argument is either a callable that is called or a statement that is executed.
So, in your two lines of code you measure the performance of two different things.
In the first line you pass a callable that is being called and its execution time is measured:
timeit.repeat(lambda: 1+1)
Here you pass a statement that is being executed and its execution time is measured:
timeit.repeat('lambda: 1+1')
Note that in the second case you don't actually call the function, but measure the time it takes to create the lambda!
If you again wanted to measure the execution time of the function call, you should have written something like this:
timeit.repeat('test()', 'test=lambda: 1+1')
For comparison, look at this example:
import time
print(min(timeit.repeat(lambda: time.sleep(1), number=1)))
print(min(timeit.repeat('lambda: time.sleep(1)', number=1)))
The output clearly shows the difference (first calls function, second creates function):
1.0009081270000024
5.370002327254042e-07

Accessing a parameter passed to one function in another function

I have two functions:
def f1(p1=raw_input("enter data")):
...do something
def f2(p2=raw_input("enter data")):
...do something else
p1 and p2 are the same data, so I want to avoid asking for the input twice. Is there a way I can pass the argument supplied to f1 to f2 without asking for it again? Ideally I would be able to use something like you would in a class. Like f1.p1Is this possible?
EDIT: To add some clarity, I looked into using the ** operator to unpack arguments and I'm aware that using the main body of the program to access the arguments is cleaner. However, the former does not match what I'm trying to do, which is gain a better understanding of what is accessible in a function. I also looked at using the inspect and locals, but these are for inspecting arguments within the function, not outside.
Yes, depending on your needs. The best would be to ask in the main program, and simply pass that value to each function as you call it. Another possibility is to have one function call the other.
# Main program
user_input = raw_input("enter data")
f1(user_input)
f2(user_input)
Ideally I would be able to use something like you would in a class.
Like f1.p1 Is this possible?
That's an advanced technique, and generally dangerous practice. Yes, you can go into the call stack, get the function object, and grab the local variable -- but the function has to be active for this to have any semantic use.
That's not the case you presented. In your code, you have f1 and f2 independently called. Once you return from f1, the value of p1 is popped off the stack and lost.
If you have f1 call f2, then it's possible for f2 to reach back to its parent and access information. Don't go there. :-)

What does the delayed() function do (when used with joblib in Python)

I've read through the documentation, but I don't understand what is meant by:
The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax.
I'm using it to iterate over the list I want to operate on (allImages) as follows:
def joblib_loop():
Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages)
This returns my HOG features, like I want (and with the speed gain using all my 8 cores), but I'm just not sure what it is actually doing.
My Python knowledge is alright at best, and it's very possible that I'm missing something basic. Any pointers in the right direction would be most appreciated
Perhaps things become clearer if we look at what would happen if instead we simply wrote
Parallel(n_jobs=8)(getHog(i) for i in allImages)
which, in this context, could be expressed more naturally as:
Create a Parallel instance with n_jobs=8
create a generator for the list [getHog(i) for i in allImages]
pass that generator to the Parallel instance
What's the problem? By the time the list gets passed to the Parallel object, all getHog(i) calls have already returned - so there's nothing left to execute in Parallel! All the work was already done in the main thread, sequentially.
What we actually want is to tell Python what functions we want to call with what arguments, without actually calling them - in other words, we want to delay the execution.
This is what delayed conveniently allows us to do, with clear syntax. If we want to tell Python that we'd like to call foo(2, g=3) sometime later, we can simply write delayed(foo)(2, g=3). Returned is the tuple (foo, [2], {g: 3}), containing:
a reference to the function we want to call, e.g.foo
all arguments (short "args") without a keyword, e.g.t 2
all keyword arguments (short "kwargs"), e.g. g=3
So, by writing Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages), instead of the above sequence, now the following happens:
A Parallel instance with n_jobs=8 gets created
The list
[delayed(getHog)(i) for i in allImages]
gets created, evaluating to
[(getHog, [img1], {}), (getHog, [img2], {}), ... ]
That list is passed to the Parallel instance
The Parallel instance creates 8 threads and distributes the tuples from the list to them
Finally, each of those threads starts executing the tuples, i.e., they call the first element with the second and the third elements unpacked as arguments tup[0](*tup[1], **tup[2]), turning the tuple back into the call we actually intended to do, getHog(img2).
we need a loop to test a list of different model configurations. This is the main function that drives the grid search process and will call the score_model() function for each model configuration. We can dramatically speed up the grid search process by evaluating model configurations in parallel. One way to do that is to use the Joblib library . We can define a Parallel object with the number of cores to use and set it to the number of scores detected in your hardware.
define executor
executor = Parallel(n_jobs=cpu_count(), backend= 'multiprocessing' )
then create a list of tasks to execute in parallel, which will be one call to the score model() function for each model configuration we have.
suppose def score_model(data, n_test, cfg):
........................
define list of tasks
tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)
we can use the Parallel object to execute the list of tasks in parallel.
scores = executor(tasks)
So what you want to be able to do is pile up a set of function calls and their arguments in such a way that you can pass them out efficiently to a scheduler/executor. Delayed is a decorator that takes in a function and its args and wraps them into an object that can be put in a list and popped out as needed. Dask has the same thing which it uses in part to feed into its graph scheduler.
From reference https://wiki.python.org/moin/ParallelProcessing
The Parallel object creates a multiprocessing pool that forks the Python interpreter in multiple processes to execute each of the items of the list. The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax.
Another thing I would like to suggest is instead of explicitly defining num of cores we can generalize like this:
import multiprocessing
num_core=multiprocessing.cpu_count()

What happens when a python function that takes arguments called without defining parameters?

L=[5,10,4,2,8,7]
def compare(a,b):
return cmp(b,a)
L.sort(compare)
print (L[-2])
L.sort()
print (L[2])
When this code is run, why a exception is not thrown since the function compare called within sort function has not been given exactly two arguments?
when the code is run it gives the output as,
4
5
You are not calling directly the function
You are passing to the sort function a 'pointer' or a 'reference' to the function that will be used as a comparator
if you need more debugging info add a line that print what is being compared as follow
L=[5,10,4,2,8,7]
def compare(a,b):
print 'comparing ',a,b
return cmp(b,a)
L.sort(compare)
and you will also notice that the number of calls depends on L dis-order
there are several ways to solve the problem of order an array
some of that are merge-sort , binary-sort and still others
this is the link to the source code of python list object
You are not calling compare in L.sort(compare). You are just passing a reference to the function that will be called internally by sort.
You don't need any other arguments for that. Remember that in order to call a function you use (), so if you were actually calling it you would have written L.sort(compare())

How to use AST to make specific statements commented out in Python?

I have a Python script but I don't want to change it.
I want to use another script to modify the original one and call to run the original one with all the "print" or "time.sleep" statements being commentted out(not run).
I search for it and find a method using AST, but I really don't have a idea of how to use it.
Thank you very much!
You might be able to manipulate the AST to achieve that, but it would probably be easier to monkeypatch whatever objects it uses prior to running. In your specific example, to incapacitate print and time.sleep, you could do this:
def insomniac(duration):
pass # don't sleep
_original_sleep = time.sleep
time.sleep = insomniac
def dont_write(stuff):
pass # don't write
_original_write = sys.stdout.write
sys.stdout.write = dont_write
To get the functionality back, you can set the relevant functions back to the stored originals. If you want to be truer to your original intention such that calls to these functions from the script in question are nullified but calls from other modules still work, you can inspect the stack to see what module the caller is in and selectively call the original or ignore the call.

Categories

Resources