threading or multiprocessing for load test script

threading or multiprocessing for load test script - python

I have written some code for testing the performance of a database when users are simultaneously running queries against it. The objective is to understand how the elapsed time increases with number of users. The code contains a class User (shown below) whose objects are created by parsing XML files.
class User(object):
def __init__(self, id, constr):
self.id = id
self.constr = constr
self.queryid = list()
self.queries = list()
def openConn(self):
self.cnxn = pyodbc.connect(self.constr)
logDet.info("%s %s"%(self.id,"Open connection."))
def closeConn(self):
self.cnxn.close()
logDet.info("%s %s"%(self.id,"Close connection."))
def executeAll(self):
self.openConn()
for n,qry in enumerate(self.queries):
try:
cursor = self.cnxn.cursor()
logTim.info("%s|%s|beg"%(self.id, self.queryid[n]))
cursor.execute(qry)
logTim.info("%s|%s|end"%(self.id, self.queryid[n]))
except Exception:
cursor.rollback()
logDet.exception("Error while running query.")
self.closeConn()
pyODBC is used for the connection to the database. Two logs are created -- one detailed (logDet) and one which has only the timings (logTim). The User objects are stored in a list. The queries for each user are also in a list (not in a thread-safe Queue).
To simulate parallel users, I have tried a couple of different approaches:
def worker(usr):
usr.executeAll()
Option 1: multiprocessing.Pool
pool = Pool(processes=len(users))
pool.map(worker, users)
Option 2: threading.Thread
for usr in users:
t = Thread(target=worker, args=(usr,))
t.start()
Both approaches work. In my test, I have tried for #users = 2,6,..,60, and each user has 4 queries. Given how the query times are captured, there should be less than a second of delay between the end of a query and beginning of next query i.e. queries should be fired one after the other. That's exactly what happens with multiprocessing but with threading, a random delay is introduced before the next query. The delay can be over a minute (see below).
Using: python3.4.1, pyodbc3.0.7; clients running code Windows 7/RHEL 6.5
I would really prefer to get this to work with threading. Is this expected in the threading approach or is there a command that I am missing? Or how can that be re-written? Thx.

When you use the threading-based approach, you're starting one thread per user, all the way up to 60 threads. All of those threads must fight for access to the GIL in between their I/O operations. This introduces a ton of overhead. You would probably see better results if you used a ThreadPool limited to a smaller number of threads (maybe 2 * multiprocessing.cpu_count()?), even with a higher number of users:
from multiprocessing.pool import ThreadPool
from multiprocessing import cpu_count
pool = ThreadPool(processes=cpu_count()*2)
pool.map(worker, users)
You may want to limit the number of concurrent processes you run, too, for memory-usage reasons. Starting up sixty concurrent Python processes is pretty expensive.

Related

How to parallelize "for" loops? [duplicate]

Say I have a very large list and I'm performing an operation like so:
for item in items:
try:
api.my_operation(item)
except:
print 'error with item'
My issue is two fold:
There are a lot of items
api.my_operation takes forever to return
I'd like to use multi-threading to spin up a bunch of api.my_operations at once so I can process maybe 5 or 10 or even 100 items at once.
If my_operation() returns an exception (because maybe I already processed that item) - that's OK. It won't break anything. The loop can continue to the next item.
Note: this is for Python 2.7.3

First, in Python, if your code is CPU-bound, multithreading won't help, because only one thread can hold the Global Interpreter Lock, and therefore run Python code, at a time. So, you need to use processes, not threads.
This is not true if your operation "takes forever to return" because it's IO-bound—that is, waiting on the network or disk copies or the like. I'll come back to that later.
Next, the way to process 5 or 10 or 100 items at once is to create a pool of 5 or 10 or 100 workers, and put the items into a queue that the workers service. Fortunately, the stdlib multiprocessing and concurrent.futures libraries both wraps up most of the details for you.
The former is more powerful and flexible for traditional programming; the latter is simpler if you need to compose future-waiting; for trivial cases, it really doesn't matter which you choose. (In this case, the most obvious implementation with each takes 3 lines with futures, 4 lines with multiprocessing.)
If you're using 2.6-2.7 or 3.0-3.1, futures isn't built in, but you can install it from PyPI (pip install futures).
Finally, it's usually a lot simpler to parallelize things if you can turn the entire loop iteration into a function call (something you could, e.g., pass to map), so let's do that first:
def try_my_operation(item):
try:
api.my_operation(item)
except:
print('error with item')
Putting it all together:
executor = concurrent.futures.ProcessPoolExecutor(10)
futures = [executor.submit(try_my_operation, item) for item in items]
concurrent.futures.wait(futures)
If you have lots of relatively small jobs, the overhead of multiprocessing might swamp the gains. The way to solve that is to batch up the work into larger jobs. For example (using grouper from the itertools recipes, which you can copy and paste into your code, or get from the more-itertools project on PyPI):
def try_multiple_operations(items):
for item in items:
try:
api.my_operation(item)
except:
print('error with item')
executor = concurrent.futures.ProcessPoolExecutor(10)
futures = [executor.submit(try_multiple_operations, group)
for group in grouper(5, items)]
concurrent.futures.wait(futures)
Finally, what if your code is IO bound? Then threads are just as good as processes, and with less overhead (and fewer limitations, but those limitations usually won't affect you in cases like this). Sometimes that "less overhead" is enough to mean you don't need batching with threads, but you do with processes, which is a nice win.
So, how do you use threads instead of processes? Just change ProcessPoolExecutor to ThreadPoolExecutor.
If you're not sure whether your code is CPU-bound or IO-bound, just try it both ways.
Can I do this for multiple functions in my python script? For example, if I had another for loop elsewhere in the code that I wanted to parallelize. Is it possible to do two multi threaded functions in the same script?
Yes. In fact, there are two different ways to do it.
First, you can share the same (thread or process) executor and use it from multiple places with no problem. The whole point of tasks and futures is that they're self-contained; you don't care where they run, just that you queue them up and eventually get the answer back.
Alternatively, you can have two executors in the same program with no problem. This has a performance cost—if you're using both executors at the same time, you'll end up trying to run (for example) 16 busy threads on 8 cores, which means there's going to be some context switching. But sometimes it's worth doing because, say, the two executors are rarely busy at the same time, and it makes your code a lot simpler. Or maybe one executor is running very large tasks that can take a while to complete, and the other is running very small tasks that need to complete as quickly as possible, because responsiveness is more important than throughput for part of your program.
If you don't know which is appropriate for your program, usually it's the first.

There's multiprocesing.pool, and the following sample illustrates how to use one of them:
from multiprocessing.pool import ThreadPool as Pool
# from multiprocessing import Pool
pool_size = 5 # your "parallelness"
# define worker function before a Pool is instantiated
def worker(item):
try:
api.my_operation(item)
except:
print('error with item')
pool = Pool(pool_size)
for item in items:
pool.apply_async(worker, (item,))
pool.close()
pool.join()
Now if you indeed identify that your process is CPU bound as #abarnert mentioned, change ThreadPool to the process pool implementation (commented under ThreadPool import). You can find more details here: http://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

You can split the processing into a specified number of threads using an approach like this:
import threading
def process(items, start, end):
for item in items[start:end]:
try:
api.my_operation(item)
except Exception:
print('error with item')
def split_processing(items, num_splits=4):
split_size = len(items) // num_splits
threads = []
for i in range(num_splits):
# determine the indices of the list this thread will handle
start = i * split_size
# special case on the last chunk to account for uneven splits
end = None if i+1 == num_splits else (i+1) * split_size
# create the thread
threads.append(
threading.Thread(target=process, args=(items, start, end)))
threads[-1].start() # start the thread we just created
# wait for all threads to finish
for t in threads:
t.join()
split_processing(items)

import numpy as np
import threading
def threaded_process(items_chunk):
""" Your main process which runs in thread for each chunk"""
for item in items_chunk:
try:
api.my_operation(item)
except Exception:
print('error with item')
n_threads = 20
# Splitting the items into chunks equal to number of threads
array_chunk = np.array_split(input_image_list, n_threads)
thread_list = []
for thr in range(n_threads):
thread = threading.Thread(target=threaded_process, args=(array_chunk[thr]),)
thread_list.append(thread)
thread_list[thr].start()
for thread in thread_list:
thread.join()

Threading advice for web crawler - scheduling with single list

I would like to add in multiple threading to my web crawler but I can see that the way the spider schedules links to be crawled may be incompatible with multi-threading. The crawler is only ever going to be active on a handful of news websites but rather than starting a new thread per domain I would prefer to have multiple threads opened on the same domain. My web crawling code is operated through the following function:
def crawl_links():
links_to_crawl.append(domain[0])
while len(links_to_crawl) > 0:
link = links_to_crawl[0]
if link in crawled_links or link in ignored_links:
del links_to_crawl[0]
else:
print '\n', link
try:
html = get_html(link)
GetLinks(html)
SaveFile(html)
crawled_links.append(links_to_crawl.pop(0))
except (ValueError, urllib2.URLError, Timeout.Timeout, httplib.IncompleteRead):
ignored_links.append(link_to_crawl.pop(0))
print 'Spider finished!'
print 'Ignored links:\n', ignored_links
print 'Crawled links:\n', crawled_links
print 'Relative links\n', relative_links
If my understanding of how threading will work is correct, if I simply opened multiple threads on this process they will all crawl the same links (potentially multiple times) or they will clash a bit. Without necessarily going into specifics, how would you advise to restructure the scheduling to make it compatible with multiple threads running at the same time?
I've given this some thought and the only workaround I could come up with is having the GetLinks() class appending links to multiple lists, with an individual list per thread... but this seems like quite a clumsy workaround.

Here is a general scheme that I have used in order to run a multi-threaded application in Python.
The scheme takes a table of input arguments, and executes in parallel one thread for each row.
Each thread takes one row, and executes sequentially one thread for each item in the row.
Each item contains a fixed number of arguments which are passed to the executed thread.
Input Example:
table = \
[
[[12,32,34],[11,20,14],[33,67,56],[10,20,45]],
[[21,21,67],[44,34,74],[23,12,54],[31,23,13]],
[[31,67,56],[34,22,67],[87,74,52],[87,74,52]],
]
In this example we will have 3 threads running in parallel, each one executing 4 threads sequentially.
In order to keep your threads balanced, it is advisable to have the same number of items in each row.
Threading Scheme:
import threading
import MyClass # This is for you to implement
def RunThreads(outFileName,errFileName):
# Create a shared object for saving the output of different threads
outFile = CriticalSection(outFileName)
# Create a shared object for saving the errors of different threads
errFile = CriticalSection(errFileName)
# Run in parallel one thread for each row in the input table
RunParallelThreads(outFile,errFile)
def RunParallelThreads(outFile,errFile):
# Create all the parallel threads
threads = [threading.Thread(target=RunSequentialThreads,args=(outFile,errFile,row)) for row in table]
# Start all the parallel threads
for thread in threads: thread.start()
# Wait for all the parallel threads to complete
for thread in threads: thread.join()
def RunSequentialThreads(outFile,errFile,row):
myObject = MyClass()
for item in row:
# Create a thread with the arguments given in the current item
thread = threading.Thread(target=myObject.Run,args=(outFile,errFile,item[0],item[1],item[2]))
# Start the thread
thread.start()
# Wait for the thread to complete, but only up to 600 seconds
thread.join(600)
# Terminate the thread if it hasn't completed up to this point
if thread.isAlive():
thread._Thread__stop()
errFile.write('Timeout on arguments: '+item[0]+' '+item[1]+' '+item[2]+'\n')
The class below implements an object which can be safely shared among different threads running in parallel. It provides a single interface method called write, which allows any thread to update the shared object in a safe manner (i.e., without the OS switching to another thread during the process).
import codecs
class CriticalSection:
def __init__(self,fileName):
self.mutex = threading.Lock()
self.fileDesc = codecs.open(fileName,mode='w',encoding='utf-8')
def __del__(self):
del self.mutex
self.fileDesc.close()
def write(self,data):
self.mutex.acquire()
self.fileDesc.write(data)
self.mutex.release()
The above scheme should allow you to control the level of "parallel-ness" and the level of "sequential-ness" within your application.
For example, you can use a single row for all the items, and have your application running in a complete sequential manner.
In contrast, you can place each item in a separate row, and have your application running in a complete parallel manner.
And of course, you can choose any combination of the above...
Note:
In MyClass, you will need to implement method Run, which will take the outFile and errFile objects, as well as the arguments that you have defined for each thread.

Python multithreading without a queue working with large data sets

I am running through a csv file of about 800k rows. I need a threading solution that runs through each row and spawns 32 threads at a time into a worker. I want to do this without a queue. It looks like current python threading solution with a queue is eating up alot of memory.
Basically want to read a csv file row and put into a worker thread. And only want 32 threads running at a time.
This is current script. It appears that it is reading the entire csv file into queue and doing a queue.join(). Is it correct that it is loading the entire csv into a queue then spawning the threads?
queue=Queue.Queue()
def worker():
while True:
task=queue.get()
try:
subprocess.call(['php {docRoot}/cli.php -u "api/email/ses" -r "{task}"'.format(
docRoot=docRoot,
task=task
)],shell=True)
except:
pass
with lock:
stats['done']+=1
if int(time.time())!=stats.get('now'):
stats.update(
now=int(time.time()),
percent=(stats.get('done')/stats.get('total'))*100,
ps=(stats.get('done')/(time.time()-stats.get('start')))
)
print("\r {percent:.1f}% [{progress:24}] {persec:.3f}/s ({done}/{total}) ETA {eta:<12}".format(
percent=stats.get('percent'),
progress=('='*int((23*stats.get('percent'))/100))+'>',
persec=stats.get('ps'),
done=int(stats.get('done')),
total=stats.get('total'),
eta=snippets.duration.time(int((stats.get('total')-stats.get('done'))/stats.get('ps')))
),end='')
queue.task_done()
for i in range(32):
workers=threading.Thread(target=worker)
workers.daemon=True
workers.start()
try:
with open(csvFile,'rb') as fh:
try:
dialect=csv.Sniffer().sniff(fh.readline(),[',',';'])
fh.seek(0)
reader=csv.reader(fh,dialect)
headers=reader.next()
except csv.Error as e:
print("\rERROR[CSV] {error}\n".format(error=e))
else:
while True:
try:
data=reader.next()
except csv.Error as e:
print("\rERROR[CSV] - Line {line}: {error}\n".format( line=reader.line_num, error=e))
except StopIteration:
break
else:
stats['total']+=1
queue.put(urllib.urlencode(dict(zip(headers,data)+dict(campaign=row.get('Campaign')).items())))
queue.join()

32 threads is probably overkill unless you have some humungous hardware available.
The rule of thumb for optimum number of threads or processes is: (no. of cores * 2) - 1
which comes to either 7 or 15 on most hardware.
The simplest way would be to start 7 threads passing each thread an "offset" as a parameter.
i.e. a number from 0 to 7.
Each thread would then skip rows until it reached the "offset" number and process that row. Having processed the row it can skip 6 rows and process the 7th -- repeat until no more rows.
This setup works for threads and multiple processes and is very efficient in I/O on most machines as all the threads should be reading roughly the same part of the file at any given time.
I should add that this method is particularly good for python as each thread is more or less independent once started and avoids the dreaded python global lock common to other methods.

I don't understand why you want to spawn 32 threads per row. However data processing in parallel in a fairly common embarassingly paralell thing to do and easily achievable with Python's multiprocessing library.
Example:
from multiprocessing import Pool
def job(args):
# do some work
inputs = [...] # define your inputs
Pool().map(job, inputs)
I leave it up to you to fill in the blanks to meet your specific requirements.
See: https://bitbucket.org/ccaih/ccav/src/tip/bin/ for many examples of this pattenr.

Other answers have explained how to use Pool without having to manage queues (it manages them for you) and that you do not want to set the number of processes to 32, but to your CPU count - 1. I would add two things. First, you may want to look at the pandas package, which can easily import your csv file into Python. The second is that the examples of using Pool in the other answers only pass it a function that takes a single argument. Unfortunately, you can only pass Pool a single object with all the inputs for your function, which makes it difficult to use functions that take multiple arguments. Here is code that allows you to call a previously defined function with multiple arguments using pool:
import multiprocessing
from multiprocessing import Pool
def multiplyxy(x,y):
return x*y
def funkytuple(t):
"""
Breaks a tuple into a function to be called and a tuple
of arguments for that function. Changes that new tuple into
a series of arguments and passes those arguments to the
function.
"""
f = t[0]
t = t[1]
return f(*t)
def processparallel(func, arglist):
"""
Takes a function and a list of arguments for that function
and proccesses in parallel.
"""
parallelarglist = []
for entry in arglist:
parallelarglist.append((func, tuple(entry)))
cpu_count = int(multiprocessing.cpu_count() - 1)
pool = Pool(processes = cpu_count)
database = pool.map(funkytuple, parallelarglist)
pool.close()
return database
#Necessary on Windows
if __name__ == '__main__':
x = [23, 23, 42, 3254, 32]
y = [324, 234, 12, 425, 13]
i = 0
arglist = []
while i < len(x):
arglist.append([x[i],y[i]])
i += 1
database = processparallel(multiplyxy, arglist)
print(database)

Your question is pretty unclear. Have you tried initializing your Queue to have a maximum size of, say, 64?
myq = Queue.Queue(maxsize=64)
Then a producer (one or more) trying to .put() new items on myq will block until consumers reduce the queue size to less than 64. This will correspondingly limit the amount of memory consumed by the queue. By default, queues are unbounded: if the producer(s) add items faster than consumers take them off, the queue can grow to consume all the RAM you have.
EDIT
This is current script. It appears that it is reading the
entire csv file into queue and doing a queue.join(). Is
it correct that it is loading the entire csv into a queue
then spawning the threads?
The indentation is messed up in your post, so have to guess some, but:
The code obviously starts 32 threads before it opens the CSV file.
You didn't show the code that creates the queue. As already explained above, if it's a Queue.Queue, by default it's unbounded, and can grow to any size if your main loop puts items on it faster than your threads remove items from it. Since you haven't said anything about what worker() does (or shown its code), we don't have enough information to guess whether that's the case. But that memory use is out of hand suggests that's the case.
And, as also explained, you can stop that easily by specifying a maximum size when you create the queue.
To get better answers, supply better info ;-)
ANOTHER EDIT
Well, the indentation is still messed up in spots, but it's better. Have you tried any suggestions? Looks like your worker threads each spawn a new process, so they'll take very much longer than it takes just to read another line from the csv file. So it's indeed very likely that you put items on the queue far faster than they're taken off. So, for the umpteenth time ;-), TRY initializing the queue with (say) maxsize=64. Then reveal what happens.
BTW, the bare except: clause in worker() is a Really Bad Idea. If anything goes wrong, you'll never know. If you have to ignore every possible exception (including even KeyboardInterrupt and SystemExit), at least log the exception info.
And note what #JamesAnderson said: unless you have extraordinary hardware resources, trying to run 32 processes at a time is almost certainly slower than running a number of processes that's no more than twice the number of available cores. Then again, that depends too a lot on what your PHP program does. If, for example, the PHP program uses disk I/O heavily, any multiprocessing may be slower than none.

Multiprocessing with python3 only runs once

I have a problem running multiple processes in python3 .
My program does the following:
1. Takes entries from an sqllite database and passes them to an input_queue
2. Create multiple processes that take items off the input_queue, run it through a function and output the result to the output queue.
3. Create a thread that takes items off the output_queue and prints them (This thread is obviously started before the first 2 steps)
My problem is that currently the 'function' in step 2 is only run as many times as the number of processes set, so for example if you set the number of processes to 8, it only runs 8 times then stops. I assumed it would keep running until it took all items off the input_queue.
Do I need to rewrite the function that takes the entries out of the database (step 1) into another process and then pass its output queue as an input queue for step 2?
Edit:
Here is an example of the code, I used a list of numbers as a substitute for the database entries as it still performs the same way. I have 300 items on the list and I would like it to process all 300 items, but at the moment it just processes 10 (the number of processes I have assigned)
#!/usr/bin/python3
from multiprocessing import Process,Queue
import multiprocessing
from threading import Thread
## This is the class that would be passed to the multi_processing function
class Processor:
def __init__(self,out_queue):
self.out_queue = out_queue
def __call__(self,in_queue):
data_entry = in_queue.get()
result = data_entry*2
self.out_queue.put(result)
#Performs the multiprocessing
def perform_distributed_processing(dbList,threads,processor_factory,output_queue):
input_queue = Queue()
# Create the Data processors.
for i in range(threads):
processor = processor_factory(output_queue)
data_proc = Process(target = processor,
args = (input_queue,))
data_proc.start()
# Push entries to the queue.
for entry in dbList:
input_queue.put(entry)
# Push stop markers to the queue, one for each thread.
for i in range(threads):
input_queue.put(None)
data_proc.join()
output_queue.put(None)
if __name__ == '__main__':
output_results = Queue()
def output_results_reader(queue):
while True:
item = queue.get()
if item is None:
break
print(item)
# Establish results collecting thread.
results_process = Thread(target = output_results_reader,args = (output_results,))
results_process.start()
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
# Perform multi processing
perform_distributed_processing(dbList,10,Processor,output_results)
# Wait for it all to finish.
results_process.join()

A collection of processes that service an input queue and write to an output queue is pretty much the definition of a process pool.
If you want to know how to build one from scratch, the best way to learn is to look at the source code for multiprocessing.Pool, which is pretty simply Python, and very nicely written. But, as you might expect, you can just use multiprocessing.Pool instead of re-implementing it. The examples in the docs are very nice.
But really, you could make this even simpler by using an executor instead of a pool. It's hard to explain the difference (again, read the docs for both modules), but basically, a future is a "smart" result object, which means instead of a pool with a variety of different ways to run jobs and get results, you just need a dumb thing that doesn't know how to do anything but return futures. (Of course in the most trivial cases, the code looks almost identical either way…)
from concurrent.futures import ProcessPoolExecutor
def Processor(data_entry):
return data_entry*2
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
yield from executor.map(processor_factory, dbList)
if __name__ == '__main__':
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
for result in perform_distributed_processing(dbList, 8, Processor):
print(result)
Or, if you want to handle them as they come instead of in order:
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
fs = (executor.submit(processor_factory, db) for db in dbList)
yield from map(Future.result, as_completed(fs))
Notice that I also replaced your in-process queue and thread, because it wasn't doing anything but providing a way to interleave "wait for the next result" and "process the most recent result", and yield (or yield from, in this case) does that without all the complexity, overhead, and potential for getting things wrong.

Don't try to rewrite the whole multiprocessing library again. I think you can use any of multiprocessing.Pool methods depending on your needs - if this is a batch job you can even use the synchronous multiprocessing.Pool.map() - only instead of pushing to input queue, you need to write a generator that yields input to the threads.

python, how to incrementally create Threads

I have a list of items aprox 60,000 items - i would like to send queries to the database to check if they exist and if they do return some computed results. I run an ordinary query, while iterating through the list one-by-one, the query has been running for the last 4 days. I thought i could use the threading module to improve on this. I did something like this
if __name__ == '__main__':
for ra, dec in candidates:
t = threading.Thread(target=search_sl, args=(ra,dec, q))
t.start()
t.join()
I tested with only 10 items and it worked fine - when i submitted the whole list of 60k items, i run into errors i.e, "maximum number of sessions exceeded". What I want to do is to create maybe 10 thread at a time. When the 1st bunch of thread have finished excuting, i send another request and so on.

You could try using a process pool, which is available in the multiprocessing module. Here is the example from the python docs:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
http://docs.python.org/library/multiprocessing.html#using-a-pool-of-workers
Try increasing the number of processes until you reach the maximum your system can support.

Improve your queries before threading (premature optimization is the root of all evil!)
Your problem is having 60,000 different queries on a single database. Having a single query for each item means a lot of overhead for opening the connection and invoking a DB cursor session.
Threading those queries can speed up your process, but yields another set of problems like DB overload and max sessions allowed.
First approach: Load many item IDs into every query
Instead, try to improve your queries. Can your write a query that sends a long list of products and returns the matches? Perhaps something like:
SELECT item_id, *
FROM items
WHERE item_id IN (id1, id2, id3, id4, id5, ....)
Python gives you convenient interfaces for this kind if queries, so that the IN clause can use a pythonic list. This way you can break your long list of items to, say, 60 queries with 1,000 ids each.
Second approach: Use a temporary table
Another interesting approach is creating a temporary table on the database with your item ids. Temporary tables lasts as long as the connection lives, so you won't have to worry about cleanups. Perhaps something like:
CREATE TEMPORARY TABLE
item_ids_list (id INT PRIMARY KEY); # Remember indexing!
Insert the ids using an appropriate Python library:
INSERT INTO item_ids_list ... # Insert your 60,000 items here
Get your results:
SELECT * FROM items WHERE items.id IN (SELECT * FROM items_ids_list);

First of all you join only the last thread. There is no guarantee that it will be finished the last. You should use like that:
from time import sleep
delay = 0.5
tlist = [threading.Thread(target=search_sl, args=(ra,dec, q)) for ra, dec in candidates ]
map(lambda t:t.start(), tlist)
while(any(map(lambda t:t.isAlive()))): sleep(delay)
The second issue is the running 60K threads at the moment requires really huge hardware resource :-) It's better to queue your tasks and then process by workers. The number of worker threads must be limited. Like that (haven't tested the code, but the idea is clear I hope):
from Queue import Queue
from threading import Thread
from time import sleep
tasks = Queue()
map(tasks.put, candidates)
maxthreads = 50
delay = 0.1
try:
threads = [Thread(target=search_sl, args=tasks.get()) \
for i in xrange(0,maxthreads) ]
except Queue.Empty:
pass
map(lambda t:t.start(), threads)
while not tasks.empty():
threads = filter(lambda t:t.isAlive(), threads)
while len(threads) < maxthreads:
try:
t = Thread(target=search_sl, args=tasks.get())
t.start()
threads.append(t)
except Queue.Empty:
break
sleep(delay)
while(any(map(lambda t:t.isAlive(), threads))): sleep(delay)

Since it's an IO task, neither of thread or process is good for it. You use those if you need to parallelize computational tasks. So, be modern please ™, use something like gevent for parallel IO intensive tasks.
http://www.gevent.org/intro.html#example

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.