I've been testing a cacheing system of my making. Its purpose is to speed up a Django web application. It stores everything in-memory. According to cProfile most of the time in my tests is spent inside QuerySet._clone() which turns out to be terribly inefficient (it's actually not that strange given the implementation).
I was having high hopes for using PyPy to speed things up. I've got a 64-bit machine. However after installing all the required libraries it turns out that PyPy compiled code runs about 2.5x slower than regular Python code, and I don't know what to make out of it. The code is CPU bound (there are absolutely no database queries, so IO-bounding is not an option). A single test runs for about 10 seconds, so I guess it should be enough for JIT to kick in. I'm using PyPy 1.5. One note - I didn't compile the sources myself, just downloaded a 64-bit linux version.
I'd like to know how frequent it is for a CPU intensive code to actually run slower under PyPy. Is there hopefully something wrong I could have done that would prevent PyPy from running at its best.
EDIT
Exact cPython output:
PyPy 1.5:
3439146 function calls (3218654 primitive calls) in 19.094 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
2/1 0.000 0.000 18.956 18.956 <string>:1(<module>)
2/1 0.000 0.000 18.956 18.956 /path/to/my/project/common/integrity/models/transactions.py:200(newfn)
2/1 0.000 0.000 18.956 18.956 /path/to/my/project/common/integrity/models/transactions.py:134(recur)
2/1 0.000 0.000 18.956 18.956 /usr/local/pypy/site-packages/django/db/transaction.py:210(inner)
2/1 0.172 0.086 18.899 18.899 /path/to/my/project/common/integrity/tests/optimization.py:369(func_cached)
9990 0.122 0.000 18.632 0.002 /usr/local/pypy/site-packages/django/db/models/manager.py:131(get)
9990 0.127 0.000 16.638 0.002 /path/to/my/project/common/integrity/models/cache.py:1068(get)
9990 0.073 0.000 12.478 0.001 /usr/local/pypy/site-packages/django/db/models/query.py:547(filter)
9990 0.263 0.000 12.405 0.001 /path/to/my/project/common/integrity/models/cache.py:1047(_filter_or_exclude)
9990 0.226 0.000 12.096 0.001 /usr/local/pypy/site-packages/django/db/models/query.py:561(_filter_or_exclude)
9990 0.187 0.000 8.383 0.001 /path/to/my/project/common/integrity/models/cache.py:765(_clone)
9990 0.212 0.000 7.662 0.001 /usr/local/pypy/site-packages/django/db/models/query.py:772(_clone)
9990 1.025 0.000 7.125 0.001 /usr/local/pypy/site-packages/django/db/models/sql/query.py:226(clone)
129942/49972 1.674 0.000 6.021 0.000 /usr/local/pypy/lib-python/2.7/copy.py:145(deepcopy)
140575/110605 0.120 0.000 4.066 0.000 {len}
9990 0.182 0.000 3.972 0.000 /usr/local/pypy/site-packages/django/db/models/query.py:74(__len__)
19980 0.260 0.000 3.777 0.000 /path/to/my/project/common/integrity/models/cache.py:1062(iterator)
9990 0.255 0.000 3.154 0.000 /usr/local/pypy/site-packages/django/db/models/sql/query.py:1149(add_q)
9990 0.210 0.000 3.073 0.000 /path/to/my/project/common/integrity/models/cache.py:973(_query)
9990 0.371 0.000 2.316 0.000 /usr/local/pypy/site-packages/django/db/models/sql/query.py:997(add_filter)
9990 0.364 0.000 2.168 0.000 /path/to/my/project/common/integrity/models/cache.py:892(_deduct)
29974/9994 0.448 0.000 2.078 0.000 /usr/local/pypy/lib-python/2.7/copy.py:234(_deepcopy_tuple)
19990 0.362 0.000 2.065 0.000 /path/to/my/project/common/integrity/models/cache.py:566(__init__)
10000 0.086 0.000 1.874 0.000 /path/to/my/project/common/integrity/models/cache.py:1090(get_query_set)
19990 0.269 0.000 1.703 0.000 /usr/local/pypy/site-packages/django/db/models/query.py:31(__init__)
9990 0.122 0.000 1.643 0.000 /path/to/my/project/common/integrity/models/cache.py:836(_deduct_recur)
19980 0.274 0.000 1.636 0.000 /usr/local/pypy/site-packages/django/utils/tree.py:55(__deepcopy__)
9990 0.607 0.000 1.458 0.000 /path/to/my/project/common/integrity/models/cache.py:789(_deduct_local)
10020 0.633 0.000 1.437 0.000 /usr/local/pypy/site-packages/django/db/models/sql/query.py:99(__init__)
129942 0.841 0.000 1.191 0.000 /usr/local/pypy/lib-python/2.7/copy.py:267(_keep_alive)
9994/9992 0.201 0.000 1.019 0.000 /usr/local/pypy/lib-python/2.7/copy.py:306(_reconstruct)
Python 2.7:
3326403 function calls (3206359 primitive calls) in 12.430 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 12.457 12.457 <string>:1(<module>)
1 0.000 0.000 12.457 12.457 /path/to/my/project/common/integrity/models/transactions.py:200(newfn)
1 0.000 0.000 12.457 12.457 /path/to/my/project/common/integrity/models/transactions.py:134(recur)
1 0.000 0.000 12.457 12.457 /usr/local/lib/python2.7/dist-packages/django/db/transaction.py:210(inner)
1 0.000 0.000 12.457 12.457 /path/to/my/project/common/integrity/models/transactions.py:165(recur2)
1 0.089 0.089 12.450 12.450 /path/to/my/project/common/integrity/tests/optimization.py:369(func_cached)
9990 0.198 0.000 12.269 0.001 /usr/local/lib/python2.7/dist-packages/django/db/models/manager.py:131(get)
9990 0.087 0.000 11.281 0.001 /path/to/my/project/common/integrity/models/cache.py:1068(get)
9990 0.040 0.000 8.161 0.001 /usr/local/lib/python2.7/dist-packages/django/db/models/query.py:547(filter)
9990 0.110 0.000 8.121 0.001 /path/to/my/project/common/integrity/models/cache.py:1047(_filter_or_exclude)
9990 0.127 0.000 7.983 0.001 /usr/local/lib/python2.7/dist-packages/django/db/models/query.py:561(_filter_or_exclude)
9990 0.100 0.000 5.593 0.001 /path/to/my/project/common/integrity/models/cache.py:765(_clone)
9990 0.122 0.000 5.125 0.001 /usr/local/lib/python2.7/dist-packages/django/db/models/query.py:772(_clone)
9990 0.405 0.000 4.899 0.000 /usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py:226(clone)
129942/49972 1.456 0.000 4.505 0.000 /usr/lib/python2.7/copy.py:145(deepcopy)
129899/99929 0.191 0.000 3.117 0.000 {len}
9990 0.111 0.000 2.968 0.000 /usr/local/lib/python2.7/dist-packages/django/db/models/query.py:74(__len__)
19980 0.070 0.000 2.843 0.000 /path/to/my/project/common/integrity/models/cache.py:1062(iterator)
9990 0.208 0.000 2.190 0.000 /path/to/my/project/common/integrity/models/cache.py:973(_query)
9990 0.182 0.000 2.114 0.000 /usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py:1149(add_q)
19984/9994 0.291 0.000 1.644 0.000 /usr/lib/python2.7/copy.py:234(_deepcopy_tuple)
9990 0.288 0.000 1.599 0.000 /usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py:997(add_filter)
9990 0.171 0.000 1.454 0.000 /path/to/my/project/common/integrity/models/cache.py:892(_deduct)
19980 0.177 0.000 1.208 0.000 /usr/local/lib/python2.7/dist-packages/django/utils/tree.py:55(__deepcopy__)
9990 0.099 0.000 1.199 0.000 /path/to/my/project/common/integrity/models/cache.py:836(_deduct_recur)
9990 0.349 0.000 1.040 0.000 /path/to/my/project/common/integrity/models/cache.py:789(_deduct_local)
Brushing aside the fact that PyPy might really be intrinsically slower for your case, there are some factors that could be making it unnecessarily slower:
Profiling is known to slow PyPy a lot more than CPython.
Some debugging/logging code can disable optimizations (by, e.g., forcing frames).
The server you're using can be a dominant factor in performance (think about how awful classic CGI would be with a JIT: it would never warm up). It can also simply influence results (different WSGI servers have shown various speed-ups).
Old-style classes are slower than new-style ones.
Even if everything is in memory, you could be hitting e.g. slow paths in PyPy's SQLite.
You can also check the JIT Friendliness wiki page for more hints about what can make PyPy slower. A nightly build will probably be faster too, as there are many improvements relative to 1.5.
A more detailed description of your stack (server, OS, DB) and setup (how did you benchmark? how many queries?) would allow us to give better answers.
Related
Profiling Django app to figure out slow functions.
I just added some middleware to track function calls, following this blog: http://agiliq.com/blog/2015/07/profiling-django-middlewares/ and I see that the entry of cProfile stats for {posix.write} is one of the longest.
Any idea what that is, and where that comes from?
Other functions are referenced by their name and package path, so I'm not sure what {posix.write} means.
the log looks like this:
204051 function calls (197141 primitive calls) in 0.997 seconds
Ordered by: internal time
List reduced from 1204 to 50 due to restriction <50>
ncalls tottime percall cumtime percall filename:lineno(function)
35 0.305 0.009 0.305 0.009 {posix.write}
95 0.206 0.002 0.207 0.002 {method 'execute' of 'psycopg2.extensions.cursor' objects}
73 0.088 0.001 0.088 0.001 {select.select}
898 0.023 0.000 0.047 0.000 /.venv/lib/python2.7/site-packages/django/db/models/base.py:388(__init__)
1642 0.012 0.000 0.371 0.000 /.venv/lib/python2.7/site-packages/django/template/base.py:806(_resolve_lookup)
1 0.010 0.010 0.011 0.011 {_sass.compile_filename}
1 0.009 0.009 0.009 0.009 {psycopg2._psycopg._connect}
34 0.009 0.000 0.009 0.000 {method 'recv' of '_socket.socket' objects}
39 0.007 0.000 0.007 0.000 {posix.read}
9641/6353 0.006 0.000 0.321 0.000 {getattr}
173 0.006 0.000 0.026 0.000 /.venv/lib/python2.7/site-packages/django/core/urlresolvers.py:425(_reverse_with_prefix)
25769 0.006 0.000 0.007 0.000 {isinstance}
EDIT:
I understand that posix.write is the write function of posix. That I need to understand I guess is what part of Django uses that a lot and why it is showing up as taking 300+ms.
How would I go about tracking this down?
Thanks
I am using Python 2.7 (Anaconda distribution) on Windows 8.1 Pro.
I have a database of articles with their respective topics.
I am building an application which queries textual phrases in my database and associates article topics to each queried phrase. The topics are assigned based on the relevance of the phrase for the article.
The bottleneck seems to be Python socket communication with the localhost.
Here are my cProfile outputs:
topics_fit (PhraseVectorizer_1_1.py:668)
function called 1 times
1930698 function calls (1929630 primitive calls) in 148.209 seconds
Ordered by: cumulative time, internal time, call count
List reduced from 286 to 40 due to restriction <40>
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.224 1.224 148.209 148.209 PhraseVectorizer_1_1.py:668(topics_fit)
206272 0.193 0.000 146.780 0.001 cursor.py:1041(next)
601 0.189 0.000 146.455 0.244 cursor.py:944(_refresh)
534 0.030 0.000 146.263 0.274 cursor.py:796(__send_message)
534 0.009 0.000 141.532 0.265 mongo_client.py:725(_send_message_with_response)
534 0.002 0.000 141.484 0.265 mongo_client.py:768(_reset_on_error)
534 0.019 0.000 141.482 0.265 server.py:69(send_message_with_response)
534 0.002 0.000 141.364 0.265 pool.py:225(receive_message)
535 0.083 0.000 141.362 0.264 network.py:106(receive_message)
1070 1.202 0.001 141.278 0.132 network.py:127(_receive_data_on_socket)
3340 140.074 0.042 140.074 0.042 {method 'recv' of '_socket.socket' objects}
535 0.778 0.001 4.700 0.009 helpers.py:88(_unpack_response)
535 3.828 0.007 3.920 0.007 {bson._cbson.decode_all}
67 0.099 0.001 0.196 0.003 {method 'sort' of 'list' objects}
206187 0.096 0.000 0.096 0.000 PhraseVectorizer_1_1.py:705(<lambda>)
206187 0.096 0.000 0.096 0.000 database.py:339(_fix_outgoing)
206187 0.074 0.000 0.092 0.000 objectid.py:68(__init__)
1068 0.005 0.000 0.054 0.000 server.py:135(get_socket)
1068/534 0.010 0.000 0.041 0.000 contextlib.py:21(__exit__)
1068 0.004 0.000 0.041 0.000 pool.py:501(get_socket)
534 0.003 0.000 0.028 0.000 pool.py:208(send_message)
534 0.009 0.000 0.026 0.000 pool.py:573(return_socket)
567 0.001 0.000 0.026 0.000 socket.py:227(meth)
535 0.024 0.000 0.024 0.000 {method 'sendall' of '_socket.socket' objects}
534 0.003 0.000 0.023 0.000 topology.py:134(select_server)
206806 0.020 0.000 0.020 0.000 collection.py:249(database)
418997 0.019 0.000 0.019 0.000 {len}
449 0.001 0.000 0.018 0.000 topology.py:143(select_server_by_address)
534 0.005 0.000 0.018 0.000 topology.py:82(select_servers)
1068/534 0.001 0.000 0.018 0.000 contextlib.py:15(__enter__)
534 0.002 0.000 0.013 0.000 thread_util.py:83(release)
207307 0.010 0.000 0.011 0.000 {isinstance}
534 0.005 0.000 0.011 0.000 pool.py:538(_get_socket_no_auth)
534 0.004 0.000 0.011 0.000 thread_util.py:63(release)
534 0.001 0.000 0.011 0.000 mongo_client.py:673(_get_topology)
535 0.003 0.000 0.010 0.000 topology.py:57(open)
206187 0.008 0.000 0.008 0.000 {method 'popleft' of 'collections.deque' objects}
535 0.002 0.000 0.007 0.000 topology.py:327(_apply_selector)
536 0.003 0.000 0.007 0.000 topology.py:286(_ensure_opened)
1071 0.004 0.000 0.007 0.000 periodic_executor.py:50(open)
In particular: {method 'recv' of '_socket.socket' objects} seems to cause trouble.
According to suggestions found in What can I do to improve socket performance in Python 3?, I tried gevent.
I added this snippet at the beginning of my script (before importing anything):
from gevent import monkey
monkey.patch_all()
This resulted in even slower performance...
*** PROFILER RESULTS ***
topics_fit (PhraseVectorizer_1_1.py:671)
function called 1 times
1956879 function calls (1951292 primitive calls) in 158.260 seconds
Ordered by: cumulative time, internal time, call count
List reduced from 427 to 40 due to restriction <40>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 158.170 158.170 hub.py:358(run)
1 0.000 0.000 158.170 158.170 {method 'run' of 'gevent.core.loop' objects}
2/1 1.286 0.643 158.166 158.166 PhraseVectorizer_1_1.py:671(topics_fit)
206272 0.198 0.000 156.670 0.001 cursor.py:1041(next)
601 0.192 0.000 156.203 0.260 cursor.py:944(_refresh)
534 0.029 0.000 156.008 0.292 cursor.py:796(__send_message)
534 0.012 0.000 150.514 0.282 mongo_client.py:725(_send_message_with_response)
534 0.002 0.000 150.439 0.282 mongo_client.py:768(_reset_on_error)
534 0.017 0.000 150.437 0.282 server.py:69(send_message_with_response)
551/535 0.002 0.000 150.316 0.281 pool.py:225(receive_message)
552/536 0.079 0.000 150.314 0.280 network.py:106(receive_message)
1104/1072 0.815 0.001 150.234 0.140 network.py:127(_receive_data_on_socket)
2427/2395 0.019 0.000 149.418 0.062 socket.py:381(recv)
608/592 0.003 0.000 48.541 0.082 socket.py:284(_wait)
552 0.885 0.002 5.464 0.010 helpers.py:88(_unpack_response)
552 4.475 0.008 4.577 0.008 {bson._cbson.decode_all}
3033 2.021 0.001 2.021 0.001 {method 'recv' of '_socket.socket' objects}
7/4 0.000 0.000 0.221 0.055 hub.py:189(_import)
4 0.127 0.032 0.221 0.055 {__import__}
67 0.104 0.002 0.202 0.003 {method 'sort' of 'list' objects}
536/535 0.003 0.000 0.142 0.000 topology.py:57(open)
537/536 0.002 0.000 0.139 0.000 topology.py:286(_ensure_opened)
1072/1071 0.003 0.000 0.138 0.000 periodic_executor.py:50(open)
537/536 0.001 0.000 0.136 0.000 server.py:33(open)
537/536 0.001 0.000 0.135 0.000 monitor.py:69(open)
20/19 0.000 0.000 0.132 0.007 topology.py:342(_update_servers)
4 0.000 0.000 0.131 0.033 hub.py:418(_get_resolver)
1 0.000 0.000 0.122 0.122 resolver_thread.py:13(__init__)
1 0.000 0.000 0.122 0.122 hub.py:433(_get_threadpool)
206187 0.081 0.000 0.101 0.000 objectid.py:68(__init__)
206187 0.100 0.000 0.100 0.000 database.py:339(_fix_outgoing)
206187 0.098 0.000 0.098 0.000 PhraseVectorizer_1_1.py:708(<lambda>)
1 0.073 0.073 0.093 0.093 threadpool.py:2(<module>)
2037 0.003 0.000 0.092 0.000 hub.py:159(get_hub)
2 0.000 0.000 0.090 0.045 thread.py:39(start_new_thread)
2 0.000 0.000 0.090 0.045 greenlet.py:195(spawn)
2 0.000 0.000 0.090 0.045 greenlet.py:74(__init__)
1 0.000 0.000 0.090 0.090 hub.py:259(__init__)
1102 0.004 0.000 0.078 0.000 pool.py:501(get_socket)
1068 0.005 0.000 0.074 0.000 server.py:135(get_socket)
This performance is somewhat unacceptable for my application - I would like it to be much faster (this is timed and profiled for a subset of ~20 documents, and I need to process few tens of thousands).
Any ideas on how to speed it up?
Much appreciated.
Edit:
Code snippet that I profiled:
# also tried monkey patching all here, see profiler
from pymongo import MongoClient
def topics_fit(self):
client = MongoClient()
# tried motor for multithreading - also slow
#client = motor.motor_tornado.MotorClient()
# initialize DB cursors
db_wiki = client.wiki
# initialize topic feature dictionary
self.topics = OrderedDict()
self.topic_mapping = OrderedDict()
vocabulary_keys = self.vocabulary.keys()
num_categories = 0
for phrase in vocabulary_keys:
phrase_tokens = phrase.split()
if len(phrase_tokens) > 1:
# query for current phrase
AND_phrase = "\"" + phrase + "\""
cursor = db_wiki.categories.find({ "$text" : { "$search": AND_phrase } },{ "score": { "$meta": "textScore" } })
cursor = list(cursor)
if cursor:
cursor.sort(key=lambda k: k["score"], reverse = True)
added_categories = cursor[0]["category_ids"]
for added_category in added_categories:
if not (added_category in self.topics):
self.topics[added_category] = num_categories
if not (self.vocabulary[phrase] in self.topic_mapping):
self.topic_mapping[self.vocabulary[phrase]] = [num_categories, ]
else:
self.topic_mapping[self.vocabulary[phrase]].append(num_categories)
num_categories+=1
else:
if not (self.vocabulary[phrase] in self.topic_mapping):
self.topic_mapping[self.vocabulary[phrase]] = [self.topics[added_category], ]
else:
self.topic_mapping[self.vocabulary[phrase]].append(self.topics[added_category])
Edit 2: output of index_information():
{u'_id_':
{u'ns': u'wiki.categories', u'key': [(u'_id', 1)], u'v': 1},
u'article_title_text_article_body_text_category_names_text': {u'default_language': u'english', u'weights': SON([(u'article_body', 1), (u'article_title', 1), (u'category_names', 1)]), u'key': [(u'_fts', u'text'), (u'_ftsx', 1)], u'v': 1, u'language_override': u'language', u'ns': u'wiki.categories', u'textIndexVersion': 2}}
I'm implementing a RANSAC algorithm for circle detection in images. I profiled the execution and I get:
13699392 function calls in 799.981 seconds
Random listing order was used
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 {time.time}
579810 0.564 0.000 0.564 0.000 {getattr}
289905 2.343 0.000 8.661 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/blas.py:226(_get_funcs)
579810 0.124 0.000 0.124 0.000 {method 'get' of 'dict' objects}
289905 0.645 0.000 2.676 0.000 {map}
2954 0.005 0.000 0.005 0.000 {method 'transpose' of 'numpy.ndarray' objects}
2954 0.023 0.000 0.464 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/shape_base.py:179(vstack)
2954 2.373 0.001 2.373 0.001 {method 'read' of 'cv2.VideoCapture' objects}
579810 0.966 0.000 2.031 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/function_base.py:550(asarray_chkfinite)
289905 10.164 0.000 24.316 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/basic.py:456(lstsq)
2954 1.090 0.000 1.090 0.000 {normalize}
1455433 3.827 0.000 3.827 0.000 {numpy.core.multiarray.array}
579810 2.899 0.000 3.148 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numerictypes.py:949(_can_coerce_all)
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.empty}
2954 32.544 0.011 795.875 0.269 git/tra-python-processer/tra/ransac.py:31(image_search)
289905 0.714 0.000 38.644 0.000 git/tra-python-processer/tra/features.py:44(__init__)
289905 2.157 0.000 2.157 0.000 {method 'randint' of 'mtrand.RandomState' objects}
1 0.005 0.005 0.005 0.005 {VideoCapture}
289905 1.026 0.000 1.026 0.000 {method 'astype' of 'numpy.generic' objects}
2954 0.006 0.000 0.010 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py:495(transpose)
289905 11.303 0.000 37.930 0.000 git/tra-python-processer/tra/features.py:48(__gen)
3496584 0.343 0.000 0.343 0.000 {len}
2954 0.344 0.000 0.344 0.000 {numpy.core.multiarray.concatenate}
2954 3.214 0.001 3.214 0.001 {numpy.core.multiarray.where}
869715 0.575 0.000 0.575 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2514(size)
869715 0.778 0.000 2.278 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:394(asarray)
289905 716.946 0.002 716.946 0.002 git/tra-python-processer/tra/features.py:89(points_distance)
5908 0.015 0.000 0.031 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:464(asanyarray)
289905 0.275 0.000 0.275 0.000 {isinstance}
289905 0.342 0.000 9.003 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/lapack.py:255(get_lapack_funcs)
5908 0.058 0.000 0.097 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/shape_base.py:60(atleast_2d)
295813 0.089 0.000 0.089 0.000 {method 'append' of 'list' objects}
289905 0.645 0.000 3.793 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numerictypes.py:970(find_common_type)
2954 0.221 0.000 0.221 0.000 {threshold}
1 0.000 0.000 0.000 0.000 {method 'get' of 'cv2.VideoCapture' objects}
1 0.000 0.000 0.000 0.000 git/tra-python-processer/tra/ransac.py:24(__init__)
2954 0.009 0.000 0.009 0.000 {numpy.core.multiarray.zeros}
579810 0.143 0.000 0.143 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/misc.py:126(_datacopied)
1 0.201 0.201 799.981 799.981 git/tra-python-processer/tra/ransac.py:122(video_processing)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2954 1.528 0.001 1.528 0.001 {cvtColor}
289905 1.280 0.000 5.346 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/blas.py:182(find_best_blas_type)
289905 0.198 0.000 0.198 0.000 {method 'index' of 'list' objects}
It's first time I use profiler, however for what I can understand the function that is most heavy is features.py:89(points_distance) that comes out to be a very easy implementation:
def points_distance(self,points):
d = n.abs(\
n.sqrt(\
n.power(self.xc - points[:,0],2) + n.power(self.yc - points[:,1],2)
)\
- self.radius
)
return d
Any suggestions? Maybe cython?
Use scipy.spatial.distance.cdist for the distance calculation in points_distance.
First, optimize your code in pure Python and numpy. Then if necessary port the critical parts to Cython. Since a number of functions are called repeatedly a few ~100000 times, you should get some speed up from Cython for those parts. Unless, of course, the computational bottleneck is in the distance calculation, which will then limit the overall execution time.
By the way, you should sort your profiler results by tottime so they are easier to read.
Implement a function with signature find_chars(string1, string2) that
takes two strings and returns a string that contains only the
characters found in string1 and string two in the order that they are
found in string1. Implement a version of order N*N and one of order N.
(Source: http://thereq.com/q/138/python-software-interview-question/characters-in-strings)
Here are my solutions:
Order N*N:
def find_chars_slow(string1, string2):
res = []
for char in string1:
if char in string2:
res.append(char)
return ''.join(res)
So the for loop goes through N elements, and each char in string2 check is another N operations so this gives N*N.
Order N:
from collections import defaultdict
def find_char_fast(string1, string2):
d = defaultdict(int)
for char in string2:
d[char] += 1
res = []
for char in string1:
if char in d:
res.append(char)
return ''.join(res)
First store the characters of string2 as a dictionary (O(N)). Then scan string1 (O(N)) and check if it is in the dict (O(1)). This gives a total runtime of O(2N) = O(N).
Is the above correct? Is there a faster method?
Your solution is algorithmically correct (the first is O(n**2), and the second is O(n)) but you're doing some things that are going to be possible red flags to an interviewer.
The first function is basically okay. You might get bonus points for writing it like this:
''.join([c for c in string1 if c in string2])
..which does essentially the same thing.
My problem (if I'm wearing my interviewer pants) with how you've written the second function is that you use a defaultdict where you don't care at all about the count - you only care about membership. This is the classic case for when to use a set.
seen = set(string2)
''.join([c for c in string1 if c in seen])
The way I've written these functions are going to be slightly faster than what you wrote, since list comprehensions loop in native code rather than in python bytecode. They are algorithmically the same complexity.
Your method is sound, and there is no method with time complexity less than O(N), since you obviously need to go through each character at least once.
That is not to say that there's no method that runs faster. There's no need to actually increment the numbers in the dictionary. You could, for example, use a set. You could also make further use of python's features, such as list comprehensions/generators:
def find_char_fast2(string1, string2):
s= set(string2)
return "".join( (x for x in string1 if x in s) )
The algorithms you have used are perfectly fine. There are few improvements you can do
Since you are converting the second string to a dictionary, I would recommend using set, like this
d = set(string2)
Apart from that you can use list comprehension, as a filter, like this
return "".join([char for char in string1 if char in d])
If the order of the characters in the output doesn't matter, you can simply convert both the strings to sets and simply find set difference, like this
return "".join(set(string1) - set(string2))
Hi, I am trying to profiling the various solution given here:
In my snippet, I am using a module called faker to generate fake words so i can test on very long string more than 20k characters:
Snippet:
from faker import Faker
from timeit import Timer
from collections import defaultdict
def first(string1, string2):
sets = set(string2)
return ''.join((c for c in string1 if c in sets))
def second(s1, s2):
res = []
for char in string1:
if char in string2:
res.append(char)
return ''.join(res)
def third(s1, s2):
d = defaultdict(int)
for char in string2:
d[char] += 1
res = []
for char in string1:
if char in d:
res.append(char)
return ''.join(res)
f = Faker()
string1 = ''.join(f.paragraph(nb_sentences=10000).split())
string2 = ''.join(f.paragraph(nb_sentences=10000).split())
funcs = [first, second, third]
import cProfile
print 'Length of String1: ', len(string1)
print 'Length of String2: ', len(string2)
print 'Time taken to execute:'
for f in funcs:
t = Timer(lambda: f(string1, string2))
print f.__name__, cProfile.run('t.timeit(number=100)')
Output:
Length of String1: 525133
Length of String2: 501050
Time taken to execute:
first 52513711 function calls in 18.169 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
100 0.001 0.000 18.164 0.182 s.py:39(<lambda>)
100 1.723 0.017 18.163 0.182 s.py:5(first)
52513400 9.442 0.000 9.442 0.000 s.py:7(<genexpr>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 18.169 18.169 timeit.py:178(timeit)
1 0.005 0.005 18.169 18.169 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 6.998 0.070 16.440 0.164 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
second 52513611 function calls in 22.280 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 22.280 22.280 <string>:1(<module>)
100 0.121 0.001 22.275 0.223 s.py:39(<lambda>)
100 16.957 0.170 22.153 0.222 s.py:9(second)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 22.280 22.280 timeit.py:178(timeit)
1 0.005 0.005 22.280 22.280 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
52513300 4.018 0.000 4.018 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 1.179 0.012 1.179 0.012 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
third 52513611 function calls in 28.184 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 28.184 28.184 <string>:1(<module>)
100 22.847 0.228 28.059 0.281 s.py:16(third)
100 0.120 0.001 28.179 0.282 s.py:39(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 28.184 28.184 timeit.py:178(timeit)
1 0.005 0.005 28.184 28.184 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
52513300 4.032 0.000 4.032 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 1.180 0.012 1.180 0.012 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
Conclusion:
So, the first function with comprehension is the fastest.
But when you run on strings size around 25K characters second functions wins.
Length of String1: 22959
Length of String2: 452919
Time taken to execute:
first 2296311 function calls in 2.216 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
100 0.000 0.000 2.216 0.022 s.py:39(<lambda>)
100 1.530 0.015 2.216 0.022 s.py:5(first)
2296000 0.402 0.000 0.402 0.000 s.py:7(<genexpr>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 2.216 2.216 timeit.py:178(timeit)
1 0.000 0.000 2.216 2.216 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 0.284 0.003 0.686 0.007 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
second 2296211 function calls in 0.939 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.939 0.939 <string>:1(<module>)
100 0.003 0.000 0.939 0.009 s.py:39(<lambda>)
100 0.729 0.007 0.936 0.009 s.py:9(second)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 0.939 0.939 timeit.py:178(timeit)
1 0.000 0.000 0.939 0.939 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
2295900 0.165 0.000 0.165 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 0.042 0.000 0.042 0.000 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
third 2296211 function calls in 8.361 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 8.361 8.361 <string>:1(<module>)
100 8.145 0.081 8.357 0.084 s.py:16(third)
100 0.004 0.000 8.361 0.084 s.py:39(<lambda>)
1 0.000 0.000 0.000 0.000 timeit.py:143(setup)
1 0.000 0.000 8.361 8.361 timeit.py:178(timeit)
1 0.000 0.000 8.361 8.361 timeit.py:96(inner)
1 0.000 0.000 0.000 0.000 {gc.disable}
1 0.000 0.000 0.000 0.000 {gc.enable}
1 0.000 0.000 0.000 0.000 {gc.isenabled}
1 0.000 0.000 0.000 0.000 {globals}
2295900 0.169 0.000 0.169 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
100 0.043 0.000 0.043 0.000 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {time.time}
None
I have a CentOS guest running in virtualbox. It runs apache and django. All my django website source files are in a windows host directory. I mounted this directory in CentOS. The file system is vboxsf.
The problem is, when I access the guest Apache url in windows host browser, It loads very slow. I mean the browser waiting time is around 17 seconds before the page load.
To investigate this, I used python profiling and I'm not able to find the issue using this profiler data. Please find below the profiler data.
ncalls tottime percall cumtime percall filename:lineno(function)
578 4.300 0.007 7.650 0.013 /usr/local/python2.7/lib/python2.7/zipfile.py:755(_RealGetContents)
345837 1.146 0.000 1.520 0.000 /usr/local/python2.7/lib/python2.7/zipfile.py:277(__init__)
1383348 0.752 0.000 0.752 0.000 {method 'read' of 'cStringIO.StringI' objects}
578 0.560 0.001 9.182 0.016 build/bdist.linux-x86_64/egg/pkg_resources.py:1452(build_zipmanifest)
347095 0.417 0.000 0.417 0.000 {_struct.unpack}
575 0.285 0.000 9.738 0.017 build/bdist.linux-x86_64/egg/pkg_resources.py:887(resource_stream)
345837 0.273 0.000 0.273 0.000 /usr/local/python2.7/lib/python2.7/zipfile.py:368(_decodeExtra)
345837 0.258 0.000 0.401 0.000 /usr/local/python2.7/lib/python2.7/zipfile.py:854(getinfo)
769042 0.248 0.000 0.248 0.000 {method 'append' of 'list' objects}
345906 0.212 0.000 0.212 0.000 {method 'find' of 'str' objects}
345837 0.207 0.000 0.207 0.000 /usr/local/python2.7/lib/python2.7/zipfile.py:362(_decodeFilename)
346850 0.205 0.000 0.205 0.000 {method 'replace' of 'str' objects}
578 0.204 0.000 0.292 0.001 /usr/local/python2.7/lib/python2.7/zipfile.py:822(namelist)
2579/621 0.173 0.000 0.363 0.001 /usr/local/python2.7/lib/python2.7/sre_parse.py:379(_parse)
345957 0.162 0.000 0.162 0.000 {chr}
356098 0.153 0.000 0.153 0.000 {method 'get' of 'dict' objects}
22293 0.084 0.000 0.096 0.000 /usr/local/python2.7/lib/python2.7/sre_parse.py:182(__next)
600 0.080 0.000 0.080 0.000 {method 'get_data' of 'zipimport.zipimporter' objects}
3896/608 0.071 0.000 0.193 0.000 /usr/local/python2.7/lib/python2.7/sre_compile.py:32(_compile)
1 0.068 0.068 0.068 0.068 /usr/local/python2.7/lib/python2.7/site-packages/celery-3.0.16-py2.7.egg/celery/backends/base.py:15()
578 0.056 0.000 9.291 0.016 build/bdist.linux-x86_64/egg/pkg_resources.py:1490(__init__)
5054/1785 0.052 0.000 0.062 0.000 /usr/local/python2.7/lib/python2.7/sre_parse.py:140(getwidth)
894 0.052 0.000 0.806 0.001 /usr/local/python2.7/lib/python2.7/re.py:226(_compile)
608 0.052 0.000 0.143 0.000 /usr/local/python2.7/lib/python2.7/sre_compile.py:361(_compile_info)
1287 0.040 0.000 0.083 0.000 /usr/local/python2.7/lib/python2.7/sre_compile.py:207(_optimize_charset)
1 0.039 0.039 0.060 0.060 /usr/local/python2.7/lib/python2.7/site-packages/ZSI-2.1_a1-py2.7.egg/ZSI/wstools/WSDLTools.py:10()
37496 0.039 0.000 0.039 0.000 {isinstance}
383/164 0.038 0.000 11.982 0.073 {__import__}
1 0.037 0.037 0.190 0.190 /usr/local/python2.7/lib/python2.7/site-packages/ZSI-2.1_a1-py2.7.egg/ZSI/__init__.py:6()
575 0.036 0.000 9.841 0.017 /usr/local/python2.7/lib/python2.7/site-packages/pytz-2012h-py2.7.egg/pytz/__init__.py:84(open_resource)
5 0.032 0.006 0.032 0.006 {method 'commit' of '_mysql.connection' objects}
3 0.031 0.010 0.033 0.011 /usr/local/python2.7/lib/python2.7/site-packages/django/core/cache/backends/memcached.py:153(__init__)
I thought shared file system is causing the problem, so I just copied the entire code base to CentOS guest locally but, again I get same performance issue.
Any help would be appreciated. Thank you.
EDIT:
Guest spec
OS: CentOS 5.8
RAM: 2GB
STORAGE: 10GB Dynamically allocated.
The problem is Windows host directory.
When you request a file from Apache it will most likely fork a new UNIX process which need to load the whole Python + Django stack to memory. Doing this roundtrip file system reads over SMB networked file system from Windows partition which is very expensive.
My suggestion is to have all files inside the guest OS and that should bring up speed up a lot.
Alternative ditch Windows altogether and run your whole development environment inside the guest OS.