I have a CentOS guest running in virtualbox. It runs apache and django. All my django website source files are in a windows host directory. I mounted this directory in CentOS. The file system is vboxsf.
The problem is, when I access the guest Apache url in windows host browser, It loads very slow. I mean the browser waiting time is around 17 seconds before the page load.
To investigate this, I used python profiling and I'm not able to find the issue using this profiler data. Please find below the profiler data.
ncalls tottime percall cumtime percall filename:lineno(function)
578 4.300 0.007 7.650 0.013 /usr/local/python2.7/lib/python2.7/zipfile.py:755(_RealGetContents)
345837 1.146 0.000 1.520 0.000 /usr/local/python2.7/lib/python2.7/zipfile.py:277(__init__)
1383348 0.752 0.000 0.752 0.000 {method 'read' of 'cStringIO.StringI' objects}
578 0.560 0.001 9.182 0.016 build/bdist.linux-x86_64/egg/pkg_resources.py:1452(build_zipmanifest)
347095 0.417 0.000 0.417 0.000 {_struct.unpack}
575 0.285 0.000 9.738 0.017 build/bdist.linux-x86_64/egg/pkg_resources.py:887(resource_stream)
345837 0.273 0.000 0.273 0.000 /usr/local/python2.7/lib/python2.7/zipfile.py:368(_decodeExtra)
345837 0.258 0.000 0.401 0.000 /usr/local/python2.7/lib/python2.7/zipfile.py:854(getinfo)
769042 0.248 0.000 0.248 0.000 {method 'append' of 'list' objects}
345906 0.212 0.000 0.212 0.000 {method 'find' of 'str' objects}
345837 0.207 0.000 0.207 0.000 /usr/local/python2.7/lib/python2.7/zipfile.py:362(_decodeFilename)
346850 0.205 0.000 0.205 0.000 {method 'replace' of 'str' objects}
578 0.204 0.000 0.292 0.001 /usr/local/python2.7/lib/python2.7/zipfile.py:822(namelist)
2579/621 0.173 0.000 0.363 0.001 /usr/local/python2.7/lib/python2.7/sre_parse.py:379(_parse)
345957 0.162 0.000 0.162 0.000 {chr}
356098 0.153 0.000 0.153 0.000 {method 'get' of 'dict' objects}
22293 0.084 0.000 0.096 0.000 /usr/local/python2.7/lib/python2.7/sre_parse.py:182(__next)
600 0.080 0.000 0.080 0.000 {method 'get_data' of 'zipimport.zipimporter' objects}
3896/608 0.071 0.000 0.193 0.000 /usr/local/python2.7/lib/python2.7/sre_compile.py:32(_compile)
1 0.068 0.068 0.068 0.068 /usr/local/python2.7/lib/python2.7/site-packages/celery-3.0.16-py2.7.egg/celery/backends/base.py:15()
578 0.056 0.000 9.291 0.016 build/bdist.linux-x86_64/egg/pkg_resources.py:1490(__init__)
5054/1785 0.052 0.000 0.062 0.000 /usr/local/python2.7/lib/python2.7/sre_parse.py:140(getwidth)
894 0.052 0.000 0.806 0.001 /usr/local/python2.7/lib/python2.7/re.py:226(_compile)
608 0.052 0.000 0.143 0.000 /usr/local/python2.7/lib/python2.7/sre_compile.py:361(_compile_info)
1287 0.040 0.000 0.083 0.000 /usr/local/python2.7/lib/python2.7/sre_compile.py:207(_optimize_charset)
1 0.039 0.039 0.060 0.060 /usr/local/python2.7/lib/python2.7/site-packages/ZSI-2.1_a1-py2.7.egg/ZSI/wstools/WSDLTools.py:10()
37496 0.039 0.000 0.039 0.000 {isinstance}
383/164 0.038 0.000 11.982 0.073 {__import__}
1 0.037 0.037 0.190 0.190 /usr/local/python2.7/lib/python2.7/site-packages/ZSI-2.1_a1-py2.7.egg/ZSI/__init__.py:6()
575 0.036 0.000 9.841 0.017 /usr/local/python2.7/lib/python2.7/site-packages/pytz-2012h-py2.7.egg/pytz/__init__.py:84(open_resource)
5 0.032 0.006 0.032 0.006 {method 'commit' of '_mysql.connection' objects}
3 0.031 0.010 0.033 0.011 /usr/local/python2.7/lib/python2.7/site-packages/django/core/cache/backends/memcached.py:153(__init__)
I thought shared file system is causing the problem, so I just copied the entire code base to CentOS guest locally but, again I get same performance issue.
Any help would be appreciated. Thank you.
EDIT:
Guest spec
OS: CentOS 5.8
RAM: 2GB
STORAGE: 10GB Dynamically allocated.
The problem is Windows host directory.
When you request a file from Apache it will most likely fork a new UNIX process which need to load the whole Python + Django stack to memory. Doing this roundtrip file system reads over SMB networked file system from Windows partition which is very expensive.
My suggestion is to have all files inside the guest OS and that should bring up speed up a lot.
Alternative ditch Windows altogether and run your whole development environment inside the guest OS.
Related
Currently I have this result:
That's not bad (I guess), but I'm wondering if I can speed things up a little bit.
I've looked at penultimate query and don't really know how to speed it up, I guess I should get rid off join, but don't know how:
I'm already using prefetch_related in my viewset, my viewset is:
class GameViewSet(viewsets.ModelViewSet):
queryset = Game.objects.prefetch_related(
"timestamp",
"fighters",
"score",
"coefs",
"rounds",
"rounds_view",
"rounds_view_f",
"finishes",
"rounds_time",
"round_time",
"time_coef",
"totals",
).all()
serializer_class = GameSerializer
permission_classes = [AllowAny]
pagination_class = StandardResultsSetPagination
#silk_profile(name="Get Games")
def list(self, request):
qs = self.get_queryset().order_by("-timestamp__ts")
page = self.paginate_queryset(qs)
if page is not None:
serializer = GameSerializer(page, many=True)
return self.get_paginated_response(serializer.data)
serializer = self.get_serializer(qs, many=True)
return Response(serializer.data)
Join is happening because I'm ordering on a related field?
My models looks like:
class Game(models.Model):
id = models.AutoField(primary_key=True)
...
class Timestamp(models.Model):
id = models.AutoField(primary_key=True)
game = models.ForeignKey(Game, related_name="timestamp", on_delete=models.CASCADE)
ts = models.DateTimeField(db_index=True)
time_of_day = models.TimeField()
And my serializers:
class TimestampSerializer(serializers.Serializer):
ts = serializers.DateTimeField(read_only=True)
time_of_day = serializers.TimeField(read_only=True)
class GameSerializer(serializers.Serializer):
id = serializers.IntegerField(read_only=True)
timestamp = TimestampSerializer(many=True)
fighters = FighterSerializer(many=True)
score = ScoreSerializer(many=True)
coefs = CoefsSerializer(many=True)
rounds = RoundsSerializer(many=True)
rounds_view = RoundsViewSerializer(many=True)
rounds_view_f = RoundsViewFinishSerializer(many=True)
finishes = FinishesSerializer(many=True)
rounds_time = RoundTimesSerializer(many=True)
round_time = RoundTimeSerializer(many=True)
time_coef = TimeCoefsSerializer(many=True)
totals = TotalsSerializer(many=True)
Also results of profling:
166039 function calls (159016 primitive calls) in 3.226 seconds
Ordered by: internal time
List reduced from 677 to 100 due to restriction <100>
ncalls tottime percall cumtime percall filename:lineno(function)
20959/20958 0.206 0.000 0.283 0.000 {built-in method builtins.isinstance}
2700 0.123 0.000 0.359 0.000 fields.py:62(is_simple_callable)
390/30 0.113 0.000 1.211 0.040 serializers.py:493(to_representation)
8943/8473 0.098 0.000 0.307 0.000 {built-in method builtins.getattr}
2700 0.096 0.000 0.650 0.000 fields.py:85(get_attribute)
14 0.068 0.005 0.130 0.009 traceback.py:388(format)
7653 0.065 0.000 0.065 0.000 {method 'append' of 'list' objects}
28 0.064 0.002 0.072 0.003 {method 'execute' of 'psycopg2.extensions.cursor' objects}
390 0.062 0.000 0.153 0.000 base.py:406(__init__)
3090 0.060 0.000 0.257 0.000 serializers.py:359(_readable_fields)
3090 0.059 0.000 0.093 0.000 _collections_abc.py:760(__iter__)
6390 0.055 0.000 0.055 0.000 {built-in method builtins.hasattr}
1440 0.054 0.000 0.112 0.000 related.py:652(get_instance_value_for_fields)
2749 0.052 0.000 0.078 0.000 abc.py:96(__instancecheck__)
388 0.052 0.000 0.107 0.000 query.py:303(clone)
2700 0.049 0.000 0.072 0.000 inspect.py:158(isfunction)
2700 0.048 0.000 0.699 0.000 fields.py:451(get_attribute)
2702 0.048 0.000 0.071 0.000 inspect.py:80(ismethod)
2701 0.047 0.000 0.070 0.000 inspect.py:285(isbuiltin)
14 0.047 0.003 0.189 0.014 traceback.py:321(extract)
3786/3426 0.043 0.000 0.123 0.000 {built-in method builtins.setattr}
4445/247 0.038 0.000 1.936 0.008 {built-in method builtins.len}
360 0.035 0.000 0.374 0.001 related_descriptors.py:575(_apply_rel_filters)
2247 0.034 0.000 0.088 0.000 traceback.py:285(line)
3203 0.029 0.000 0.029 0.000 {method 'copy' of 'dict' objects}
12 0.028 0.002 1.836 0.153 query.py:1831(prefetch_one_level)
2749 0.026 0.000 0.026 0.000 {built-in method _abc._abc_instancecheck}
2700 0.024 0.000 0.024 0.000 serializer_helpers.py:154(__getitem__)
2780 0.024 0.000 0.024 0.000 {method 'get' of 'dict' objects}
360 0.023 0.000 0.069 0.000 related_lookups.py:26(get_normalized_value)
720 0.022 0.000 0.458 0.001 related_descriptors.py:615(get_queryset)
744 0.022 0.000 0.087 0.000 related_descriptors.py:523(__get__)
360 0.022 0.000 0.057 0.000 related_descriptors.py:203(__set__)
470 0.022 0.000 0.081 0.000 local.py:46(_get_context_id)
749 0.020 0.000 0.048 0.000 linecache.py:15(getline)
720 0.019 0.000 0.031 0.000 lookups.py:252(resolve_expression_parameter)
361/1 0.018 0.000 1.211 1.211 serializers.py:655(to_representation)
296/14 0.018 0.000 0.084 0.006 copy.py:128(deepcopy)
470 0.018 0.000 0.106 0.000 local.py:82(_get_storage)
732 0.017 0.000 0.043 0.000 related_descriptors.py:560(__init__)
720 0.017 0.000 0.040 0.000 related_descriptors.py:76(__set__)
1185/1151 0.017 0.000 0.032 0.000 {method 'join' of 'str' objects}
1501 0.017 0.000 0.017 0.000 {method 'format' of 'str' objects}
732 0.016 0.000 0.026 0.000 manager.py:26(__init__)
372 0.016 0.000 0.414 0.001 query.py:951(_filter_or_exclude)
14 0.016 0.001 0.028 0.002 traceback.py:369(from_list)
759 0.015 0.000 0.040 0.000 query.py:178(__init__)
387 0.015 0.000 0.143 0.000 query.py:1308(_clone)
732 0.015 0.000 0.022 0.000 manager.py:20(__new__)
1710 0.015 0.000 0.015 0.000 {built-in method __new__ of type object at 0x7fa87d9ad940}
720 0.015 0.000 0.034 0.000 __init__.py:1818(get_prep_value)
470 0.014 0.000 0.033 0.000 sync.py:469(get_current_task)
390 0.014 0.000 0.174 0.000 base.py:507(from_db)
1365 0.014 0.000 0.014 0.000 {method 'update' of 'dict' objects}
749 0.013 0.000 0.022 0.000 linecache.py:37(getlines)
744 0.013 0.000 0.044 0.000 lookups.py:266()
12 0.013 0.001 0.054 0.004 lookups.py:230(get_prep_lookup)
749 0.013 0.000 0.020 0.000 linecache.py:147(lazycache)
720 0.013 0.000 0.066 0.000 related.py:646(get_local_related_value)
720 0.013 0.000 0.071 0.000 related.py:649(get_foreign_related_value)
720 0.013 0.000 0.019 0.000 __init__.py:824(get_prep_value)
1506 0.013 0.000 0.013 0.000 {method 'strip' of 'str' objects}
372/12 0.013 0.000 0.026 0.002 query.py:1088(resolve_lookup_value)
638 0.013 0.000 0.018 0.000 threading.py:1306(current_thread)
732 0.013 0.000 0.021 0.000 reverse_related.py:200(get_cache_name)
720 0.013 0.000 0.018 0.000 base.py:573(_get_pk_val)
360 0.012 0.000 0.021 0.000 __init__.py:543(__hash__)
470 0.011 0.000 0.117 0.000 local.py:101(__getattr__)
28 0.011 0.000 0.089 0.003 compiler.py:199(get_select)
385 0.011 0.000 0.857 0.002 query.py:265(__iter__)
320 0.011 0.000 0.019 0.000 __init__.py:515(__eq__)
387 0.011 0.000 0.120 0.000 query.py:354(chain)
780 0.011 0.000 0.021 0.000 dispatcher.py:159(send)
360 0.010 0.000 0.031 0.000 related.py:976(get_prep_value)
387 0.010 0.000 0.157 0.000 query.py:1296(_chain)
372 0.010 0.000 0.014 0.000 query.py:151(__init__)
403 0.010 0.000 0.914 0.002 query.py:45(__iter__)
1204 0.010 0.000 0.010 0.000 {built-in method builtins.iter}
360 0.010 0.000 0.016 0.000 :1017(_handle_fromlist)
372 0.010 0.000 0.434 0.001 query.py:935(filter)
360 0.010 0.000 0.023 0.000 query.py:1124(check_query_object_type)
360 0.009 0.000 0.017 0.000 mixins.py:21(is_cached)
1152 0.009 0.000 0.009 0.000 query.py:194(query)
104 0.009 0.000 0.017 0.000 fields.py:323(__init__)
732 0.009 0.000 0.009 0.000 manager.py:120(_set_creation_counter)
470 0.009 0.000 0.013 0.000 :389(parent)
470 0.009 0.000 0.014 0.000 tasks.py:34(current_task)
459 0.009 0.000 0.013 0.000 deconstruct.py:14(__new__)
870 0.009 0.000 0.009 0.000 fields.py:810(to_representation)
322 0.009 0.000 0.019 0.000 linecache.py:53(checkcache)
318/266 0.009 0.000 0.145 0.001 compiler.py:434(compile)
763 0.008 0.000 0.008 0.000 traceback.py:292(walk_stack)
494 0.008 0.000 0.014 0.000 compiler.py:417(quote_name_unless_alias)
396 0.008 0.000 0.019 0.000 related.py:710(get_path_info)
374 0.008 0.000 0.011 0.000 utils.py:237(_route_db)
796 0.008 0.000 0.008 0.000 tree.py:21(__init__)
322 0.008 0.000 0.008 0.000 {built-in method posix.stat}
413/365 0.008 0.000 1.938 0.005 query.py:1322(_fetch_all)
12 0.008 0.001 1.244 0.104 related_descriptors.py:622(get_prefetch_queryset)
732 0.008 0.000 0.008 0.000 reverse_related.py:180(get_accessor_name)
And visual representation:
So, my questions is, how can I speed it up?
On the database side, you could set up a Materialized View for your query and trigger an update anytime a new timestamp is added (or whatever is happening in your application that requires a refresh). That way the results are pre-calculated for your lookup. However, I suppose there could be edge cases where when you update and lookup at the same time, you end up with a pre-updated result? It'd be a trade off and only you know whether that'd be worth it...
In any case, I did not come up with that, check out e.g. https://hashrocket.com/blog/posts/materialized-view-strategies-using-postgresql
Profiling Django app to figure out slow functions.
I just added some middleware to track function calls, following this blog: http://agiliq.com/blog/2015/07/profiling-django-middlewares/ and I see that the entry of cProfile stats for {posix.write} is one of the longest.
Any idea what that is, and where that comes from?
Other functions are referenced by their name and package path, so I'm not sure what {posix.write} means.
the log looks like this:
204051 function calls (197141 primitive calls) in 0.997 seconds
Ordered by: internal time
List reduced from 1204 to 50 due to restriction <50>
ncalls tottime percall cumtime percall filename:lineno(function)
35 0.305 0.009 0.305 0.009 {posix.write}
95 0.206 0.002 0.207 0.002 {method 'execute' of 'psycopg2.extensions.cursor' objects}
73 0.088 0.001 0.088 0.001 {select.select}
898 0.023 0.000 0.047 0.000 /.venv/lib/python2.7/site-packages/django/db/models/base.py:388(__init__)
1642 0.012 0.000 0.371 0.000 /.venv/lib/python2.7/site-packages/django/template/base.py:806(_resolve_lookup)
1 0.010 0.010 0.011 0.011 {_sass.compile_filename}
1 0.009 0.009 0.009 0.009 {psycopg2._psycopg._connect}
34 0.009 0.000 0.009 0.000 {method 'recv' of '_socket.socket' objects}
39 0.007 0.000 0.007 0.000 {posix.read}
9641/6353 0.006 0.000 0.321 0.000 {getattr}
173 0.006 0.000 0.026 0.000 /.venv/lib/python2.7/site-packages/django/core/urlresolvers.py:425(_reverse_with_prefix)
25769 0.006 0.000 0.007 0.000 {isinstance}
EDIT:
I understand that posix.write is the write function of posix. That I need to understand I guess is what part of Django uses that a lot and why it is showing up as taking 300+ms.
How would I go about tracking this down?
Thanks
I am using Python 2.7 (Anaconda distribution) on Windows 8.1 Pro.
I have a database of articles with their respective topics.
I am building an application which queries textual phrases in my database and associates article topics to each queried phrase. The topics are assigned based on the relevance of the phrase for the article.
The bottleneck seems to be Python socket communication with the localhost.
Here are my cProfile outputs:
topics_fit (PhraseVectorizer_1_1.py:668)
function called 1 times
1930698 function calls (1929630 primitive calls) in 148.209 seconds
Ordered by: cumulative time, internal time, call count
List reduced from 286 to 40 due to restriction <40>
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.224 1.224 148.209 148.209 PhraseVectorizer_1_1.py:668(topics_fit)
206272 0.193 0.000 146.780 0.001 cursor.py:1041(next)
601 0.189 0.000 146.455 0.244 cursor.py:944(_refresh)
534 0.030 0.000 146.263 0.274 cursor.py:796(__send_message)
534 0.009 0.000 141.532 0.265 mongo_client.py:725(_send_message_with_response)
534 0.002 0.000 141.484 0.265 mongo_client.py:768(_reset_on_error)
534 0.019 0.000 141.482 0.265 server.py:69(send_message_with_response)
534 0.002 0.000 141.364 0.265 pool.py:225(receive_message)
535 0.083 0.000 141.362 0.264 network.py:106(receive_message)
1070 1.202 0.001 141.278 0.132 network.py:127(_receive_data_on_socket)
3340 140.074 0.042 140.074 0.042 {method 'recv' of '_socket.socket' objects}
535 0.778 0.001 4.700 0.009 helpers.py:88(_unpack_response)
535 3.828 0.007 3.920 0.007 {bson._cbson.decode_all}
67 0.099 0.001 0.196 0.003 {method 'sort' of 'list' objects}
206187 0.096 0.000 0.096 0.000 PhraseVectorizer_1_1.py:705(<lambda>)
206187 0.096 0.000 0.096 0.000 database.py:339(_fix_outgoing)
206187 0.074 0.000 0.092 0.000 objectid.py:68(__init__)
1068 0.005 0.000 0.054 0.000 server.py:135(get_socket)
1068/534 0.010 0.000 0.041 0.000 contextlib.py:21(__exit__)
1068 0.004 0.000 0.041 0.000 pool.py:501(get_socket)
534 0.003 0.000 0.028 0.000 pool.py:208(send_message)
534 0.009 0.000 0.026 0.000 pool.py:573(return_socket)
567 0.001 0.000 0.026 0.000 socket.py:227(meth)
535 0.024 0.000 0.024 0.000 {method 'sendall' of '_socket.socket' objects}
534 0.003 0.000 0.023 0.000 topology.py:134(select_server)
206806 0.020 0.000 0.020 0.000 collection.py:249(database)
418997 0.019 0.000 0.019 0.000 {len}
449 0.001 0.000 0.018 0.000 topology.py:143(select_server_by_address)
534 0.005 0.000 0.018 0.000 topology.py:82(select_servers)
1068/534 0.001 0.000 0.018 0.000 contextlib.py:15(__enter__)
534 0.002 0.000 0.013 0.000 thread_util.py:83(release)
207307 0.010 0.000 0.011 0.000 {isinstance}
534 0.005 0.000 0.011 0.000 pool.py:538(_get_socket_no_auth)
534 0.004 0.000 0.011 0.000 thread_util.py:63(release)
534 0.001 0.000 0.011 0.000 mongo_client.py:673(_get_topology)
535 0.003 0.000 0.010 0.000 topology.py:57(open)
206187 0.008 0.000 0.008 0.000 {method 'popleft' of 'collections.deque' objects}
535 0.002 0.000 0.007 0.000 topology.py:327(_apply_selector)
536 0.003 0.000 0.007 0.000 topology.py:286(_ensure_opened)
1071 0.004 0.000 0.007 0.000 periodic_executor.py:50(open)
In particular: {method 'recv' of '_socket.socket' objects} seems to cause trouble.
According to suggestions found in What can I do to improve socket performance in Python 3?, I tried gevent.
I added this snippet at the beginning of my script (before importing anything):
from gevent import monkey
monkey.patch_all()
This resulted in even slower performance...
*** PROFILER RESULTS ***
topics_fit (PhraseVectorizer_1_1.py:671)
function called 1 times
1956879 function calls (1951292 primitive calls) in 158.260 seconds
Ordered by: cumulative time, internal time, call count
List reduced from 427 to 40 due to restriction <40>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 158.170 158.170 hub.py:358(run)
1 0.000 0.000 158.170 158.170 {method 'run' of 'gevent.core.loop' objects}
2/1 1.286 0.643 158.166 158.166 PhraseVectorizer_1_1.py:671(topics_fit)
206272 0.198 0.000 156.670 0.001 cursor.py:1041(next)
601 0.192 0.000 156.203 0.260 cursor.py:944(_refresh)
534 0.029 0.000 156.008 0.292 cursor.py:796(__send_message)
534 0.012 0.000 150.514 0.282 mongo_client.py:725(_send_message_with_response)
534 0.002 0.000 150.439 0.282 mongo_client.py:768(_reset_on_error)
534 0.017 0.000 150.437 0.282 server.py:69(send_message_with_response)
551/535 0.002 0.000 150.316 0.281 pool.py:225(receive_message)
552/536 0.079 0.000 150.314 0.280 network.py:106(receive_message)
1104/1072 0.815 0.001 150.234 0.140 network.py:127(_receive_data_on_socket)
2427/2395 0.019 0.000 149.418 0.062 socket.py:381(recv)
608/592 0.003 0.000 48.541 0.082 socket.py:284(_wait)
552 0.885 0.002 5.464 0.010 helpers.py:88(_unpack_response)
552 4.475 0.008 4.577 0.008 {bson._cbson.decode_all}
3033 2.021 0.001 2.021 0.001 {method 'recv' of '_socket.socket' objects}
7/4 0.000 0.000 0.221 0.055 hub.py:189(_import)
4 0.127 0.032 0.221 0.055 {__import__}
67 0.104 0.002 0.202 0.003 {method 'sort' of 'list' objects}
536/535 0.003 0.000 0.142 0.000 topology.py:57(open)
537/536 0.002 0.000 0.139 0.000 topology.py:286(_ensure_opened)
1072/1071 0.003 0.000 0.138 0.000 periodic_executor.py:50(open)
537/536 0.001 0.000 0.136 0.000 server.py:33(open)
537/536 0.001 0.000 0.135 0.000 monitor.py:69(open)
20/19 0.000 0.000 0.132 0.007 topology.py:342(_update_servers)
4 0.000 0.000 0.131 0.033 hub.py:418(_get_resolver)
1 0.000 0.000 0.122 0.122 resolver_thread.py:13(__init__)
1 0.000 0.000 0.122 0.122 hub.py:433(_get_threadpool)
206187 0.081 0.000 0.101 0.000 objectid.py:68(__init__)
206187 0.100 0.000 0.100 0.000 database.py:339(_fix_outgoing)
206187 0.098 0.000 0.098 0.000 PhraseVectorizer_1_1.py:708(<lambda>)
1 0.073 0.073 0.093 0.093 threadpool.py:2(<module>)
2037 0.003 0.000 0.092 0.000 hub.py:159(get_hub)
2 0.000 0.000 0.090 0.045 thread.py:39(start_new_thread)
2 0.000 0.000 0.090 0.045 greenlet.py:195(spawn)
2 0.000 0.000 0.090 0.045 greenlet.py:74(__init__)
1 0.000 0.000 0.090 0.090 hub.py:259(__init__)
1102 0.004 0.000 0.078 0.000 pool.py:501(get_socket)
1068 0.005 0.000 0.074 0.000 server.py:135(get_socket)
This performance is somewhat unacceptable for my application - I would like it to be much faster (this is timed and profiled for a subset of ~20 documents, and I need to process few tens of thousands).
Any ideas on how to speed it up?
Much appreciated.
Edit:
Code snippet that I profiled:
# also tried monkey patching all here, see profiler
from pymongo import MongoClient
def topics_fit(self):
client = MongoClient()
# tried motor for multithreading - also slow
#client = motor.motor_tornado.MotorClient()
# initialize DB cursors
db_wiki = client.wiki
# initialize topic feature dictionary
self.topics = OrderedDict()
self.topic_mapping = OrderedDict()
vocabulary_keys = self.vocabulary.keys()
num_categories = 0
for phrase in vocabulary_keys:
phrase_tokens = phrase.split()
if len(phrase_tokens) > 1:
# query for current phrase
AND_phrase = "\"" + phrase + "\""
cursor = db_wiki.categories.find({ "$text" : { "$search": AND_phrase } },{ "score": { "$meta": "textScore" } })
cursor = list(cursor)
if cursor:
cursor.sort(key=lambda k: k["score"], reverse = True)
added_categories = cursor[0]["category_ids"]
for added_category in added_categories:
if not (added_category in self.topics):
self.topics[added_category] = num_categories
if not (self.vocabulary[phrase] in self.topic_mapping):
self.topic_mapping[self.vocabulary[phrase]] = [num_categories, ]
else:
self.topic_mapping[self.vocabulary[phrase]].append(num_categories)
num_categories+=1
else:
if not (self.vocabulary[phrase] in self.topic_mapping):
self.topic_mapping[self.vocabulary[phrase]] = [self.topics[added_category], ]
else:
self.topic_mapping[self.vocabulary[phrase]].append(self.topics[added_category])
Edit 2: output of index_information():
{u'_id_':
{u'ns': u'wiki.categories', u'key': [(u'_id', 1)], u'v': 1},
u'article_title_text_article_body_text_category_names_text': {u'default_language': u'english', u'weights': SON([(u'article_body', 1), (u'article_title', 1), (u'category_names', 1)]), u'key': [(u'_fts', u'text'), (u'_ftsx', 1)], u'v': 1, u'language_override': u'language', u'ns': u'wiki.categories', u'textIndexVersion': 2}}
I'm implementing a RANSAC algorithm for circle detection in images. I profiled the execution and I get:
13699392 function calls in 799.981 seconds
Random listing order was used
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 {time.time}
579810 0.564 0.000 0.564 0.000 {getattr}
289905 2.343 0.000 8.661 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/blas.py:226(_get_funcs)
579810 0.124 0.000 0.124 0.000 {method 'get' of 'dict' objects}
289905 0.645 0.000 2.676 0.000 {map}
2954 0.005 0.000 0.005 0.000 {method 'transpose' of 'numpy.ndarray' objects}
2954 0.023 0.000 0.464 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/shape_base.py:179(vstack)
2954 2.373 0.001 2.373 0.001 {method 'read' of 'cv2.VideoCapture' objects}
579810 0.966 0.000 2.031 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/function_base.py:550(asarray_chkfinite)
289905 10.164 0.000 24.316 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/basic.py:456(lstsq)
2954 1.090 0.000 1.090 0.000 {normalize}
1455433 3.827 0.000 3.827 0.000 {numpy.core.multiarray.array}
579810 2.899 0.000 3.148 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numerictypes.py:949(_can_coerce_all)
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.empty}
2954 32.544 0.011 795.875 0.269 git/tra-python-processer/tra/ransac.py:31(image_search)
289905 0.714 0.000 38.644 0.000 git/tra-python-processer/tra/features.py:44(__init__)
289905 2.157 0.000 2.157 0.000 {method 'randint' of 'mtrand.RandomState' objects}
1 0.005 0.005 0.005 0.005 {VideoCapture}
289905 1.026 0.000 1.026 0.000 {method 'astype' of 'numpy.generic' objects}
2954 0.006 0.000 0.010 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py:495(transpose)
289905 11.303 0.000 37.930 0.000 git/tra-python-processer/tra/features.py:48(__gen)
3496584 0.343 0.000 0.343 0.000 {len}
2954 0.344 0.000 0.344 0.000 {numpy.core.multiarray.concatenate}
2954 3.214 0.001 3.214 0.001 {numpy.core.multiarray.where}
869715 0.575 0.000 0.575 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2514(size)
869715 0.778 0.000 2.278 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:394(asarray)
289905 716.946 0.002 716.946 0.002 git/tra-python-processer/tra/features.py:89(points_distance)
5908 0.015 0.000 0.031 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:464(asanyarray)
289905 0.275 0.000 0.275 0.000 {isinstance}
289905 0.342 0.000 9.003 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/lapack.py:255(get_lapack_funcs)
5908 0.058 0.000 0.097 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/shape_base.py:60(atleast_2d)
295813 0.089 0.000 0.089 0.000 {method 'append' of 'list' objects}
289905 0.645 0.000 3.793 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numerictypes.py:970(find_common_type)
2954 0.221 0.000 0.221 0.000 {threshold}
1 0.000 0.000 0.000 0.000 {method 'get' of 'cv2.VideoCapture' objects}
1 0.000 0.000 0.000 0.000 git/tra-python-processer/tra/ransac.py:24(__init__)
2954 0.009 0.000 0.009 0.000 {numpy.core.multiarray.zeros}
579810 0.143 0.000 0.143 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/misc.py:126(_datacopied)
1 0.201 0.201 799.981 799.981 git/tra-python-processer/tra/ransac.py:122(video_processing)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2954 1.528 0.001 1.528 0.001 {cvtColor}
289905 1.280 0.000 5.346 0.000 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/linalg/blas.py:182(find_best_blas_type)
289905 0.198 0.000 0.198 0.000 {method 'index' of 'list' objects}
It's first time I use profiler, however for what I can understand the function that is most heavy is features.py:89(points_distance) that comes out to be a very easy implementation:
def points_distance(self,points):
d = n.abs(\
n.sqrt(\
n.power(self.xc - points[:,0],2) + n.power(self.yc - points[:,1],2)
)\
- self.radius
)
return d
Any suggestions? Maybe cython?
Use scipy.spatial.distance.cdist for the distance calculation in points_distance.
First, optimize your code in pure Python and numpy. Then if necessary port the critical parts to Cython. Since a number of functions are called repeatedly a few ~100000 times, you should get some speed up from Cython for those parts. Unless, of course, the computational bottleneck is in the distance calculation, which will then limit the overall execution time.
By the way, you should sort your profiler results by tottime so they are easier to read.
I've been testing a cacheing system of my making. Its purpose is to speed up a Django web application. It stores everything in-memory. According to cProfile most of the time in my tests is spent inside QuerySet._clone() which turns out to be terribly inefficient (it's actually not that strange given the implementation).
I was having high hopes for using PyPy to speed things up. I've got a 64-bit machine. However after installing all the required libraries it turns out that PyPy compiled code runs about 2.5x slower than regular Python code, and I don't know what to make out of it. The code is CPU bound (there are absolutely no database queries, so IO-bounding is not an option). A single test runs for about 10 seconds, so I guess it should be enough for JIT to kick in. I'm using PyPy 1.5. One note - I didn't compile the sources myself, just downloaded a 64-bit linux version.
I'd like to know how frequent it is for a CPU intensive code to actually run slower under PyPy. Is there hopefully something wrong I could have done that would prevent PyPy from running at its best.
EDIT
Exact cPython output:
PyPy 1.5:
3439146 function calls (3218654 primitive calls) in 19.094 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
2/1 0.000 0.000 18.956 18.956 <string>:1(<module>)
2/1 0.000 0.000 18.956 18.956 /path/to/my/project/common/integrity/models/transactions.py:200(newfn)
2/1 0.000 0.000 18.956 18.956 /path/to/my/project/common/integrity/models/transactions.py:134(recur)
2/1 0.000 0.000 18.956 18.956 /usr/local/pypy/site-packages/django/db/transaction.py:210(inner)
2/1 0.172 0.086 18.899 18.899 /path/to/my/project/common/integrity/tests/optimization.py:369(func_cached)
9990 0.122 0.000 18.632 0.002 /usr/local/pypy/site-packages/django/db/models/manager.py:131(get)
9990 0.127 0.000 16.638 0.002 /path/to/my/project/common/integrity/models/cache.py:1068(get)
9990 0.073 0.000 12.478 0.001 /usr/local/pypy/site-packages/django/db/models/query.py:547(filter)
9990 0.263 0.000 12.405 0.001 /path/to/my/project/common/integrity/models/cache.py:1047(_filter_or_exclude)
9990 0.226 0.000 12.096 0.001 /usr/local/pypy/site-packages/django/db/models/query.py:561(_filter_or_exclude)
9990 0.187 0.000 8.383 0.001 /path/to/my/project/common/integrity/models/cache.py:765(_clone)
9990 0.212 0.000 7.662 0.001 /usr/local/pypy/site-packages/django/db/models/query.py:772(_clone)
9990 1.025 0.000 7.125 0.001 /usr/local/pypy/site-packages/django/db/models/sql/query.py:226(clone)
129942/49972 1.674 0.000 6.021 0.000 /usr/local/pypy/lib-python/2.7/copy.py:145(deepcopy)
140575/110605 0.120 0.000 4.066 0.000 {len}
9990 0.182 0.000 3.972 0.000 /usr/local/pypy/site-packages/django/db/models/query.py:74(__len__)
19980 0.260 0.000 3.777 0.000 /path/to/my/project/common/integrity/models/cache.py:1062(iterator)
9990 0.255 0.000 3.154 0.000 /usr/local/pypy/site-packages/django/db/models/sql/query.py:1149(add_q)
9990 0.210 0.000 3.073 0.000 /path/to/my/project/common/integrity/models/cache.py:973(_query)
9990 0.371 0.000 2.316 0.000 /usr/local/pypy/site-packages/django/db/models/sql/query.py:997(add_filter)
9990 0.364 0.000 2.168 0.000 /path/to/my/project/common/integrity/models/cache.py:892(_deduct)
29974/9994 0.448 0.000 2.078 0.000 /usr/local/pypy/lib-python/2.7/copy.py:234(_deepcopy_tuple)
19990 0.362 0.000 2.065 0.000 /path/to/my/project/common/integrity/models/cache.py:566(__init__)
10000 0.086 0.000 1.874 0.000 /path/to/my/project/common/integrity/models/cache.py:1090(get_query_set)
19990 0.269 0.000 1.703 0.000 /usr/local/pypy/site-packages/django/db/models/query.py:31(__init__)
9990 0.122 0.000 1.643 0.000 /path/to/my/project/common/integrity/models/cache.py:836(_deduct_recur)
19980 0.274 0.000 1.636 0.000 /usr/local/pypy/site-packages/django/utils/tree.py:55(__deepcopy__)
9990 0.607 0.000 1.458 0.000 /path/to/my/project/common/integrity/models/cache.py:789(_deduct_local)
10020 0.633 0.000 1.437 0.000 /usr/local/pypy/site-packages/django/db/models/sql/query.py:99(__init__)
129942 0.841 0.000 1.191 0.000 /usr/local/pypy/lib-python/2.7/copy.py:267(_keep_alive)
9994/9992 0.201 0.000 1.019 0.000 /usr/local/pypy/lib-python/2.7/copy.py:306(_reconstruct)
Python 2.7:
3326403 function calls (3206359 primitive calls) in 12.430 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 12.457 12.457 <string>:1(<module>)
1 0.000 0.000 12.457 12.457 /path/to/my/project/common/integrity/models/transactions.py:200(newfn)
1 0.000 0.000 12.457 12.457 /path/to/my/project/common/integrity/models/transactions.py:134(recur)
1 0.000 0.000 12.457 12.457 /usr/local/lib/python2.7/dist-packages/django/db/transaction.py:210(inner)
1 0.000 0.000 12.457 12.457 /path/to/my/project/common/integrity/models/transactions.py:165(recur2)
1 0.089 0.089 12.450 12.450 /path/to/my/project/common/integrity/tests/optimization.py:369(func_cached)
9990 0.198 0.000 12.269 0.001 /usr/local/lib/python2.7/dist-packages/django/db/models/manager.py:131(get)
9990 0.087 0.000 11.281 0.001 /path/to/my/project/common/integrity/models/cache.py:1068(get)
9990 0.040 0.000 8.161 0.001 /usr/local/lib/python2.7/dist-packages/django/db/models/query.py:547(filter)
9990 0.110 0.000 8.121 0.001 /path/to/my/project/common/integrity/models/cache.py:1047(_filter_or_exclude)
9990 0.127 0.000 7.983 0.001 /usr/local/lib/python2.7/dist-packages/django/db/models/query.py:561(_filter_or_exclude)
9990 0.100 0.000 5.593 0.001 /path/to/my/project/common/integrity/models/cache.py:765(_clone)
9990 0.122 0.000 5.125 0.001 /usr/local/lib/python2.7/dist-packages/django/db/models/query.py:772(_clone)
9990 0.405 0.000 4.899 0.000 /usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py:226(clone)
129942/49972 1.456 0.000 4.505 0.000 /usr/lib/python2.7/copy.py:145(deepcopy)
129899/99929 0.191 0.000 3.117 0.000 {len}
9990 0.111 0.000 2.968 0.000 /usr/local/lib/python2.7/dist-packages/django/db/models/query.py:74(__len__)
19980 0.070 0.000 2.843 0.000 /path/to/my/project/common/integrity/models/cache.py:1062(iterator)
9990 0.208 0.000 2.190 0.000 /path/to/my/project/common/integrity/models/cache.py:973(_query)
9990 0.182 0.000 2.114 0.000 /usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py:1149(add_q)
19984/9994 0.291 0.000 1.644 0.000 /usr/lib/python2.7/copy.py:234(_deepcopy_tuple)
9990 0.288 0.000 1.599 0.000 /usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py:997(add_filter)
9990 0.171 0.000 1.454 0.000 /path/to/my/project/common/integrity/models/cache.py:892(_deduct)
19980 0.177 0.000 1.208 0.000 /usr/local/lib/python2.7/dist-packages/django/utils/tree.py:55(__deepcopy__)
9990 0.099 0.000 1.199 0.000 /path/to/my/project/common/integrity/models/cache.py:836(_deduct_recur)
9990 0.349 0.000 1.040 0.000 /path/to/my/project/common/integrity/models/cache.py:789(_deduct_local)
Brushing aside the fact that PyPy might really be intrinsically slower for your case, there are some factors that could be making it unnecessarily slower:
Profiling is known to slow PyPy a lot more than CPython.
Some debugging/logging code can disable optimizations (by, e.g., forcing frames).
The server you're using can be a dominant factor in performance (think about how awful classic CGI would be with a JIT: it would never warm up). It can also simply influence results (different WSGI servers have shown various speed-ups).
Old-style classes are slower than new-style ones.
Even if everything is in memory, you could be hitting e.g. slow paths in PyPy's SQLite.
You can also check the JIT Friendliness wiki page for more hints about what can make PyPy slower. A nightly build will probably be faster too, as there are many improvements relative to 1.5.
A more detailed description of your stack (server, OS, DB) and setup (how did you benchmark? how many queries?) would allow us to give better answers.