Circumventing input-to-action latency in mpl's plt.draw() function

Circumventing input-to-action latency in mpl's plt.draw() function - python

In trying to track down the source of an unacceptable (257ms) runtime of matplotlib's plt.draw() function, I stumbled upon this article: http://bastibe.de/2013-05-30-speeding-up-matplotlib.html. In particular, this quote caught my eye:
"I am using pause() here to update the plot without blocking. The correct way to do this is to use draw() instead..."
Digging further, I found that plt.draw() can be substituted by two commands,
plt.pause(0.001)
fig.canvas.blit(ax1.bbox)
Which take up 256ms and 1ms respectively in my code.
This was abnormal, why would a 1ms pause take 256ms to complete? I took some data, and found the following:
plt.pause(n):
n(s) time(ms) overhead(n(ms)-time(ms))
0.0001 270-246 ~246ms
0.001 270-254 ~253ms
0.01 280-265 ~255ms
0.1 398-354 ~254ms
0.2 470-451 ~251ms
0.5 779-759 ~259ms
1.0 1284-1250 ~250ms
numbers courtesy of rkern's line_profiler
This makes it very clear that plt.pause() is doing more than just pausing the program, and I was correct:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
175 def pause(interval):
176 """
177 Pause for *interval* seconds.
178
179 If there is an active figure it will be updated and displayed,
180 and the GUI event loop will run during the pause.
181
182 If there is no active figure, or if a non-interactive backend
183 is in use, this executes time.sleep(interval).
184
185 This can be used for crude animation. For more complex
186 animation, see :mod:`matplotlib.animation`.
187
188 This function is experimental; its behavior may be changed
189 or extended in a future release.
190
191 """
192 1 6 6.0 0.0 backend = rcParams['backend']
193 1 1 1.0 0.0 if backend in _interactive_bk:
194 1 5 5.0 0.0 figManager = _pylab_helpers.Gcf.get_active()
195 1 0 0.0 0.0 if figManager is not None:
196 1 2 2.0 0.0 canvas = figManager.canvas
197 1 257223 257223.0 20.4 canvas.draw()
198 1 145 145.0 0.0 show(block=False)
199 1 1000459 1000459.0 79.5 canvas.start_event_loop(interval)
200 1 2 2.0 0.0 return
201
202 # No on-screen figure is active, so sleep() is all we need.
203 import time
204 time.sleep(interval)
once again courtesy of rkern's line_profiler
This was a breakthrough, as it was suddenly clear why plt.pause() was able to replace plt.draw(), it had a draw function inside it with that same ~250ms overhead I was getting at the start of my program.
At this point, I decided to profile plt.draw() itself:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
551 def draw():
571 1 267174 267174.0 100.0 get_current_fig_manager().canvas.draw()
Alright, one more step down the rabbit hole:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
57 def draw_wrapper(artist, renderer, *args, **kwargs):
58 769 1798 2.3 0.7 before(artist, renderer)
59 769 242060 314.8 98.5 draw(artist, renderer, *args, **kwargs)
60 769 1886 2.5 0.8 after(artist, renderer)
Unfortunately, this was the point at which my ability to run my profiler through the source code ended, leaving me scratching my head at this next level of draw() function, and why it was being called 769 times.
It turns out, the answer was right in front of me the whole time! That same article, which started this whole obsessive hunt in the first place, was actually created to study the same strange behavior! Their solution: To replace plt.draw() with individual calls to each artist which needed to be updated, rather than every single one!
I hope my chasing of this behavior can help others understand it, though currently I'm stuck with a CGContextRef is NULL error whenever I try to replicate his methods, which seems to be specific to the MacOSX backend...
More info as it comes! Please add any more relevant information in answers below, or if you can help me with my CGContextRef is NULL error.

Related

Gensim Summarizer throws MemoryError, Any Solution?

I am trying to generate the summary of a large text file using Gensim Summarizer.
I am getting memory error. Have been facing this issue since sometime, any help
would be really appreciated. feel free to ask for more details.
from gensim.summarization.summarizer import summarize
file_read =open("xxxxx.txt",'r')
Content= file_read.read()
def Summary_gen(content):
print(len(Content))
summary_r=summarize(Content,ratio=0.02)
print(summary_r)
Summary_gen(Content)
The length of the document is:
365042
Error messsage:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-6-a91bd71076d1> in <module>()
10
11
---> 12 Summary_gen(Content)
<ipython-input-6-a91bd71076d1> in Summary_gen(content)
6 def Summary_gen(content):
7 print(len(Content))
----> 8 summary_r=summarize(Content,ratio=0.02)
9 print(summary_r)
10
c:\python3.6\lib\site-packages\gensim\summarization\summarizer.py in summarize(text, ratio, word_count, split)
428 corpus = _build_corpus(sentences)
429
--> 430 most_important_docs = summarize_corpus(corpus, ratio=ratio if word_count is None else 1)
431
432 # If couldn't get important docs, the algorithm ends.
c:\python3.6\lib\site-packages\gensim\summarization\summarizer.py in summarize_corpus(corpus, ratio)
367 return []
368
--> 369 pagerank_scores = _pagerank(graph)
370
371 hashable_corpus.sort(key=lambda doc: pagerank_scores.get(doc, 0), reverse=True)
c:\python3.6\lib\site-packages\gensim\summarization\pagerank_weighted.py in pagerank_weighted(graph, damping)
57
58 """
---> 59 adjacency_matrix = build_adjacency_matrix(graph)
60 probability_matrix = build_probability_matrix(graph)
61
c:\python3.6\lib\site-packages\gensim\summarization\pagerank_weighted.py in build_adjacency_matrix(graph)
92 neighbors_sum = sum(graph.edge_weight((current_node, neighbor)) for neighbor in graph.neighbors(current_node))
93 for j in xrange(length):
---> 94 edge_weight = float(graph.edge_weight((current_node, nodes[j])))
95 if i != j and edge_weight != 0.0:
96 row.append(i)
c:\python3.6\lib\site-packages\gensim\summarization\graph.py in edge_weight(self, edge)
255
256 """
--> 257 return self.get_edge_properties(edge).setdefault(self.WEIGHT_ATTRIBUTE_NAME, self.DEFAULT_WEIGHT)
258
259 def neighbors(self, node):
c:\python3.6\lib\site-packages\gensim\summarization\graph.py in get_edge_properties(self, edge)
404
405 """
--> 406 return self.edge_properties.setdefault(edge, {})
407
408 def add_edge_attributes(self, edge, attrs):
MemoryError:
I have tried looking up for this error on the internet, but, couldn't find a workable solution to this.

From the logs, it looks like the code builds an adjacency matrix
---> 59 adjacency_matrix = build_adjacency_matrix(graph)
This probably tries to create a huge adjacency matrix with your 365042 documents which cannot fit in your memory(i.e., RAM).
You could try:
Reducing the document size to fewer files (maybe start with 10000)
and check if it works
Try running it on a system with more RAM

Did you try to use word_count argument instead of ratio?
If the above still doesn't solve the problem, then that's because of gensim's implementation limitations. The only way to use gensim if you still OOM errors is to split documents. That also will speed up your solution (and if the document is really big, it shouldn't be a problem anyway).
What's the problem with summarize:
gensim's summarizer uses TextRank by default, an algorithm that uses PageRank. In gensim it is unfortunately implemented using a Python list of PageRank graph nodes, so it may fail if your graph is too big.
BTW is the document length measured in words, or characters?

"Read_Ncol" exit with error code -1073740791

I am using python 3.5.3 and igraph 0.7.1.
Why the following code finishes with "Process finished with exit code -1073740791 (0xC0000409)" error message.
from igraph import Graph
g = Graph.Read_Ncol('test.csv', directed=False)
test.csv
119 205
119 625
124 133
124 764
124 813
55 86
55 205
55 598
133 764

The Read_Ncol function reads files in NCOL format, as produced by the Large Graph Layout program.
Your example works fine for me, also on Python 3.5.3 with igraph 0.7.1.
>>> g = Graph.Read_Ncol('test.csv', directed=False)
>>> g
<igraph.Graph object at 0x10c4844f8>
>>> print(g)
IGRAPH UN-- 10 9 --
+ attr: name (v)
+ edges (vertex names):
119--205, 119--625, 124--133, 124--764, 124--813, 55--86, 205--55, 55--598,
133--764
It seems the error C0000409 means "Stack Buffer Overrun" on Windows, which probably means that your program is writing outside of the space allocated on the stack (it's different from a stack overflow, according to this Microsoft Technet Blog.)

Why is ctypes so slow to convert a Python list to a C array?

The bottleneck of my code is currently a conversion from a Python list to a C array using ctypes, as described in this question.
A small experiment shows that it is indeed very slow, in comparison of other Python instructions:
import timeit
setup="from array import array; import ctypes; t = [i for i in range(1000000)];"
print(timeit.timeit(stmt='(ctypes.c_uint32 * len(t))(*t)',setup=setup,number=10))
print(timeit.timeit(stmt='array("I",t)',setup=setup,number=10))
print(timeit.timeit(stmt='set(t)',setup=setup,number=10))
Gives:
1.790962941000089
0.0911122129996329
0.3200237319997541
I obtained these results with CPython 3.4.2. I get similar times with CPython 2.7.9 and Pypy 2.4.0.
I tried runing the above code with perf, commenting the timeit instructions to run only one at a time. I get these results:
ctypes
Performance counter stats for 'python3 perf.py':
1807,891637 task-clock (msec) # 1,000 CPUs utilized
8 context-switches # 0,004 K/sec
0 cpu-migrations # 0,000 K/sec
59 523 page-faults # 0,033 M/sec
5 755 704 178 cycles # 3,184 GHz
13 552 506 138 instructions # 2,35 insn per cycle
3 217 289 822 branches # 1779,581 M/sec
748 614 branch-misses # 0,02% of all branches
1,808349671 seconds time elapsed
array
Performance counter stats for 'python3 perf.py':
144,678718 task-clock (msec) # 0,998 CPUs utilized
0 context-switches # 0,000 K/sec
0 cpu-migrations # 0,000 K/sec
12 913 page-faults # 0,089 M/sec
458 284 661 cycles # 3,168 GHz
1 253 747 066 instructions # 2,74 insn per cycle
325 528 639 branches # 2250,011 M/sec
708 280 branch-misses # 0,22% of all branches
0,144966969 seconds time elapsed
set
Performance counter stats for 'python3 perf.py':
369,786395 task-clock (msec) # 0,999 CPUs utilized
0 context-switches # 0,000 K/sec
0 cpu-migrations # 0,000 K/sec
108 584 page-faults # 0,294 M/sec
1 175 946 161 cycles # 3,180 GHz
2 086 554 968 instructions # 1,77 insn per cycle
422 531 402 branches # 1142,636 M/sec
768 338 branch-misses # 0,18% of all branches
0,370103043 seconds time elapsed
The code with ctypes has less page-faults than the code with set and the same number of branch-misses than the two others. The only thing I see is that there are more instructions and branches (but I still don't know why) and more context switches (but it is certainly a consequence of the longer run time rather than a cause).
I therefore have two questions:
Why is ctypes so slow ?
Is there a way to improve performances, either with ctype or with another library?

The solution is to use the array module and cast the address or use the from_buffer method...
import timeit
setup="from array import array; import ctypes; t = [i for i in range(1000000)];"
print(timeit.timeit(stmt="v = array('I',t);assert v.itemsize == 4; addr, count = v.buffer_info();p = ctypes.cast(addr,ctypes.POINTER(ctypes.c_uint32))",setup=setup,number=10))
print(timeit.timeit(stmt="v = array('I',t);a = (ctypes.c_uint32 * len(v)).from_buffer(v)",setup=setup,number=10))
print(timeit.timeit(stmt='(ctypes.c_uint32 * len(t))(*t)',setup=setup,number=10))
print(timeit.timeit(stmt='set(t)',setup=setup,number=10))
It is then many times faster when using Python 3:
$ python3 convert.py
0.08303386811167002
0.08139665238559246
1.5630637975409627
0.3013848252594471

While this is not a definitive answer, the problem seems to be the constructor call with *t. Doing the following instead, decreases the overhead significantly:
array = (ctypes.c_uint32 * len(t))()
array[:] = t
Test:
import timeit
setup="from array import array; import ctypes; t = [i for i in range(1000000)];"
print(timeit.timeit(stmt='(ctypes.c_uint32 * len(t))(*t)',setup=setup,number=10))
print(timeit.timeit(stmt='a = (ctypes.c_uint32 * len(t))(); a[:] = t',setup=setup,number=10))
print(timeit.timeit(stmt='array("I",t)',setup=setup,number=10))
print(timeit.timeit(stmt='set(t)',setup=setup,number=10))
Output:
1.7090932869978133
0.3084979929990368
0.08278547400186653
0.2775516299989249

Python Arduino serial save to file

sorry for my english.
My Arduino serial give 3 values like this, at 300 Hz:
-346 54 -191
-299 12 -123
-497 -214 77
-407 -55 -19
45 129 46
297 123 -197
393 71 -331
544 115 -273
515 -355 -89
510 -183 -47
Whit this python code I read and write correctly serial to file but after the while cycle do not terminate, and the shell remain open, and do not print stop:
...
ard=serial.Serial(portname,baudrate)
print"start"
while True:
x = ard.readline()
#print x
a=open(filename,'ab')
a.write(x)
a.close
print "stop"
...
I a biginner programmer, can you tell me a solution, to write serial to file and go forward.
Tanks

You're never breaking from the while loop. You should:
Add a timeout to the serial reader
When there are no bytes received, break the loop
Taking your code as a base, try something like this:
...
ard=serial.Serial(addr,baud)
ard.timeout = 1 # in seconds
print"start"
while True:
x = ard.readline()
if len(x) == 0:
break
a=open(fname,'ab')
a.write(x)
a.close
print "stop"
...

It works!
I have use ard.timeout and if condition (just if condition alone do not work).
An other question,
My arduino serial start and terminate like this:
Start
-663 -175 76
361 47 157
425 -229 -174
531 -283 -288
518 -40 -28
538 -228 206
581 188 174
445 5 176
end
It's possible to start write file after "Start" string and terminate before "end" string?
I have tried something like this but do not work:
while True:
x = ard.readline()
if x=="end":
break
#print x
a=open(fname,'ab')
a.write(x)
a.close
Blockquote
enter code here

Python Recursive Depth First Search Performance Tips (Is Cython an option?)

I have a project that relies on finding all cycles in a graph that pass through a vertex at most k times. Naturally, I'm sticking with the case of k=1 for the sake of development right now. I've come to the conclusion that this algorithm as a depth first search is at worst O((kn)^(kn)) for a complete graph, but I rarely approach this upper bound in the context of the problem, so I would still like to give this approach a try.
I've implemented the following as a part of the project to achieve this end:
class Graph(object):
...
def path_is_valid(self, current_path):
"""
:param current_path:
:return: Boolean indicating a whether the given path is valid
"""
length = len(current_path)
if length < 3:
# The path is too short
return False
# Passes through vertex twice... sketchy for general case
if len(set(current_path)) != len(current_path):
return False
# The idea here is take a moving window of width three along the path
# and see if it's contained entirely in a polygon.
arc_triplets = (current_path[i:i+3] for i in xrange(length-2))
for triplet in arc_triplets:
for face in self.non_fourgons:
if set(triplet) <= set(face):
return False
# This is all kinds of unclear when looking at. There is an edge case
# pertaining to the beginning and end of a path existing inside of a
# polygon. The previous filter will not catch this, so we cycle the path
# and recheck moving window filter.
path_copy = list(current_path)
for i in xrange(length):
path_copy = path_copy[1:] + path_copy[:1] # wtf
arc_triplets = (path_copy[i:i+3] for i in xrange(length-2))
for triplet in arc_triplets:
for face in self.non_fourgons:
if set(triplet) <= set(face):
return False
return True
def cycle_dfs(self, current_node, start_node, graph, current_path):
"""
:param current_node:
:param start_node:
:param graph:
:param current_path:
:return:
"""
if len(current_path) >= 3:
last_three_vertices = current_path[-3:]
previous_three_faces = [set(self.faces_containing_arcs[vertex])
for vertex in last_three_vertices]
intersection_all = set.intersection(*previous_three_faces)
if len(intersection_all) == 2:
return []
if current_node == start_node:
if self.path_is_valid(current_path):
return [tuple(shift(list(current_path)))]
else:
return []
else:
loops = []
for adjacent_node in set(graph[current_node]):
current_path.append(adjacent_node)
graph[current_node].remove(adjacent_node)
graph[adjacent_node].remove(current_node)
loops += list(self.cycle_dfs(adjacent_node, start_node,
graph, current_path))
graph[current_node].append(adjacent_node)
graph[adjacent_node].append(current_node)
current_path.pop()
return loops
path_is_valid() aims to cut down on the number of paths produced by the depth first search as they are found, based upon filtering criteria that are specific to the problem. I tried to explain the purpose of each one reasonably, but everything is clearer in one's own head; I'd be happy to improve the comments if needed.
I'm open to any and all suggestions to improve performance, since, as the profile below shows, this is what is taking all my time.
Also, I'm about to turn to Cython, but my code heavily relies on Python objects and I don't know if that's a smart move. Can anyone shed some light as to whether or not this route is even beneficial with this many native Python data structures involved? I can't seem to find much information on this and any help would be appreciated.
Since I know people will ask, I have profiled my entire project and this is the source of the problem:
311 1 18668669 18668669.0 99.6 cycles = self.graph.find_cycles()
Here's the line-profiled output of the self.graph.find_cycles() and self.path_is_valid():
Function: cycle_dfs at line 106
Total time: 11.9584 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
106 def cycle_dfs(self, current_node, start_node, graph, current_path):
107 """
108 Naive depth first search applied to the pseudo-dual graph of the
109 reference curve. This sucker is terribly inefficient. More to come.
110 :param current_node:
111 :param start_node:
112 :param graph:
113 :param current_path:
114 :return:
115 """
116 437035 363181 0.8 3.6 if len(current_path) >= 3:
117 436508 365213 0.8 3.7 last_three_vertices = current_path[-3:]
118 436508 321115 0.7 3.2 previous_three_faces = [set(self.faces_containing_arcs[vertex])
119 1746032 1894481 1.1 18.9 for vertex in last_three_vertices]
120 436508 539400 1.2 5.4 intersection_all = set.intersection(*previous_three_faces)
121 436508 368725 0.8 3.7 if len(intersection_all) == 2:
122 return []
123
124 437035 340937 0.8 3.4 if current_node == start_node:
125 34848 1100071 31.6 11.0 if self.path_is_valid(current_path):
126 486 3400 7.0 0.0 return [tuple(shift(list(current_path)))]
127 else:
128 34362 27920 0.8 0.3 return []
129
130 else:
131 402187 299968 0.7 3.0 loops = []
132 839160 842350 1.0 8.4 for adjacent_node in set(graph[current_node]):
133 436973 388646 0.9 3.9 current_path.append(adjacent_node)
134 436973 438763 1.0 4.4 graph[current_node].remove(adjacent_node)
135 436973 440220 1.0 4.4 graph[adjacent_node].remove(current_node)
136 436973 377422 0.9 3.8 loops += list(self.cycle_dfs(adjacent_node, start_node,
137 436973 379207 0.9 3.8 graph, current_path))
138 436973 422298 1.0 4.2 graph[current_node].append(adjacent_node)
139 436973 388651 0.9 3.9 graph[adjacent_node].append(current_node)
140 436973 412489 0.9 4.1 current_path.pop()
141 402187 285471 0.7 2.9 return loops
Function: path_is_valid at line 65
Total time: 1.6726 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
65 def path_is_valid(self, current_path):
66 """
67 Aims to implicitly filter during dfs to decrease output size. Observe
68 that more complex filters are applied further along in the function.
69 We'd rather do less work to show the path is invalid rather than more,
70 so filters are applied in order of increasing complexity.
71 :param current_path:
72 :return: Boolean indicating a whether the given path is valid
73 """
74 34848 36728 1.1 2.2 length = len(current_path)
75 34848 33627 1.0 2.0 if length < 3:
76 # The path is too short
77 99 92 0.9 0.0 return False
78
79 # Passes through arcs twice... Sketchy for later.
80 34749 89536 2.6 5.4 if len(set(current_path)) != len(current_path):
81 31708 30402 1.0 1.8 return False
82
83 # The idea here is take a moving window of width three along the path
84 # and see if it's contained entirely in a polygon.
85 3041 6287 2.1 0.4 arc_triplets = (current_path[i:i+3] for i in xrange(length-2))
86 20211 33255 1.6 2.0 for triplet in arc_triplets:
87 73574 70670 1.0 4.2 for face in self.non_fourgons:
88 56404 94019 1.7 5.6 if set(triplet) <= set(face):
89 2477 2484 1.0 0.1 return False
90
91 # This is all kinds of unclear when looking at. There is an edge case
92 # pertaining to the beginning and end of a path existing inside of a
93 # polygon. The previous filter will not catch this, so we cycle the path
94 # a reasonable amount and recheck moving window filter.
95 564 895 1.6 0.1 path_copy = list(current_path)
96 8028 7771 1.0 0.5 for i in xrange(length):
97 7542 14199 1.9 0.8 path_copy = path_copy[1:] + path_copy[:1] # wtf
98 7542 11867 1.6 0.7 arc_triplets = (path_copy[i:i+3] for i in xrange(length-2))
99 125609 199100 1.6 11.9 for triplet in arc_triplets:
100 472421 458030 1.0 27.4 for face in self.non_fourgons:
101 354354 583106 1.6 34.9 if set(triplet) <= set(face):
102 78 83 1.1 0.0 return False
103
104 486 448 0.9 0.0 return True
Thanks!
EDIT: Well, after a lot of merciless profiling, I was able to bring the run time down from 12 seconds to ~1.5.
I changed this portion of cycle_dfs()
last_three_vertices = current_path[-3:]
previous_three_faces = [set(self.faces_containing_arcs[vertex])
for vertex in last_three_vertices]
intersection_all = set.intersection(*previous_three_faces)
if len(intersection_all) == 2: ...
to this:
# Count the number of times each face appears by incrementing values
# of face_id's
containing_faces = defaultdict(lambda: 0)
for face in (self.faces_containing_arcs[v]
for v in current_path[-3:]):
for f in face:
containing_faces[f] += 1
# If there's any face_id f that has a value of three, that means that
# there is one face that all three arcs bound. This is a trivial path
# so we discard it.
if 3 in containing_faces.values(): ...
This was motivated by another post I saw benchmarking Python dictionary assignment; turns out assigning and editing values in a dict is a tiny bit slower than adding integers (which still blows my mind). Along with the two additions to self.path_is_valid(), I squeaked out a 12x speedup. However, further suggestions would be appreciated since better performance overall will only make harder problems easier as the input complexity grows.

I would recommend two optimizations for path_is_valid. Of course, your main problem is in cycle_dfs, and you probably just need a better algorithm.
1) Avoid creating extra data structures:
for i in xrange(length-2):
for face in self.non_fourgons:
if (path[i] in face && path[i+1] in face && path[i+2] in face):
return False
2) Create a dictionary mapping points to the non_fourgons they are members of:
for i in xrange(length-2):
for face in self.non_fourgons[ path[i] ]:
if (path[i+1] in face && path[i+2] in face):
return False
The expression self.non_fourgons[ p ] should return a list of the non-fourgons
which contain p as a member. This reduces the number of polygons you have to check.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Circumventing input-to-action latency in mpl's plt.draw() function - python

Related

Gensim Summarizer throws MemoryError, Any Solution?

"Read_Ncol" exit with error code -1073740791

Why is ctypes so slow to convert a Python list to a C array?

Python Arduino serial save to file

Python Recursive Depth First Search Performance Tips (Is Cython an option?)

Categories

Resources