How can I execute ser.readline() at a controlled rate, say every 0.002 seconds? The following code below in Python returns a list of varying sizes after every run, meaning the sampling rate varies every time. I was wondering if there was a controlled way of reading from the serial port given a desired sampling rate of 500 scans/second:
import numpy as np
from time import time
import serial
ser = serial.Serial('COM3', 115200, timeout=1)
ser.flushInput()
digital_data = np.array([])
# Set the end time 60 seconds from start
te = time() + 60
# While loop runs for 60 seconds
while time() <= te:
digital_data = append(digital_data, ser.readline().decode('utf-8'))
print(len(digital_data)) # Varies in size for each run
This may not be exact depending on the OS you're running...
import numpy as np
import time
import serial
ser = serial.Serial('COM3', 115200, timeout=1)
ser.flushInput()
digital_data = np.array([])
# Set the end time 60 seconds from start
te = time.time() + 60
delay = 0.002
while time.time() <= te:
start = time.time()
digital_data = append(digital_data, ser.readline().decode('utf-8'))
duration = time.time() - start
if duration < delay:
time.sleep(delay - duration)
print(len(digital_data)) # Varies in size for each run
Related
I have recently started looking into using cuda for optimising searches over numeric arrays. I have a simplified piece of code below which demonstrates the issue.
import numpy as np
import time
from numba import cuda
#cuda.jit
def count_array4(device_array, pivot_point, device_output_array):
for i in range(len(device_array)):
if (pivot_point - 0.05) < device_array[i] < (pivot_point + 0.05):
device_output_array[i] = True
else:
device_output_array[i] = False
width = 512
height = 512
size = width * height
print(f'Number of records {size}')
array_of_random = np.random.rand(size)
device_array = cuda.to_device(array_of_random)
start = time.perf_counter()
device_output_array = cuda.device_array(size)
print(f'Copy Host to Device: {time.perf_counter() - start}')
for x in range(10):
start = time.perf_counter()
count_array4[512, 512](device_array, .5, device_output_array)
print(f'Run: {x} Time: {time.perf_counter() - start}')
start = time.perf_counter()
output_array = device_output_array.copy_to_host()
print(f'Copy Device to Host: {time.perf_counter() - start}')
print(np.sum(output_array))
This gives me the expected optimization in processing, however the time it takes to return the data to the host seems extremely high.
Number of records 262144
Copy Host to Device: 0.00031610000000004135
Run: 0 Time: 0.0958601
Run: 1 Time: 0.0001626999999999601
Run: 2 Time: 0.00012100000000003774
Run: 3 Time: 0.00011590000000005762
Run: 4 Time: 0.00011419999999995323
Run: 5 Time: 0.0001126999999999656
Run: 6 Time: 0.00011289999999997136
Run: 7 Time: 0.0001122999999999541
Run: 8 Time: 0.00011490000000002887
Run: 9 Time: 0.00011200000000000099
Copy Device to Host: 13.0583358
26110.0
I'm fairly sure that I'm missing something basic here, or a technique that I don't know the correct term to search for. If anyone can point me in the right direction I'd be very grateful.
Kernel launches are asynchronous and the driver can queue multiple launches. As a result, you are measuring only kernel launch overhead within the loop, and then the data transfer, which is a blocking call, captures all the kernel execution time. You can change this behaviour by modifying your code like this:
for x in range(10):
start = time.perf_counter()
count_array4[512, 512](device_array, .5, device_output_array)
cuda.synchronize()
print(f'Run: {x} Time: {time.perf_counter() - start}')
The synchronize call ensures each kernel launch is completed and the device is idle before another kernel launches. The effect should be that each kernel run time will increase, the indicated transfer time will decrease.
I'm trying so simulate coin tosses and profits and plot the graph in matplotlib:
from random import choice
import matplotlib.pyplot as plt
import time
start_time = time.time()
num_of_graphs = 2000
tries = 2000
coins = [150, -100]
last_loss = 0
for a in range(num_of_graphs):
profit = 0
line = []
for i in range(tries):
profit = profit + choice(coins)
if (profit < 0 and last_loss < i):
last_loss = i
line.append(profit)
plt.plot(line)
plt.show()
print("--- %s seconds ---" % (time.time() - start_time))
print("No losses after " + str(last_loss) + " iterations")
The end result is
--- 9.30498194695 seconds ---
No losses after 310 iterations
Why is it taking so long to run this script? If I change num_of_graphs to 10000, the scripts never finishes.
How would you optimize this?
Your measure of execution time is too rough. The following allows you to measure the time needed for the simulation, separate from the time needed for plotting:
It is using numpy.
import matplotlib.pyplot as plt
import numpy as np
import time
def run_sims(num_sims, num_flips):
start = time.time()
sims = [np.random.choice(coins, num_flips).cumsum() for _ in range(num_sims)]
end = time.time()
print(f"sim time = {end-start}")
return sims
def plot_sims(sims):
start = time.time()
for line in sims:
plt.plot(line)
end = time.time()
print(f"plotting time = {end-start}")
plt.show()
if __name__ == '__main__':
start_time = time.time()
num_sims = 2000
num_flips = 2000
coins = np.array([150, -100])
plot_sims(run_sims(num_sims, num_flips))
result:
sim time = 0.13962197303771973
plotting time = 6.621474981307983
As you can see, the sim time is greatly reduced (it was on the order of 7 seconds on my 2011 laptop); The plotting time is matplotlib dependent.
matplotlib is getting slower as the script progresses because it is
redrawing all of the lines that you have previously plotted - even the
ones that have scrolled off the screen.
This is the answer from a previous post answered by Simon Gibbons.
matplotlib isn't optimized for speed, rather its graphics. Here are the links to a few which were developed for speed:
http://www.pyqtgraph.org/
http://code.google.com/p/guiqwt/
http://code.enthought.com/projects/chaco/
You can refer to the matplotlib cookbook for more about performance.
In order to better optimize your code, I would always try to replace loops by vectorization using numpy or, depending on my specific needs, other libraries that use numpy under the hood.
In this case, you could calculate and plot your profits this way:
import matplotlib.pyplot as plt
import time
import numpy as np
start_time = time.time()
num_of_graphs = 2000
tries = 2000
coins = [150, -100]
# Create a 2-D array with random choices
# rows for tries, columns for individual runs (graphs).
coin_tosses = np.random.choice(coins, (tries, num_of_graphs))
# Caculate 2-D array of profits by summing
# cumulatively over rows (trials).
profits = coin_tosses.cumsum(axis=0)
# Plot everything in one shot.
plt.plot(profits)
plt.show()
print("--- %s seconds ---" % (time.time() - start_time))
In my configuration, this code took aprox. 6.3 seconds (6.2 plotting) to run, while your code took almost 15 seconds.
Background:
This blog reported speed benefits from using numpy.fromiter() over numpy.array(). Using the provided script as a based, I wanted to see the benefits of numpy.fromiter() when executed in the map() and submit() methods in python's concurrent.futures.ProcessPoolExecutor class.
Below are my findings for a 2 seconds run:
It is clear that numpy.fromiter() is faster than numpy.array() when the array size is <256 in general.
However the performances of numpy.fromiter() and numpy.array() can be significantly poorer than a series run, and are not consistent, when executed by the map() and submit() methods in python's concurrent.futures.ProcessPoolExecutor class.
Questions:
Can the inconsistent and poorer performances of numpy.fromiter() and numpy.array() when used in map() and submit() methods in python's concurrent.futures.ProcessPoolExecutor class be avoided? How can I improve my scripts?
The python scripts that I had used for this benchmarking are given below.
map():
#!/usr/bin/env python3.5
import concurrent.futures
from itertools import chain
import time
import numpy as np
import pygal
from os import path
list_sizes = [2**x for x in range(1, 11)]
seconds = 2
def test(size_array):
pyarray = [float(x) for x in range(size_array)]
start = time.time()
iterations = 0
while time.time() - start <= seconds:
np.fromiter(pyarray, dtype=np.float32, count=size_array)
iterations += 1
fromiter_count = iterations
# array
start = time.time()
iterations = 0
while time.time() - start <= seconds:
np.array(pyarray, dtype=np.float32)
iterations += 1
array_count = iterations
#return array_count, fromiter_count
return size_array, array_count, fromiter_count
begin = time.time()
results = {}
with concurrent.futures.ProcessPoolExecutor(max_workers=6) as executor:
data = list(chain.from_iterable(executor.map(test, list_sizes)))
print('data = ', data)
for i in range( 0, len(data), 3 ):
res = tuple(data[i+1:i+3])
size_array = data[i]
results[size_array] = res
print("Result for size {} in {} seconds: {}".format(size_array,seconds,res))
out_folder = path.dirname(path.realpath(__file__))
print("Create diagrams in {}".format(out_folder))
chart = pygal.Line()
chart.title = "Performance in {} seconds".format(seconds)
chart.x_title = "Array size"
chart.y_title = "Iterations"
array_result = []
fromiter_result = []
x_axis = sorted(results.keys())
print(x_axis)
chart.x_labels = x_axis
chart.add('np.array', [results[x][0] for x in x_axis])
chart.add('np.fromiter', [results[x][1] for x in x_axis])
chart.render_to_png(path.join(out_folder, 'result_{}_concurrent_futures_map.png'.format(seconds)))
end = time.time()
compute_time = end - begin
print("Program Time = ", compute_time)
submit():
#!/usr/bin/env python3.5
import concurrent.futures
from itertools import chain
import time
import numpy as np
import pygal
from os import path
list_sizes = [2**x for x in range(1, 11)]
seconds = 2
def test(size_array):
pyarray = [float(x) for x in range(size_array)]
start = time.time()
iterations = 0
while time.time() - start <= seconds:
np.fromiter(pyarray, dtype=np.float32, count=size_array)
iterations += 1
fromiter_count = iterations
# array
start = time.time()
iterations = 0
while time.time() - start <= seconds:
np.array(pyarray, dtype=np.float32)
iterations += 1
array_count = iterations
return size_array, array_count, fromiter_count
begin = time.time()
results = {}
with concurrent.futures.ProcessPoolExecutor(max_workers=6) as executor:
future_to_size_array = {executor.submit(test, size_array):size_array
for size_array in list_sizes}
data = list(chain.from_iterable(
f.result() for f in concurrent.futures.as_completed(future_to_size_array)))
print('data = ', data)
for i in range( 0, len(data), 3 ):
res = tuple(data[i+1:i+3])
size_array = data[i]
results[size_array] = res
print("Result for size {} in {} seconds: {}".format(size_array,seconds,res))
out_folder = path.dirname(path.realpath(__file__))
print("Create diagrams in {}".format(out_folder))
chart = pygal.Line()
chart.title = "Performance in {} seconds".format(seconds)
chart.x_title = "Array size"
chart.y_title = "Iterations"
x_axis = sorted(results.keys())
print(x_axis)
chart.x_labels = x_axis
chart.add('np.array', [results[x][0] for x in x_axis])
chart.add('np.fromiter', [results[x][1] for x in x_axis])
chart.render_to_png(path.join(out_folder, 'result_{}_concurrent_futures_submitv2.png'.format(seconds)))
end = time.time()
compute_time = end - begin
print("Program Time = ", compute_time)
Serial:(with minor changes to original code)
#!/usr/bin/env python3.5
import time
import numpy as np
import pygal
from os import path
list_sizes = [2**x for x in range(1, 11)]
seconds = 2
def test(size_array):
pyarray = [float(x) for x in range(size_array)]
# fromiter
start = time.time()
iterations = 0
while time.time() - start <= seconds:
np.fromiter(pyarray, dtype=np.float32, count=size_array)
iterations += 1
fromiter_count = iterations
# array
start = time.time()
iterations = 0
while time.time() - start <= seconds:
np.array(pyarray, dtype=np.float32)
iterations += 1
array_count = iterations
return array_count, fromiter_count
begin = time.time()
results = {}
for size_array in list_sizes:
res = test(size_array)
results[size_array] = res
print("Result for size {} in {} seconds: {}".format(size_array,seconds,res))
out_folder = path.dirname(path.realpath(__file__))
print("Create diagrams in {}".format(out_folder))
chart = pygal.Line()
chart.title = "Performance in {} seconds".format(seconds)
chart.x_title = "Array size"
chart.y_title = "Iterations"
x_axis = sorted(results.keys())
print(x_axis)
chart.x_labels = x_axis
chart.add('np.array', [results[x][0] for x in x_axis])
chart.add('np.fromiter', [results[x][1] for x in x_axis])
#chart.add('np.array', [x[0] for x in results.values()])
#chart.add('np.fromiter', [x[1] for x in results.values()])
chart.render_to_png(path.join(out_folder, 'result_{}_serial.png'.format(seconds)))
end = time.time()
compute_time = end - begin
print("Program Time = ", compute_time)
The reason for the inconsistent and poor performances of numpy.fromiter() and numpy.array() that I had encountered earlier appears to be associated to the number of CPUs used by concurrent.futures.ProcessPoolExecutor. I had earlier used 6 CPUs. Below diagrams shows the corresponding performances of numpy.fromiter() and numpy.array() when 2, 4, 6, and 8 CPUs were used. These diagrams show that there exists an optimum number of CPUs that can be used. Using too many CPUs (i.e. >4 CPUs) can be detrimental for small array sizes (<512 elements). Example, >4 CPUs can cause slower performances (by a factor of 1/2) and even inconsistent performances when compared to a serial run.
I always sure that there is no point to have more threads/processes than CPU cores (from the performance perspective). However, my python sample shows me a different result.
import concurrent.futures
import random
import time
def doSomething(task_num):
print("executing...", task_num)
time.sleep(1) # simulate heavy operation that takes ~ 1 second
return random.randint(1, 10) * random.randint(1, 500) # real operation, used random to avoid caches and so on...
def main():
# This part is not taken in consideration because I don't want to
# measure the worker creation time
executor = concurrent.futures.ProcessPoolExecutor(max_workers=60)
start_time = time.time()
for i in range(1, 100): # execute 100 tasks
executor.map(doSomething, [i, ])
executor.shutdown(wait=True)
print("--- %s seconds ---" % (time.time() - start_time))
if __name__ == '__main__':
main()
Program results:
1 WORKER --- 100.28233647346497 seconds ---
2 WORKERS --- 50.26122164726257 seconds ---
3 WORKERS --- 33.32741022109985 seconds ---
4 WORKERS --- 25.399883031845093 seconds ---
5 WORKERS --- 20.434186220169067 seconds ---
10 WORKERS--- 10.903695344924927 seconds ---
50 WORKERS--- 6.363946914672852 seconds ---
60 WORKERS--- 4.819359302520752 seconds ---
How this can work faster having just 4 Logical processors ?
Here is my Computer specifications (Tested on Windows 8 and Ubuntu 14):
CPU Intel(R) Core(TM) i5-3210M CPU # 2.50GHz
Sockets: 1
Cores: 2
Logical processors: 4
The reason is because sleep() only uses a negligible amount of CPU. In this case, it is a poor simulation of actual work performed by a thread.
All sleep() really does is suspend the thread until the timer expires. While the thread is suspended, it doesn't use any CPU cycles.
I extended your example with a more intensive computation (eg. matrix inversion). You will see what you expected: the computation time will decrease to the number of cores and increase afterwards (because of the cost of context switching).
import concurrent.futures
import random
import time
import numpy as np
import matplotlib.pyplot as plt
def doSomething(task_num):
print("executing...", task_num)
for i in range(100000):
A = np.random.normal(0,1,(1000,1000))
B = np.inv(A)
return random.randint(1, 10) * random.randint(1, 500) # real operation, used random to avoid caches and so on...
def measureTime(nWorkers: int):
executor = concurrent.futures.ProcessPoolExecutor(max_workers=nWorkers)
start_time = time.time()
for i in range(1, 40): # execute 100 tasks
executor.map(doSomething, [i, ])
executor.shutdown(wait=True)
return (time.time() - start_time)
def main():
# This part is not taken in consideration because I don't want to
# measure the worker creation time
maxWorkers = 20
dT = np.zeros(maxWorkers)
for i in range(maxWorkers):
dT[i] = measureTime(i+1)
print("--- %s seconds ---" % dT[i])
plt.plot(np.linspace(1,maxWorkers, maxWorkers), dT)
plt.show()
if __name__ == '__main__':
main()
I am curious how to alternate between two loops. The 1st loop takes a set of pictures. The second loop deletes all of them. I want to take pics, delete, take, delete infinitely (or at least for a very long period of time).
import time
import picamera
import webbrowser
import io
import os
frames = 60
deletecount = 0
def filenames():
frame = 0
while frame < frames:
yield 'image%02d.jpg' % frame
frame += 1
with picamera.PiCamera() as camera:
camera.resolution = (1024, 768)
camera.framerate = 60
camera.start_preview()
time.sleep(1)
start = time.time()
camera.capture_sequence(filenames(),use_video_port=True)
finish = time.time() #takes 60 pics
while deletecount < frames:
if os.path.exists("/home/pi/image%02d.jpg"%deletecount):
os.remove("/home/pi/image%02d.jpg"%deletecount)
deletecount += 1