I have an Arduino 33 BLE that is updating a few bluetooth characteristics with a string representation of BNO055 sensor calibration and quaternion data. On the Arduino side, I see the calibration and quaternion data getting updated in a nice orderly sequence as expected.
I have a Python (3.9) program running on Windows 10 that uses asyncio to subscribe to the characteristics on the Arduino to read the updates. Everything works fine when I have an update rate on the Arduino of 1/second. By "works fine" I mean I see the orderly sequence of updates: quaternion, calibration, quaternion, calibration,.... The problem I have is that I changed the update rate to the 10/second (100ms delay in Arduino) and now I am getting, for example, 100 updates for quaternion data but only 50 updates for calibration data when the number of updates should be equal. Somehow I'm not handling the updates properly on the python side.
The python code is listed below:
import asyncio
import pandas as pd
from bleak import BleakClient
from bleak import BleakScanner
ardAddress = ''
found = ''
exit_flag = False
temperaturedata = []
timedata = []
calibrationdata=[]
quaterniondata=[]
# loop: asyncio.AbstractEventLoop
tempServiceUUID = '0000290c-0000-1000-8000-00805f9b34fb' # Temperature Service UUID on Arduino 33 BLE
stringUUID = '00002a56-0000-1000-8000-00805f9b34fb' # Characteristic of type String [Write to Arduino]
inttempUUID = '00002a1c-0000-1000-8000-00805f9b34fb' # Characteristic of type Int [Temperature]
longdateUUID = '00002a08-0000-1000-8000-00805f9b34fb' # Characteristic of type Long [datetime millis]
strCalibrationUUID = '00002a57-0000-1000-8000-00805f9b34fb' # Characteristic of type String [BNO055 Calibration]
strQuaternionUUID = '9e6c967a-5a87-49a1-a13f-5a0f96188552' # Characteristic of type Long [BNO055 Quaternion]
async def scanfordevices():
devices = await BleakScanner.discover()
for d in devices:
print(d)
if (d.name == 'TemperatureMonitor'):
global found, ardAddress
found = True
print(f'{d.name=}')
print(f'{d.address=}')
ardAddress = d.address
print(f'{d.rssi=}')
return d.address
async def readtemperaturecharacteristic(client, uuid: str):
val = await client.read_gatt_char(uuid)
intval = int.from_bytes(val, byteorder='little')
print(f'readtemperaturecharacteristic: Value read from: {uuid} is: {val} | as int={intval}')
async def readdatetimecharacteristic(client, uuid: str):
val = await client.read_gatt_char(uuid)
intval = int.from_bytes(val, byteorder='little')
print(f'readdatetimecharacteristic: Value read from: {uuid} is: {val} | as int={intval}')
async def readcalibrationcharacteristic(client, uuid: str):
# Calibration characteristic is a string
val = await client.read_gatt_char(uuid)
strval = val.decode('UTF-8')
print(f'readcalibrationcharacteristic: Value read from: {uuid} is: {val} | as string={strval}')
async def getservices(client):
svcs = await client.get_services()
print("Services:")
for service in svcs:
print(service)
ch = service.characteristics
for c in ch:
print(f'\tCharacteristic Desc:{c.description} | UUID:{c.uuid}')
def notification_temperature_handler(sender, data):
"""Simple notification handler which prints the data received."""
intval = int.from_bytes(data, byteorder='little')
# TODO: review speed of append vs extend. Extend using iterable but is faster
temperaturedata.append(intval)
#print(f'Temperature: Sender: {sender}, and byte data= {data} as an Int={intval}')
def notification_datetime_handler(sender, data):
"""Simple notification handler which prints the data received."""
intval = int.from_bytes(data, byteorder='little')
timedata.append(intval)
#print(f'Datetime: Sender: {sender}, and byte data= {data} as an Int={intval}')
def notification_calibration_handler(sender, data):
"""Simple notification handler which prints the data received."""
strval = data.decode('UTF-8')
numlist=extractvaluesaslist(strval,':')
#Save to list for processing later
calibrationdata.append(numlist)
print(f'Calibration Data: {sender}, and byte data= {data} as a List={numlist}')
def notification_quaternion_handler(sender, data):
"""Simple notification handler which prints the data received."""
strval = data.decode('UTF-8')
numlist=extractvaluesaslist(strval,':')
#Save to list for processing later
quaterniondata.append(numlist)
print(f'Quaternion Data: {sender}, and byte data= {data} as a List={numlist}')
def extractvaluesaslist(raw, separator=':'):
# Get everything after separator
s1 = raw.split(sep=separator)[1]
s2 = s1.split(sep=',')
return list(map(float, s2))
async def runmain():
# Based on code from: https://github.com/hbldh/bleak/issues/254
global exit_flag
print('runmain: Starting Main Device Scan')
await scanfordevices()
print('runmain: Scan is done, checking if found Arduino')
if found:
async with BleakClient(ardAddress) as client:
print('runmain: Getting Service Info')
await getservices(client)
# print('runmain: Reading from Characteristics Arduino')
# await readdatetimecharacteristic(client, uuid=inttempUUID)
# await readcalibrationcharacteristic(client, uuid=strCalibrationUUID)
print('runmain: Assign notification callbacks')
await client.start_notify(inttempUUID, notification_temperature_handler)
await client.start_notify(longdateUUID, notification_datetime_handler)
await client.start_notify(strCalibrationUUID, notification_calibration_handler)
await client.start_notify(strQuaternionUUID, notification_quaternion_handler)
while not exit_flag:
await asyncio.sleep(1)
# TODO: This does nothing. Understand why?
print('runmain: Stopping notifications.')
await client.stop_notify(inttempUUID)
print('runmain: Write to characteristic to let it know we plan to quit.')
await client.write_gatt_char(stringUUID, 'Stopping'.encode('ascii'))
else:
print('runmain: Arduino not found. Check that its on')
print('runmain: Done.')
def main():
# get main event loop
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(runmain())
except KeyboardInterrupt:
global exit_flag
print('\tmain: Caught keyboard interrupt in main')
exit_flag = True
finally:
pass
print('main: Getting all pending tasks')
# From book Pg 26.
pending = asyncio.all_tasks(loop=loop)
print(f'\tmain: number of tasks={len(pending)}')
for task in pending:
task.cancel()
group = asyncio.gather(*pending, return_exceptions=True)
print('main: Waiting for tasks to complete')
loop.run_until_complete(group)
loop.close()
# Display data recorded in Dataframe
if len(temperaturedata)==len(timedata):
print(f'Temperature data len={len(temperaturedata)}, and len of timedata={len(timedata)}')
df = pd.DataFrame({'datetime': timedata,
'temperature': temperaturedata})
#print(f'dataframe shape={df.shape}')
#print(df)
df.to_csv('temperaturedata.csv')
else:
print(f'No data or lengths different: temp={len(temperaturedata)}, time={len(timedata)}')
if len(quaterniondata)==len(calibrationdata):
print('Processing Quaternion and Calibration Data')
#Load quaternion data
dfq=pd.DataFrame(quaterniondata,columns=['time','qw','qx','qy','qz'])
print(f'Quaternion dataframe shape={dfq.shape}')
#Add datetime millis data
#dfq.insert(0,'Time',timedata)
#Load calibration data
dfcal=pd.DataFrame(calibrationdata,columns=['time','syscal','gyrocal','accelcal','magcal'])
print(f'Calibration dataframe shape={dfcal.shape}')
#Merge two dataframes together
dffinal=pd.concat([dfq,dfcal],axis=1)
dffinal.to_csv('quaternion_and_cal_data.csv')
else:
print(f'No data or lengths different. Quat={len(quaterniondata)}, Cal={len(calibrationdata)}')
if len(quaterniondata)>0:
dfq = pd.DataFrame(quaterniondata, columns=['time', 'qw', 'qx', 'qy', 'qz'])
dfq.to_csv('quaterniononly.csv')
if len(calibrationdata)>0:
dfcal = pd.DataFrame(calibrationdata, columns=['time','syscal', 'gyrocal', 'accelcal', 'magcal'])
dfcal.to_csv('calibrationonly.csv')
print("main: Done.")
if __name__ == "__main__":
'''Starting Point of Program'''
main()
So, my first question is can anyone help me understand why I do not seem to be getting all the updates in my Python program? I should be seeing notification_quaternion_handler() and notification_calibration_handler() called the same number of times but I am not. I assume I am not using asyncio properly but I am at a loss to debug it at this point?
My second question is, are there best practices for trying to receive relatively high frequency updates from bluetooth, for example every 10-20 ms? I am trying to read IMU sensor data and it needs to be done at a fairly high rate.
This is my first attempt at bluetooth and asyncio so clearly I have a lot to learn.
Thank You for the help
Fantastic answer by #ukBaz.
In summary for other who may have a similar issue.
On the Arduino side I ended up with something like this (important parts only shown):
typedef struct __attribute__ ((packed)) {
unsigned long timeread;
int qw; //float Quaternion values will be scaled to int by multiplying by constant
int qx;
int qy;
int qz;
uint8_t cal_system;
uint8_t cal_gyro;
uint8_t cal_accel;
uint8_t cal_mag;
}sensordata ;
//Declare struct and populate
sensordata datareading;
datareading.timeread=tnow;
datareading.qw=(int) (quat.w()*10000);
datareading.qx=(int) (quat.x()*10000);
datareading.qy=(int) (quat.y()*10000);
datareading.qz=(int) (quat.z()*10000);
datareading.cal_system=system;
datareading.cal_gyro=gyro;
datareading.cal_accel=accel;
datareading.cal_mag=mag;
//Write values to Characteristics.
structDataChar.writeValue((uint8_t *)&datareading, sizeof(datareading));
Then on the Python (Windows Desktop) side I have this to unpack the data being sent:
def notification_structdata_handler(sender, data):
"""Simple notification handler which prints the data received."""
# NOTE: IT IS CRITICAL THAT THE UNPACK BYTE STRUCTURE MATCHES THE STRUCT
# CONFIGURATION SHOWN IN THE ARDUINO C PROGRAM.
# <hh meaning: <=little endian, h=short (2 bytes), b=1 byte, i=int 4 bytes, unsigned long = 4 bytes
#Scale factor used in Arduino to convert floats to ints.
scale=10000
# Main Sensor struct
t,qw,qx,qy,qz,cs,cg,ca,cm= struct.unpack('<5i4b', data)
sensorstructdata.append([t,qw/scale,qx/scale,qy/scale,qz/scale,cs,cg,ca,cm])
print(f'--->Struct Decoded. time={t}, qw={qw/scale}, qx={qx/scale}, qy={qy/scale}, qz={qz/scale},'
f'cal_s={cs}, cal_g={cg}, cal_a={ca}, cal_m={cm}')
Thanks for all the help and as promised the performance is MUCH better than what I started with!
You have multiple characteristics that are being updated at the same frequency. It is more efficient in Bluetooth Low Energy (BLE) to transmit those values in the same characteristic. The other thing I noticed is that you appear to be sending the value as a string. It looks like the string format might "key:value" by the way you are extracting information from the string. This is also inefficient way to send data via BLE.
The data that is transmitted over BLE is always a list of bytes so if a float is required, it needs to be changed into an integer to be sent as bytes. As an example, if we wanted to send a value with two decimal places, multiplying it by 100 would always remove the decimal places. To go the other way it would be divide by 100. e.g:
>>> value = 12.34
>>> send = int(value * 100)
>>> print(send)
1234
>>> send / 100
12.34
The struct library allows integers to be easily packed that into a series of byes to send. As an example:
>>> import struct
>>> value1 = 12.34
>>> value2 = 67.89
>>> send_bytes = struct.pack('<hh', int(value1 * 100), int(value2 * 100))
>>> print(send_bytes)
b'\xd2\x04\x85\x1a'
To then unpack that:
>>> r_val1, r_val2 = struct.unpack('<hh', send_bytes)
>>> print(f'Value1={r_val1/100} : Value2={r_val2/100}')
Value1=12.34 : Value2=67.89
Using a single characteristic with the minimum number of bytes being transmitted should allow for the faster notifications.
To look at how other characteristics do this then look at the following document from the Bluetooth SIG:
https://www.bluetooth.com/specifications/specs/gatt-specification-supplement-5/
A good example might be the Blood Pressure Measurement characteristic.
Related
The consumer bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic my_topic works and I can see the logs as: 2021-12-11T22:40:13.800Z {"ts":1639262395.220755,"uid":"CiaUp427FXwzqySsOh","id.orig_h":"fe80::f816:3eff:fef4:a877","id.orig_p":5353,"id.resp_h":"ff02::fb","id.resp_p":5353,"proto":"udp","service":"dns","duration":0.40987586975097659,"orig_bytes":1437,"resp_bytes":0,"conn_state":"S0","missed_bytes":0,"history":"D","orig_pkts":4,"orig_ip_bytes":1629,"resp_pkts":0,"resp_ip_bytes":0}
However with the consumer code listed below I am getting: JSON Decoder Error: Extra data line 1 column 4 char 4" which seems to be an easy error related to parsing the data, which has each log starting with the date:time as shown above. Meaning the consumer gets the first log but cannot parse it.
Easy enough, yet it seems I cannot get around it as this is part of the KafkaConsumer object. If anyone can give a hint or show how to do it it would be great. Thanks and Regards, M
from json import loads
from kafka import KafkaConsumer, TopicPartition
import threading, time
from IPython.display import clear_output
KAFKA_SERVER='10.10.10.10:9092'
TOPIC = 'my_topic'
AUTO_OFFSET_RESET = 'earliest'
CONSUMER_TIME_OUT=1000 #miliseconds
MAXIMUM_SECONDS=0.01 #seconds
class TrafficConsumer():
def __init__(self, offset=AUTO_OFFSET_RESET, verbose=False, close=True):
try:
self.__traffic_consumer = KafkaConsumer(
TOPIC,
bootstrap_servers = [KAFKA_SERVER],
auto_offset_reset = offset,
enable_auto_commit = True,
#group_id = GROUP_ID,
value_deserializer = lambda x : loads(x.decode('utf-8')),
consumer_timeout_ms = CONSUMER_TIME_OUT,
#on_commit = self.commit_completed(),
)
self.__traffic_consumer.subscribe([TOPIC])
threading.Thread.__init__(self)
self.stop_event = threading.Event()
except Exception as e:
print("Consumer is not accessible. Check: the connections and the settings in attributes_kafka.", e)
self.set_conn_log_traffic(verbose=verbose, close=close)
def stop(self):
self.stop_event.set()
def get_consumer(self):
return self.__traffic_consumer
def set_conn_log_traffic(self, verbose=False, close=True):
while not self.stop_event.is_set():
for ind_flow in self.__traffic_consumer.poll(2):
print(ind_flow)
if self.stop_event.is_set():
break
if close: self.__traffic_consumer.close()
Your data isn't proper json. It includes a timestamp before the json object, which cannot be decoded using json.loads.
You should verify how the producer is sending data since the timestamp is part of the value, rather than the Kafka record timestamp
Or, you can handle the problem in a the consumer by using a different deserializer function
For example
def safe_deserialize(value):
_, data = value.decode('utf-8').split(" ", 1)
return json.loads(data)
...
KafkaConsumer(
...
value_deserializer = safe_deserialize,
My Problem:
The web app I'm building relies on real-time transcription of a user's voice along with timestamps for when each word begins and ends.
Google's Speech-to-Text API has a limit of 4 minutes for streaming requests but I want users to be able to run their mic's for as long as 30 minutes if they so choose.
Thankfully, Google provides its own code examples for how to make successive requests to their Speech-to-Text API in a way that mimics endless streaming speech recognition.
I've adapted their Python infinite streaming example for my purposes (see below for my code). The timestamps provided by Google are pretty accurate but the issue is that when I exceed the streaming limit (4 minutes) and a new request is made, the timestamped transcript returned by Google's API from the new request is off by as much as 5 seconds or more.
Below is an example of the output when I adjust the streaming limit to 10 seconds (so a new request to Google's Speech-to-Text API begins every 10 seconds).
The timestamp you see printed next to each transcribed response (the 'corrected_time' in the code) is the timestamp for the end of the transcribed line, not the beginning. These timestamps are accurate for the first request but are off by ~4 seconds in the second request and ~9 seconds in the third request.
In a Nutshell, I want to make sure that when the streaming limit is exceeded and a new request is made, the timestamps returned by Google for that new request are adjusted accurately.
My Code:
To help you understand what's going on, I would recommend running it on your machine (only takes a couple of minutes to get working if you have a Google Cloud service account).
I've included more detail on my current diagnosis below the code.
#!/usr/bin/env python
"""Google Cloud Speech API sample application using the streaming API.
NOTE: This module requires the dependencies `pyaudio`.
To install using pip:
pip install pyaudio
Example usage:
python THIS_FILENAME.py
"""
# [START speech_transcribe_infinite_streaming]
import os
import re
import sys
import time
from google.cloud import speech
import pyaudio
from six.moves import queue
# Audio recording parameters
STREAMING_LIMIT = 20000 # 20 seconds (originally 4 mins but shortened for testing purposes)
SAMPLE_RATE = 16000
CHUNK_SIZE = int(SAMPLE_RATE / 10) # 100ms
# Environment Variable set for Google Credentials. Put the json service account
# key in the root directory
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'YOUR_SERVICE_ACCOUNT_KEY.json'
def get_current_time():
"""Return Current Time in MS."""
return int(round(time.time() * 1000))
class ResumableMicrophoneStream:
"""Opens a recording stream as a generator yielding the audio chunks."""
def __init__(self, rate, chunk_size):
self._rate = rate
self.chunk_size = chunk_size
self._num_channels = 1
self._buff = queue.Queue()
self.closed = True
self.start_time = get_current_time()
self.restart_counter = 0
self.audio_input = []
self.last_audio_input = []
self.result_end_time = 0
self.is_final_end_time = 0
self.final_request_end_time = 0
self.bridging_offset = 0
self.last_transcript_was_final = False
self.new_stream = True
self._audio_interface = pyaudio.PyAudio()
self._audio_stream = self._audio_interface.open(
format=pyaudio.paInt16,
channels=self._num_channels,
rate=self._rate,
input=True,
frames_per_buffer=self.chunk_size,
# Run the audio stream asynchronously to fill the buffer object.
# This is necessary so that the input device's buffer doesn't
# overflow while the calling thread makes network requests, etc.
stream_callback=self._fill_buffer,
)
def __enter__(self):
self.closed = False
return self
def __exit__(self, type, value, traceback):
self._audio_stream.stop_stream()
self._audio_stream.close()
self.closed = True
# Signal the generator to terminate so that the client's
# streaming_recognize method will not block the process termination.
self._buff.put(None)
self._audio_interface.terminate()
def _fill_buffer(self, in_data, *args, **kwargs):
"""Continuously collect data from the audio stream, into the buffer."""
self._buff.put(in_data)
return None, pyaudio.paContinue
def generator(self):
"""Stream Audio from microphone to API and to local buffer"""
while not self.closed:
data = []
"""
THE BELOW 'IF' STATEMENT IS WHERE THE ERROR IS LIKELY OCCURRING
This statement runs when the streaming limit is hit and a new request is made.
"""
if self.new_stream and self.last_audio_input:
chunk_time = STREAMING_LIMIT / len(self.last_audio_input)
if chunk_time != 0:
if self.bridging_offset < 0:
self.bridging_offset = 0
if self.bridging_offset > self.final_request_end_time:
self.bridging_offset = self.final_request_end_time
chunks_from_ms = round(
(self.final_request_end_time - self.bridging_offset)
/ chunk_time
)
self.bridging_offset = round(
(len(self.last_audio_input) - chunks_from_ms) * chunk_time
)
for i in range(chunks_from_ms, len(self.last_audio_input)):
data.append(self.last_audio_input[i])
self.new_stream = False
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = self._buff.get()
self.audio_input.append(chunk)
if chunk is None:
return
data.append(chunk)
# Now consume whatever other data's still buffered.
while True:
try:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
self.audio_input.append(chunk)
except queue.Empty:
break
yield b"".join(data)
def listen_print_loop(responses, stream):
"""Iterates through server responses and prints them.
The responses passed is a generator that will block until a response
is provided by the server.
Each response may contain multiple results, and each result may contain
multiple alternatives; Here we print only the transcription for the top
alternative of the top result.
In this case, responses are provided for interim results as well. If the
response is an interim one, print a line feed at the end of it, to allow
the next result to overwrite it, until the response is a final one. For the
final one, print a newline to preserve the finalized transcription.
"""
for response in responses:
if get_current_time() - stream.start_time > STREAMING_LIMIT:
stream.start_time = get_current_time()
break
if not response.results:
continue
result = response.results[0]
if not result.alternatives:
continue
transcript = result.alternatives[0].transcript
result_seconds = 0
result_micros = 0
if result.result_end_time.seconds:
result_seconds = result.result_end_time.seconds
if result.result_end_time.microseconds:
result_micros = result.result_end_time.microseconds
stream.result_end_time = int((result_seconds * 1000) + (result_micros / 1000))
corrected_time = (
stream.result_end_time
- stream.bridging_offset
+ (STREAMING_LIMIT * stream.restart_counter)
)
# Display interim results, but with a carriage return at the end of the
# line, so subsequent lines will overwrite them.
if result.is_final:
sys.stdout.write("FINAL RESULT # ")
sys.stdout.write(str(corrected_time/1000) + ": " + transcript + "\n")
stream.is_final_end_time = stream.result_end_time
stream.last_transcript_was_final = True
# Exit recognition if any of the transcribed phrases could be
# one of our keywords.
if re.search(r"\b(exit|quit)\b", transcript, re.I):
sys.stdout.write("Exiting...\n")
stream.closed = True
break
else:
sys.stdout.write("INTERIM RESULT # ")
sys.stdout.write(str(corrected_time/1000) + ": " + transcript + "\r")
stream.last_transcript_was_final = False
def main():
"""start bidirectional streaming from microphone input to speech API"""
client = speech.SpeechClient()
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=SAMPLE_RATE,
language_code="en-US",
max_alternatives=1,
)
streaming_config = speech.StreamingRecognitionConfig(
config=config, interim_results=True
)
mic_manager = ResumableMicrophoneStream(SAMPLE_RATE, CHUNK_SIZE)
print(mic_manager.chunk_size)
sys.stdout.write('\nListening, say "Quit" or "Exit" to stop.\n\n')
sys.stdout.write("End (ms) Transcript Results/Status\n")
sys.stdout.write("=====================================================\n")
with mic_manager as stream:
while not stream.closed:
sys.stdout.write(
"\n" + str(STREAMING_LIMIT * stream.restart_counter) + ": NEW REQUEST\n"
)
stream.audio_input = []
audio_generator = stream.generator()
requests = (
speech.StreamingRecognizeRequest(audio_content=content)
for content in audio_generator
)
responses = client.streaming_recognize(streaming_config, requests)
# Now, put the transcription responses to use.
listen_print_loop(responses, stream)
if stream.result_end_time > 0:
stream.final_request_end_time = stream.is_final_end_time
stream.result_end_time = 0
stream.last_audio_input = []
stream.last_audio_input = stream.audio_input
stream.audio_input = []
stream.restart_counter = stream.restart_counter + 1
if not stream.last_transcript_was_final:
sys.stdout.write("\n")
stream.new_stream = True
if __name__ == "__main__":
main()
# [END speech_transcribe_infinite_streaming]
My Current Diagnosis
The 'corrected_time' is not being set correctly when new requests are made. This is due to the 'bridging_offset' not being set correctly. So what we need to look at is the 'generator()' method in the 'ResumableMicrophoneStream' class.
In the 'generator()' method, there is an 'if' statement which is run when the streaming limit is hit and a new request is made
if self.new_stream and self.last_audio_input:
Its purpose appears to be to take any lingering audio data that wasn't finished being transcribed before the streaming limit was hit and add it to the buffer before any new audio chunks so that it's transcribed in the new request.
It is also the responsibility of this 'if' statement to set the 'bridging offset' but I'm not entirely sure what this offset represents. All I know is that however it is being set, it is not being set accurately.
Time offset values show the beginning and the end of each spoken word
that is recognized in the supplied audio. A time offset value
represents the amount of time that has elapsed from the beginning of
the audio, in increments of 100ms.
This tells us that the offset you are receiving for each of the timestamps that you are running within your project will always make the timestamps from start to finish. That would be my guess as to why it’s causing your application problems.
So here is my use case:
I read from a database rows containing information to make a complex SOAP call (I'm using zeep to do these calls).
One row from the database corresponds to a request to the service.
There can be up to 20 thousand lines, so I don't want to read everything in memory before making the calls.
I need to process the responses - when the
response is OK, I need to store some returned information back into
my database, and when there is an exception I need to process the
exception for that particular request/response pair.
I need also to capture some external information at the time of the request creation, so that I know where to store the response from the request. In my current code I'm using the delightful property of gather() that makes the results come in the same order.
I read the relevant PEPs and Python documentation but I'm still very confused, as there seems to be multiple ways to solve the same problem.
I also went through countless exercises on the web, but the examples are all trivial - it's either asyncio.sleep() or some webscraping with a finite list of urls.
The solution that I have come up so far kinda works - the asyncio.gather() method is very, very, useful, but I have not been able to 'feed' it from a generator. I'm currently just counting to an arbitrary size and then starting a .gather() operation. I've transcribed the code, with boring parts left out and I've tried to anonymise the code
I've tried solutions involving semaphores, queues, different event loops, but I'm failing every time. Ideally I'd like to be able to create Futures 'continuously' - I think I'm missing the logic of 'convert this awaitable call to a future'
I'd be grateful for any help!
import asyncio
from asyncio import Future
import zeep
from zeep.plugins import HistoryPlugin
history = HistoryPlugin()
max_concurrent_calls = 5
provoke_errors = True
def export_data_async(db_variant: str, order_nrs: set):
st = time.time()
results = []
loop = asyncio.get_event_loop()
def get_client1(service_name: str, system: Systems = Systems.ACME) -> Tuple[zeep.Client, zeep.client.Factory]:
client1 = zeep.Client(wsdl=system.wsdl_url(service_name=service_name),
transport=transport,
plugins=[history],
)
factory_ns2 = client1.type_factory(namespace='ns2')
return client1, factory_ns2
table = 'ZZZZ'
moveback_table = 'EEEEEE'
moveback_dict = create_default_empty_ordered_dict('attribute1 attribute2 attribute3 attribute3')
client, factory = get_client1(service_name='ACMEServiceName')
if log.isEnabledFor(logging.DEBUG):
client.wsdl.dump()
zeep_log = logging.getLogger('zeep.transports')
zeep_log.setLevel(logging.DEBUG)
with Db(db_variant) as db:
db.open_db(CON_STRING[db_variant])
db.init_table_for_read(table, order_list=order_nrs)
counter_failures = 0
tasks = []
sids = []
results = []
def handle_future(future: Future) -> None:
results.extend(future.result())
def process_tasks_concurrently() -> None:
nonlocal tasks, sids, counter_failures, results
futures = asyncio.gather(*tasks, return_exceptions=True)
futures.add_done_callback(handle_future)
loop.run_until_complete(futures)
for i, response_or_fault in enumerate(results):
if type(response_or_fault) in [zeep.exceptions.Fault, zeep.exceptions.TransportError]:
counter_failures += 1
log_webservice_fault(sid=sids[i], db=db, err=response_or_fault, object=table)
else:
db.write_dict_to_table(
moveback_table,
{'sid': sids[i],
'attribute1': response_or_fault['XXX']['XXX']['xxx'],
'attribute2': response_or_fault['XXX']['XXX']['XXXX']['XXX'],
'attribute3': response_or_fault['XXXX']['XXXX']['XXX'],
}
)
db.commit_db_con()
tasks = []
sids = []
results = []
return
for row in db.rows(table):
if int(row.id) % 2 == 0 and provoke_errors:
payload = faulty_message_payload(row=row,
factory=factory,
)
else:
payload = message_payload(row=row,
factory=factory,
)
tasks.append(client.service.myRequest(
MessageHeader=factory.MessageHeader(**message_header_arguments(row=row)),
myRequestPayload=payload,
_soapheaders=[security_soap_header],
))
sids.append(row.sid)
if len(tasks) == max_concurrent_calls:
process_tasks_concurrently()
if tasks: # this is the remainder of len(db.rows) % max_concurrent_calls
process_tasks_concurrently()
loop.run_until_complete(transport.session.close())
db.execute_this_statement(statement=update_sql)
db.commit_db_con()
log.info(db.activity_log)
if counter_failures:
log.info(f"{table :<25} Count failed: {counter_failures}")
print("time async: %.2f" % (time.time() - st))
return results
Failed attempt with Queue: (blocks at await client.service)
loop = asyncio.get_event_loop()
counter = 0
results = []
async def payload_generator(db_variant: str, order_nrs: set):
# code that generates the data for the request
yield counter, row, payload
async def service_call_worker(queue, results):
while True:
counter, row, payload = await queue.get()
results.append(await client.service.myServicename(
MessageHeader=calculate_message_header(row=row)),
myPayload=payload,
_soapheaders=[security_soap_header],
)
)
print(colorama.Fore.BLUE + f'after result returned {counter}')
# Here do the relevant processing of response or error
queue.task_done()
async def main_with_q():
n_workers = 3
queue = asyncio.Queue(n_workers)
e = pprint.pformat(queue)
p = payload_generator(DB_VARIANT, order_list_from_args())
results = []
workers = [asyncio.create_task(service_call_worker(queue, results))
for _ in range(n_workers)]
async for c in p:
await queue.put(c)
await queue.join() # wait for all tasks to be processed
for worker in workers:
worker.cancel()
if __name__ == '__main__':
try:
loop.run_until_complete(main_with_q())
loop.run_until_complete(transport.session.close())
finally:
loop.close()
I am trying to use a MPU-6000 accelerometer and Raspberry Pi Zero W to log vibration data in a windshield. I'm fairly new to Python so please bear with me.
I've written a python2 script that configures the MPU-6000 to communicate over I2C, with the clock configured to 400 kHz.
The MPU-6000 gives an interrupt when there is new data available in the accelerometer registers, which is read, converted to 2's complement and then written to a CSV file together with a timestamp. The output rate of the accelerometer is configured to be 1 kHz.
I'm experiencing that when sampling all three sensor axis the script isn't able to write all data points to the CSV file. Instead of a 1000 datapoints pr axis pr second I get approximately 650 datapoints pr axis pr second.
I've tried writing only one axis, which proved succesfull with 1000 datapoints pr second. I know that the MPU-6000 has a FIFO register available, which I probably can burst read to get 1000 samples/s without any problem. The problem will be obtaining a timestamp for each sample, so I haven't tried to implement reading from the FIFO register yet.
I will most likely do most of the post processing in Matlab, so the most important things the python script should do is to write sensor data in any form to a CSV file at the determined rate, with a timestamp.
Is there any way to further improve my Python script, so I can sample all three axis and write to a CSV file at a 1 kHz rate?
Parts of my script is depicted below:
#!/usr/bin/python
import smbus
import math
import csv
import time
import sys
import datetime
# Register addresses
power_mgmt_1 = 0x6b
power_mgmt_2 = 0x6c
samlerate_divider = 0x19
accel_config = 0x1C
INT_Enable = 0x38
def read_byte(reg):
return bus.read_byte_data(address, reg)
def read_word(reg):
h = bus.read_byte_data(address, reg)
l = bus.read_byte_data(address, reg+1)
value = (h <<8)+l
return value
def read_word_2c(reg):
val = read_word(reg)
if (val >= 0x8000):
return -((65535 - val) + 1)
else:
return val
csvwriter = None
def csv_open():
csvfile = open('accel-data.csv', 'a')
csvwriter = csv.writer(csvfile)
def csv_write(timedelta, accelerometerx, accelerometery, accelerometerz):
global csvwriter
csvwriter.writerow([timedelta, accelerometerx, accelerometery,
accelerometerz])
# I2C configs
bus = smbus.SMBus(1)
address = 0x69
#Power management configurations
bus.write_byte_data(address, power_mgmt_1, 0)
bus.write_byte_data(address, power_mgmt_2, 0x00)
#Configure sample-rate divider
bus.write_byte_data(address, 0x19, 0x07)
#Configure data ready interrupt:
bus.write_byte_data(address,INT_Enable, 0x01)
#Opening csv file and getting ready for writing
csv_open()
csv_write('Time', 'X_Axis', 'Y_Axis', 'Z_Axis')
print
print "Accelerometer"
print "---------------------"
print "Printing acccelerometer data: "
#starttime = datetime.datetime.now()
while True:
data_interrupt_read = bus.read_byte_data(address, 0x3A)
if data_interrupt_read == 1:
meas_time = datetime.datetime.now()
# delta_time = meas_time - starttime
accelerometer_xout = read_word_2c(0x3b)
accelerometer_yout = read_word_2c(0x3d)
accelerometer_zout = read_word_2c(0x3f)
# accelerometer_xout = read_word(0x3b)
# accelerometer_yout = read_word(0x3d)
# accelerometer_zout = read_word(0x3f)
# accelerometer_xout_scaled = accelerometer_xout / 16384.0
# accelerometer_yout_scaled = accelerometer_yout / 16384.0
# accelerometer_zout_scaled = accelerometer_zout / 16384.0
# csv_write(meas_time, accelerometer_xout_scaled,
accelerometer_yout_scaled, accelerometer_zout_scaled)
csv_write(meas_time, accelerometer_xout, accelerometer_yout,
accelerometer_zout)
continue
If the data you are trying to write is continuous, then the best approach is to minimise the amount of processing needed to write it and to also minimise the amount of data being written. To do this, a good approach would be to write the raw data into a binary formatted file. Each data word would then only require 2 bytes to be written. The datetime object can be converted into a timestamp which would need 4 bytes. So you would use a format such as:
[4 byte timestamp][2 byte x][2 byte y][2 byte z]
Python's struct library can be used to convert multiple variables into a single binary string which can be written to a file. The data appears to be signed, if this is the case, you could try writing the word as is, and then using the libraries built in support for signed values to read it back in later.
For example, the following could be used to write the raw data to a binary file:
#!/usr/bin/python
import smbus
import math
import csv
import time
import sys
import datetime
import struct
# Register addresses
power_mgmt_1 = 0x6b
power_mgmt_2 = 0x6c
samlerate_divider = 0x19
accel_config = 0x1C
INT_Enable = 0x38
def read_byte(reg):
return bus.read_byte_data(address, reg)
def read_word(reg):
h = bus.read_byte_data(address, reg)
l = bus.read_byte_data(address, reg+1)
value = (h <<8)+l
return value
# I2C configs
bus = smbus.SMBus(1)
address = 0x69
#Power management configurations
bus.write_byte_data(address, power_mgmt_1, 0)
bus.write_byte_data(address, power_mgmt_2, 0x00)
#Configure sample-rate divider
bus.write_byte_data(address, 0x19, 0x07)
#Configure data ready interrupt:
bus.write_byte_data(address, INT_Enable, 0x01)
print
print "Accelerometer"
print "---------------------"
print "Printing accelerometer data: "
#starttime = datetime.datetime.now()
bin_format = 'L3H'
with open('accel-data.bin', 'ab') as f_output:
while True:
#data_interrupt_read = bus.read_byte_data(address, 0x3A)
data_interrupt_read = 1
if data_interrupt_read == 1:
meas_time = datetime.datetime.now()
timestamp = time.mktime(meas_time.timetuple())
accelerometer_xout = read_word(0x3b)
accelerometer_yout = read_word(0x3d)
accelerometer_zout = read_word(0x3f)
f_output.write(struct.pack(bin_format, timestamp, accelerometer_xout, accelerometer_yout, accelerometer_zout))
Then later on, you could then convert the binary file to a CSV file using:
from datetime import datetime
import csv
import struct
bin_format = 'L3h' # Read data as signed words
entry_size = struct.calcsize(bin_format)
with open('accel-data.bin', 'rb') as f_input, open('accel-data.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['Time', 'X_Axis', 'Y_Axis', 'Z_Axis'])
while True:
bin_entry = f_input.read(entry_size)
if len(bin_entry) < entry_size:
break
entry = list(struct.unpack(bin_format, bin_entry))
entry[0] = datetime.fromtimestamp(entry[0]).strftime('%Y-%m-%d %H:%M:%S')
csv_output.writerow(entry)
If your data collection is not continuous, you could make use of threads. One thread would read your data into a special queue. Another thread could read items out of the queue onto the disk.
If it is continuous this approach will fail if the writing of data is slower than the reading of it.
Take a look at the special Format characters used to tell struct how to pack and unpack the binary data.
I've read the Google documentation and looked at their examples however have not managed to get this working correctly in my particular use case. The problem is that the packets of the audio stream are broken up into smaller chunks (frame size) base64 encoded and sent over MQTT - meaning that the generator approach is likely to stop part way through despite not being fully completed by the sender. My MicrophoneSender component will send the final part of the message with a segment_key = -1, so this is the flag that the complete message has been sent and that a full/final process of the stream can be completed. Prior to that point the buffer may not have all of the complete stream so it's difficult to get either a) the generator to stop yielding b) the google as to return a partial transcription. A partial transcription is required once every 10 or so frames.
To illustrate this better here is my code.
inside receiver:
STREAMFRAMETHRESHOLD = 10
def mqttMsgCallback(self, client, userData, msg):
if msg.topic.startswith("MicSender/stream"):
msgDict = json.loads(msg.payload)
streamBytes = b64decode(msgDict['audio_data'].encode('utf-8'))
frameNum = int(msgDict['segment_num'])
if frameNum == 0:
self.asr_time_start = time.time()
self.asr.endOfStream = False
if frameNum >= 0:
self.asr.store_stream_bytes(streamBytes)
self.asr.endOfStream = False
if frameNum % STREAMFRAMETHRESHOLD == 0:
self.asr.get_intermediate_and_print()
else:
#FINAL, recieved -1
trans = self.asr.finish_stream()
self.send_message(trans)
self.frameCount=0
inside Google Speech Class implementation:
class GoogleASR(ASR):
def __init__(self, name):
super().__init__(name)
# STREAMING
self.stream_buf = queue.Queue()
self.stream_gen = self.getGenerator(self.stream_buf)
self.endOfStream = True
self.requests = (types.StreamingRecognizeRequest(audio_content=chunk) for chunk in self.stream_gen)
self.streaming_config = types.StreamingRecognitionConfig(config=self.config)
self.current_transcript = ''
self.numCharsPrinted = 0
def getGenerator(self, buff):
while not self.endOfStream:
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = buff.get()
if chunk is None:
return
data = [chunk]
# Now consume whatever other data's still buffered.
while True:
try:
chunk = buff.get(block=False)
data.append(chunk)
except queue.Empty:
self.endOfStream = True
yield b''.join(data)
break
yield b''.join(data)
def store_stream_bytes(self, bytes):
self.stream_buf.put(bytes)
def get_intermediate_and_print(self):
self.get_intermediate()
def get_intermediate(self):
if self.stream_buf.qsize() > 1:
print("stream buf size: {}".format(self.stream_buf.qsize()))
responses = self.client.streaming_recognize(self.streaming_config, self.requests)
# print(responses)
try:
# Now, put the transcription responses to use.
if not self.numCharsPrinted:
self.numCharsPrinted = 0
for response in responses:
if not response.results:
continue
# The `results` list is consecutive. For streaming, we only care about
# the first result being considered, since once it's `is_final`, it
# moves on to considering the next utterance.
result = response.results[0]
if not result.alternatives:
continue
# Display the transcription of the top alternative.
self.current_transcript = result.alternatives[0].transcript
# Display interim results, but with a carriage return at the end of the
# line, so subsequent lines will overwrite them.
#
# If the previous result was longer than this one, we need to print
# some extra spaces to overwrite the previous result
overwrite_chars = ' ' * (self.numCharsPrinted - len(self.current_transcript))
sys.stdout.write(self.current_transcript + overwrite_chars + '\r')
sys.stdout.flush()
self.numCharsPrinted = len(self.current_transcript)
def finish_stream(self):
self.endOfStream = False
self.get_intermediate()
self.endOfStream = True
final_result = self.current_transcript
self.stream_buf= queue.Queue()
self.allBytes = bytearray()
self.current_transcript = ''
self.requests = (types.StreamingRecognizeRequest(audio_content=chunk) for chunk in self.stream_gen)
self.streaming_config = types.StreamingRecognitionConfig(config=self.config)
return final_result
Currently what this does is output nothing from the transcriptions side.
stream buf size: 21
stream buf size: 41
stream buf size: 61
stream buf size: 81
stream buf size: 101
stream buf size: 121
stream buf size: 141
stream buf size: 159
But the response/transcript is empty. If I put a breakpoint on the for response in responses inside the get_intermediate function then it never runs which means that for some reason it's empty (not retuned from Google). However, if I put a breakpoint on the generator and take too long (> 5 seconds) to continue to yield the data, it (Google) tells me that the data is probably being sent to the server too slow. google.api_core.exceptions.OutOfRange: 400 Audio data is being streamed too slow. Please stream audio data approximately at real time.
Maybe someone can spot the obvious here...
The way you have organized your code, the generator you give to the Google API is initialized exactly once - on line 10, using a generator expression: self.requests = (...). As constructed, this generator will also run exactly once and become 'exhausted'. Same applies to the generator function that the (for ...) generator itself calls (self.getGeneerator()). It will run once only and stop when it retrieved 10 chunks of data (which are very small, from what I can see). Then, the outer generator (what you assigned to self.requests) will also stop forever - giving the ASR only a short bit of data (10 times 20 bytes, looking at the printed debug output). There's nothing recognizable in that, most likely.
BTW, note you have a redundant yield b''.join(data) in your function, the data will be sent twice.
You will need to redo the (outer) generator so it does not return until all data is received. If you want to use another generator as you do to gather each bigger chunk for the 'outer' generator from which the Google API is reading, you will need to re-make it every time you begin a new loop with it.