this is my first time asking something here, so I hope I am asking the following question the "correct way". If not, please let me know, and I will give more information.
I am using one Python script, to read and write 4000Hz of serial data to a CSV file.
The structure of the CSV file is as follows: (this example shows the beginning of the file)
Time of mSure Calibration: 24.10.2020 20:03:14.462654
Calibration Data - AICC: 833.95; AICERT: 2109; AVCC: 0.00; AVCERT: 0
Sampling Frequency: 4000Hz
timestamp,instantaneousCurrentValue,instantaneousVoltageValue,activePowerValueCalculated,activePowerValue
24.10.2020 20:03:16.495828,-0.00032,7e-05,-0.0,0.0
24.10.2020 20:03:16.496078,0.001424,7e-05,0.0,0.0
24.10.2020 20:03:16.496328,9.6e-05,7e-05,0.0,0.0
24.10.2020 20:03:16.496578,-0.000912,7e-05,-0.0,0.0
Data will be written to this CSV as long as the script reading serial data is active. Thus, this might become a huge file at some time. (Data is written in chunks of 8000 rows = every two seconds)
Here is my problem: I want to plot this data live. For example, update the plot each time data is written to the CSV file. The plotting shall be done from another script than the script reading and writing the serial data.
What is working: 1. Creating the CSV file. 2. Plotting a finished CSV file using another script - actually pretty well :-)
I have this script for plotting:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""Data Computation Software for TeensyDAQ - Reads and computes CSV-File"""
# region imports
import getopt
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import pathlib
from scipy.signal import argrelextrema
import sys
# endregion
# region globals
inputfile = ''
outputfile = ''
# endregion
# region functions
def main(argv):
"""Main application"""
# region define variables
global inputfile
global outputfile
inputfile = str(pathlib.Path(__file__).parent.absolute(
).resolve())+"\\noFilenameProvided.csv"
outputfile = str(pathlib.Path(__file__).parent.absolute(
).resolve())+"\\noFilenameProvidedOut.csv"
# endregion
# region read system arguments
try:
opts, args = getopt.getopt(
argv, "hi:o:", ["infile=", "outfile="])
except getopt.GetoptError:
print('dataComputation.py -i <inputfile> -o <outputfile>')
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print('dataComputation.py -i <inputfile> -o <outputfile>')
sys.exit()
elif opt in ("-i", "--infile"):
inputfile = str(pathlib.Path(
__file__).parent.absolute().resolve())+"\\"+arg
elif opt in ("-o", "--outfile"):
outputfile = str(pathlib.Path(
__file__).parent.absolute().resolve())+"\\"+arg
# endregion
# region read csv
colTypes = {'timestamp': 'str',
'instantaneousCurrent': 'float',
'instantaneousVoltage': 'float',
'activePowerCalculated': 'float',
'activePower': 'float',
'apparentPower': 'float',
'fundReactivePower': 'float'
}
cols = list(colTypes.keys())
df = pd.read_csv(inputfile, usecols=cols, dtype=colTypes,
parse_dates=True, dayfirst=True, skiprows=3)
df['timestamp'] = pd.to_datetime(
df['timestamp'], utc=True, format='%d.%m.%Y %H:%M:%S.%f')
df.insert(loc=0, column='tick', value=np.arange(len(df)))
# endregion
# region plot data
fig, axes = plt.subplots(nrows=6, ncols=1, sharex=True, figsize=(16,8))
fig.canvas.set_window_title(df['timestamp'].iloc[0])
fig.align_ylabels(axes[0:5])
df['instantaneousCurrent'].plot(ax=axes[0], color='red'); axes[0].set_title('Momentanstrom'); axes[0].set_ylabel('A',rotation=0)
df['instantaneousVoltage'].plot(ax=axes[1], color='blue'); axes[1].set_title('Momentanspannung'); axes[1].set_ylabel('V',rotation=0)
df['activePowerCalculated'].plot(ax=axes[2], color='green'); axes[2].set_title('Momentanleistung ungefiltert'); axes[2].set_ylabel('W',rotation=0)
df['activePower'].plot(ax=axes[3], color='brown'); axes[3].set_title('Momentanleistung'); axes[3].set_ylabel('W',rotation=0)
df['apparentPower'].plot(ax=axes[4], color='brown'); axes[4].set_title('Scheinleistung'); axes[4].set_ylabel('VA',rotation=0)
df['fundReactivePower'].plot(ax=axes[5], color='brown'); axes[5].set_title('Blindleitsung'); axes[5].set_ylabel('VAr',rotation=0); axes[5].set_xlabel('microseconds since start')
plt.tight_layout()
plt.show()
# endregion
# endregion
if __name__ == "__main__":
main(sys.argv[1:])
My thoughts on how to solve my problem:
Modify my plotting script to continuously read the CSV file and plot using the animation function of matplotlib.
Using some sort of streaming functionality to read the CSV in a stream. I have read about the streamz library, but I have no idea how I could use it.
Any help is highly appreciated!
Kind regards,
Sascha
EDIT 31.10.2020:
Since I am not aware of the mean duration, how long to wait for help, I try to add more input, which maybe leads to helpful comments.
I wrote this script to write data continuously to a CSV file, which emulates my real script without the need for external hardware: (Random data is produced and CSV-formatted using a timer. Each time there are 50 new rows, the data is written to a CSV file)
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import csv
from random import randrange
import time
import threading
import pathlib
from datetime import datetime, timedelta
datarows = list()
datarowsToWrite = list()
outputfile = str(pathlib.Path(__file__).parent.absolute().resolve()) + "\\noFilenameProvided.csv"
sampleCount = 0
def startBatchWriteThread():
global outputfile
global datarows
global datarowsToWrite
datarowsToWrite.clear()
datarowsToWrite = datarows[:]
datarows.clear()
thread = threading.Thread(target=batchWriteData,args=(outputfile, datarowsToWrite))
thread.start()
def batchWriteData(file, data):
print("Items to write: " + str(len(data)))
with open(file, 'a+') as f:
for item in data:
f.write("%s\n" % item)
def generateDatarows():
global sampleCount
timer1 = threading.Timer(0.001, generateDatarows)
timer1.daemon = True
timer1.start()
datarow = datetime.now().strftime("%d.%m.%Y %H:%M:%S.%f")[:] + "," + str(randrange(10)) + "," + str(randrange(10)) + "," + str(randrange(10)) + "," + str(randrange(10)) + "," + str(randrange(10)) + "," + str(randrange(10))
datarows.append(datarow)
sampleCount += 1
try:
datarows.append("row 1")
datarows.append("row 2")
datarows.append("row 3")
datarows.append("timestamp,instantaneousCurrent,instantaneousVoltage,activePowerCalculated,activePower,apparentPower,fundReactivePower")
startBatchWriteThread()
generateDatarows()
while True:
if len(datarows) == 50:
startBatchWriteThread()
except KeyboardInterrupt:
print("Shutting down, writing the rest of the buffer.")
batchWriteData(outputfile, datarows)
print("Done, writing " + outputfile)
The script from my initial post can then plot the data from the CSV file.
I need to plot the data as it is written to the CSV file to see the data more or less live.
Hope this makes my problem more understandable.
For the Googlers: I could not find a way to achieve my goal as described in the question.
However, if you are trying to plot live data, coming with high speed over serial comms (4000Hz in my case), I recommend designing your application as a single program with multiple processes.
The problem in my special case was, that when I tried to plot and compute the incoming data simultaneously in the same thread/task/process/whatever, my serial receive rate went down to 100Hz instead of 4kHz. The solution with multiprocessing and passing data using the quick_queue module between the processes I could resolve the problem.
I ended up, having a program, which receives data from a Teensy via serial communication at 4kHz, this incoming data was buffered to blocks of 4000 samples and then the data was pushed to the plotting process and additionally, the block was written to a CSV-file in a separate Thread.
Best,
S
Related
I'm continuosly getting readings from an ADC in Python, but during the process of writing it to a file, I lose some samples because there is some small delay. Is there a way I could avoid losing these samples (I'm sampling at 100Hz)?
I'm using multithreading, but in the process of writing and cleaning the list used to write the data to a file, I always lose some samples. The code is copied here as I have written it and all advice is welcome.
Thanks in advance.
import threading
import time
from random import randint
import os
from datetime import datetime
import ADS1256
import RPi.GPIO as GPIO
import sys
import os
import csv
ADC=ADS1256.ADS1256()
ADC.ADS1256_init()
value_list=[]
#adc_reading function reads adc values and writes a list continuously.
def adc_reading():
global value_list
value_list=[]
while True:
adc_value=ADC.ADS1256_GetAll()
timestamp=time.time()
x=adc_value[1]
y=adc_value[2]
z=adc_value[3]
value_list.append([timestamp,x,y,z])
#function to create a new file every 60 seconds with the values gathered in adc_reading()
def cronometro():
global value_list
while True:
contador=60
inicio=time.time()
diferencia=0
while diferencia<=contador:
diferencia=time.time()-inicio
write_to_file(value_list)
#write_to_file() function writes the values gathered in adc_reading() to a file every 60 seconds.
def write_to_file(lista):
nombre_archivo=str(int(time.time()))+".finish"
with open(nombre_archivo, 'w') as f:
# using csv.writer method from CSV package
write = csv.writer(f)
write.writerows(lista)
value_list=[]
escritor = threading.Thread(target=adc_reading)
temporizador = threading.Thread(target=cronometro)
escritor.start()
temporizador.start()
At a 100Hz, I have to wonder if the write operation really takes longer than 10ms. You could probably do both operations in the same loop and just collect data in a buffer and write it (about 6000 values) once every 60 seconds without incurring more than a few milliseconds delay:
import time
import ADS1256
import csv
ADC = ADS1256.ADS1256()
ADC.ADS1256_init()
def adc_reading():
buffer = []
contador = 60
while True:
check = inicio = time.time()
while check - inicio <= contador:
adc_value = ADC.ADS1256_GetAll()
buffer.append([(check := time.time()), *adc_value[1:4]])
nombre_archivo = str(int(check)) + ".finish"
with open(nombre_archivo, 'w') as f:
write = csv.writer(f)
write.writerows(buffer)
buffer = []
if __name__ == '__main__':
adc_reading()
If you do need them to run in parallel (slow computer, other circumstances), you shouldn't use threads, but processes from multiprocessing.
The two threads won't run in parallel, they will alternate. You could run the data collection in a separate process and collect data from that from the main process.
Here's an example of doing this with some toy code, I think it's easy to see how to adjust for your case:
from multiprocessing import SimpleQueue, Process
from random import randint
from time import sleep, time
def generate_signals(q: SimpleQueue):
c = 0
while True:
sleep(0.01) # about 100 Hz
q.put((c, randint(1, 42)))
c += 1
def write_signals(q: SimpleQueue):
delay = 3 # 3 seconds for demo, 60 works as well
while True:
start = time()
while (check := time()) - start < delay:
sleep(.1)
values = []
while not q.empty():
values.append(str(q.get()))
with open(f'{str(int(check))}.finish', 'w') as f:
f.write('\n'.join(values))
if __name__ == "__main__":
q = SimpleQueue()
generator = Process(target=generate_signals, args=((q),))
generator.start()
writer = Process(target=write_signals, args=((q),))
writer.start()
writer.join(timeout=10) # run for no more than 10 seconds, enough for demo
writer.kill()
generator.join(timeout=0)
generator.kill()
Edit: added a counter, to show that no values are missed.
I have created a program which is reading the sensor data. As per my usecase i have to store the readings in a text file every 5 seconds with new file names example: file1, file2, file3 and so on until i stop the program using keyboard. Can anyone guide me to accomplish this task?
import sys
import time
import board
import digitalio
import busio
import csv
import adafruit_lis3dh
from datetime import datetime, timezone
i2c = busio.I2C(board.SCL, board.SDA)
int1 = digitalio.DigitalInOut(board.D6) # Set this to the correct pin for the interrupt!
lis3dh = adafruit_lis3dh.LIS3DH_I2C(i2c, int1=int1)
lis3dh.range = adafruit_lis3dh.RANGE_2_G
with open("/media/pi/D427-7B2E/test.txt", 'w') as f:
sys.stdout = f
while True:
ti = int(datetime.now(tz=timezone.utc).timestamp() * 1000)
x, y, z = lis3dh.acceleration
print('{}, {}, {}, {}'.format(ti,x / 9.806, y / 9.806, z / 9.806))
time.sleep(0.001)
pass
can you try logging?
have a look on TimedRotatingFileHandler may be this could help
Can you look into threading.Timer as recursive and set timer as you wish and call the function at program start there you can either update the file name in array/dict so that it will be reflected to rest of program or you can write the contents of the array into file and empty the array.
def write_it():
threading.Timer(300, write_it).start()
write_it()
i solved it using RotatingFileHandler which is a python logging class. Our scope changed from 5 second to one hour so what i did i ran the program for 1 hour and the i got a file which was of 135 mb and this i converted to bytes and used those bytes in the maxBytes argument. So what happens now in this code is that once the file size reaches to bytes mentioned in the code it creates another file and starts storing the values in it and it continues doing same for 5 files. i hope this helps to other people having similar kind of usecase.
logger = logging.getLogger("Rotating Log")
logger.setLevel(logging.INFO)
handler = RotatingFileHandler('/home/pi/VibrationSensor/test.txt', maxBytes=20, backupCount=5)
logger.addHandler(handler)
while True:
ti = int(datetime.now(tz=timezone.utc).timestamp() * 1000)
x, y, z = lis3dh.acceleration
logger.info('{}, {}, {}, {}'.format(ti,x / 9.806, y / 9.806, z / 9.806))
time.sleep(0.001)
pass
i have a acceleration sensor which continuously outputs reading in 400 Hz ( like [0.21511 0.1451 0.2122] ). I want to store them and post process them. Now im able to store the first entry of the reading not all.
How to make it happen.
thanks
from altimu10v5.lsm6ds33 import LSM6DS33
from time import sleep
import numpy as np
lsm6ds33 = LSM6DS33()
lsm6ds33.enable()
accel=lsm6ds33.get_accelerometer_g_forces()
while True:
DataOut = np.column_stack(accel)
np.savetxt('output.dat',np.expand_dims(accel, axis=0), fmt='%2.2f %2.2f %2.2f')
sleep(1)
ยด
Actual problem is, You are calling get_accelerometer_g_forces() only once.
Just move it inside While looop
Updated:
while True:
accel=lsm6ds33.get_accelerometer_g_forces()
f=open('output.dat','ab')
DataOut = np.column_stack(accel)
np.savetxt(f,np.expand_dims(accel, axis=0), fmt='%2.2f %2.2f %2.2f')
sleep(1)
Here is a reference :How to write a numpy array to a csv file?
Make sure that reading the data is enclosed within to loop!
You don't need numpy here yet:
while True:
with open("output.dat", "w") as f:
f.write("%.5f, %.5f, %.5f" % tuple(accelerometer_g_forces()))
Note that there is no condition to stop outputting the data.
My program first clusters a big dataset in 100 clusters, then run a model on each cluster of the dataset using multiprocessing. My goal is to concatenate all the output values in one big csv file which is the concatenation of all output datas from the 100 fitted models.
For now, I am just creating 100 csv files, then loop on the folder containing these files and copying them one by one and line by line in a big file.
My question: is there a smarter method to get this big output file without exporting 100 files. I use pandas and scikit-learn for data processing, and multiprocessing for parallelization.
have your processing threads return the dataset to the main process rather than writing the csv files themselves, then as they give data back to your main process, have it write them to one continuous csv.
from multiprocessing import Process, Manager
def worker_func(proc_id,results):
# Do your thing
results[proc_id] = ["your dataset from %s" % proc_id]
def convert_dataset_to_csv(dataset):
# Placeholder example. I realize what its doing is ridiculous
converted_dataset = [ ','.join(data.split()) for data in dataset]
return converted_dataset
m = Manager()
d_results= m.dict()
worker_count = 100
jobs = [Process(target=worker_func,
args=(proc_id,d_results))
for proc_id in range(worker_count)]
for j in jobs:
j.start()
for j in jobs:
j.join()
with open('somecsv.csv','w') as f:
for d in d_results.values():
# if the actual conversion function benefits from multiprocessing,
# you can do that there too instead of here
for r in convert_dataset_to_csv(d):
f.write(r + '\n')
If all of your partial csv files have no headers and share column number and order, you can concatenate them like this:
with open("unified.csv", "w") as unified_csv_file:
for partial_csv_name in partial_csv_names:
with open(partial_csv_name) as partial_csv_file:
unified_csv_file.write(partial_csv_file.read())
Pinched the guts of this from http://computer-programming-forum.com/56-python/b7650ebd401d958c.htm it's a gem.
#!/usr/bin/python
# -*- coding: utf-8 -*-
from glob import glob
n=1
file_list = glob('/home/rolf/*.csv')
concat_file = open('concatenated.csv','w')
files = map(lambda f: open(f, 'r').read, file_list)
print "There are {x} files to be concatenated".format(x=len(files))
for f in files:
print "files added {n}".format(n=n)
concat_file.write(f())
n+=1
concat_file.close()
There are several similar questions but none of them answers this simple question directly:
How can i catch a commands output and stream that content into numpy arrays without creating a temporary string object to read from?
So, what I would like to do is this:
import subprocess
import numpy
import StringIO
def parse_header(fileobject):
# this function moves the filepointer and returns a dictionary
d = do_some_parsing(fileobject)
return d
sio = StringIO.StringIO(subprocess.check_output(cmd))
d = parse_header(sio)
# now the file pointer is at the start of data, parse_header takes care of that.
# ALL of the data is now available in the next line of sio
dt = numpy.dtype([(key, 'f8') for key in d.keys()])
# i don't know how do make this work:
data = numpy.fromxxxx(sio , dt)
# if i would do this, I create another copy besides the StringIO object, don't I?
# so this works, but isn't this 'bad' ?
datastring = sio.read()
data = numpy.fromstring(datastring, dtype=dt)
I tried it with StringIO and cStringIO but both are not accepted by numpy.frombuffer and numpy.fromfile.
Using StringIO object I first have to read the stream into a string and then use numpy.fromstring, but I would like to avoid creating the intermediate object (several Gigabytes).
An alternative for me would be if I can stream sys.stdin into numpy arrays, but that does not work with numpy.fromfile either (seek needs to be implemented).
Are there any work-arounds for this? I can't be the first one trying this (unless this is a PEBKAC case?)
Solution:
This is the current solution, it's a mix of unutbu's instruction how to use the Popen with PIPE and the hint of eryksun to use bytearray, so I don't know who to accept!? :S
proc = sp.Popen(cmd, stdout = sp.PIPE, shell=True)
d = parse_des_header(proc.stdout)
rec_dtype = np.dtype([(key,'f8') for key in d.keys()])
data = bytearray(proc.stdout.read())
ndata = np.frombuffer(data, dtype = rec_dtype)
I didn't check if the data is really not creating another copy, don't know how. But what I noticed that this works much faster than everything I tried before, so many thanks to both the answers' authors!
Update 2022:
I just tried above solution steps without the bytearray() step and it just works fine. Thanks to Python 3 I guess?
You can use Popen with stdout=subprocess.PIPE. Read in the header, then load the rest into a bytearray to use with np.frombuffer.
Additional comments based on your edit:
If you're going to call proc.stdout.read(), it's equivalent to using check_output(). Both create a temporary string. If you preallocate data, you could use proc.stdout.readinto(data). Then if the number of bytes read into data is less than len(data), free the excess memory, else extend data by whatever is left to be read.
data = bytearray(2**32) # 4 GiB
n = proc.stdout.readinto(data)
if n < len(data):
data[n:] = ''
else:
data += proc.stdout.read()
You could also come at this starting with a pre-allocated ndarray ndata and use buf = np.getbuffer(ndata). Then readinto(buf) as above.
Here's an example to show that the memory is shared between the bytearray and the np.ndarray:
>>> data = bytearray('\x01')
>>> ndata = np.frombuffer(data, np.int8)
>>> ndata
array([1], dtype=int8)
>>> ndata[0] = 2
>>> data
bytearray(b'\x02')
Since your data can easily fit in RAM, I think the easiest way to load the data into a numpy array is to use a ramfs.
On Linux,
sudo mkdir /mnt/ramfs
sudo mount -t ramfs -o size=5G ramfs /mnt/ramfs
sudo chmod 777 /mnt/ramfs
Then, for example, if this is the producer of the binary data:
writer.py:
from __future__ import print_function
import random
import struct
N = random.randrange(100)
print('a b')
for i in range(2*N):
print(struct.pack('<d',random.random()), end = '')
Then you could load it into a numpy array like this:
reader.py:
import subprocess
import numpy
def parse_header(f):
# this function moves the filepointer and returns a dictionary
header = f.readline()
d = dict.fromkeys(header.split())
return d
filename = '/mnt/ramfs/data.out'
with open(filename, 'w') as f:
cmd = 'writer.py'
proc = subprocess.Popen([cmd], stdout = f)
proc.communicate()
with open(filename, 'r') as f:
header = parse_header(f)
dt = numpy.dtype([(key, 'f8') for key in header.keys()])
data = numpy.fromfile(f, dt)