I am currently using Python ver 3.9.7
So I have a lot of serial port data continuously incoming. The data I have is incoming as dictionaries appending to a list. Each element is a dictionary, as the first part is an integer, and the rest of a string. I then have subcategorised each string, and am saving it to an excel spreadsheet.
I need to prevent duplicates appending to my list. Below is my code trying to do this, however when I view the Excel log being created, I am often seeing sometimes 50k rows of the same data repeatedly.
I was able to successfully prevent duplicate excel rows with reading from a text file with my approaches, but can't seem to find a solution for the continuous incoming data.
The output I would like is unique values only for each element appending to my list, and appearing on my excel file.
Code below:
import serial
import xlsxwriter
#ser-prt-connection
ser = serial.Serial(port='COM2',baudrate=9600)
#separate X (int) and Y (string)
regex = '(?:.*\)?X=(?P<X>-?\d+)\sY=(?P<Y>.*)'
extracted_vals = [] #list to be appended
less_vals = [] #want list with no duplicates
row = 0
workbook = xlsxwriter.Workbook('Serial_port_data.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(row, 0, 'X')
worksheet.write(row, 1, 'Ya')
worksheet.write(row, 2, 'Yb')
worksheet.write(row, 3, 'Yc')
while True:
for line in ser.readlines():
signal = str(line)
for part in parts:
m = re.match(regex,parts)
if m is None:
continue
X,Y = m.groupdict(),values()
# each element appending to list (incl. duplicates)
item = dict(X=int(X),Y=str(Y.lower())
extracted_vals.append(item)
for i in range(0, len(extracted_vals):
i+=0
X_val =extracted_vals[i].setdefault('X')
Key = extracted_vals[i].keys()
data = extracted_vals[i].setdefault('Y')
for val in extracted_vals:
if val not in less_vals:
less_vals.append(val)
for j in range(0,len(less_vals)):
j+=0
X_val = less_vals[j].setdefault('X')
less_data = less_vals[j].setdefault('Y')
#separate Y part into substrings
Ya = less_data[0:3]
Yb = less_data[3:6]
Yc = less_data[6:]
less = X_val,Ya,Yb,Yc
#to check for no duplicates in ouput, compared with raw data
print(signal) #raw incoming data continuously
print(less) #want list, no duplicates
# write to excel file, column for X value
# 3 more columns for Ya, Yb, Yc each
if True:
row = row+1
worksheet.write(row,0,X_val)
worksheet.write(row,1,Ya)
worksheet.write(row,2,Yb)
worksheet.write(row,3,Yc)
#Chosen no. rows to write to file
if row == 10:
workbook.close()
serial.close()
Example of what a line of raw data 'signal' looks like:
X=-10Y=yyAyyBthisisYc
Example of what list 'less' is looking like for one line of raw data:
(-10, 'yyA','yyB', 'thisisYc')
#(repeats in simalar fashion for subsequent lines)
#each part has its own row in excel file
#-10 is X value
My main issue is that sometimes the data being printed is unique, but the excel file has many duplicates.
My other issue is that sometimes the data is printed as every second duplicate: like 1,2,1,2,1,2,1,2
and the same is being saved to the Excel file.
I have only been programming a few weeks now, so any advice at all in general is welcome
Your program has a real lot of problems. I just tried to show you how you can improve it. Bear in mind that I did not test it.
import serial
import xlsxwriter
#ser-prt-connection
ser = serial.Serial(port='COM2',baudrate=9600)
#separate X (int) and Y (string)
regex = '(?:.*\)?X=(?P<X>-?\d+)\sY=(?P<Y>.*)'
# First problem:
# maybe you don't need these lists. Perhaps, you could use a set and a list. Let's do it:
# extracted_vals = [] #list to be appended
extracted_vals = set()
less_vals = [] #want list with no duplicates
row = 0 # this will be used at the end of the program.
MAXROWS = 10 # Given that you want to limit the number of rows, I'd write that limit where it can be seen and changed easily.
#
workbook = xlsxwriter.Workbook('Serial_port_data.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(row, 0, 'X')
worksheet.write(row, 1, 'Ya')
worksheet.write(row, 2, 'Yb')
worksheet.write(row, 3, 'Yc')
while True: # Second problem: this loop lacks a way to terminate!
# When you write 'while True:', you also MUST write 'break' or 'return' somewhere inside the loop! I'll take care of it.
for line in ser.readlines():
signal = str(line)
#
# Third problem: where is this 'parts' defined? The following line will throw an error! Maybe you meant "for part in signal"?
for part in signal: # was 'parts'
m = re.match(regex,signal) # was 'parts'
if m is None:
continue
X,Y = m.groupdict(),values()
#
# each element appending to list (incl. duplicates)
item = {(int(X), Y.lower())} # you forgot the closing parens, this is the Fourth problem. But this also counts as Fifth:
# you cannot create a dictionary as "dict(X=int(X),Y=str(Y.lower())"! It's a syntax error.
#
# Now that extracted_vals is a set, you can write:
extracted_vals.add(item)
# instead of 'extracted_vals.append(item)'
# At this point, you have no duplicates: sets automatically ignore them!
less_vals = list(extracted_vals) # And now you have a non-duplicated list.
# Sixth problem: all the following lines were included in the "for line in ser.readlines():" loop, you forgot to de-dent them.
# I've done it for you. Written the way you wrote them, for each line read from serial you also repeated all of the following!
# That might be the main reason for creating many duplicate rows.
for i in range(0, len(less_vals) ): # I added the last closing parens, this was the Fourth problem again
# Seventh problem: if you need to process all items in a sequence, you should use
# "for item in sequence:" not "for i in range(len(sequence)):" Moreover, the
# range() function does not need a starting value, if you need it to start from 0.
i+=0 # Eighth problem: this line does nothing. Why did you write it?
X_val = less_vals[i].setdefault('X') # this will put an item {'X': None} if missing in the dictionary extracted_vals[i]
# and assigns the value of less_vals[i]["X"] (possibly None) to X_val.
#Are you sure that you need an {'X': None} item?
# You could have written: X_val = less_vals[i]["X"] or, much better:
# for elem in less_vals:
# X_val, data = elem["X"], elem["Y"]
Key = less_vals[i].keys() # this assigns to 'Key' a dictionary view object, that you will never use. Ninth problem.
data = less_vals[i].setdefault('Y') # this will put an item {'Y': None} if missing in the dictionary less_vals[i]
# and assigns the value of less_vals[i]["Y"] (possibly None) to data.
#Are you sure that you need an {'Y': None} item?
# this loop is completely useless.
# for val in less_vals: # Sixth problem, again: this loop was included in the "for i in range(0, len(less_vals) ):" loop. I de-dented it.
# if val not in less_vals:
# less_vals.append(val)
# I noticed that I could put all following lines in the previous loop, without re-reading all less_vals.
#
# for j in range(0,len(less_vals)): # Sixth problem, again: this loop too was included in the "for i in range(0, len(less_vals) ):" loop. I de-dented it.
# j+=0 # Seventh problem, again: this line does nothing. Why did you write it?
#
# so, now we are continuing the loop on less_vals:
# X_val = less_vals[j].setdefault('X') we already have it
# less_data = less_vals[j].setdefault('Y') instead of 'less_data', we use 'data' that we already have
#separate Y part into substrings
# Remember that data can be None? Tenth problem, you should have dealt with this. I'll take care of it:
if data is None:
Ya = Yb = Yc = ""
else:
Ya = data[0:3]
Yb = data[3:6]
Yc = data[6:]
less = X_val,Ya,Yb,Yc
#to check for no duplicates in ouput, compared with raw data
print(signal) # it may contain only the last line read from serial, if the raw data contained more than 1 line
print(less) # I don't think this check is useful, but if you like it...
# write to excel file, column for X value
# 3 more columns for Ya, Yb, Yc each
# if True: Eleventh problem: this line does nothing
row = row+1
worksheet.write(row,0,X_val)
worksheet.write(row,1,Ya)
worksheet.write(row,2,Yb)
worksheet.write(row,3,Yc)
#
if row >= MAXROWS:
break # this exits the for line in ser.readlines(): loop
if row >= MAXROWS:
workbook.close()
break # this exits the while True: loop, solving the Second problem.
serial.close()
This is a refined version of your program; I tried to follow my own advices. I have no serial communication, (and I have no time to mock one) so I didn't test it properly.
# A bit refined, albeit untested.
import serial
import xlsxwriter
ser = serial.Serial(port='COM2',baudrate=9600)
#separate X (int) and Y (string)
regex = '(?:.*\)?X=(?P<X>-?\d+)\sY=(?P<Y>.*)'
MAXROWS = 50
row = 0 # this will be used at the end of the program.
#
workbook = xlsxwriter.Workbook('Serial_port_data.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(row, 0, 'X')
worksheet.write(row, 1, 'Ya')
worksheet.write(row, 2, 'Yb')
worksheet.write(row, 3, 'Yc')
while True:
extracted_vals = set()
for line in ser.readlines():
m = re.match(regex,line)
if m is None:
continue
values = m.groupdict()
extracted_vals.add((int(values['X']), values['Y'].lower())) # each element in extracted_vals will be a tuple (integer, string)
#
for this in extracted_vals:
X_val = this[0]
data = this[1]
Ya = data[0:3]
Yb = data[3:6]
Yc = data[6:]
row = row+1
worksheet.write(row,0,X_val)
worksheet.write(row,1,Ya)
worksheet.write(row,2,Yb)
worksheet.write(row,3,Yc)
#
if row >= MAXROWS:
break
if row >= MAXROWS:
workbook.close()
break
serial.close()
I have a device that stores three data sets in a .DAT file, they always have the same heading and number of columns, but the number of rows vary.
They are (n x 4), (m x 4), (L x 3).
I need to extract the three data sets into seperate arrays for plotting.
I have been trying to use numpy.genfromtxt and numpy.loadtxt, but the only way I can get them to work for this format is to manually define the row which each data set starts.
As I will regularly need to deal with this format I have been trying to automate it.
If someone could suggest a method which might work I would greatly appreciate it. I have attached an example file.
example file
Just a quck and dirty solution. At your file size, you might run into performance issues. If you know m, n and L, initialize the output vectors with the respective length.
here is the strategy: Load the whole File in a variable. Read the variable line by line. As soon as you discover a keyword, raise a flag that you are in the specific block. In the next line, read out the line to the correct variables.
isblock1 = isblock2 = isblock3 = False
fout = [] # construct also all the other variables that you want to collect.
with open(file, 'r') as file:
lines = file.readlines() #read all the lines
for line in lines:
if isblock1:
(f, psd, ipj, itj) = line.split()
fout.append(f) #do this also with the other variables
if isblock2:
(t1, p1, p2, p12) = line.split()
if isblock3:
(t2, v1, v2) = line.split()
if 'Frequency' is in line:
isblock1 = True
isblock2 = isblock3 = False
if 'Phasor' is in line:
isblock2 = True
isblock1 = isblock3 = False
if 'Voltage' is in line:
isblock3 = True
isblock1 = isblock2 = False
Hope that helps.
I am extracting 150 different cell values from 350,000 (20kb) ascii raster files. My current code is fine for processing the 150 cell values from 100's of the ascii files, however it is very slow when running on the full data set.
I am still learning python so are there any obvious inefficiencies? or suggestions to improve the below code.
I have tried closing the 'dat' file in the 2nd function; no improvement.
dat = None
First: I have a function which returns the row and column locations from a cartesian grid.
def world2Pixel(gt, x, y):
ulX = gt[0]
ulY = gt[3]
xDist = gt[1]
yDist = gt[5]
rtnX = gt[2]
rtnY = gt[4]
pixel = int((x - ulX) / xDist)
line = int((ulY - y) / xDist)
return (pixel, line)
Second: A function to which I pass lists of 150 'id','x' and 'y' values in a for loop. The first function is called within and used to extract the cell value which is appended to a new list. I also have a list of files 'asc_list' and corresponding times in 'date_list'. Please ignore count / enumerate as I use this later; unless it is impeding efficiency.
def asc2series(id, x, y):
#count = 1
ls_id = []
ls_p = []
ls_d = []
for n, (asc,date) in enumerate(zip(asc, date_list)):
dat = gdal.Open(asc_list)
gt = dat.GetGeoTransform()
pixel, line = world2Pixel(gt, east, nort)
band = dat.GetRasterBand(1)
#dat = None
value = band.ReadAsArray(pixel, line, 1, 1)[0, 0]
ls_id.append(id)
ls_p.append(value)
ls_d.append(date)
Many thanks
In world2pixel you are setting rtnX and rtnY which you don't use.
You probably meant gdal.Open(asc) -- not asc_list.
You could move gt = dat.GetGeoTransform() out of the loop. (Rereading made me realize you can't really.)
You could cache calls to world2Pixel.
You're opening dat file for each pixel -- you should probably turn the logic around to only open files once and lookup all the pixels mapped to this file.
Benchmark, check the links in this podcast to see how: http://talkpython.fm/episodes/show/28/making-python-fast-profiling-python-code
I am beginner in python (also in programming)I have a larg file containing repeating 3 lines with numbers 1 empty line and again...
if I print the file it looks like:
1.93202838
1.81608154
1.50676177
2.35787777
1.51866227
1.19643624
...
I want to take each three numbers - so that it is one vector, make some math operations with them and write them back to a new file and move to another three lines - to another vector.so here is my code (doesnt work):
import math
inF = open("data.txt", "r+")
outF = open("blabla.txt", "w")
a = []
fin = []
b = []
for line in inF:
a.append(line)
if line.startswith(" \n"):
fin.append(b)
h1 = float(fin[0])
k2 = float(fin[1])
l3 = float(fin[2])
h = h1/(math.sqrt(h1*h1+k1*k1+l1*l1)+1)
k = k1/(math.sqrt(h1*h1+k1*k1+l1*l1)+1)
l = l1/(math.sqrt(h1*h1+k1*k1+l1*l1)+1)
vector = [str(h), str(k), str(l)]
outF.write('\n'.join(vector)
b = a
a = []
inF.close()
outF.close()
print "done!"
I want to get "vector" from each 3 lines in my file and put it into blabla.txt output file. Thanks a lot!
My 'code comment' answer:
take care to close all parenthesis, in order to match the opened ones! (this is very likely to raise SyntaxError ;-) )
fin is created as an empty list, and is never filled. Trying to call any value by fin[n] is therefore very likely to break with an IndexError;
k2 and l3 are created but never used;
k1 and l1 are not created but used, this is very likely to break with a NameError;
b is created as a copy of a, so is a list. But you do a fin.append(b): what do you expect in this case by appending (not extending) a list?
Hope this helps!
This is only in the answers section for length and formatting.
Input and output.
Control flow
I know nothing of vectors, you might want to look into the Math module or NumPy.
Those links should hopefully give you all the information you need to at least get started with this problem, as yuvi said, the code won't be written for you but you can come back when you have something that isn't working as you expected or you don't fully understand.
I can't seem to find a way to return the value of the number of columns in a worksheet in xlwt.Workbook(). The idea is to take a wad of .xls files in a directory and combine them into one. One problem I am having is changing the column position when writing the next file. this is what i'm working with thus far:
import xlwt, xlrd, os
def cbc(rd_sheet, wt_sheet, rlo=0, rhi=None,
rshift=0, clo=0, chi=None, cshift = 0):
if rhi is None: rhi = rd_sheet.nrows
if chi is None: chi = 2#only first two cols needed
for row_index in xrange(rlo, rhi):
for col_index in xrange(clo, chi):
cell = rd_sheet.cell(row_index, col_index)
wt_sheet.write(row_index + rshift, col_index + cshift, cell.value)
Dir = '/home/gerg/Desktop/ex_files'
ext = '.xls'
list_xls = [file for file in os.listdir(Dir) if file.endswith(ext)]
files = [Dir + '/%s' % n for n in list_xls]
output = '/home/gerg/Desktop/ex_files/copy_test.xls'
wbook = xlwt.Workbook()
wsheet = wbook.add_sheet('Summary', cell_overwrite_ok=True)#overwrite just for the repeated testing
for XLS in files:
rbook = xlrd.open_workbook(XLS)
rsheet = rbook.sheet_by_index(0)
cbc(rsheet, wsheet, cshift = 0)
wbook.save(output)
list_xls returns:
['file2.xls', 'file3.xls', 'file1.xls', 'copy_test.xls']
files returns:
['/home/gerg/Desktop/ex_files/file2.xls', '/home/gerg/Desktop/ex_files/file3.xls', '/home/gerg/Desktop/ex_files/file1.xls', '/home/gerg/Desktop/ex_files/copy_test.xls']
My question is how to scoot each file written into xlwt.workbook over by 2 each time. This code gives me the first file saved to .../copy_test.xls. Is there a problem with the file listing as well? I have a feeling there may be.
This is Python2.6 and I bounce between windows and linux.
Thank you for your help,
GM
You are using only the first two columns in each input spreadsheet. You don't need "the number of columns in a worksheet in xlwt.Workbook()". You already have the cshift mechanism in your code, but you are not using it. All you need to do is change the loop in your outer block, like this:
for file_index, file_name in enumerate(files):
rbook = xlrd.open_workbook(file_name)
rsheet = rbook.sheet_by_index(0)
cbc(rsheet, wsheet, chi = 2, cshift = file_index * 2)
For generality, change the line
if chi is None: chi = 2
in your function to
if chi is None: chi = rsheet.ncols
and pass chi=2 in as an arg as I have done in the above code.
I don't understand your rationale for overriding the overwrite check ... surely in your application, overwriting an existing cell value is incorrect?
You say "This code gives me the first file saved to .../copy_test.xls". First in input order is file2.xls. The code that you have shown is overwriting previous input and will give you the LAST file (in input order) , not the first ... perhaps you are mistaken. Note: The last input file 'copy_test.xls' is quite likely be a previous OUTPUT file; perhaps your output file should be put in a separate folder.