Im reading from my serialport data, I can store this data to .csv file. But the problem is that I want to write my data to a second or third column.
With code the data is stored in the first column:
file = open('test.csv', 'w', encoding="utf",newline="")
writer = csv.writer(file)
while True:
if serialInst.in_waiting:
packet = (serialInst.readline())
packet = [str(packet.decode().rstrip())] #decode remove \r\n strip the newline
writer.writerow(packet)
output of the code .csv file:
Column A
Column B
Data 1
Data 2
Data 3
Data 4
example desired output .csv file:
Column A
Column B
Data1
data 2
Data3
Data 4
I've not use the csv.writer before, but a quick read of the docs, seems to indicate that you can only write one row at a time, but you are getting data one cell/value at a time.
In your code example, you already have a file handle. Instead of writing one row at a time, you want to write one cell at a time. You'll need some extra variables to keep track of when to make a new line.
file = open('test.csv', 'w', encoding="utf",newline="")
writer = csv.writer(file)
ncols = 2 # 2 columns total in this example, but it's easy to imagine you might want more one day
col = 0 # use Python convention of zero based lists/arrays
while True:
if serialInst.in_waiting:
packet = (serialInst.readline())
packet = [str(packet.decode().rstrip())] #decode remove \r\n strip the newline
if col == ncols-1:
# last column, leave out comma and add newline \n
file.write(packet + '\n')
col = 0 # reset col to first position
else:
file.write(packet + ',')
col = col + 1
In this code, we're using the write method of a file object instead of using the csv module. See these docs for how to directly read and write from/to files.
Related
Analysis software I'm using outputs many groups of results in 1 csv file and separates the groups with 2 empty lines.
I would like to break the results in groups so that I can then analyse them separately.
I'm sure there is a built-in function in python (or one of it's libraries) that does this, I tried this piece of code that I found somewhere but it doesn't seem to work.
import csv
results = open('03_12_velocity_y.csv').read().split("\n\n")
# Feed first csv.reader
first_csv = csv.reader(results[0], delimiter=',')
# Feed second csv.reader
second_csv = csv.reader(results[1], delimiter=',')
Update:
The original code actually works, but my python skills are pretty limited and I did not implement it properly.
.split(\n\n\n) method does work but the csv.reader is an object and to get the data in a list (or something similar), it needs to iterate through all the rows and write them to the list.
I then used Pandas to remove the header and convert the scientific notated values to float. Code is bellow. Thanks everyone for help.
import csv
import pandas as pd
# Open the csv file, read it and split it when it encounters 2 empty lines (\n\n\n)
results = open('03_12_velocity_y.csv').read().split('\n\n\n')
# Create csv.reader objects that are used to iterate over rows in a csv file
# Define the output - create an empty multi-dimensional list
output1 = [[],[]]
# Iterate through the rows in the csv file and append the data to the empty list
# Feed first csv.reader
csv_reader1 = csv.reader(results[0].splitlines(), delimiter=',')
for row in csv_reader1:
output1.append(row)
df = pd.DataFrame(output1)
# remove first 7 rows of data (the start position of the slice is always included)
df = df.iloc[7:]
# Convert all data from string to float
df = df.astype(float)
If your row counts are inconsistent across groups, you'll need a little state machine to check when you're between groups and do something with the last group.
#!/usr/bin/env python3
import csv
def write_group(group, i):
with open(f"group_{i}.csv", "w", newline="") as out_f:
csv.writer(out_f).writerows(group)
with open("input.csv", newline="") as f:
reader = csv.reader(f)
group_i = 1
group = []
last_row = []
for row in reader:
if row == [] and last_row == [] and group != []:
write_group(group, group_i)
group = []
group_i += 1
continue
if row == []:
last_row = row
continue
group.append(row)
last_row = row
# flush remaining group
if group != []:
write_group(group, group_i)
I mocked up this sample CSV:
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
g2r1c1,g2r1c2,g2r1c3
g2r2c1,g2r2c2,g2r2c3
g3r1c1,g3r1c2,g3r1c3
g3r2c1,g3r2c2,g3r2c3
g3r3c1,g3r3c2,g3r3c3
g3r4c1,g3r4c2,g3r4c3
g3r5c1,g3r5c2,g3r5c3
And when I run the program above I get three CSV files:
group_1.csv
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
group_2.csv
g2r1c1,g2r1c2,g2r1c3
g2r2c1,g2r2c2,g2r2c3
group_3.csv
g3r1c1,g3r1c2,g3r1c3
g3r2c1,g3r2c2,g3r2c3
g3r3c1,g3r3c2,g3r3c3
g3r4c1,g3r4c2,g3r4c3
g3r5c1,g3r5c2,g3r5c3
If your row counts are consistent, you can do this with fairly vanilla Python or using the Pandas library.
Vanilla Python
Define your group size and the size of the break (in "rows") between groups.
Loop over all the rows adding each row to a group accumulator.
When the group accumulator reaches the pre-defined group size, do something with it, reset the accumulator, and then skip break-size rows.
Here, I'm writing each group to its own numbered file:
import csv
group_sz = 5
break_sz = 2
def write_group(group, i):
with open(f"group_{i}.csv", "w", newline="") as f_out:
csv.writer(f_out).writerows(group)
with open("input.csv", newline="") as f_in:
reader = csv.reader(f_in)
group_i = 1
group = []
for row in reader:
group.append(row)
if len(group) == group_sz:
write_group(group, group_i)
group_i += 1
group = []
for _ in range(break_sz):
try:
next(reader)
except StopIteration: # gracefully ignore an expected StopIteration (at the end of the file)
break
group_1.csv
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
g1r4c1,g1r4c2,g1r4c3
g1r5c1,g1r5c2,g1r5c3
With Pandas
I'm new to Pandas, and learning this as I go, but it looks like Pandas will automatically trim blank rows/records from a chunk of data^1.
With that in mind, all you need to do is specify the size of your group, and tell Pandas to read your CSV file in "iterator mode", where you can ask for a chunk (your group size) of records at a time:
import pandas as pd
group_sz = 5
with pd.read_csv("input.csv", header=None, iterator=True) as reader:
i = 1
while True:
try:
df = reader.get_chunk(group_sz)
except StopIteration:
break
df.to_csv(f"group_{i}.csv")
i += 1
Pandas add an "ID" column and default header when it writes out the CSV:
group_1.csv
,0,1,2
0,g1r1c1,g1r1c2,g1r1c3
1,g1r2c1,g1r2c2,g1r2c3
2,g1r3c1,g1r3c2,g1r3c3
3,g1r4c1,g1r4c2,g1r4c3
4,g1r5c1,g1r5c2,g1r5c3
TRY this out with your output:
import pandas as pd
# csv file name to be read in
in_csv = 'input.csv'
# get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))
# size of rows of data to write to the csv,
# you can change the row size according to your need
rowsize = 500
# start looping through data writing it to a new file for each set
for i in range(1,number_lines,rowsize):
df = pd.read_csv(in_csv,
header=None,
nrows = rowsize,#number of rows to read at each loop
skiprows = i)#skip rows that have been read
#csv to write data to a new file with indexed name. input_1.csv etc.
out_csv = 'input' + str(i) + '.csv'
df.to_csv(out_csv,
index=False,
header=False,
mode='a', #append data to csv file
)
I updated the question with the last details that answer my question.
My question is not how to open a .csv file, detect which rows I want to omit, and write a new .csv file with my desired lines. I'm already doing that successfully:
def sanitize(filepath): #Removes header information, leaving only column names and data. Outputs "sanitized" file.
with open(filepath) as unsan, open(dirname + "/" + newname + '.csv', 'w', newline='') as san:
writer = csv.writer(san)
line_count = 0
headingrow = 0
datarow = 0
safety = 1
for row in csv.reader(unsan, delimiter=','):
#Detect data start
if "DATA START" in str(row):
safety = 0
headingrow = line_count + 1
datarow = line_count + 4
#Detect data end
if "DATA END" in str(row):
safety = 1
#Write data
if safety == 0:
if line_count == headingrow or line_count >= datarow:
writer.writerow(row)
line_count += 1
I have .csv data files that are megabytes, sometimes gigabytes (up to 4Gb) in size. Out of 180,000 lines in each file, I only need to omit about 50 lines.
Example pseudo-data (rows I want to keep are indented):
[Header Start]
...48 lines of header data...
[Header End]
Blank Line
[Data Start]
Row with Column Names
Column Units
Column Variable Type
...180,000 lines of data...
I understand that I can't edit a .csv file as I iterate over it (Learned here:
How to Delete Rows CSV in python). Is there a quicker way to remove the header information from the file, like maybe writing the remaining 180,000 lines as a block instead of iterating through and writing each line?
Maybe one solution would be to append all the data rows to a list of lists and then use writer.writerows(list of lists) instead of writing them one at a time (Batch editing of csv files with Python, https://docs.python.org/3/library/csv.html)? However, wouldn't that mean I'm loading essentially the whole file (up to 4Gb) into my RAM?
UPDATE:
I've got a pandas import working, but when I time it, it takes about twice as long as the code above. Specifically, the to_csv portion takes about 10s for a 26Mb file.
import csv, pandas as pd
filepath = r'input'
with open(filepath) as unsan:
line_count = 0
headingrow = 0
datarow = 0
safety = 1
row_count = sum(1 for row in csv.reader(unsan, delimiter=','))
for row in csv.reader(unsan, delimiter=','):
#Detect data start
if "DATA START" in str(row):
safety = 0
headingrow = line_count + 1
datarow = line_count + 4
#Write data
if safety == 0:
if line_count == headingrow:
colnames = row
line_count +=1
break
line_count += 1
badrows = [*range(0, 55, 1),row_count - 1]
df = pd.read_csv(filepath, names=[*colnames], skiprows=[*badrows], na_filter=False)
df.to_csv (r'output', index = None, header=True)
Here's the research I've done:
Deleting rows with Python in a CSV file
https://intellipaat.com/community/18827/how-to-delete-only-one-row-in-csv-with-python
https://www.reddit.com/r/learnpython/comments/7tzbjm/python_csv_cleandelete_row_function_doesnt_work/
https://nitratine.net/blog/post/remove-columns-in-a-csv-file-with-python/
Delete blank rows from CSV?
If it is not important that the file is read in Python, or with a CSV reader/writer, you can use other tools. On *nix you can use sed:
sed -n '/DATA START/,/DATA END/p' myfile.csv > headerless.csv
This will be very fast for millions of lines.
perl is more multi-platform:
perl -F -lane "print if /DATA START/ .. /DATA END/;" myfile.csv
To avoid editing the file, and read the file with headers straight into Python and then into Pandas, you can wrap the file in your own file-like object.
Given an input file called myfile.csv with this content:
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
now, some, data
1,2,3
4,5,6
7,8,9
You can read that file in directly using a wrapper class:
import io
class HeaderSkipCsv(io.TextIOBase):
def __init__(self, filename):
""" create an iterator from the filename """
self.data = self.yield_csv(filename)
def readable(self):
""" here for compatibility """
return True
def yield_csv(self, filename):
""" open filename and read past the first empty line
Then yield characters one by one. This reads just one
line at a time in memory
"""
with open(filename) as f:
for line in f:
if line.strip() == "":
break
for line in f:
for char in line:
yield char
def read(self, n=None):
""" called by Pandas with some 'n', this returns
the next 'n' characters since the last read as a string
"""
data = ""
for i in range(n):
try:
data += next(self.data)
except StopIteration:
break
return data
WANT_PANDAS=True #set to False to just write file
if WANT_PANDAS:
import pandas as pd
df = pd.read_csv(HeaderSkipCsv('myfile.csv'))
print(df.head(5))
else:
with open('myoutfile.csv', 'w') as fo:
with HeaderSkipCsv('myfile.csv') as fi:
c = fi.read(1024)
while c:
fo.write(c)
c = fi.read(1024)
which outputs:
now some data
0 1 2 3
1 4 5 6
2 7 8 9
Because Pandas allows any file-like object, we can provide our own! Pandas calls read on the HeaderSkipCsv object as it would on any file object. Pandas just cares about reading valid csv data from a file object when it calls read on it. Rather than providing Pandas with a clean file, we provide it with a file-like object that filters out the data Pandas does not like (i.e. the headers).
The yield_csv generator iterates over the file without reading it in, so only as much data as Pandas requests is loaded into memory. The first for loop in yield_csv advances f to beyond the first empty line. f represents a file pointer and is not reset at the end of a for loop while the file remains open. Since the second for loop receives f under the same with block, it starts consuming at the start of the csv data, where the first for loop left it.
Another way of writing the first for loop would be
next((line for line in f if line.isspace()), None)
which is more explicit about advancing the file pointer, but arguably harder to read.
Because we skip the lines up to and including the empty line, Pandas just gets the valid csv data. For the headers, no more than one line is ever loaded.
I have one CSV file, and I want to extract the first column of it. My CSV file is like this:
Device ID;SysName;Entry address(es);IPv4 address;Platform;Interface;Port ID (outgoing port);Holdtime
PE1-PCS-RANCAGUA;;;192.168.203.153;cisco CISCO7606 Capabilities Router Switch IGMP;TenGigE0/5/0/1;TenGigabitEthernet3/3;128 sec
P2-CORE-VALPO.cisco.com;P2-CORE-VALPO.cisco.com;;200.72.146.220;cisco CRS Capabilities Router;TenGigE0/5/0/0;TenGigE0/5/0/4;128 sec
PE2-CONCE;;;172.31.232.42;Cisco 7204VXR Capabilities Router;GigabitEthernet0/0/0/14;GigabitEthernet0/3;153 sec
P1-CORE-CRS-CNT.entel.cl;P1-CORE-CRS-CNT.entel.cl;;200.72.146.49;cisco CRS Capabilities Router;TenGigE0/5/0/0;TenGigE0/1/0/6;164 sec
For that purpose I use the following code that I saw here:
import csv
makes = []
with open('csvoutput/topologia.csv', 'rb') as f:
reader = csv.reader(f)
# next(reader) # Ignore first row
for row in reader:
makes.append(row[0])
print makes
Then I want to replace into a textfile a particular value for each one of the values of the first column and save it as a new file.
Original textfile:
PLANNED.IMPACTO_ID = IMPACTO.ID AND
PLANNED.ESTADOS_ID = ESTADOS_PLANNED.ID AND
TP_CLASIFICACION.ID = TP_DATA.ID_TP_CLASIFICACION AND
TP_DATA.PLANNED_ID = PLANNED.ID AND
PLANNED.FECHA_FIN >= CURDATE() - INTERVAL 1 DAY AND
PLANNED.DESCRIPCION LIKE '%P1-CORE-CHILLAN%’;
Expected output:
PLANNED.IMPACTO_ID = IMPACTO.ID AND
PLANNED.ESTADOS_ID = ESTADOS_PLANNED.ID AND
TP_CLASIFICACION.ID = TP_DATA.ID_TP_CLASIFICACION AND
TP_DATA.PLANNED_ID = PLANNED.ID AND
PLANNED.FECHA_FIN >= CURDATE() - INTERVAL 1 DAY AND
PLANNED.DESCRIPCION LIKE 'FIRST_COLUMN_VALUE’;
And so on for every value in the first column, and save it as a separate file.
How can I do this? Thank you very much for your help.
You could just read the file, apply changes, and write the file back again. There is no efficient way to edit a file (inserting characters is not efficiently possible), you can only rewrite it.
If your file is going to be big, you should not keep the whole table in memory.
import csv
makes = []
with open('csvoutput/topologia.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
makes.append(row)
# Apply changes in makes
with open('csvoutput/topologia.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(makes);
Hello I'm really new here as well as in the world of python.
I have some (~1000) .csv files, including ~ 1800000 rows of information each. The files are in the following form:
5302730,131841,-0.29999999999999999,NULL,2013-12-31 22:00:46.773
5303072,188420,28.199999999999999,NULL,2013-12-31 22:27:46.863
5350066,131841,0.29999999999999999,NULL,2014-01-01 00:37:21.023
5385220,-268368577,4.5,NULL,2014-01-01 03:12:14.163
5305752,-268368587,5.1900000000000004,NULL,2014-01-01 03:11:55.207
So, i would like for all of the files:
(1) to remove the 4th (NULL) column
(2) to keep in every file only certain rows (depending on the value of the first column i.e.5302730, keep only the rows that containing that value)
I don't know if this is even possible, so any answer is appreciated!
Thanks in advance.
Have a look at the csv module
One can use the csv.reader function to generate an iterator of lines, with each lines cells as a list.
for line in csv.reader(open("filename.csv")):
# Remove 4th column, remember python starts counting at 0
line = line[:3] + line[4:]
if line[0] == "thevalueforthefirstcolumn":
dosomethingwith(line)
If you wish to do this sort of operation with CSV files more than once and want to use different parameters regarding column to skip, column to use as key and what to filter on, you can use something like this:
import csv
def read_csv(filename, column_to_skip=None, key_column=0, key_filter=None):
data_from_csv = []
with open(filename) as csvfile:
csv_reader = csv.reader(csvfile)
for row in csv_reader:
# Skip data in specific column
if column_to_skip is not None:
del row[column_to_skip]
# Filter out rows where the key doesn't match
if key_filter is not None:
key = row[key_column]
if key_filter != key:
continue
data_from_csv.append(row)
return data_from_csv
def write_csv(filename, data_to_write):
with open(filename, 'w') as csvfile:
csv_writer = csv.writer(csvfile)
for row in data_to_write:
csv_writer.writerow(row)
data = read_csv('data.csv', column_to_skip=3, key_filter='5302730')
write_csv('data2.csv', data)
I want to perform multiple edits to most rows in a csv file without making multiple writes to the output csv file.
I have a csv that I need to convert and clean up into specific format for another program to use. For example, I'd like to:
remove all blank rows
remove all rows where the value of column "B" is not a number
with this new data, create a new column and explode the first part of the values in column B into the new column
Here's an example of the data:
"A","B","C","D","E"
"apple","blah","1","","0.00"
"ape","12_fun","53","25","1.00"
"aloe","15_001","51","28",2.00"
I can figure out the logic behind each process, but what I can't figure out is how to perform each process without reading and writing to a file each time. I'm using the CSV module. Is there a better way to perform these steps at once before writing a final CSV?
I would define a set of tests and a set of processes.
If all tests pass, all processes are applied, and the final result is written to output:
import csv
#
# Row tests
#
def test_notblank(row):
return any(len(i) for i in row)
def test_bnumeric(row):
return row[1].isdigit()
def do_tests(row, tests=[test_notblank, test_bnumeric]):
return all(t(row) for t in tests)
#
# Row processing
#
def process_splitb(row):
b = row[1].split('.')
row[1] = b[0]
row.append(b[1])
return row
def do_processes(row, processes=[process_splitb]):
for p in processes:
row = p(row)
return row
def main():
with open("in.csv","rb") as inf, open("out.csv","wb") as outf:
incsv = csv.reader(inf)
outcsv = csv.writer(outf)
outcsv.writerow(incsv.next()) # pass header row
outcsv.writerows(do_processes(row) for row in incsv if do_tests(row))
if __name__=="__main__":
main()
Simple for loops.
import csv
csv_file = open('in.csv', 'rb')
csv_reader = csv.reader(csv_file)
header = csv_reader.next()
header.append('F') #add new column
records = [header]
#process records
for record in csv_reader:
#skip blank records
if record == []:
continue
#make sure column "B" has 2 parts
try:
part1, part2 = record[1].split('_')
except:
continue
#make sure part1 is a digit
if not part1.isdigit():
continue
record[1] = part1 #make column B equal part1
record.append(part2) #add data for the new column F to record
records.append(record)
new_csv_file = open('out.csv', 'wb')
csv_writer = csv.writer(new_csv_file, quoting=csv.QUOTE_ALL)
for r in records:
csv_writer.writerow(r)
Why use the CSV module. A CSV is made up of text lines (strings) and you can use the Python string power (split, join, replace, len) to create your result.
line_cols = line.split(',') and back: line = ','.join(line_cols)