Replace newlines in the middle of .csv columns

Replace newlines in the middle of .csv columns - python

I have a CSV file which can look like this:
Col1,"Col2","Col31
Col32
Col33"
That means if I use this code:
with open(inpPath, "r+") as csvFile:
wb_obj = list(csv.reader(csvFile, delimiter=','))
for row in wb_obj:
print(row)
The output looks like this:
[Col1,"Col2","Col31\nCol32\nCol33"]
So I am trying to replace the \n characters with spaces so the CSV file would be rewritten like this: Col1,"Col2","Col31 Col32 Col33"
I have written this short function but it results in Error: Cannot open mapping csv, Exception thrown I/O operation on closed file.
def processCSV(fileName):
with open(fileName, "rU") as csvFile:
filtered = (line.replace('\n', ' ') for line in csvFile)
wb_obj = csv.reader(filtered, delimiter=",")
return wb_obj
How could I fix that? Thank you very much for any help

Your processCSV function returns an iterable based on the file object csvFile, and yet when the iterable gets consumed by the caller of the processCSV function, the file object csvFile is already closed by the context manager for being outside the with statement, hence the said error. The common pattern for such a function is to make it accept a file object as a parameter, and let the caller open the file instead to pass the file object to the function so that the file can remain open for the caller.
You also should not replace all newlines with spaces to begin with, since you really only want to replace newlines if they are within double quotes, which would be parsed by csv.reader as part of a column value rather than row separators. Instead, you should let csv.reader do its parsing first, and then replace newline characters with spaces in all column values:
def processCSV(file):
return ([col.replace('\n', ' ') for col in row] for row in csv.reader(file))
# the caller
with open(filename) as file:
for row in processCSV(file):
print(*row, sep=',')

This comes down to using a generator expression inside of the file context. See a longer explanation here: https://stackoverflow.com/a/39656712/15981783.
You can change the generator expression to a list comprehension and it will work:
import csv
def processCSV(fileName):
with open(fileName, "r") as csvFile:
# list comprehension used here instead
filtered = [line.replace('\n', ' ') for line in csvFile]
wb_obj = csv.reader(filtered, delimiter=",")
return wb_obj
print(list(processCSV("tmp.csv")))
Returns:
[['Col1', 'Col2', 'Col31 Col32 Col33']]

Related

Read and convert row in text file into list of string

I have a text file data.txt that contains 2 rows of text.
first_row_1 first_row_2 first_row_3
second_row_1 second_row_2 second_row_3
I would like to read the second row of the text file and convert the contents into a list of string in python. The list should look like this;
txt_list_str=['second_row_1','second_row_2','second_row_3']
Here is my attempted code;
import csv
with open('data.txt', newline='') as f:
reader = csv.reader(f)
row1 = next(reader)
row2 = next(reader)
my_list = row2.split(" ")
I got the error AttributeError: 'list' object has no attribute 'split'
I am using python v3.
EDIT: Thanks for all the answers. I am sure all of them works. But can someone tell me what is wrong with my own attempted code? Thanks.

The reason your code doesn't work is you are trying to use split on a list, but it is meant to be used on a string. Therefore in your example you would use row2[0] to access the first element of the list.
my_list = row2[0].split(" ")
Alternatively, if you have access to the numpy library you can use loadtxt.
import numpy as np
f = np.loadtxt("data.txt", dtype=str, skiprows=1)
print (f)
# ['second_row_1' 'second_row_2' 'second_row_3']
The result of this is an array as opposed to a list. You could simply cast the array to a list if you require a list
print (list(f))
#['second_row_1', 'second_row_2', 'second_row_3']

Use read file method to open file.
E.g.
>>> fp = open('temp.txt')
Use file inbuilt generator to iterate lines by next method, and ignore first line.
>>> next(fp)
'first_row_1 first_row_2 first_row_3)\n'
Get second line in any variable.
>>> second_line = next(fp)
>>> second_line
'second_row_1 second_row_2 second_row_3'
Use Split string method to get items in list. split method take one or zero argument. if not given they split use white space for split.
>>> second_line.split()
['second_row_1', 'second_row_2', 'second_row_3']
And finally close the file.
fp.close()
Note: There are number of way to get respective output.
But you should attempt first as DavidG said in comment.

with open("file.txt", "r") as f:
next(f) # skipping first line; will work without this too
for line in f:
txt_list_str = line.split()
print(txt_list_str)
Output
['second_row_1', 'second_row_2', 'second_row_3']

Read CSV with comma as linebreak

I have a file saved as .csv
"400":0.1,"401":0.2,"402":0.3
Ultimately I want to save the data in a proper format in a csv file for further processing. The problem is that there are no line breaks in the file.
pathname = r"C:\pathtofile\file.csv"
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
print(reader)
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(reader)
The print reader output looks exactly how I want (or at least it's a format I can further process).
"400":0.1
"401":0.2
"402":0.3
And now I want to save that to a new csv file. However the output looks like
"""",4,0,0,"""",:,0,.,1,"
","""",4,0,1,"""",:,0,.,2,"
","""",4,0,2,"""",:,0,.,3
I'm sure it would be intelligent to convert the format to
400,0.1
401,0.2
402,0.3
at this stage instead of doing later with another script.
The main problem is that my current code
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
reader = csv.reader(reader,delimiter=':')
x = []
y = []
print(reader)
for row in reader:
x.append( float(row[0]) )
y.append( float(row[1]) )
print(x)
print(y)
works fine for the type of csv files I currently have, but doesn't work for these mentioned above:
y.append( float(row[1]) )
IndexError: list index out of range
So I'm trying to find a way to work with them too. I think I'm missing something obvious as I imagine that it can't be too hard to properly define the linebreak character and delimiter of a file.
with open(pathname, newline=',') as file:
yields
ValueError: illegal newline value: ,

The right way with csv module, without replacing and casting to float:
import csv
with open('file.csv', 'r') as f, open('filenew.csv', 'w', newline='') as out:
reader = csv.reader(f)
writer = csv.writer(out, quotechar=None)
for r in reader:
for i in r:
writer.writerow(i.split(':'))
The resulting filenew.csv contents (according to your "intelligent" condition):
400,0.1
401,0.2
402,0.3
Nuances:
csv.reader and csv.writer objects treat comma , as default delimiter (no need to file.read().replace(',', '\n'))
quotechar=None is specified for csv.writer object to eliminate double quotes around the values being saved

You need to split the values to form a list to represent a row. Presently the code is splitting the string into individual characters to represent the row.
pathname = r"C:\pathtofile\file.csv"
with open(pathname) as old_file:
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=',')
text_rows = old_file.read().split(",")
for row in text_rows:
items = row.split(":")
csv_writer.writerow([int(items[0]), items[1])

If you look at the documentation, for write_row, it says:
Write the row parameter to the writer’s file
object, formatted according to the current dialect.
But, you are writing an entire string in your code
csv_writer.writerow(reader)
because reader is a string at this point.
Now, the format you want to use in your CSV file is not clearly mentioned in the question. But as you said, if you can do some preprocessing to create a list of lists and pass each sublist to writerow(), you should be able to produce the required file format.

re.sub for a csv file

I am receiving a error on this code. It is "TypeError: expected string or buffer". I looked around, and found out that the error is because I am passing re.sub a list, and it does not take lists. However, I wasn't able to figure out how to change my line from the csv file into something that it would read.
I am trying to change all the periods in a csv file into commas. Here is my code:
import csv
import re
in_file = open("/test.csv", "rb")
reader = csv.reader(in_file)
out_file = open("/out.csv", "wb")
writer = csv.writer(out_file)
for row in reader:
newrow = re.sub(r"(\.)+", ",", row)
writer.writerow(newrow)
in_file.close()
out_file.close()
I'm sorry if this has already been answered somewhere. There was certainly a lot of answers regarding this error, but I couldn't make any of them work with my csv file. Also, as a side note, this was originally an .xslb excel file that I converted into csv in order to be able to work with it. Was that necessary?

You could use list comprehension to apply your substitution to each item in row
for row in reader:
newrow = [re.sub(r"(\.)+", ",", item) for item in row]
writer.writerow(newrow)

for row in reader does not return single element to parse it rather it returns list of of elements in that row so you have to unpack that list and parse each item individually, just like #Trii shew you:
[re.sub(r'(\.)+','.',s) for s in row]

In this case, we are using glob to access all the csv files in the directory.
The code below overwrites the source csv file, so there is no need to create an output file.
NOTE:
If you want to get a second file with the parameters provided with re.sub, replace write = open(i, 'w') for write = open('secondFile.csv', 'w')
import re
import glob
for i in glob.glob("*.csv"):
read = open(i, 'r')
reader = read.read()
csvRe = re.sub(re.sub(r"(\.)+", ",", str(reader))
write = open(i, 'w')
write.write(csvRe)
read.close()
write.close()

How to read csv on python with newline separator #

I need to read a csv on Python, and the text file that I have has this structure:
"114555","CM13","0004","0","C/U"#"99172","CM13","0001","0","C/U"#"178672","CM13","0001","0","C/U"
delimeter: ,
newline: #
My code so far:
import csv
data = []
with open('stock.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', lineterminator='#')
for row in reader:
data.append({'MATERIAL': row[0],'CENTRO': row[1], 'ALMACEN': row[2], 'STOCK_VALORIZADO' : row[3], 'STOCK_UMB':row[4]})
print(data) #this print just one row
This code only print one row, because it's not recognize # as a newline,
and prints it with quotes:
[{'MATERIAL': '114555', 'CENTRO': 'CM13', 'ALMACEN': '0004', 'STOCK_VALORIZADO': '0', 'STOCK_UMB': 'C/U#"99172"'}]

According to https://docs.python.org/2/library/csv.html :
"The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future." Hence for now, providing the argument lineterminator='#' will not work.
I think the best option is to read your entire file into a variable, and replace all '#' characters, you can do this as follows:
with open("stock.csv", "r") as myfile:
data = myfile.read().replace('#', '\n')
Now you need to adjust your algorithm in such a way that you can pass the variable data to csv.reader (instead of the file stock.csv), according to the python doc:
"The "iterable" argument can be any object that returns a line
of input for each iteration, such as a file object or a list. [...]"
Hence you can pass data.splitlines() to csv.reader.

I was struggling with CRLF ('\r\n') line endings using csv.reader. I was able to get it working using the newline parameter in open
with open(local_file, 'r', newline='\r\n') as f:
reader = csv.reader(f)

Removing last row in csv

I'm trying to remove the last row in a csv but I getting an error: _csv.Error: string with NUL byte
This is what I have so far:
dcsv = open('PnL.csv' , 'a+r+b')
cWriter = csv.writer(dcsv, delimiter=' ')
cReader = csv.reader(dcsv)
for row in cReader:
cWriter.writerow(row[:-1])
I cant figure out why I keep getting errors

I would just read in the whole file with readlines(), pop out the last row, and then write that with csv module
import csv
f = open("summary.csv", "r+w")
lines=f.readlines()
lines=lines[:-1]
cWriter = csv.writer(f, delimiter=',')
for line in lines:
cWriter.writerow(line)

This should work
import csv
f = open('Pnl.csv', "r+")
lines = f.readlines()
lines.pop()
f = open('Pnl.csv', "w+")
f.writelines(lines)

I'm not sure what you're doing with the 'a+r+b' file mode and reading and writing to the same file, so won't provide a complete code snippet, but here's a simple method to skip any lines that contains a NUL byte in them in a file you're reading, whether it's the last, first, or one in the middle being read.
The trick is to realize that the docs say the csvfile argument to a csv.writer() "can be any object which supports the iterator protocol and returns a string each time its next() method is called." This means that you can replace the file argument in the call with a simple filter iterator function defined this way:
def filter_nul_byte_lines(a_file):
for line in a_file:
if '\x00' not in line:
yield line
and use it in a way similar to this:
dcsv = open('Pnl.csv', 'rb+')
cReader = csv.reader(filter_nul_byte_lines(dcsv))
for row in cReader:
print row
This will cause any lines with a NUL byte in them to be ignored while reading the file. Also this technique works on-the-fly as each line is read, so it does not require reading the entire file into memory at once or preprocessing it ahead of time.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace newlines in the middle of .csv columns - python

Related

Read and convert row in text file into list of string

Read CSV with comma as linebreak

re.sub for a csv file

How to read csv on python with newline separator #

Removing last row in csv

Categories

Resources