Read CSV with comma as linebreak - python

I have a file saved as .csv
"400":0.1,"401":0.2,"402":0.3
Ultimately I want to save the data in a proper format in a csv file for further processing. The problem is that there are no line breaks in the file.
pathname = r"C:\pathtofile\file.csv"
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
print(reader)
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(reader)
The print reader output looks exactly how I want (or at least it's a format I can further process).
"400":0.1
"401":0.2
"402":0.3
And now I want to save that to a new csv file. However the output looks like
"""",4,0,0,"""",:,0,.,1,"
","""",4,0,1,"""",:,0,.,2,"
","""",4,0,2,"""",:,0,.,3
I'm sure it would be intelligent to convert the format to
400,0.1
401,0.2
402,0.3
at this stage instead of doing later with another script.
The main problem is that my current code
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
reader = csv.reader(reader,delimiter=':')
x = []
y = []
print(reader)
for row in reader:
x.append( float(row[0]) )
y.append( float(row[1]) )
print(x)
print(y)
works fine for the type of csv files I currently have, but doesn't work for these mentioned above:
y.append( float(row[1]) )
IndexError: list index out of range
So I'm trying to find a way to work with them too. I think I'm missing something obvious as I imagine that it can't be too hard to properly define the linebreak character and delimiter of a file.
with open(pathname, newline=',') as file:
yields
ValueError: illegal newline value: ,

The right way with csv module, without replacing and casting to float:
import csv
with open('file.csv', 'r') as f, open('filenew.csv', 'w', newline='') as out:
reader = csv.reader(f)
writer = csv.writer(out, quotechar=None)
for r in reader:
for i in r:
writer.writerow(i.split(':'))
The resulting filenew.csv contents (according to your "intelligent" condition):
400,0.1
401,0.2
402,0.3
Nuances:
csv.reader and csv.writer objects treat comma , as default delimiter (no need to file.read().replace(',', '\n'))
quotechar=None is specified for csv.writer object to eliminate double quotes around the values being saved

You need to split the values to form a list to represent a row. Presently the code is splitting the string into individual characters to represent the row.
pathname = r"C:\pathtofile\file.csv"
with open(pathname) as old_file:
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=',')
text_rows = old_file.read().split(",")
for row in text_rows:
items = row.split(":")
csv_writer.writerow([int(items[0]), items[1])

If you look at the documentation, for write_row, it says:
Write the row parameter to the writer’s file
object, formatted according to the current dialect.
But, you are writing an entire string in your code
csv_writer.writerow(reader)
because reader is a string at this point.
Now, the format you want to use in your CSV file is not clearly mentioned in the question. But as you said, if you can do some preprocessing to create a list of lists and pass each sublist to writerow(), you should be able to produce the required file format.

Related

Adding custom delimiters back to a csv?

Currently, I take in a csv file using custom delimiters, "|". I then read it in and modify it using the code below:
import csv
ChangedDate = '2018-10-31'
firstfile = open('example.csv',"r")
firstReader = csv.reader(firstfile, delimiter='|')
firstData = list(firstReader)
outputFile = open("output.csv","w")
iteration = 0
for row in firstData:
firstData[iteration][25] = ChangedDate
iteration+=1
outputwriter = csv.writer(open("output.csv","w"))
outputwriter.writerows(firstData)
outputFile.close()
However, when I write the rows to my output file, they are comma seperated. This is a problem because I am dealing with large financial data, and therefore commas appear naturally, such as $8,000.00, hence the "|" delimiters of the original file. Is there a way to "re-delimit" my list before I write it to an output file?
You can provide the delimiter to the csv.writer:
with open("output.csv", "w") as f:
outputwriter = csv.writer(f, delimiter='|')

how can I use csv tools for zip text file?

update-my file.txt.zp is tab delimited and looks kind of like this :
file.txt.zp
I want to split the first col by : _ /
original post:
I have a very large zipped tab delimited file.
I want to open it, scan it one row at a time, split some of the col, and write it to a new file.
I got various errors (every time I fix one another pops)
This is my code:
import csv
import re
import gzip
f = gzip.open('file.txt.gz')
original = f.readlines()
f.close()
original_l = csv.reader(original)
for row in original_l:
file_l = re.split('_|:|/',row)
with open ('newfile.gz', 'w', newline='') as final:
finalfile = csv.writer(final,delimiter = ' ')
finalfile.writerow(file_l)
Thanks!
for this code i got the error:
for row in original_l:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
so based on what I found here I added this after f.close():
original = original.decode('utf8')
and then got the error:
original = original.decode('utf8')
AttributeError: 'list' object has no attribute 'decode'
Update 2
This code should produce the output that you're after.
import csv
import gzip
import re
with gzip.open('file.txt.gz', mode='rt') as f, \
open('newfile.gz', 'w') as final:
writer = csv.writer(final, delimiter=' ')
reader = csv.reader(f, delimiter='\t')
_ = next(reader) # skip header row
for row in reader:
writer.writerow(re.split(r'_|:|/', row[0]))
Update
Open the gzip file in text mode because str objects are required by the CSV module in Python 3.
f = gzip.open('file.txt.gz', 'rt')
Also specify the delimiter when creating the csv.reader.
original_l = csv.reader(original, delimiter='\t')
This will get you past the first hurdle.
Now you need to explain what the data is, which columns you wish to extract, and what the output should look like.
Original answer follows...
One obvious problem is that the output file is constantly being overwritten by the next row of input. This is because the output file is opened in (over)write mode (`'w`` ) once per row.
It would be better to open the output file once outside of the loop.
Also, the CSV file delimiter is not specified when creating the reader. You said that the file is tab delimited so specify that:
original_l = csv.reader(original, delimiter='\t')
On the other hand, your code attempts to split each row using other delimiters, however, the rows coming from the csv.reader are represented as a list, not a string as the re.split() code would require.
Another problem is that the output file is not zipped as the name suggests.

Reading data from one CSV and displaying parsed data on to another CSV file

I am very new to Python. I am trying to read a csv file and displaying the result to another CSV file. What I want to do is I want to write selective rows in the input csv file on to the output file. Below is the code I wrote so far. This code read every single row from the input file i.e. 1.csv and write it to an output file out.csv. How can I tweak this code say for example I want my output file to contain only those rows which starts with READ in column 8 and rows which are not equal to 0000 in column 10. Both of these conditions need to be met. Like start with READ and not equal to 0000. I want to write all these rows. Also this block of code is for a single csv file. Can anyone please tell me how I can do it for say 10000 csv files ? Also when I execute the code, I can see spaces between lines on my out csv. How can I remove those spaces ?
import csv
f1 = open("1.csv", "r")
reader = csv.reader(f1)
header = reader.next()
f2 = open("out.csv", "w")
writer = csv.writer(f2)
writer.writerow(header)
for row in reader:
writer.writerow(row)
f1.close()
f2.close()
Something like:
import os
import csv
import glob
class CSVReadWriter(object):
def munge(self, filename, suffix):
name,ext = os.path.split(filename)
return '{0}{1}.{2}'.format(name, suffix, ext)
def is_valid(self, row):
return row[8] == 'READ' and row[10] == '0000'
def filter_csv(fin, fout):
reader = csv.reader(fin)
writer = csv.writer(fout)
writer.write(reader.next()) # header
for row in reader:
if self.is_valid(row):
writer.writerow(row)
def read_write(self, iname, suffix):
with open(iname, 'rb') as fin:
oname = self.munge(filename, suffix)
with open(oname, 'wb') as fout:
self.filter_csv(fin, fout)
work_directory = r"C:\Temp\Data"
for filename in glob.glob(work_directory):
csvrw = CSVReadWriter()
csvrw.read_write(filename, '_out')
I've made it a class so that you can over ride the munge and is_valid methods to suit different cases. Being a class also means that you can store state better, for example if you wanted to output lines between certain criteria.
The extra spaces between lines that you mention are to do with \r\n carriage return and line feed line endings. Using open with 'wb' might resolve it.

re.sub for a csv file

I am receiving a error on this code. It is "TypeError: expected string or buffer". I looked around, and found out that the error is because I am passing re.sub a list, and it does not take lists. However, I wasn't able to figure out how to change my line from the csv file into something that it would read.
I am trying to change all the periods in a csv file into commas. Here is my code:
import csv
import re
in_file = open("/test.csv", "rb")
reader = csv.reader(in_file)
out_file = open("/out.csv", "wb")
writer = csv.writer(out_file)
for row in reader:
newrow = re.sub(r"(\.)+", ",", row)
writer.writerow(newrow)
in_file.close()
out_file.close()
I'm sorry if this has already been answered somewhere. There was certainly a lot of answers regarding this error, but I couldn't make any of them work with my csv file. Also, as a side note, this was originally an .xslb excel file that I converted into csv in order to be able to work with it. Was that necessary?
You could use list comprehension to apply your substitution to each item in row
for row in reader:
newrow = [re.sub(r"(\.)+", ",", item) for item in row]
writer.writerow(newrow)
for row in reader does not return single element to parse it rather it returns list of of elements in that row so you have to unpack that list and parse each item individually, just like #Trii shew you:
[re.sub(r'(\.)+','.',s) for s in row]
In this case, we are using glob to access all the csv files in the directory.
The code below overwrites the source csv file, so there is no need to create an output file.
NOTE:
If you want to get a second file with the parameters provided with re.sub, replace write = open(i, 'w') for write = open('secondFile.csv', 'w')
import re
import glob
for i in glob.glob("*.csv"):
read = open(i, 'r')
reader = read.read()
csvRe = re.sub(re.sub(r"(\.)+", ",", str(reader))
write = open(i, 'w')
write.write(csvRe)
read.close()
write.close()

How to read csv on python with newline separator #

I need to read a csv on Python, and the text file that I have has this structure:
"114555","CM13","0004","0","C/U"#"99172","CM13","0001","0","C/U"#"178672","CM13","0001","0","C/U"
delimeter: ,
newline: #
My code so far:
import csv
data = []
with open('stock.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', lineterminator='#')
for row in reader:
data.append({'MATERIAL': row[0],'CENTRO': row[1], 'ALMACEN': row[2], 'STOCK_VALORIZADO' : row[3], 'STOCK_UMB':row[4]})
print(data) #this print just one row
This code only print one row, because it's not recognize # as a newline,
and prints it with quotes:
[{'MATERIAL': '114555', 'CENTRO': 'CM13', 'ALMACEN': '0004', 'STOCK_VALORIZADO': '0', 'STOCK_UMB': 'C/U#"99172"'}]
According to https://docs.python.org/2/library/csv.html :
"The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future." Hence for now, providing the argument lineterminator='#' will not work.
I think the best option is to read your entire file into a variable, and replace all '#' characters, you can do this as follows:
with open("stock.csv", "r") as myfile:
data = myfile.read().replace('#', '\n')
Now you need to adjust your algorithm in such a way that you can pass the variable data to csv.reader (instead of the file stock.csv), according to the python doc:
"The "iterable" argument can be any object that returns a line
of input for each iteration, such as a file object or a list. [...]"
Hence you can pass data.splitlines() to csv.reader.
I was struggling with CRLF ('\r\n') line endings using csv.reader. I was able to get it working using the newline parameter in open
with open(local_file, 'r', newline='\r\n') as f:
reader = csv.reader(f)

Categories

Resources