I'm trying to remove the last row in a csv but I getting an error: _csv.Error: string with NUL byte
This is what I have so far:
dcsv = open('PnL.csv' , 'a+r+b')
cWriter = csv.writer(dcsv, delimiter=' ')
cReader = csv.reader(dcsv)
for row in cReader:
cWriter.writerow(row[:-1])
I cant figure out why I keep getting errors
I would just read in the whole file with readlines(), pop out the last row, and then write that with csv module
import csv
f = open("summary.csv", "r+w")
lines=f.readlines()
lines=lines[:-1]
cWriter = csv.writer(f, delimiter=',')
for line in lines:
cWriter.writerow(line)
This should work
import csv
f = open('Pnl.csv', "r+")
lines = f.readlines()
lines.pop()
f = open('Pnl.csv', "w+")
f.writelines(lines)
I'm not sure what you're doing with the 'a+r+b' file mode and reading and writing to the same file, so won't provide a complete code snippet, but here's a simple method to skip any lines that contains a NUL byte in them in a file you're reading, whether it's the last, first, or one in the middle being read.
The trick is to realize that the docs say the csvfile argument to a csv.writer() "can be any object which supports the iterator protocol and returns a string each time its next() method is called." This means that you can replace the file argument in the call with a simple filter iterator function defined this way:
def filter_nul_byte_lines(a_file):
for line in a_file:
if '\x00' not in line:
yield line
and use it in a way similar to this:
dcsv = open('Pnl.csv', 'rb+')
cReader = csv.reader(filter_nul_byte_lines(dcsv))
for row in cReader:
print row
This will cause any lines with a NUL byte in them to be ignored while reading the file. Also this technique works on-the-fly as each line is read, so it does not require reading the entire file into memory at once or preprocessing it ahead of time.
Related
Im trying to code if the csv file is more than 3 lines, edit it to delete the first line.
I want to delete the first line from the existing file instead of saving it as a new file.
For this reason, I had to delete the existing file and create a file with the same name but
only one line is saved and the comma disappears.
I'm using Pandas data frame. But if it doesn't matter if I don't use it, I don't want to use it
Function name might be weird because I'm a beginner
Thanks.
file = open("./csv/selllist.csv", encoding="ANSI")
reader = csv.reader(file)
lines= len(list(reader))
if lines > 3:
df = pd.read_csv('./csv/selllist.csv', 'r+', encoding="ANSI")
dfa = df.iloc[1:]
print(dfa)
with open("./csv/selllist.csv", 'r+', encoding="ANSI") as x:
x.truncate(0)
with open('./csv/selllist.csv', 'a', encoding="ANSI", newline='') as fo:
# Pass the CSV file object to the writer() function
wo = writer(fo)
# Result - a writer object
# Pass the data in the list as an argument into the writerow() function
wo.writerow(dfa)
# Close the file object
fo.close()
print()
This is the type of csv file I deal with
string, string, string, string, string
string, string, string, string, string
string, string, string, string, string
string, string, string, string, string
Take a 2-step approach.
Open the file for reading and count the number of lines. If there are more than 3 lines, re-open the file (for writing) and update it.
For example:
lines = []
with open('./csv/selllist.csv') as csv:
lines = csv.readlines()
if len(lines) > 3:
with open('./csv/selllist.csv', 'w') as csv:
for line in lines[1:]: # skip first line
csv.write(line)
With pandas, you can just specify header=None while reading and writing:
import pandas as pd
if lines > 3:
df = pd.read_csv("data.csv", header=None)
df.iloc[1:].to_csv("data.csv", header=None, index=None)
With the csv module:
import csv
with open("data.csv") as infile:
reader = csv.reader(infile)
lines = list(reader)
if len(lines)>3:
with open("data.csv", "w", newline="") as outfile:
writer = csv.writer(outfile, delimiter=",")
writer.writerows(lines[1:])
With one open call and using seek and truncate.
Setup
out = """\
Look at me
I'm a file
for sure
4th line, woot!"""
with open('filepath.csv', 'w') as fh:
fh.write(out)
Solution
I'm aiming to minimize the stuff I'm doing. I'll only open one file and only one time. I'll only split one time.
with open('filepath.csv', 'r+') as csv:
top, rest = csv.read().split('\n', 1) # Only necessary to pop off first line
if rest.count('\n') > 1: # If 4 or more lines, there will be at
# least two more newline characters
csv.seek(0) # Once we're done reading, we need to
# go back to beginning of file
csv.truncate() # truncate to reduce size of file as well
csv.write(rest)
I'm trying to have output to be without commas, and separate each line into two strings and print them.
My code so far yields:
173,70
134,63
122,61
140,68
201,75
222,78
183,71
144,69
But i'd like it to print it out without the comma and the values on each line separated as strings.
if __name__ == '__main__':
# Complete main section of code
file_name = "data.txt"
# Open the file for reading here
my_file = open('data.txt')
lines = my_file.read()
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
print(lines)
In your sample code, line contains the full content of the file as a str.
my_file = open('data.txt')
lines = my_file.read()
You then later re-open the file to iterate the lines:
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
Note, however, str.split and str.replace do not modify the existing value, as strs in python are immutable. Also note you are operating on lines there, rather than the for-loop variable line.
Instead, you'll need to assign the result of those functions into new values, or give them as arguments (E.g., to print). So you'll want to open the file, iterate over the lines and print the value with the "," replaced with a " ":
with open("data.txt") as f:
for line in f:
print(line.replace(",", " "))
Or, since you are operating on the whole file anyway:
with open("data.txt") as f:
print(f.read().replace(",", " "))
Or, as your file appears to be CSV content, you may wish to use the csv module from the standard library instead:
import csv
with open("data.txt", newline="") as csvfile:
for row in csv.reader(csvfile):
print(*row)
with open('data.txt', 'r') as f:
for line in f:
for value in line.split(','):
print(value)
while python can offer us several ways to open files this is the prefered one for working with files. becuase we are opening the file in lazy mode (this is the prefered one espicialy for large files), and after exiting the with scope (identation block) the file io will be closed automaticly by the system.
here we are openening the file in read mode. files folow the iterator polices, so we can iterrate over them like lists. each line is a true line in the file and is a string type.
After getting the line, in line variable, we split (see str.split()) the line into 2 tokens, one before the comma and the other after the comma. split return new constructed list of strings. if you need to omit some unwanted characters you can use the str.strip() method. usualy strip and split combined together.
elegant and efficient file reading - method 1
with open("data.txt", 'r') as io:
for line in io:
sl=io.split(',') # now sl is a list of strings.
print("{} {}".format(sl[0],sl[1])) #now we use the format, for printing the results on the screen.
non elegant, but efficient file reading - method 2
fp = open("data.txt", 'r')
line = None
while (line=fp.readline()) != '': #when line become empty string, EOF have been reached. the end of file!
sl=line.split(',')
print("{} {}".format(sl[0],sl[1]))
update-my file.txt.zp is tab delimited and looks kind of like this :
file.txt.zp
I want to split the first col by : _ /
original post:
I have a very large zipped tab delimited file.
I want to open it, scan it one row at a time, split some of the col, and write it to a new file.
I got various errors (every time I fix one another pops)
This is my code:
import csv
import re
import gzip
f = gzip.open('file.txt.gz')
original = f.readlines()
f.close()
original_l = csv.reader(original)
for row in original_l:
file_l = re.split('_|:|/',row)
with open ('newfile.gz', 'w', newline='') as final:
finalfile = csv.writer(final,delimiter = ' ')
finalfile.writerow(file_l)
Thanks!
for this code i got the error:
for row in original_l:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
so based on what I found here I added this after f.close():
original = original.decode('utf8')
and then got the error:
original = original.decode('utf8')
AttributeError: 'list' object has no attribute 'decode'
Update 2
This code should produce the output that you're after.
import csv
import gzip
import re
with gzip.open('file.txt.gz', mode='rt') as f, \
open('newfile.gz', 'w') as final:
writer = csv.writer(final, delimiter=' ')
reader = csv.reader(f, delimiter='\t')
_ = next(reader) # skip header row
for row in reader:
writer.writerow(re.split(r'_|:|/', row[0]))
Update
Open the gzip file in text mode because str objects are required by the CSV module in Python 3.
f = gzip.open('file.txt.gz', 'rt')
Also specify the delimiter when creating the csv.reader.
original_l = csv.reader(original, delimiter='\t')
This will get you past the first hurdle.
Now you need to explain what the data is, which columns you wish to extract, and what the output should look like.
Original answer follows...
One obvious problem is that the output file is constantly being overwritten by the next row of input. This is because the output file is opened in (over)write mode (`'w`` ) once per row.
It would be better to open the output file once outside of the loop.
Also, the CSV file delimiter is not specified when creating the reader. You said that the file is tab delimited so specify that:
original_l = csv.reader(original, delimiter='\t')
On the other hand, your code attempts to split each row using other delimiters, however, the rows coming from the csv.reader are represented as a list, not a string as the re.split() code would require.
Another problem is that the output file is not zipped as the name suggests.
I have a file saved as .csv
"400":0.1,"401":0.2,"402":0.3
Ultimately I want to save the data in a proper format in a csv file for further processing. The problem is that there are no line breaks in the file.
pathname = r"C:\pathtofile\file.csv"
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
print(reader)
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(reader)
The print reader output looks exactly how I want (or at least it's a format I can further process).
"400":0.1
"401":0.2
"402":0.3
And now I want to save that to a new csv file. However the output looks like
"""",4,0,0,"""",:,0,.,1,"
","""",4,0,1,"""",:,0,.,2,"
","""",4,0,2,"""",:,0,.,3
I'm sure it would be intelligent to convert the format to
400,0.1
401,0.2
402,0.3
at this stage instead of doing later with another script.
The main problem is that my current code
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
reader = csv.reader(reader,delimiter=':')
x = []
y = []
print(reader)
for row in reader:
x.append( float(row[0]) )
y.append( float(row[1]) )
print(x)
print(y)
works fine for the type of csv files I currently have, but doesn't work for these mentioned above:
y.append( float(row[1]) )
IndexError: list index out of range
So I'm trying to find a way to work with them too. I think I'm missing something obvious as I imagine that it can't be too hard to properly define the linebreak character and delimiter of a file.
with open(pathname, newline=',') as file:
yields
ValueError: illegal newline value: ,
The right way with csv module, without replacing and casting to float:
import csv
with open('file.csv', 'r') as f, open('filenew.csv', 'w', newline='') as out:
reader = csv.reader(f)
writer = csv.writer(out, quotechar=None)
for r in reader:
for i in r:
writer.writerow(i.split(':'))
The resulting filenew.csv contents (according to your "intelligent" condition):
400,0.1
401,0.2
402,0.3
Nuances:
csv.reader and csv.writer objects treat comma , as default delimiter (no need to file.read().replace(',', '\n'))
quotechar=None is specified for csv.writer object to eliminate double quotes around the values being saved
You need to split the values to form a list to represent a row. Presently the code is splitting the string into individual characters to represent the row.
pathname = r"C:\pathtofile\file.csv"
with open(pathname) as old_file:
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=',')
text_rows = old_file.read().split(",")
for row in text_rows:
items = row.split(":")
csv_writer.writerow([int(items[0]), items[1])
If you look at the documentation, for write_row, it says:
Write the row parameter to the writer’s file
object, formatted according to the current dialect.
But, you are writing an entire string in your code
csv_writer.writerow(reader)
because reader is a string at this point.
Now, the format you want to use in your CSV file is not clearly mentioned in the question. But as you said, if you can do some preprocessing to create a list of lists and pass each sublist to writerow(), you should be able to produce the required file format.
I need to read a csv on Python, and the text file that I have has this structure:
"114555","CM13","0004","0","C/U"#"99172","CM13","0001","0","C/U"#"178672","CM13","0001","0","C/U"
delimeter: ,
newline: #
My code so far:
import csv
data = []
with open('stock.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', lineterminator='#')
for row in reader:
data.append({'MATERIAL': row[0],'CENTRO': row[1], 'ALMACEN': row[2], 'STOCK_VALORIZADO' : row[3], 'STOCK_UMB':row[4]})
print(data) #this print just one row
This code only print one row, because it's not recognize # as a newline,
and prints it with quotes:
[{'MATERIAL': '114555', 'CENTRO': 'CM13', 'ALMACEN': '0004', 'STOCK_VALORIZADO': '0', 'STOCK_UMB': 'C/U#"99172"'}]
According to https://docs.python.org/2/library/csv.html :
"The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future." Hence for now, providing the argument lineterminator='#' will not work.
I think the best option is to read your entire file into a variable, and replace all '#' characters, you can do this as follows:
with open("stock.csv", "r") as myfile:
data = myfile.read().replace('#', '\n')
Now you need to adjust your algorithm in such a way that you can pass the variable data to csv.reader (instead of the file stock.csv), according to the python doc:
"The "iterable" argument can be any object that returns a line
of input for each iteration, such as a file object or a list. [...]"
Hence you can pass data.splitlines() to csv.reader.
I was struggling with CRLF ('\r\n') line endings using csv.reader. I was able to get it working using the newline parameter in open
with open(local_file, 'r', newline='\r\n') as f:
reader = csv.reader(f)