How to remove delimiters when reading csv file in Python? - python

Just trying to learn python and trying to help a friend with taking a column from a .csv file to print it with a label-maker. The first problem I came across is this:
I will use this example file: test.csv
1111,2222,3333,4444
aaaa,bbbb,cccc,dddd
aaaa,bbbb,cccc,dddd
I run it trough:
import csv
with open('test.csv', 'r') as csv_File:
csv_reader = csv.reader(csv_File)
with open('test2.csv', 'w') as new_file:
csv_writer = csv.writer(new_file)
for line in csv_reader:
(csv_writer).writerow(line[1])
and get the output:
2,2,2,2
b,b,b,b
b,b,b,b
I want the output:
2222
bbbb
bbbb
what am I doing wrong?

writerow is expecting a whole list to write as a row, just as you got a whole list from the reader. To output one field only you should wrap it in a list:
csv_writer.writerow([line[1]])
But note it would be simpler to just write the data directly, since you don't need any of the functionality that the CSV writer gives you:
with open('test2.csv', 'w') as new_file:
for line in csv_reader:
new_file.write(line[1])

writerow takes a iterable of data of one row. You provide it a single string that gets interpreted as iterable and each element gets printed as column.
Fix:
csv_writer.writerow([line[1]]) # put the string into a list so you provide a single item row

Related

List to csv without commas in Python

I have a following problem.
I would like to save a list into a csv (in the first column).
See example here:
import csv
mylist = ["Hallo", "der Pixer", "Glas", "Telefon", "Der Kühlschrank brach kaputt."]
def list_na_csv(file, mylist):
with open(file, "w", newline="") as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerows(mylist)
list_na_csv("example.csv", mylist)
My output in excel looks like this:
Desired output is:
You can see that I have two issues: Firstly, each character is followed by comma. Secondly, I don`t know how to use some encoding, for example UTF-8 or cp1250. How can I fix it please?
I tried to search similar question, but nothing worked for me. Thank you.
You have two problems here.
writerows expects a list of rows, said differently a list of iterables. As a string is iterable, you write each word in a different row, one character per field. If you want one row with one word per field, you should use writerow
csv_writer.writerow(mylist)
by default, the csv module uses the comma as the delimiter (this is the most common one). But Excel is a pain in the ass with it: it expects the delimiter to be the one of the locale, which is the semicolon (;) in many West European countries, including Germany. If you want to use easily your file with your Excel you should change the delimiter:
csv_writer = csv.writer(csv_file, delimiter=';')
After your edit, you want all the data in the first column, one element per row. This is kind of a decayed csv file, because it only has one value per record and no separator. If the fields can never contain a semicolon nor a new line, you could just write a plain text file:
...
with open(file, "w", newline="") as csv_file:
for row in mylist:
print(row, file=file)
...
If you want to be safe and prevent future problems if you later want to process more corner cases values, you could still use the csv module and write one element per row by including it in another iterable:
...
with open(file, "w", newline="") as csv_file:
csv_writer = csv.writer(csv_file, delimiter=';')
csv_writer.writerows([elt] for elt in mylist)
...
l = ["Hallo", "der Pixer", "Glas", "Telefon", "Der Kühlschrank brach kaputt."]
with open("file.csv", "w") as msg:
msg.write(",".join(l))
For less trivial examples:
l = ["Hallo", "der, Pixer", "Glas", "Telefon", "Der Kühlschrank, brach kaputt."]
with open("file.csv", "w") as msg:
msg.write(",".join([ '"'+x+'"' for x in l]))
Here you basically set every list element between quotes, to prevent from the intra field comma problem.
Try this it will work 100%
import csv
mylist = ["Hallo", "der Pixer", "Glas", "Telefon", "Der Kühlschrank brach kaputt."]
def list_na_csv(file, mylist):
with open(file, "w") as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerow(mylist)
list_na_csv("example.csv", mylist)
If you want to write the entire list of strings to a single row, use csv_writer.writerow(mylist) as mentioned in the comments.
If you want to write each string to a new row, as I believe your reference to writing them in the first column implies, you'll have to format your data as the class expects: "A row must be an iterable of strings or numbers for Writer objects". On this data that would look something like:
csv_writer.writerows((entry,) for entry in mylist)
There, I'm using a generator expression to wrap each word in a tuple, thus making it an iterable of strings. Without something like that, your strings are themselves iterables and lead to it delimiting between each character as you've seen.
Using csv to write a single entry per line is almost pointless, but it does have the advantage that it will escape your delimiter if it appears in the data.
To specify an encoding, the docs say:
Since open() is used to open a CSV file for reading, the file will by
default be decoded into unicode using the system default encoding (see
locale.getpreferredencoding()). To decode a file using a different
encoding, use the encoding argument of open:
import csv with open('some.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
The same applies to writing in something other than the system default encoding: specify the encoding argument when
opening the output file.
try split("\n")
example:
counter = 0
amazing list = ["hello","hi"]
for x in titles:
ok = amazinglist[counter].split("\n")
writer.writerow(ok)
counter +=1

Incorrect format in CSV excel file when loading JSON data into CSV

I have some data in a JSON file, and I have used the code below to write them into csv file, but I found that the each word in a sentence has occupied one column, I want to store whole sentence in a single column.
This is the code:
for line in open('test1.json', 'r'):
if not line.strip():
continue
data = json.loads(line)
text = data["text"]
filtered_text = clean_tweets(text)
print(filtered_text)
with open ('test1.csv', 'a', encoding='utf-8') as f:
csvWriter = csv.writer(f)
csvWriter.writerow(filtered_text)
f.close()
This is the output of csv file.
csv.writerow() expects an iterable parameter. Each item in the iterable is placed in a column. Strings are iterable, hence you get a single character in each column.
Put the string(s) in a list:
csvWriter.writerow([filtered_text])
But since you seem to only have one column, using the csv module is unnecessary. Just use:
with open('test1.csv', 'a', encoding='utf8') as f:
f.write(filtered_text + '\n') # add newline if needed
Another option:
with open('test1.csv', 'a', encoding='utf8') as f:
print(filtered_text,file=f) # will add the newline
csvWriter.writerow() takes a list or similar of columns - you need to supply for example csvWriter.writerow([tweet_id, filtered_tweet])

how can I use csv tools for zip text file?

update-my file.txt.zp is tab delimited and looks kind of like this :
file.txt.zp
I want to split the first col by : _ /
original post:
I have a very large zipped tab delimited file.
I want to open it, scan it one row at a time, split some of the col, and write it to a new file.
I got various errors (every time I fix one another pops)
This is my code:
import csv
import re
import gzip
f = gzip.open('file.txt.gz')
original = f.readlines()
f.close()
original_l = csv.reader(original)
for row in original_l:
file_l = re.split('_|:|/',row)
with open ('newfile.gz', 'w', newline='') as final:
finalfile = csv.writer(final,delimiter = ' ')
finalfile.writerow(file_l)
Thanks!
for this code i got the error:
for row in original_l:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
so based on what I found here I added this after f.close():
original = original.decode('utf8')
and then got the error:
original = original.decode('utf8')
AttributeError: 'list' object has no attribute 'decode'
Update 2
This code should produce the output that you're after.
import csv
import gzip
import re
with gzip.open('file.txt.gz', mode='rt') as f, \
open('newfile.gz', 'w') as final:
writer = csv.writer(final, delimiter=' ')
reader = csv.reader(f, delimiter='\t')
_ = next(reader) # skip header row
for row in reader:
writer.writerow(re.split(r'_|:|/', row[0]))
Update
Open the gzip file in text mode because str objects are required by the CSV module in Python 3.
f = gzip.open('file.txt.gz', 'rt')
Also specify the delimiter when creating the csv.reader.
original_l = csv.reader(original, delimiter='\t')
This will get you past the first hurdle.
Now you need to explain what the data is, which columns you wish to extract, and what the output should look like.
Original answer follows...
One obvious problem is that the output file is constantly being overwritten by the next row of input. This is because the output file is opened in (over)write mode (`'w`` ) once per row.
It would be better to open the output file once outside of the loop.
Also, the CSV file delimiter is not specified when creating the reader. You said that the file is tab delimited so specify that:
original_l = csv.reader(original, delimiter='\t')
On the other hand, your code attempts to split each row using other delimiters, however, the rows coming from the csv.reader are represented as a list, not a string as the re.split() code would require.
Another problem is that the output file is not zipped as the name suggests.

Read CSV with comma as linebreak

I have a file saved as .csv
"400":0.1,"401":0.2,"402":0.3
Ultimately I want to save the data in a proper format in a csv file for further processing. The problem is that there are no line breaks in the file.
pathname = r"C:\pathtofile\file.csv"
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
print(reader)
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(reader)
The print reader output looks exactly how I want (or at least it's a format I can further process).
"400":0.1
"401":0.2
"402":0.3
And now I want to save that to a new csv file. However the output looks like
"""",4,0,0,"""",:,0,.,1,"
","""",4,0,1,"""",:,0,.,2,"
","""",4,0,2,"""",:,0,.,3
I'm sure it would be intelligent to convert the format to
400,0.1
401,0.2
402,0.3
at this stage instead of doing later with another script.
The main problem is that my current code
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
reader = csv.reader(reader,delimiter=':')
x = []
y = []
print(reader)
for row in reader:
x.append( float(row[0]) )
y.append( float(row[1]) )
print(x)
print(y)
works fine for the type of csv files I currently have, but doesn't work for these mentioned above:
y.append( float(row[1]) )
IndexError: list index out of range
So I'm trying to find a way to work with them too. I think I'm missing something obvious as I imagine that it can't be too hard to properly define the linebreak character and delimiter of a file.
with open(pathname, newline=',') as file:
yields
ValueError: illegal newline value: ,
The right way with csv module, without replacing and casting to float:
import csv
with open('file.csv', 'r') as f, open('filenew.csv', 'w', newline='') as out:
reader = csv.reader(f)
writer = csv.writer(out, quotechar=None)
for r in reader:
for i in r:
writer.writerow(i.split(':'))
The resulting filenew.csv contents (according to your "intelligent" condition):
400,0.1
401,0.2
402,0.3
Nuances:
csv.reader and csv.writer objects treat comma , as default delimiter (no need to file.read().replace(',', '\n'))
quotechar=None is specified for csv.writer object to eliminate double quotes around the values being saved
You need to split the values to form a list to represent a row. Presently the code is splitting the string into individual characters to represent the row.
pathname = r"C:\pathtofile\file.csv"
with open(pathname) as old_file:
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=',')
text_rows = old_file.read().split(",")
for row in text_rows:
items = row.split(":")
csv_writer.writerow([int(items[0]), items[1])
If you look at the documentation, for write_row, it says:
Write the row parameter to the writer’s file
object, formatted according to the current dialect.
But, you are writing an entire string in your code
csv_writer.writerow(reader)
because reader is a string at this point.
Now, the format you want to use in your CSV file is not clearly mentioned in the question. But as you said, if you can do some preprocessing to create a list of lists and pass each sublist to writerow(), you should be able to produce the required file format.

re.sub for a csv file

I am receiving a error on this code. It is "TypeError: expected string or buffer". I looked around, and found out that the error is because I am passing re.sub a list, and it does not take lists. However, I wasn't able to figure out how to change my line from the csv file into something that it would read.
I am trying to change all the periods in a csv file into commas. Here is my code:
import csv
import re
in_file = open("/test.csv", "rb")
reader = csv.reader(in_file)
out_file = open("/out.csv", "wb")
writer = csv.writer(out_file)
for row in reader:
newrow = re.sub(r"(\.)+", ",", row)
writer.writerow(newrow)
in_file.close()
out_file.close()
I'm sorry if this has already been answered somewhere. There was certainly a lot of answers regarding this error, but I couldn't make any of them work with my csv file. Also, as a side note, this was originally an .xslb excel file that I converted into csv in order to be able to work with it. Was that necessary?
You could use list comprehension to apply your substitution to each item in row
for row in reader:
newrow = [re.sub(r"(\.)+", ",", item) for item in row]
writer.writerow(newrow)
for row in reader does not return single element to parse it rather it returns list of of elements in that row so you have to unpack that list and parse each item individually, just like #Trii shew you:
[re.sub(r'(\.)+','.',s) for s in row]
In this case, we are using glob to access all the csv files in the directory.
The code below overwrites the source csv file, so there is no need to create an output file.
NOTE:
If you want to get a second file with the parameters provided with re.sub, replace write = open(i, 'w') for write = open('secondFile.csv', 'w')
import re
import glob
for i in glob.glob("*.csv"):
read = open(i, 'r')
reader = read.read()
csvRe = re.sub(re.sub(r"(\.)+", ",", str(reader))
write = open(i, 'w')
write.write(csvRe)
read.close()
write.close()

Categories

Resources