creating a csv file from a function result, python - python

I am using this pdf to csv function from {Python module for converting PDF to text} and I was wondering how can I now export the result to a csv file on my drive? I tried adding in the function
with open('C:\location', 'wb') as f:
writer = csv.writer(f)
for row in data:
writer.writerow(row)
but the resulting csv file has one character per row and not the rows I have when printing data in python.

If you are printing a single character per row, then what you have is a string. Your loop
for row in data:
translates to
for character in string:
so you need to break your string up into the chunks you want written on a single row. You might be able to use something like data.split() but it's hard to say without seeing more of your code and data.
In response to your comment:
yes, you can just dump the data to a CSV... If it adheres to the rules of CSV. If your data is separated by commas, with each row terminated by a newline, then you can just write your data to a file.
with open ("file.csv",'w') as f:
f.write(data)
This will ONLY work if your data adheres to the rules of csv.

Related

How to create the header of a CSV file?

i want to write a csv file in Python. I want to use these 2 words as header.
import csv
myFile = open('tabelle.csv','w')
with myFile:
writer = csv.writer(myFile)
writer.writerow(["Wort","Haeufigkeit"])
Is that enough to build my header? Now I want to add in this csv file the other words under this two words. Does python now accept this as a header or just as a normal row?
As far as the csv writer is concerned the header is like any other row. The idea of a header comes up only when you want to read and interpret a csv file. So, what you said does work.

In Python CSV module while reading and writing how to get rid of apostrophe

input file Output file - In Python CSV module while reading and writing how to get rid of apostrophe my code is as following. original file dont have apostrophe but output have.
import csv
with open("test.csv",'r') as f:
reader = csv.reader(f)
for row in reader:
writer=open('output.csv','a')
writer.write(str(row))
writer.write('\n')
writer.close()
As write() asks for a string, you could write the elements one at a time. Try this:
writer=open('output.csv','a')
for row in reader:
for element in row:
#print(type(element))
writer.write(element)
writer.write('\n')
writer.close()
Also I believe the writer.close() whould be used only when the loop has already finished all the writing processes
Edit:In the print you can see that the type of the element var is a string, and not list

Cleaning unicode characters while writing to csv

I am using a certain REST api to get data, and then attemping to write it to a csv using python 2.7
In the csv, every item with a tuple has u' ' around it. For example, with the 'tags' field i am retrieving, i am getting [u'01d/02d/major/--', u'45m/04h/12h/24h', u'internal', u'net', u'premium_custom', u'priority_fields_swapped', u'priority_saved', u'problem', u'urgent', u'urgent_priority_issue'] . However, if I print the data in the program prior to it being written in the csv, the data looks fine, .ie ('01d/02d/major--', '45m/04h/12h/24h', etc). So I am assuming I have to modify something in the csv write command or within the the csv writer object itself. My question is how to write the data into the csv properly so that there are no unicode characters.
In Python3:
Just define the encoding when opening the csv file to write in.
If the row contains non ascii chars, you will get UnicodeEncodeError
row = [u'01d/02d/major/--', u'45m/04h/12h/24h', u'internal', u'net', u'premium_custom', u'priority_fields_swapped', u'priority_saved', u'problem', u'urgent', u'urgent_priority_issue']
import csv
with open('output.csv', 'w', newline='', encoding='ascii') as f:
writer = csv.writer(f)
writer.writerow(row)

Converting tsv to tsv in python

I have a tsv-file (tab-seperated) and would like to filter out a lot of data using python before I import it into a postgresql database.
My problem is that I can't find a way to keep the format of the original file which is mandatory because otherwise the import processes won't work.
The web suggested that I should use the csv library, but no matter what delimter I use I always end up with files in a different format than the origin, e. g. files, that contain a comma after every character or files, that contain a tab after every character or files that have all data in one row.
Here is my code:
import csv
import glob
# create a list of all tsv-files in one directory
liste = glob.glob("/some_directory/*.tsv")
# go thru all the files
for item in liste:
#open the tsv-file for reading and a file for writing
with open(item, 'r') as tsvin, open('/some_directory/new.tsv', 'w') as csvout:
tsvin = csv.reader(tsvin, delimiter='\t')
# I am not sure if I have to enter a delimter here for the outfile. If I enter "delimter='\t'" like for the In-File, the outfile ends up with a tab after every character
writer = csv.writer(csvout)
# go thru all lines of the input tsv
for row in tsvin:
# do some filtering
if 'some_substring1' in row[4] or 'some_substring2' in row[4]:
#do some more filtering
if 'some_substring1' in str(row[9]) or 'some_substring1' in str(row[9]):
# now I get lost...
writer.writerow(row)
Do you have any idea what I am doing wrong? The final file has to have a tab between every field and some kind of line break at the end.
Somehow you are passing a string to w.writerow(), not a list as expected.
Remember that strings are iterable; each iteration returns a single character from the string. writerow() simply iterates over its argument writing each item separated by the delimiter character (by default a comma). So if you pass a string to writerow() it will write each character from the string separated by the delimiter.
How is it that row is a string? It could be that the delimiter for the input file is incorrect - perhaps the file does not use tabs but has fixed field widths using runs of spaces as the delimiter.
You can check whether the reader is correctly parsing your file by printing out the value of row:
for row in tsvin:
print(row)
...
If the file is being correctly parsed, expect to see that row is a list, and that each element of the list corresponds to a column/field from the file.
If it is not parsing correctly then you might see that row is a string, or that it's a list but the fields are empty and/or out of place.
It would be helpful if you added a sample of your input file to the question.

Python csv writer : "Unknown Dialect" Error

I have a very large string in the CSV format that will be written to a CSV file.
I try to write it to CSV using the simplest if the python script
results=""" "2013-12-03 23:59:52","/core/log","79.223.39.000","logging-4.0",iPad,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,3,"1385593191.865",true,ERROR,"app_error","iPad/Unknown/webkit/537.51.1",NA,"Does+not",false
"2013-12-03 23:58:41","/core/log","217.7.59.000","logging-4.0",Win32,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,4,"1385593120.68",true,ERROR,"app_error","Win32/Unknown/msie/9.0",NA,"Does+not,false
"2013-12-03 23:58:19","/core/client_log","79.240.195.000","logging-4.0",Win32,"5.1","1.0.1.59-266060",NA,NA,NA,NA,6,"1385593099.001",true,ERROR,"app_error","Win32/5.1/mozilla/25.0",NA,"Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}",false"""
resultArray = results.split('\n')
with open(csvfile, 'wb') as f:
writer = csv.writer(f)
for row in resultArray:
writer.writerows(row)
The code returns
"Unknown Dialect"
Error
Is the error because of the script or is it due to the string that is being written?
EDIT
If the problem is bad input how do I sanitize it so that it can be used by the csv.writer() method?
You need to specify the format of your string:
with open(csvfile, 'wb') as f:
writer = csv.writer(f, delimiter=',', quotechar="'", quoting=csv.QUOTE_ALL)
You might also want to re-visit your writing loop; the way you have it written you will get one column in your file, and each row will be one character from the results string.
To really exploit the module, try this:
import csv
lines = ["'A','bunch+of','multiline','CSV,LIKE,STRING'"]
reader = csv.reader(lines, quotechar="'")
with open('out.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(reader))
out.csv will have:
A,bunch+of,multiline,"CSV,LIKE,STRING"
If you want to quote all the column values, then add quoting=csv.QUOTE_ALL to the writer object; then you file will have:
"A","bunch+of","multiline","CSV,LIKE,STRING"
To change the quotes to ', add quotechar="'" to the writer object.
The above code does not give csv.writer.writerows input that it expects. Specifically:
resultArray = results.split('\n')
This creates a list of strings. Then, you pass each string to your writer and tell it to writerows with it:
for row in resultArray:
writer.writerows(row)
But writerows does not expect a single string. From the docs:
csvwriter.writerows(rows)
Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.
So you're passing a string to a method that expects its argument to be a list of row objects, where a row object is itself expected to be a sequence of strings or numbers:
A row must be a sequence of strings or numbers for Writer objects
Are you sure your listed example code accurately reflects your attempt? While it certainly won't work, I would expect the exception produced to be different.
For a possible fix - if all you are trying to do is to write a big string to a file, you don't need the csv library at all. You can just write the string directly. Even splitting on newlines is unnecessary unless you need to do something like replacing Unix-style linefeeds with DOS-style linefeeds.
If you need to use the csv module after all, you need to give your writer something it understands - in this example, that would be something like writer.writerow(['A','bunch+of','multiline','CSV,LIKE,STRING']). Note that that's a true Python list of strings. If you need to turn your raw string "'A','bunch+of','multiline','CSV,LIKE,STRING'" into such a list, I think you'll find the csv library useful as a reader - no need to reinvent the wheel to handle the quoted commas in the substring 'CSV,LIKE,STRING'. And in that case you would need to care about your dialect.
you can use 'register_dialect':
for example for escaped formatting:
csv.register_dialect('escaped', escapechar='\\', doublequote=True, quoting=csv.QUOTE_ALL)

Categories

Resources