Pandas to_csv with multiple separators - python

I want to convert a pandas dataframe to csv with multiple separators. Is there a way?
dataframe.to_csv(file.csv, sep="%%")
Error: delimiter must be 1-character string

The easiest way might be to use a unique single-character separator first, then replace it:
tsv = dataframe.to_csv(sep='\t') # use '\1' if your data contains tabs
psv = tsv.replace('\t', '%%')
with open('file.csv', 'w') as outfile:
outfile.write(psv)
P.S.: Consider using an extension other than .csv since it's not comma separated.

I think there might be some bugs with replace as John says, cause it can't promise the replaced character is the seperator.
Besides, as to_csv returned as a string, if the data is big, it migth lead to memory error.
Here is another feasible solution.
with open('test_pandas.txt', 'w') as f:
for index, row in dataframe.iterrows():
l = map(str, row.values.tolist())
line = '%%'.join(l)
f.write(line+'\n')

Related

How to replace characters in a csv file

I'm doing some measurements in the lab and want to transform them into some nice Python plots. The problem is the way the software exports CSV files, as I can't find a way to properly read the numbers. It looks like this:
-10;-0,0000026
-8;-0,00000139
-6;-0,000000546
-4;-0,000000112
-2;-5,11E-09
0,0000048;6,21E-09
2;0,000000318
4;0,00000304
6;0,0000129
8;0,0000724
10;0,000268
Separation by ; is fine, but I need every , to be ..
Ideally I would like Python to be able to read numbers such as 6.21E-09 as well, but I should be able to fix that in excel...
My main issue: Change every , to . so Python can read them as a float.
The simplest way would be for you to convert them to string and then use the .replace() method to pretty much do anything. For i.e.
txt = "0,0000048;6,21E-09"
txt = txt.replace(';', '.')
You could also read the CSV file (I don't know how you are reading the file) but depending on the library, you could change the 'delimiter' (to : for example). CSV is Comma-separated values and as the name implies, it separates columns by means of '.
You can do whatever you want in Python, for example:
import csv
with open('path_to_csv_file', 'r') as csv_file:
data = list(csv.reader(csv_file, delimiter=';'))
data = [(int(raw_row[0]), float(raw_row[1].replace(',', '.'))) for row in data]
with open('path_to_csv_file', 'w') as csv_file:
writer = csv.writer(csv_file, delimiter=';')
writer.writerows(data)
Can you consider a regex to match the ',' all in the text, then loop the match results in a process that takes ',' to '.'.

Want to convert the csv file from line break mode to be separated by comma

Currently the csv file is saved in line break mode. But it should be separated by comma for inputting these datas as an array.
The current csv file:
test#eaxmple.com
test#eaxmple.com
test#eaxmple.com
The ideal csv file:
test#eaxmple.com, test#eaxmple.com, test#eaxmple.com
The code:
def get_addresses():
with open('./addresses.csv') as f:
addresses_file = csv.reader(f)
# Need to be converted
How can I convert it? I hope to use Python.
tried this.
with open('./addresses.txt') as input, open('./addresses.csv', 'w') as output:
output.write(','.join(input.readlines()))
output.write('\n')
the result:
test#eaxmple.com
,test#eaxmple.com
,test#eaxmple.com
with open('./addresses.txt') as f:
print(",".join(f.read().splitlines()))
Load the original file into pandas using:
import pandas as pd
df = pd.read_csv({YOUR_FILE}, escapechar='\\')
Then export it back to .csv (by default this will be comma separated).
df.to_csv({YOUR_FILE})
For this simple task, just read them into an array, then join the array on commas.
with open('./addresses.txt') as input, open('./addresses.csv', 'w') as output:
output.write(','.join(input.read().splitlines()))
output.write('\n')
This ignores any complications in the CSV formatting - if your data could contain commas (which are reserved as the field separator) or double quotes (which are reserved for quoting other reserved characters) you will want to switch to the proper csv module for output and perhaps for input.
Overwriting your input file is also an unnecessary complication, so I suggest you rename the input file to addresses.txt and use addresses.csv only for output.
Demo: https://repl.it/repls/AdequateStunningVideogames
Another common trick is to read one line at a time, and write a separator before each output except the first. This is more scalable for large input files.
with open blah blah blah ...:
separator = '' # for first line
for line in input:
output.write(separator)
output.write(line)
separator = ',' # for subsequent input lines
output.write('\n')

Writing to a CSV without getting quote marks, escapcehar error

I have an output I am writing to a CSV. I need to add csv.QUOTE_NONE but I can't seem to find the right location without it producing an error.
variable:
variable = ['20', '10', '30,30']
Note: some of the variables I am using will contain strings i.e ['Test','Output', '100']
code:
with open('file.csv', 'w') as csv_file:
writerc = csv.writer(csv_file)
for item in variable():
writerc.writerow(item)
When using the above code, it produces the following line in the CSV.
20,10,"30,30"
The required write is:
20,30,30,30
If I use quoting=csv.QUOTE_NONE I get an escapechar error _csv.Error: need to escape, but no escapechar set - this is resolved if I set an escapechar but this then adds a character in place of the quotation marks.
Any ideas?
You could try further splitting your data before writing it. This would avoid it needing to use quote characters automatically.
It works by creating a new list of values possibly containing multiple new split entries, for example your '30,30' would become ['30', '30']. Next it uses Python's chain function to flatten these sub-lists back into a single list which can then be written to your output CSV file.
import itertools
import csv
data = [['20', '10', '30,30'], ['Test','Output', '100']]
with open('file.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
for line in data:
csv_output.writerow(list(itertools.chain.from_iterable(v.split(',') for v in line)))
This would give you the following file.csv:
20,10,30,30
Test,Output,100
I think the problem lies in the fact that .csv means Comma-separated
values, so it treats "," in the value as the separator or delimiter. That's why the doublequotes are used automatically to escape.
I suggest you use the pandas library which makes it easier to deal with this issue.
Code
import pandas as pd
df = pd.DataFrame({'variable' : ['20', '10', '30,30']})
# Note that I use '\t' as the separator instead of ',' and get a .tsv file which
# is essentially the same as .csv file except the separator.
df.to_csv(sep = '\t', path_or_buf='file.tsv', index=False)
you can see the differences of using these 2 separators in the Full script. Another thing is that, I think your code suggests that you use the variable
as the name of the column, but your output suggests that you use variable as the name of the row (or index). Anyway, my answer is based on the assumption that you use variable
as the name of the column. Hope it helps ;)

Python csv writer : "Unknown Dialect" Error

I have a very large string in the CSV format that will be written to a CSV file.
I try to write it to CSV using the simplest if the python script
results=""" "2013-12-03 23:59:52","/core/log","79.223.39.000","logging-4.0",iPad,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,3,"1385593191.865",true,ERROR,"app_error","iPad/Unknown/webkit/537.51.1",NA,"Does+not",false
"2013-12-03 23:58:41","/core/log","217.7.59.000","logging-4.0",Win32,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,4,"1385593120.68",true,ERROR,"app_error","Win32/Unknown/msie/9.0",NA,"Does+not,false
"2013-12-03 23:58:19","/core/client_log","79.240.195.000","logging-4.0",Win32,"5.1","1.0.1.59-266060",NA,NA,NA,NA,6,"1385593099.001",true,ERROR,"app_error","Win32/5.1/mozilla/25.0",NA,"Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}",false"""
resultArray = results.split('\n')
with open(csvfile, 'wb') as f:
writer = csv.writer(f)
for row in resultArray:
writer.writerows(row)
The code returns
"Unknown Dialect"
Error
Is the error because of the script or is it due to the string that is being written?
EDIT
If the problem is bad input how do I sanitize it so that it can be used by the csv.writer() method?
You need to specify the format of your string:
with open(csvfile, 'wb') as f:
writer = csv.writer(f, delimiter=',', quotechar="'", quoting=csv.QUOTE_ALL)
You might also want to re-visit your writing loop; the way you have it written you will get one column in your file, and each row will be one character from the results string.
To really exploit the module, try this:
import csv
lines = ["'A','bunch+of','multiline','CSV,LIKE,STRING'"]
reader = csv.reader(lines, quotechar="'")
with open('out.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(reader))
out.csv will have:
A,bunch+of,multiline,"CSV,LIKE,STRING"
If you want to quote all the column values, then add quoting=csv.QUOTE_ALL to the writer object; then you file will have:
"A","bunch+of","multiline","CSV,LIKE,STRING"
To change the quotes to ', add quotechar="'" to the writer object.
The above code does not give csv.writer.writerows input that it expects. Specifically:
resultArray = results.split('\n')
This creates a list of strings. Then, you pass each string to your writer and tell it to writerows with it:
for row in resultArray:
writer.writerows(row)
But writerows does not expect a single string. From the docs:
csvwriter.writerows(rows)
Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.
So you're passing a string to a method that expects its argument to be a list of row objects, where a row object is itself expected to be a sequence of strings or numbers:
A row must be a sequence of strings or numbers for Writer objects
Are you sure your listed example code accurately reflects your attempt? While it certainly won't work, I would expect the exception produced to be different.
For a possible fix - if all you are trying to do is to write a big string to a file, you don't need the csv library at all. You can just write the string directly. Even splitting on newlines is unnecessary unless you need to do something like replacing Unix-style linefeeds with DOS-style linefeeds.
If you need to use the csv module after all, you need to give your writer something it understands - in this example, that would be something like writer.writerow(['A','bunch+of','multiline','CSV,LIKE,STRING']). Note that that's a true Python list of strings. If you need to turn your raw string "'A','bunch+of','multiline','CSV,LIKE,STRING'" into such a list, I think you'll find the csv library useful as a reader - no need to reinvent the wheel to handle the quoted commas in the substring 'CSV,LIKE,STRING'. And in that case you would need to care about your dialect.
you can use 'register_dialect':
for example for escaped formatting:
csv.register_dialect('escaped', escapechar='\\', doublequote=True, quoting=csv.QUOTE_ALL)

Insert whitespace after delimiter with CSV writer

f = open("file1.csv", "r")
g = open("file2.csv", "w")
a = csv.reader(f, delimiter=";", skipinitialspace=True)
b = csv.writer(g, delimiter=";")
for line in a:
b.writerow(line)
In the above code, I try to load file1.csv using the csv module in Python2.7, and then write it in file2.csv using a csv.writer.
My issue comes from existing whitespaces (a single space character) after the delimiter in the input file. I need to remove them in order to do some data manipulation later on, so I used the skipinitialspace=True argument for the reader. However, I cannot get the writer to print the space char after the delimiter, and therefore disturbing any subsequent diffing of the two files.
I tried to use the Sniffer class to auto-generate a Dialect but I guess my input files (coming from a large complex legacy system, with dozens of fields and poor quoting and escaping) are proving to be too complex for this.
In more simple terms I'm looking for the answers to the following questions:
How can I insert a space character after each delimiter in the writer?
Incidently, what are the reasons to prohibit the use of multi-character strings as delimiters? delimiter="; " would've solved my problem.
You can wrap your file objects in proxies that add the whitespace:
>>> class DelimitedFile(file):
... def write(self, value):
... super(DelimitedFile, self).write(value.replace(";", "; "))
...
>>> f = DelimitedFile("foo", "w")
>>> f.write("hello;world")
>>> f.close()
>>> open("foo").read()
'hello; world'
If you left the whitespace you want written in (removing/restoring it during processing), or put it back after processing but before writing, that would take care of it.
One solution would be to write to a StringIO object, and then to replace the semicolons with '; ', or to do so during processing of the lines, if you do any other processing.
As for the first, I would probably do something like this:
for k, line in enumerate(a):
if k == 0:
b.writerow(line)
else:
b.writerow(' ' + line) #assuming line is always a string, if not just use str() on it
As for the second, I have no idea.

Categories

Resources