Converting XLSX files with double quotes to CSV using Python

Converting XLSX files with double quotes to CSV using Python - python

I have a python script to convert Xlsx files to CSV like this -
import sys
import xlrd
import csv
def toCsv(xlsPath, csvPath):
wb = xlrd.open_workbook(xlsPath)
sh = wb.sheet_by_index(0)
your_csv_file = open(csvPath, 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in xrange(sh.nrows):
rv = sh.row(rownum)
print rv
wr.writerow(sh.row_values(rownum))
your_csv_file.close()
toCsv(sys.argv[1], sys.argv[2])
Some of the cells in the XLSX file contain single and double quotes. When the script runs on such cells, the output CSV comes out empty. How do I escape the single or double quotes? I tried unsuccessfully adding Dialect.doublequote = false and true as the third argument to the writer() call. Please help.
Thanks

Related

Converting CSV file to .xlsx file

I am trying to convert a CSV file to a .xlsx file, where the source CSV file is saved on my Desktop. I want the output file to be saved to my Desktop.
I have tried the below code. However, I am getting a 'file not found' error and 'create the parser' error. I do not know what these errors mean.
I seek:
Help to fix the script and
Help understanding the causes of the problem.
import pandas as pd
read_file = pd.read_csv(r'C:\Users\anthonyedwards\Desktop\credit_card_input_data.csv')
read_file.to_excel(r'C:\Users\anthonyedwards\Desktop\credit_card_output_data.xlsx', index = None, header=True)

Here's an example using xlsxwriter:
import os
import glob
import csv
from xlsxwriter.workbook import Workbook
for csvfile in glob.glob(os.path.join('.', 'file.csv')):
workbook = Workbook(csvfile[:-4] + '.xlsx')
worksheet = workbook.add_worksheet()
with open(csvfile, 'rt', encoding='utf8') as f:
reader = csv.reader(f)
for r, row in enumerate(reader):
for c, col in enumerate(row):
worksheet.write(r, c, col)
workbook.close()
FYI, there is also a package called openpyxl, that can read/write Excel 2007 xlsx/xlsm files.

how to convert xlsx to tab delimited files

I have quite a lot of xlsx files which is a pain to convert them one by one to tab delimited files
I would like to know if there is any solution to do this by python. Here what I found and what tried to do with failure
This I found and I tried the solution but did not work Mass Convert .xls and .xlsx to .txt (Tab Delimited) on a Mac
I also tried to do it for one file to see how it works but with no success
#!/usr/bin/python
import xlrd
import csv
def main():
# I open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# I don't know the name of sheet
mysheet = myfile.sheet_by_index(0)
# I open the output csv
myCsvfile = open('my.csv', 'wb')
# I write the file into it
wr = csv.writer(myCsvfile, delimiter="\t")
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
myCsvfile.close()
if __name__ == '__main__':
main()

No real need for the main function.
And not sure about your indentation problems, but this is how I would write what you have. (And should work, according to first comment above)
#!/usr/bin/python
import xlrd
import csv
# open the output csv
with open('my.csv', 'wb') as myCsvfile:
# define a writer
wr = csv.writer(myCsvfile, delimiter="\t")
# open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# get a sheet
mysheet = myfile.sheet_by_index(0)
# write the rows
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))

Why go with so much pain when you can do it in 3 lines:
import pandas as pd
file = pd.read_excel('myfile.xlsx')
file.to_csv('myfile.xlsx',
sep="\t",
index=False)

Error while using QUOTE_NONE to never quote fields while writing to csv

I've written a simple code in python using xlrd module that reads data from xlsx file and writes it to csv. When I try to write to csv with no fields I'm getting below error:
Error: need to escape, but no escapechar set
with reference to question 23296356 on so, I've tried setting quotechar to empty string to fix the error.But that did not fix the issue. What I'm I missing here? Below the code snippet that I've be running:
import xlrd
import csv
wb=xlrd.open_workbook('testfile.xlsx')
lisT = wb.sheet_names()
lLength = len(lisT)
for i in range(0,lLength-1):
sh = wb.sheet_by_name(lisT[i])
shfile = lisT[i]+".csv"
csvoutfile = open(shfile,'wb')
wr = csv.writer(csvoutfile, quoting=csv.QUOTE_NONE, quotechar='') #facing the issue here
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
csvoutfile.close()

MemoryError while converting txt to xlsx

Related questions:
1. Error in converting txt to xlsx using python
Converting txt to xlsx while setting the cell property for number cells as number
My code is
import csv
import openpyxl
import sys
def convert(input_path, output_path):
"""
Read a csv file (with no quoting), and save its contents in an excel file.
"""
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_path) as f:
reader = csv.reader(f, delimiter='\t', quoting=csv.QUOTE_NONE)
for row_index, row in enumerate(reader, 1):
for col_index, value in enumerate(row, 1):
ws.cell(row=row_index, column=col_index).value = value
print 'hello world'
wb.save(output_path)
print 'hello world2'
def main():
try:
input_path, output_path = sys.argv[1:]
except ValueError:
print 'Usage: python %s input_path output_path' % (sys.argv[0],)
else:
convert(input_path, output_path)
if __name__ == '__main__':
main()
This code works, except for some input files. I couldn't find what the difference is between the input txt that causes this problem and input txt that doesn't.
My first guess was encoding. I tried changing the encoding of the input file to UTF-8 and UTF-8 with BOM. But this failed.
My second guess was it used literally too much memory. But my computer has SSD with 32 GB RAM.
So perhaps this code is not fully utilizing the capacity of this RAM?
How can I fix this?
Edit: I added that line
print 'hello world'
and
print 'hello world2'
to check if all the parts before 'hello world' are run correctly.
I checked the code prints 'hello world', but not 'hello world2'
So, it really seems likely that
wb.save(output_path)
is causing the problem.

openpyxl has optimised modes for reading and writing large files.
wb = Workbook(write_only=True) will enable this.
I'd also recommend that you install lxml for speed. This is all covered in the documentation.

Below are three alternatives:
RANGE FOR LOOP
Possibly, the two enumerate() calls may have a memory footprint as indexing must occur in a nested for loop. Consider passing csv.reader content into a list (being subscriptable) and use range(). Though admittedly even this may not be efficient as starting in Python 3 each range() call (compared to deprecated xrange) generates its own list in memory as well.
with open(input_path) as f:
reader = csv.reader(f)
row = []
for data in reader:
row.append(data)
for i in range(len(row)):
for j in range(len(row[0])):
ws.cell(row=i, column=j).value = row[i][j]
OPTIMIZED WRITER
OpenPyXL even warns that scrolling through cells even without assigning values will retain them in memory. As a solution, you can use the Optimized Writer using above row list produced from csv.reader. This route appends entire rows in a write-only workbook instance:
from openpyxl import Workbook
wb = Workbook(write_only=True)
ws = wb.create_sheet()
i = 0
for irow in row:
ws.append(['%s' % j for j in row[j]])
i += 1
wb.save('C:\Path\To\Outputfile.xlsx')
WIN32COM LIBRARY
Finally, consider using the built-in win32com library where you open the csv in Excel and save as an xlsx or xls workbook. Do note this package is only for Python Windows installations.
import win32com.client as win32
excel = win32.Dispatch('Excel.Application')
# OPEN CSV DIRECTLY INSIDE EXCEL
wb = excel.Workbooks.Open(input_path)
excel.Visible = False
outxl=r'C:\Path\To\Outputfile.xlsx'
# SAVE EXCEL AS xlOpenXMLWorkbook TYPE (51)
wb.SaveAs(outxl, FileFormat=51)
wb.Close(False)
excel.Quit()

Here are fews points you can consider:
Check /tmp folder, default folder where tmp files for created;
Your code is utilizing complete space in that folder. Either increase that folder or you can change tmp file path while creating workbook;
I use in memory for performing my task and it worked.
Below is my code:
#!/usr/bin/python
import os
import csv
import io
import sys
import traceback
from xlsxwriter.workbook import Workbook
fileNames=sys.argv[1]
try:
f=open(fileNames, mode='r')
workbook = Workbook(fileNames[:-4] + '.xlsx',{'in_memory': True})
worksheet = workbook.add_worksheet()
workbook.use_zip64()
rowCnt=0
#Create the bold style for the header row
for line in f:
rowCnt = rowCnt + 1
row = line.split("\001")
for j in range(len(row)):
worksheet.write(rowCnt, j, row[j].strip())
f.close()
workbook.close()
print ('success')
except ValueError:
print ('failure')

Converting xls to csv in Python 3 using xlrd

I'm using Python 3.3 with xlrd and csv modules to convert an xls file to csv. This is my code:
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook('MySpreadsheet.xls')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('test_output.csv', 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()
With that I am receiving this error: TypeError: 'str' does not support the buffer interface
I tried changing the encoding and replaced the line within the loop with this:
wr.writerow(bytes(sh.row_values(rownum),'UTF-8'))
But I get this error: TypeError: encoding or errors without a string argument
Anyone know what may be going wrong?

Try this
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook('MySpreadsheet.xlsx')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('output.csv', 'w', encoding='utf8')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()

i recommend using pandas library for this task
import pandas as pd
xls = pd.ExcelFile('file.xlsx')
df = xls.parse(sheetname="Sheet1", index_col=None, na_values=['NA'])
df.to_csv('file.csv')

Your problem is basically that you open your file with Python2 semantics. Python3 is locale-aware, so if you just want to write text to this file (and you do), open it as a text file with the right options:
your_csv_file = open('test_output.csv', 'w', encoding='utf-8', newline='')
The encoding parameter specifies the output encoding (it does not have to be utf-8) and the Python3 documentation for csv expressly says that you should specify newline='' for csv file objects.

A quicker way to do it with pandas:
import pandas as pd
xls_file = pd.read_excel('MySpreadsheet.xls', sheetname="Sheet1")
xls_file.to_csv('MySpreadsheet.csv', index = False)
#remove the index because pandas automatically indexes the first column of CSV files.
You can read more about pandas.read_excel here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting XLSX files with double quotes to CSV using Python - python

Related

Converting CSV file to .xlsx file

how to convert xlsx to tab delimited files

Error while using QUOTE_NONE to never quote fields while writing to csv

MemoryError while converting txt to xlsx

Converting xls to csv in Python 3 using xlrd

Categories

Resources