Converting CSV file to .xlsx file - python

I am trying to convert a CSV file to a .xlsx file, where the source CSV file is saved on my Desktop. I want the output file to be saved to my Desktop.
I have tried the below code. However, I am getting a 'file not found' error and 'create the parser' error. I do not know what these errors mean.
I seek:
Help to fix the script and
Help understanding the causes of the problem.
import pandas as pd
read_file = pd.read_csv(r'C:\Users\anthonyedwards\Desktop\credit_card_input_data.csv')
read_file.to_excel(r'C:\Users\anthonyedwards\Desktop\credit_card_output_data.xlsx', index = None, header=True)

Here's an example using xlsxwriter:
import os
import glob
import csv
from xlsxwriter.workbook import Workbook
for csvfile in glob.glob(os.path.join('.', 'file.csv')):
workbook = Workbook(csvfile[:-4] + '.xlsx')
worksheet = workbook.add_worksheet()
with open(csvfile, 'rt', encoding='utf8') as f:
reader = csv.reader(f)
for r, row in enumerate(reader):
for c, col in enumerate(row):
worksheet.write(r, c, col)
workbook.close()
FYI, there is also a package called openpyxl, that can read/write Excel 2007 xlsx/xlsm files.

Related

What's the equivalent of reader = csv.reader(...) for xlsx sheets?

I have a script with
path=r"mypath\myfile.xlsx"
with open(path) as f:
reader = csv.reader(f)
but it won't work because the code is trying to open an xlsx file with a module made for csv files.
So, does an expression equivalent for xlsx files exist?
The equivalent of the highlighted code for xlsx sheets is:
path=r"mypath\myfile.xlsx"
import pandas as pd
with open(path) as f:
reader = pd.read_excel(f)

Python - XLRDError: Unsupported format, or corrupt file: Expected BOF record

I am trying to open an excel file which was given to me for my project, the excel file is the file that we get from a SAP system. But when I try opening it using pandas I am getting the following error:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\xff\xfe\r\x00\n\x00\r\x00'
The following is my code:
import pandas as pd
# To open an excel file
df = pd.ExcelFile('myexcel.xls').parse('Sheet1')
Dont know whether it will work for you once it had worked for me, but anyway can you try the following:
from __future__ import unicode_literals
from xlwt import Workbook
import io
filename = r'myexcel.xls'
# Opening the file using 'utf-16' encoding
file1 = io.open(filename, "r", encoding="utf-16")
data = file1.readlines()
# Creating a workbook object
xldoc = Workbook()
# Adding a sheet to the workbook object
sheet = xldoc.add_sheet("Sheet1", cell_overwrite_ok=True)
# Iterating and saving the data to sheet
for i, row in enumerate(data):
# Two things are done here
# Removeing the '\n' which comes while reading the file using io.open
# Getting the values after splitting using '\t'
for j, val in enumerate(row.replace('\n', '').split('\t')):
sheet.write(i, j, val)
# Saving the file as an excel file
xldoc.save('myexcel.xls')
I had faced the same xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; error and solved it by writing an XML to XLSX converter. You can call pd.ExcelFile('myexcel.xlsx') after the convertion. The reason is that actually, pandas uses xlrd for reading Excel files and xlrd does not support XML Spreadsheet (*.xml) i.e. NOT in XLS or XLSX format.
import pandas as pd
from bs4 import BeautifulSoup
def convert_to_xlsx():
with open('sample.xls') as xml_file:
soup = BeautifulSoup(xml_file.read(), 'xml')
writer = pd.ExcelWriter('sample.xlsx')
for sheet in soup.findAll('Worksheet'):
sheet_as_list = []
for row in sheet.findAll('Row'):
sheet_as_list.append([cell.Data.text if cell.Data else '' for cell in row.findAll('Cell')])
pd.DataFrame(sheet_as_list).to_excel(writer, sheet_name=sheet.attrs['ss:Name'], index=False, header=False)
writer.save()
What worked for me was applying this advice:
How to cope with an XLRDError
There you also find a suitable explanation that was appropiated for me. It says that the problem was a file format not correctly saved. When I opened the xls file, it offered to save it as html.I saved it a ".xlsx" and solved the problem

how to convert xlsx to tab delimited files

I have quite a lot of xlsx files which is a pain to convert them one by one to tab delimited files
I would like to know if there is any solution to do this by python. Here what I found and what tried to do with failure
This I found and I tried the solution but did not work Mass Convert .xls and .xlsx to .txt (Tab Delimited) on a Mac
I also tried to do it for one file to see how it works but with no success
#!/usr/bin/python
import xlrd
import csv
def main():
# I open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# I don't know the name of sheet
mysheet = myfile.sheet_by_index(0)
# I open the output csv
myCsvfile = open('my.csv', 'wb')
# I write the file into it
wr = csv.writer(myCsvfile, delimiter="\t")
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
myCsvfile.close()
if __name__ == '__main__':
main()
No real need for the main function.
And not sure about your indentation problems, but this is how I would write what you have. (And should work, according to first comment above)
#!/usr/bin/python
import xlrd
import csv
# open the output csv
with open('my.csv', 'wb') as myCsvfile:
# define a writer
wr = csv.writer(myCsvfile, delimiter="\t")
# open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# get a sheet
mysheet = myfile.sheet_by_index(0)
# write the rows
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
Why go with so much pain when you can do it in 3 lines:
import pandas as pd
file = pd.read_excel('myfile.xlsx')
file.to_csv('myfile.xlsx',
sep="\t",
index=False)

Converting xls to csv in Python 3 using xlrd

I'm using Python 3.3 with xlrd and csv modules to convert an xls file to csv. This is my code:
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook('MySpreadsheet.xls')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('test_output.csv', 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()
With that I am receiving this error: TypeError: 'str' does not support the buffer interface
I tried changing the encoding and replaced the line within the loop with this:
wr.writerow(bytes(sh.row_values(rownum),'UTF-8'))
But I get this error: TypeError: encoding or errors without a string argument
Anyone know what may be going wrong?
Try this
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook('MySpreadsheet.xlsx')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('output.csv', 'w', encoding='utf8')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()
i recommend using pandas library for this task
import pandas as pd
xls = pd.ExcelFile('file.xlsx')
df = xls.parse(sheetname="Sheet1", index_col=None, na_values=['NA'])
df.to_csv('file.csv')
Your problem is basically that you open your file with Python2 semantics. Python3 is locale-aware, so if you just want to write text to this file (and you do), open it as a text file with the right options:
your_csv_file = open('test_output.csv', 'w', encoding='utf-8', newline='')
The encoding parameter specifies the output encoding (it does not have to be utf-8) and the Python3 documentation for csv expressly says that you should specify newline='' for csv file objects.
A quicker way to do it with pandas:
import pandas as pd
xls_file = pd.read_excel('MySpreadsheet.xls', sheetname="Sheet1")
xls_file.to_csv('MySpreadsheet.csv', index = False)
#remove the index because pandas automatically indexes the first column of CSV files.
You can read more about pandas.read_excel here.

Trying to convert XLS to CSV in Python

I`m trying to convert .xls to .csv but when i run the code below nothing happens.
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook('d://Documents and Settings//tdrub//Desktop//TreinamentoPython XLS-CSV//Teste.xls')
sh = wb.sheet_by_name('Sheet1')
Agencia = open('d://Documents and Settings//tdrub//Desktop//Agencia.csv', 'wb')
wr = csv.writer(Agencia, quoting=csv.QUOTE_ALL)
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
Agencia.close()
The directory is correct, the sheet name is correct but when i run the code no .csv file is created.
I appreciate if someone can help me :)
import xlrd
import csv
import os
file= open('out.csv', 'wb')
wr = csv.writer(file, quoting=csv.QUOTE_ALL)
book=xlrd.open_workbook("F.xls")
sheet=book.sheet_by_index(0)
for sheet in book.sheets():
for row in range(sheet.nrows):
wr.writerow(sheet.row_values(row))

Categories

Resources