convert a tsv file to xls/xlsx using python - python

I want to convert a file in tsv format to xls/xlsx..
I tried using
os.rename("sample.tsv","sample.xlsx")
But the file getting converted is corrupted. Is there any other method of doing it?

Here is a simple example of converting TSV to XLSX using XlsxWriter and the core csv module:
import csv
from xlsxwriter.workbook import Workbook
# Add some command-line logic to read the file names.
tsv_file = 'sample.tsv'
xlsx_file = 'sample.xlsx'
# Create an XlsxWriter workbook object and add a worksheet.
workbook = Workbook(xlsx_file)
worksheet = workbook.add_worksheet()
# Create a TSV file reader.
tsv_reader = csv.reader(open(tsv_file, 'rb'), delimiter='\t')
# Read the row data from the TSV file and write it to the XLSX file.
for row, data in enumerate(tsv_reader):
worksheet.write_row(row, 0, data)
# Close the XLSX file.
workbook.close()

You need:
Read the data from the tsv file.
Convert it in what you want them to be.
Write them to an Excel file with openpyxl for xlsx or xlwt for xls.

import csv
from xlsxwriter.workbook import Workbook
# Add some command-line logic to read the file names.
tsv_file = 'sample.tsv'
xlsx_file = 'sample.xlsx'
# Create an XlsxWriter workbook object and add a worksheet.
workbook = Workbook(xlsx_file)
worksheet = workbook.add_worksheet()
# Create a TSV file reader.
tsv_reader = csv.reader(open(tsv_file,'rt'),delimiter="\t")
# Read the row data from the TSV file and write it to the XLSX file.
for row, data in enumerate(tsv_reader):
worksheet.write_row(row, 0, data)
# Close the XLSX file.
workbook.close()

Related

Converting CSV file to .xlsx file

I am trying to convert a CSV file to a .xlsx file, where the source CSV file is saved on my Desktop. I want the output file to be saved to my Desktop.
I have tried the below code. However, I am getting a 'file not found' error and 'create the parser' error. I do not know what these errors mean.
I seek:
Help to fix the script and
Help understanding the causes of the problem.
import pandas as pd
read_file = pd.read_csv(r'C:\Users\anthonyedwards\Desktop\credit_card_input_data.csv')
read_file.to_excel(r'C:\Users\anthonyedwards\Desktop\credit_card_output_data.xlsx', index = None, header=True)
Here's an example using xlsxwriter:
import os
import glob
import csv
from xlsxwriter.workbook import Workbook
for csvfile in glob.glob(os.path.join('.', 'file.csv')):
workbook = Workbook(csvfile[:-4] + '.xlsx')
worksheet = workbook.add_worksheet()
with open(csvfile, 'rt', encoding='utf8') as f:
reader = csv.reader(f)
for r, row in enumerate(reader):
for c, col in enumerate(row):
worksheet.write(r, c, col)
workbook.close()
FYI, there is also a package called openpyxl, that can read/write Excel 2007 xlsx/xlsm files.

Python Pandas XLRDError when reading .xls files

I'm having a problem with reading .xls files in Pandas.
Here's the code
df = pd.read_excel('sample.xls')
And the output states,
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xff\xfeD\x00A\x00T\x00'
Anyone experiencing the same issue? How to fix it?
# Changing the data types of all strings in the module at once
from __future__ import unicode_literals
# Used to save the file as excel workbook
# Need to install this library
from xlwt import Workbook
# Used to open to corrupt excel file
import io
filename = r'sample.xls'
# Opening the file using 'utf-16' encoding
file1 = io.open(filename, "r", encoding="utf-16")
data = file1.readlines()
# Creating a workbook object
xldoc = Workbook()
# Adding a sheet to the workbook object
sheet = xldoc.add_sheet("Sheet1", cell_overwrite_ok=True)
# Iterating and saving the data to sheet
for i, row in enumerate(data):
# Two things are done here
# Removeing the '\n' which comes while reading the file using io.open
# Getting the values after splitting using '\t'
for j, val in enumerate(row.replace('\n', '').split('\t')):
sheet.write(i, j, val)
# Saving the file as an excel file
xldoc.save('1.xls')
Credits to this Medium Article

Getting the excel file after df.to_excel(...) with Panda

I am using Pyrebase to upload my files to Firebase.
I have a DataFrame df and convert it to an Excel File as follows:
writer = ExcelWriter('results.xlsx')
excelFile = df.to_excel(writer,'Sheet1')
print(excelFile)
# Save to firebase
childRef = "path/to/results.xlsx"
storage = firebase.storage()
storage.child(childRef).put(excelFile)
However, this stores the Excel file as an Office Spreadsheet with zero bytes. If I run writer.save() then I do get the appropriate filetype (xlsx), but it is stored on my Server (which I want to avoid). How can I generate the right filetype as one would do with writer.save()?
Note: print(excelFile) returns None
It can be solved by using local memory:
# init writer
bio = BytesIO()
writer = pd.ExcelWriter(bio, engine='xlsxwriter')
filename = "output.xlsx"
# sheets
dfValue.to_excel(writer, "sheetname")
# save the workbook
writer.save()
bio.seek(0)
# get the excel file (answers my question)
workbook = bio.read()
excelFile = workbook
# save the excelfile to firebase
# see also issue: https://github.com/thisbejim/Pyrebase/issues/142
timestamp = str(int(time.time()*1000));
childRef = "/path/to/" + filename
storage = firebase.storage()
storage.child(childRef).put(excelFile)
fileUrl = storage.child(childRef).get_url(None)
According to the documentation you should add
writer.save()
source

how to convert xlsx to tab delimited files

I have quite a lot of xlsx files which is a pain to convert them one by one to tab delimited files
I would like to know if there is any solution to do this by python. Here what I found and what tried to do with failure
This I found and I tried the solution but did not work Mass Convert .xls and .xlsx to .txt (Tab Delimited) on a Mac
I also tried to do it for one file to see how it works but with no success
#!/usr/bin/python
import xlrd
import csv
def main():
# I open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# I don't know the name of sheet
mysheet = myfile.sheet_by_index(0)
# I open the output csv
myCsvfile = open('my.csv', 'wb')
# I write the file into it
wr = csv.writer(myCsvfile, delimiter="\t")
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
myCsvfile.close()
if __name__ == '__main__':
main()
No real need for the main function.
And not sure about your indentation problems, but this is how I would write what you have. (And should work, according to first comment above)
#!/usr/bin/python
import xlrd
import csv
# open the output csv
with open('my.csv', 'wb') as myCsvfile:
# define a writer
wr = csv.writer(myCsvfile, delimiter="\t")
# open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# get a sheet
mysheet = myfile.sheet_by_index(0)
# write the rows
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
Why go with so much pain when you can do it in 3 lines:
import pandas as pd
file = pd.read_excel('myfile.xlsx')
file.to_csv('myfile.xlsx',
sep="\t",
index=False)

How to write in to already opened excel file by using openpyxl

I opened a excel file by using the following code:
from openpyxl import load_workbook
wb = load_workbook('path of the file')
DriverTableSheet = wb.get_sheet_by_name(name = 'name of the sheet')
after that I have to append some values in that excel file..
for that I used the following code
DriverTableSheet.cell(row=1, column=2).value="value"
But it is not responding. Can u guys please guide how to write / append a data in that excel file and save that excel file

Categories

Resources