Converting xlsm to csv - python

I have this simple snippet which used to work well until today. It converts xlsm files into csv.
import xlrd
workbook = xlrd.open_workbook('T:/DataDump/3.26.17.xlsm')
for sheet in workbook.sheets():
with open('{}.csv'.format(sheet.name), 'w') as f:
writer = csv.writer(f)
writer.writerows(sheet.row_values(row) for row in range(sheet.nrows))
print ("CSV converted")
xlsm file:
Name Date Status
Python 12/15/2014 Manager
Pandas 10/17/2014 Senior
csv file:
Name Date Status
Python 12/15/2014 Manager
Pandas 10/17/2014 Senior
This snippet is providing me with the csv but with double spaces between the rows. How can I fix this please?

In Python 3,open f with the additional parameter newline=''
with open('{}.csv'.format(sheet.name), 'w', newline='') as f:

Related

python csv file with several columns

Hello i want to create a csv file in python with the standard library csv.
My Code:
csv_columns = ['receiptNr','category','date','allvalue', 'quantity', 'value']
dict_data = {'receiptNr': 293293, 'category': 'Sbudget' ,'date': '29.11.2020' ,'allvalue': '2.70' , 'quantity': '2 STK' , 'value': '1.35'}
csv_file = r"C:\Maturaprojekt\table.csv"
try:
with open(csv_file, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns, lineterminator = '\n')
writer.writeheader()
writer.writerow(dict_data)
except IOError:
print("I/O error")
the output is:
output
but i want to have:
wantedfile
raw text data:
data
That's problem of delimiter.
The delimiter used in your csv file is probably not the same delimiter used in your spreadsheet program.
You can add a delimiter parameter to csv.DictWriter to change the delimiter of your csv file if you want, like this:
writer = csv.DictWriter(csvfile, delimiter=';', fieldnames=csv_columns, lineterminator = '\n')
Or you could change the csv file delimiter chosen in your spreadsheet program.
As #Chris commented, you're showing us screenshots from a spreadsheet program, you will have to select the right delimiter in your spreadsheet program (usually when you open the csv file).
In short: the default delimiter used by csv.DictWriter is not the same that is used in your spreadsheet program.
How you could change the delimiter (separator) used by your spreadshet program depends on the spreadsheet program you use.
You could go to your favorite search engine and search for something like this:
csv file change separator libreoffice, or csv file change separator excel, or csv file change delimiter excel, etc.

Adding csv filename to a column in python (200 files)

I have 200 files with dates in the file name. I would like to add date from this file name into new column in each file.
I created macro in Python:
import pandas as pd
import os
import openpyxl
import csv
os.chdir(r'\\\\\\\')
for file_name in os.listdir(r'\\\\\\'):
with open(file_name,'r') as csvinput:
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('FileName')
all.append(row)
for row in reader:
row.append(file_name)
all.append(row)
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
if file_name.endswith('.csv'):
workbook = openpyxl.load_workbook(file_name)
workbook.save(file_name)
csv_filename = pd.read_csv(r'\\\\\\')
csv_data= pd.read_csv(csv_filename, header = 0)
csv_data['filename'] = csv_filename`
Right now I see "InvalidFileException: File is not a zip file" and only first file has added column with the file name.
Can you please advise what am I doing wrong? BTW I,m using Python 3.4.
Many thanks,
Lukasz
First problem, this section:
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
should be indented, to be included in the for loop. Now it is only executed once after the loop. This is why you only get one output file.
Second problem, the exception is probably caused by openpyxl.load_workbook(file_name). Presumably openpyxl can only open actual Excel files (which are .zip files with other extension), no CSV files. Why do you want to open and save it after all? I think you can just remove those three lines.

What's the equivalent of reader = csv.reader(...) for xlsx sheets?

I have a script with
path=r"mypath\myfile.xlsx"
with open(path) as f:
reader = csv.reader(f)
but it won't work because the code is trying to open an xlsx file with a module made for csv files.
So, does an expression equivalent for xlsx files exist?
The equivalent of the highlighted code for xlsx sheets is:
path=r"mypath\myfile.xlsx"
import pandas as pd
with open(path) as f:
reader = pd.read_excel(f)

how to convert xlsx to tab delimited files

I have quite a lot of xlsx files which is a pain to convert them one by one to tab delimited files
I would like to know if there is any solution to do this by python. Here what I found and what tried to do with failure
This I found and I tried the solution but did not work Mass Convert .xls and .xlsx to .txt (Tab Delimited) on a Mac
I also tried to do it for one file to see how it works but with no success
#!/usr/bin/python
import xlrd
import csv
def main():
# I open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# I don't know the name of sheet
mysheet = myfile.sheet_by_index(0)
# I open the output csv
myCsvfile = open('my.csv', 'wb')
# I write the file into it
wr = csv.writer(myCsvfile, delimiter="\t")
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
myCsvfile.close()
if __name__ == '__main__':
main()
No real need for the main function.
And not sure about your indentation problems, but this is how I would write what you have. (And should work, according to first comment above)
#!/usr/bin/python
import xlrd
import csv
# open the output csv
with open('my.csv', 'wb') as myCsvfile:
# define a writer
wr = csv.writer(myCsvfile, delimiter="\t")
# open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# get a sheet
mysheet = myfile.sheet_by_index(0)
# write the rows
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
Why go with so much pain when you can do it in 3 lines:
import pandas as pd
file = pd.read_excel('myfile.xlsx')
file.to_csv('myfile.xlsx',
sep="\t",
index=False)

Converting xls to csv in Python 3 using xlrd

I'm using Python 3.3 with xlrd and csv modules to convert an xls file to csv. This is my code:
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook('MySpreadsheet.xls')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('test_output.csv', 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()
With that I am receiving this error: TypeError: 'str' does not support the buffer interface
I tried changing the encoding and replaced the line within the loop with this:
wr.writerow(bytes(sh.row_values(rownum),'UTF-8'))
But I get this error: TypeError: encoding or errors without a string argument
Anyone know what may be going wrong?
Try this
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook('MySpreadsheet.xlsx')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('output.csv', 'w', encoding='utf8')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()
i recommend using pandas library for this task
import pandas as pd
xls = pd.ExcelFile('file.xlsx')
df = xls.parse(sheetname="Sheet1", index_col=None, na_values=['NA'])
df.to_csv('file.csv')
Your problem is basically that you open your file with Python2 semantics. Python3 is locale-aware, so if you just want to write text to this file (and you do), open it as a text file with the right options:
your_csv_file = open('test_output.csv', 'w', encoding='utf-8', newline='')
The encoding parameter specifies the output encoding (it does not have to be utf-8) and the Python3 documentation for csv expressly says that you should specify newline='' for csv file objects.
A quicker way to do it with pandas:
import pandas as pd
xls_file = pd.read_excel('MySpreadsheet.xls', sheetname="Sheet1")
xls_file.to_csv('MySpreadsheet.csv', index = False)
#remove the index because pandas automatically indexes the first column of CSV files.
You can read more about pandas.read_excel here.

Categories

Resources