What's the equivalent of reader = csv.reader(...) for xlsx sheets? - python

I have a script with
path=r"mypath\myfile.xlsx"
with open(path) as f:
reader = csv.reader(f)
but it won't work because the code is trying to open an xlsx file with a module made for csv files.
So, does an expression equivalent for xlsx files exist?

The equivalent of the highlighted code for xlsx sheets is:
path=r"mypath\myfile.xlsx"
import pandas as pd
with open(path) as f:
reader = pd.read_excel(f)

Related

Adding csv filename to a column in python (200 files)

I have 200 files with dates in the file name. I would like to add date from this file name into new column in each file.
I created macro in Python:
import pandas as pd
import os
import openpyxl
import csv
os.chdir(r'\\\\\\\')
for file_name in os.listdir(r'\\\\\\'):
with open(file_name,'r') as csvinput:
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('FileName')
all.append(row)
for row in reader:
row.append(file_name)
all.append(row)
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
if file_name.endswith('.csv'):
workbook = openpyxl.load_workbook(file_name)
workbook.save(file_name)
csv_filename = pd.read_csv(r'\\\\\\')
csv_data= pd.read_csv(csv_filename, header = 0)
csv_data['filename'] = csv_filename`
Right now I see "InvalidFileException: File is not a zip file" and only first file has added column with the file name.
Can you please advise what am I doing wrong? BTW I,m using Python 3.4.
Many thanks,
Lukasz
First problem, this section:
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
should be indented, to be included in the for loop. Now it is only executed once after the loop. This is why you only get one output file.
Second problem, the exception is probably caused by openpyxl.load_workbook(file_name). Presumably openpyxl can only open actual Excel files (which are .zip files with other extension), no CSV files. Why do you want to open and save it after all? I think you can just remove those three lines.

Python save XLSX to CSV

I am not a python user. However, I am sick of saving Excel files to CSV manually, and everybody hates Perl at this shop. I can't get Spreadsheet::XLSX to work in this work environment. They only use Python.
The python version is 2.4.
#!/usr/bin/python
import openpyxl
import csv
wb = openpyxl.load_workbook('DailySnapshot.xlsx')
sh = wb.get_active_sheet()
with open('test.csv', 'wb') as f:
c = csv.writer(f)
for r in sh.rows:
c.writerow([cell.value for cell in r])
The DailySnapshot.xlxs is saved in the same directory a the script. It is a one page excel spreadsheet, and the sheet is called 'Table1'. I figured I would name the CSV file test.csv. This is the error it throws.
File "./secondPyTry.py", line 8
with open('test.csv', 'wb') as f:
^ SyntaxError: invalid syntax
As it was said in the comments, Python 2.4 does not wupport with. You should open the file like this instead:
f = open('test.csv', 'wb')
c = csv.writer(f)
for r in sh.rows:
c.writerow([cell.value for cell in r])
f.close()

Converting xlsm to csv

I have this simple snippet which used to work well until today. It converts xlsm files into csv.
import xlrd
workbook = xlrd.open_workbook('T:/DataDump/3.26.17.xlsm')
for sheet in workbook.sheets():
with open('{}.csv'.format(sheet.name), 'w') as f:
writer = csv.writer(f)
writer.writerows(sheet.row_values(row) for row in range(sheet.nrows))
print ("CSV converted")
xlsm file:
Name Date Status
Python 12/15/2014 Manager
Pandas 10/17/2014 Senior
csv file:
Name Date Status
Python 12/15/2014 Manager
Pandas 10/17/2014 Senior
This snippet is providing me with the csv but with double spaces between the rows. How can I fix this please?
In Python 3,open f with the additional parameter newline=''
with open('{}.csv'.format(sheet.name), 'w', newline='') as f:

Append a sheet to an existing excel file using openpyxl

I saw this post to append a sheet using xlutils.copy:
https://stackoverflow.com/a/38086916/2910740
Is there any solution which uses only openpyxl?
I found solution. It was very easy:
def store_excel(self, file_name, sheet_name):
if os.path.isfile(file_name):
self.workbook = load_workbook(filename = file_name)
self.worksheet = self.workbook.create_sheet(sheet_name)
else:
self.workbook = Workbook()
self.worksheet = self.workbook.active
self.worksheet.title = time.strftime(sheet_name)
.
.
.
self.worksheet.cell(row=row_num, column=col_num).value = data
I would recommend storing data in a CSV file, which is a ubiquitous file format made specifically to store tabular data. Excel supports it fully, as do most open source Excel-esque programs.
In that case, it's as simple as opening up a file to append to it, rather than write or read:
with open("output.csv", "a") as csvfile:
wr = csv.writer(csvfile, dialect='excel')
wr.writerow(YOUR_LIST)
As for Openpyxl:
end_of_sheet = your_sheet.max_row
will return how many rows your sheet is so that you can start writing to the position after that.

how to convert xlsx to tab delimited files

I have quite a lot of xlsx files which is a pain to convert them one by one to tab delimited files
I would like to know if there is any solution to do this by python. Here what I found and what tried to do with failure
This I found and I tried the solution but did not work Mass Convert .xls and .xlsx to .txt (Tab Delimited) on a Mac
I also tried to do it for one file to see how it works but with no success
#!/usr/bin/python
import xlrd
import csv
def main():
# I open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# I don't know the name of sheet
mysheet = myfile.sheet_by_index(0)
# I open the output csv
myCsvfile = open('my.csv', 'wb')
# I write the file into it
wr = csv.writer(myCsvfile, delimiter="\t")
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
myCsvfile.close()
if __name__ == '__main__':
main()
No real need for the main function.
And not sure about your indentation problems, but this is how I would write what you have. (And should work, according to first comment above)
#!/usr/bin/python
import xlrd
import csv
# open the output csv
with open('my.csv', 'wb') as myCsvfile:
# define a writer
wr = csv.writer(myCsvfile, delimiter="\t")
# open the xlsx file
myfile = xlrd.open_workbook('myfile.xlsx')
# get a sheet
mysheet = myfile.sheet_by_index(0)
# write the rows
for rownum in xrange(mysheet.nrows):
wr.writerow(mysheet.row_values(rownum))
Why go with so much pain when you can do it in 3 lines:
import pandas as pd
file = pd.read_excel('myfile.xlsx')
file.to_csv('myfile.xlsx',
sep="\t",
index=False)

Categories

Resources