I've been spending the better part of the weekend trying to figure out the best way to transfer data from an MS Access table into an Excel sheet using Python. I've found a few modules that may help (execsql, python-excel), but with my limited knowledge and the modules I have to use to create certain data (I'm a GIS professional, so I'm creating spatial data using the ArcGIS arcpy module into an access table)
I'm not sure what the best approach should be. All I need to do is copy 4 columns of data from access to excel and then format the excel. I have the formatting part solved.
Should I:
Iterate through the rows using a cursor and somehow load the rows into excel?
Copy the columns from access to excel?
Export the whole access table into a sheet in excel?
Thanks for any suggestions.
I eventually found a way to do this. I thought I'd post my code for anyone who may run into the same situation. I use some GIS files, but if you don't, you can set a variable to a directory path instead of using env.workspace and use a cursor search instead of the arcpy.SearchCursor function, then this is doable.
import arcpy, xlwt
from arcpy import env
from xlwt import Workbook
# Set the workspace. Location of feature class or dbf file. I used a dbf file.
env.workspace = "C:\data"
# Use row object to get and set field values
cur = arcpy.SearchCursor("SMU_Areas.dbf")
# Set up workbook and sheet
book = Workbook()
sheet1 = book.add_sheet('Sheet 1')
book.add_sheet('Sheet 2')
# Set counter
rowx = 0
# Loop through rows in dbf file.
for row in cur:
rowx += 1
# Write each row to the sheet from the workbook. Set column index in sheet for each column in .dbf
sheet1.write(rowx,0,row.ID)
sheet1.write(rowx,1,row.SHAPE_Area/10000)
book.save('C:\data\MyExcel.xls')
del cur, row
I currently use the XLRD module to suck in data from an Excel spreadsheet and an insert cursor to create a feature class, which works very well.
You should be able to use a search cursor to iterate through the feature class records and then use the XLWT Python module (http://www.python-excel.org/) to write the records to Excel.
You can use ADO to read the data from Access(Here are the connection strings for Access 2007+(.accdb files) and Access 2003-(.mdb files)) and than use Excel's Range.CopyFromRecordset method(assuming you are using Excel via COM) to copy the entire recordset into Excel.
The best approach might be to not use Python for this task.
You could use the macro recorder in Excel to record the import of the External data into Excel.
After starting the macro recorder click Data -> Get External Data -> New Database Query and enter your criteria. Once the data import is complete you can look at the code that was generated and replace the hard coded search criteria with variables.
Another idea - how important is the formatting part? If you can ditch the formatting, you can output your data as CSV. Excel can open CSV files, and the CSV format is much simpler then the Excel format - it's so simple you can write it directly from Python like a text file, and that way you won't need to mess with Office COM objects.
Related
I'm running a python script to automate some of my day-to-day tasks at work. One task I'm trying to do is simply add a row to an existing ods sheet that I usually open via LibreOffice.
This file has multiple sheets and depending on what my script is doing, it will add data to different sheets.
The thing is, I'm having trouble finding a simple and easy way to just add some data to the first unpopulated row of the sheet.
Reading about odslib3, pyexcel and other packages, it seems that to write a row, I need to specifically tell the row number and column to write data, and opening the ods file just to see what cell to write and tell the pythom script seems unproductive
Is there a way to easily add a row of data to an ods sheet without informing row number and column ?
If I understand the question I believe that using a .remove() and a .append() will do the trick. It will create and populate data on the last row (can't say its the most efficient though).
EX if:
from pyexcel_ods3 import save_data
from pyexcel_ods3 import get_data
data = get_data("info.ods")
print(data["Sheet1"])
[['first_row','first_row'],[]]
if([] in data["Sheet1"]):
data["Sheet1"].remove([])#remove unpopulated row
data["Sheet1"].append(["second_row","second_row"])#add new row
print(data["Sheet1"])
[['first_row','first_row'],['second_row','second_row']]
I have a spreadsheet which is having following columns:
TestID TestData ExpectedOutput ActualOutput Result
I have separate python scripts for each test-id. I need to read the row corresponding to that particular test-id and after execution, need to update result in same spreadsheet. I am not able to update that result value. can someone please help?
I read the spreadsheet using Pandas.
e.g.
a row in spread sheet:
TestID TestData ExpectedOutput ActualOutput Result
Testid-1 Min_freq=5,Max_freq=60, Drive started Drive started Pass
My script would search for this testid and read the test data. after execution, it would compare the output with expected output and accordingly would update the value of cell Result. I am not getting how to update result value.
Please help me.
The only package that can modify/edit an existing excel is openpyxl
You can read it by xlrd, but cannot modify it by xlwt or xlsxwriter, which can create and flash new xls and xlsx.
However, if you are using another source to edit the existing excel, they are not editing the same ones but two template mirror files, be sure to save it before letting python to read it, and vise versa.
I am writing a python script using XlsxWriter to generate an .xlsx file comprising of multiple worksheets. Each worksheet will have multiple tables and lots of formatting - hence my code is getting pretty long. Therefore, I am looking for a way to split the code up, eg. Worksheet 1 corresponding to worksheet1.py, with a 'main' file to compile the worksheets into a single workbook.
I have tried using a function to create a worksheet and calling that from another file to add to an existing workbook - but this method does not work. XlsxWriter requires you to add the worksheet to an existing workbook. (If I'm missing something and this is possible please let me know).
Alternately, I thought of creating individual workbooks with a single worksheet inside and using a second package (openpyxl) to collate the worksheets. However, I think this will alter the formatting on the worksheets. (Again, please let me know if I am missing something).
Any ideas on this subject would be greatly received
Thanks
Edit: example table
example table
Pandas will actually be very helpful in this case.
you can first create writer for your excel file
writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter')
create you tables are dataframe, check here about dataframes basics
df.to_excel(writer,sheet_name='Sheet 1',startrow=0 , startcol=0)
place that table easily into any excel sheet(workbook) you want just provide the name as argument.
put another table in same sheet
df_1.to_excel(writer,sheet_name='Sheet 1',startrow=20 , startcol=0)
change the row from where you want to start the table, or change the sheet name
The Problem:
Open a ListObject (excel table) of an Excel file from y python environment.
The why:
There are multiple solutions to open an excel file in python. Starting with pandas:
import pandas as pd
mysheetName="sheet1"
df = pd.read_excel(io=file_name, sheet_name=mysheetName)
This will pass the sheet1 into a pandas data frame.
So far so good.
Other more detailed solution is using specific libraries. This one being a code of a stack overflow question.
from openpyxl import load_workbook
wb2 = load_workbook('test.xlsx')
print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']
worksheet1 = wb2['Sheet1'] # one way to load a worksheet
worksheet2 = wb2.get_sheet_by_name('Sheet2') # another way to load a worksheet
print(worksheet1['D18'].value)
So far so good as well.
BUT:
If you have a ListObject (excel table) in a sheet I did not find any way to access the data of the Listobject.
ListObjects are often used by a bit more advance users of Excel; above all when programming macros in VBA. There are very convenient and could be seen as the equivalent of a pandas dataframe in Excel. Having a bridge between Excel Listobject and a pandas data frame seems like super logical. Nevertheless I did not find so far any solution, library or workaround for doing that.
The question.
Does anyone know about some python lybrary/solution to directly extract Listobjects form Excel sheets?.
NOTE1: Not nice solution
Of course knowing the "placement" of the Listobject it is possible to refer to the start and last cell, but this is a really bad solution because does not allow you to modify the Listobject in the excel file (the python would have to be modified straight away). As soon as the placement of the ListObject changes, or the listobject itself gets bigger, the python code would be broken.
NOTE2: My current solution:
I export the listObject from excel (with a macro) into a JSON file and read it from python. But the extra work is obvious. VBA code, extra file etc etc.
Last comment: If someone is interested about this issue but still don't have a clue what is a ListObject in excel here click and see here:
James is right:
https://openpyxl.readthedocs.io/en/stable/worksheet_tables.html
https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.table.html
There is a class in openpyxl to read tables. Also by id:
class openpyxl.worksheet.table.Table(id=1,...
id=1 would mean the first table of the worksheet.
Remember always that ListObjects in Excel are called Tables. Thats weird (as oft with VBA). If you work with VBA you might forget that the ListObject=Table.
With xlwings is also possible. The API is a bit different:
import xlwings as xw
wb = xw.Workbook.active()
xw.Range('TableName[ColumnName]').value
Or to get the column including header and Total row, you could do:
xw.Range('TableName[[#All], [ColumnName]]').value
2 Questions to ask:
Ques 1:
I just started studying about xlrd for reading excel file in python.
I was wondering if there is a method in xlsrd --> similar to get_active_sheet() in openpyxl or any other way to get the Active sheet ?
get_active_sheet() works this in openpyxl
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
active_sheet = wb.get_active_sheet()
output : Worksheet "Sheet1"
I had found methods in xlrd for retrieving the names of sheets, but none of them could tell me the active sheet.
Ques 2:
Is xlrd the best packaage in python for reading excel files? I also came across this which had info about other python packages(xlsxwriterxlwtxlutils) for reading and writing excel files.
Which of the above all will be best for making an App which reads an Excel File and applies different validations to to different columns
For eg: Column with Header 'ID' should have unique values and A column with Header 'Country' should have valid Countries.
The "active sheet" here seems you're referring to the last sheet selected when the workbook was saved/closed. You can get this sheet via the sheet_visible value.
import xlrd
xl = xlrd.open_workbook("example.xls")
for sht in xl.sheets():
# sht.sheet_visible value of 1 is "active sheet"
print(sht.name, sht.sheet_selected, sht.sheet_visible)
Usually only one sheet is selected at a time, so it may look like sheet_visible and sheet_selected are the same, but multiple sheets can be selected at a time (ctrl+click multiple sheet tabs, for example).
Another reason this may seem confusing is because Excel uses "visible" in terms of hidden/visible sheets. In xlrd, this is instead sheet.visibility (see https://stackoverflow.com/a/44583134/4258124)
Welcome to Stack Overflow.
I have been working with Excel files in Python for a while now, so I could help you with your question, I think.
openpyxl and xlrd solve different problems, one is for xlsx files (Excel 2007+), where the other one is for xls files (Excel 1997-2003), respectively.
Xenon said in his answer that Excel doesn't recognize the concept of an active sheet, which is not totally true. If you open an Excel document, go to some other sheet (that isn't the first one) and save and close the document, the next time you open it, Excel will open the document on the last sheet you were on.
However, xlrd does not support this kind of workflow, i.e. asking for the active sheet. If you know the sheet name, then you could use the method sheet_by_name, or if you know the sheet index, you could use the method sheet_by_index.
I don't know if the xlrd is the best package around, but it is pretty solid, and I have had nary a problem using it.
The example given could be solved by first iterating through the first row and keeping a dictionary of which column a header is. Then storing all the values in the ID column in a list and comparing the length of that list with the length of a set created from that list, i.e. len(values) == len(set(values)). Following that, you could iterate through the column with header of Country and check each value if it is in a dictionary you previously made with all the valid counties.
I hope this answer suits your needs.
Summary: Stick with xlrd because is mature enough.
You can see all worksheets in a given workbook with the sheet_names() function. Excel has no concept of an "active sheet", but if my assumption that you are referring to the first sheet is correct, you can get the first element of sheet_names() to get the "active sheet."
With regards to your second question, it's not easy to say that a package is better than another package objectively. However, xlrd is widely used, and the most popular Python library for what it does.
I would recommend sticking with it.