How python XLRD library scans all sheets in excel? - python

I am planning to use XLRD libraries for reading the number of rows and columns in the excel file that I imported.
I use following codes which work perfectly fine.
import xlrd
path = 'sample123.xlsx'
inputWorkbook = xlrd.open_workbook(path)
inputWorksheet = inputWorkbook.sheet_by_index(0)
print("Your worksheet has: " + str(inputWorksheet.nrows) + " rows")
print("Your worksheet has: " + str(inputWorksheet.ncols) + " columns")
However, that codes only run for a sheet (the first one). If I would like to randomly import a number of excel files that I do not know the total index or sheet names of each file, is there any coding suggestion so that all sheets in that file could be scanned through, thus the number of rows and columns for all sheets can be detected?
Thanks very much for your assistance.

However, that codes only run for a sheet (the first one)
that is because you are passing the index=0 when calling get sheet method...
you call the method get_sheet
myDoc.get_sheet(index)
where index is the index of the sheet, if you dont know it, you can find it by name:
sheet_names().index(nameOfMySheet)
here the doc
here is an example about how to get the sheets in a workbook
import xlrd
book = xlrd.open_workbook("sample.xls")
for sheet in book.sheets():
print sheet.name

To read all sheets from one excel file by using xlrd,
import xlrd
path = 'sample123.xlsx'
inputWorkbook = xlrd.open_workbook(path)
dict_sheet_tabs= {} # Store sheets in a dictionary
for sheet_name in inputWorkbook.sheet_names():
print(sheet_name ) # name of each tab
all_sheet = wb1.sheet_by_name(sheet_name) # read sheet by name
dict_sheet_tabs.update({sheet_name:all_sheet })
print(dict_sheet_tabs)
>>> {'sheet_name1': <xlrd.sheet.Sheet object at 0x7fa903b6efd0>, 'sheet_name2': <xlrd.sheet.Sheet object at 0x7fa9038ece10>}
#The dictionary keys are sheet names and values are the sheet content

Related

Export specific Sheet, and save different file openpyxl

I'm trying to export a specific sheet from an excel file, but without a result. I want to export a specific sheet of paper to a completely new file What I have written is:
import openpyxl
book = openpyxl.load_workbook('C:\Python\test.xlsx')
a = (book.get_sheet_names())
sheet1 = book[a[5]]
sheet1.save('C:\Python\sheet2.xlsx')
Also, another thing I can't do,and look for a certain sheet if I have its name.
I apologize if the questions are simple, but it's been a few days since I started with python :)
Well, openpyxl does provide copy_worksheet() but it cannot be used between different workbooks. You can copy your sheet cell-by-cell or you can modify your starting workbook in memory and then you can save it with a different file name. Here is the code
import openpyxl
# your starting wb with 2 Sheets: Sheet1 and Sheet2
wb = openpyxl.load_workbook('test.xlsx')
sheets = wb.sheetnames # ['Sheet1', 'Sheet2']
for s in sheets:
if s != 'Sheet2':
sheet_name = wb.get_sheet_by_name(s)
wb.remove_sheet(sheet_name)
# your final wb with just Sheet1
wb.save('test_with_just_sheet2.xlsx')

Read Excel Sheet based on VBA CodeName property

I have a .xls Excel file and know the VBA CodeName property for the sheet I want to read. How can I read the sheet into Python by using the Sheet's CodeName property, rather than it's Name property?
Topic discussing the difference between VBA CodeName and Name: Excel tab sheet names vs. Visual Basic sheet names
#you have to install the module
#import the xlrd module
import xlrd
workbook = xlrd.open("example.xls") #it can take x|xs|xls format
#if you know the name of the sheet use
sheet = workbook.sheet_by_name("name of the sheet")
#or you can use index number
sheet = workbook.sheet_by_index("index of the sheet here")
#you can print the cell you want like this
print(" print from the 4th row 2nd cell".format(sheet(4,2).value))
if this was helpful give it a like if not i don't understand english well thankyou and i am new

Create a hyperlink to a different Excel sheet in the same workbook

I'm using the module openpyxl for Python and am trying to create a hyperlink that will take me to a different tab in the same Excel workbook. Doing something similar to the following creates the hyperlink; however, when I click on it, it tells me it can't open the file.
from openpyxl import Workbook
wb = Workbook()
first_sheet = wb.create_sheet(title='first')
second_sheet = wb.create_sheet(title='second')
first_sheet['A1'] = "hello"
second_sheet['B2'] = "goodbye"
link_from = first_sheet['A1']
link_to = second_sheet['B2'].value
link_from.hyperlink = link_to
wb.save("C:/somepath/workbook.xlsx")
I'm assuming the issue lies in the value of 'link_to'; however, I don't know what would need changed or what kind of path I would have to write.
I'm using Python 2.7.6 and Excel 2013.
I found a way to do it.
Assuming one .xlsx file named 'workbookEx.xlsx' with two sheets named 'sheet1' and 'sheet2' and needing a link from one cell(A1) of the 'sheet1' to another cell(E5) of the 'sheet2':
from openpyxl import load_workbook
wb = load_workbook(workbookEx.xlsx)
ws = wb.get_sheet_by_name("sheet1")
link = "workbookEx.xlsx#sheet2!E5"
ws.cell(row=1, column=1).hyperlink = (link)
The secret was the "#", Excel do not shows you but it uses the '#' for same file links, I just had to copy a same file link created in Excel to a Word document to see the '#'.
It is also possible to omit the filename, i.e. to link against a sheet of the active document just use: _cell.hyperlink = '#sheetName!A1'.
To name the link you just created, just set the cell value to the desired string: _cell.value = 'Linkname'.
As an addendum to Marcus.Luck's answer, if wanting to use Excel's built-in hyperlink function directly, you may need to format as:
'=HYPERLINK("{}", "{}")'.format(link, "Link Name")
Without this formatting, the file didn't open for me without needing repair, which removed the cell values when clicking the links.
e.g. ws.cell(row=1, column=1).value = '=HYPERLINK("{}", "{}")'.format(link, "Link Name")
Another working solution is to use excels built in function HYPERLINK.
It doesn't make the value in the cell into a hyperlink but put a formula in the cell and act like a hyperlink.
ws.cell('A1').value = '=HYPERLINK("#sheet2!E5","Link name")'
Support for hyperlinks in openpyxl is currently extremely rudimentary and largely limited to reading the links in existing files.
In addition to the previous answers, it is also useful to format the cell to make it look like the hyperlinks created within Excel. To do this use the Excel style named "Hyperlink", as per example below which also includes using single quotes around sheet names in case they include spaces (as mentioned by Neofish).
cell(row, col).value = '=HYPERLINK("#\'{}\'!A1", "{}")'.format(sheet_name_with_spaces, "Link Name")
cell(row, col).style = 'Hyperlink'
import openpyxl as opxl
import pandas as pd
def hyperlinking(New_file_path):
xls = pd.ExcelFile(New_file_path)
sheets = xls.sheet_names
wb = opxl.load_workbook(New_file_path)
ws = wb.create_sheet("Consolitated_Sheet")
ws['A1'] = "Sheet_Name"; ws['B1'] = "Active_link"
for i, j in enumerate(sheets):
# print('A'+str(i+2) + ' value is: ' + j)
ws['A' + str(i + 2)] = j
ws['B' + str(i + 2)].value = '=HYPERLINK("%s", "%s")' % ('#' + str(j) + '!A1', 'Clickhere')
wb.save(New_file_path)
wb.close()
Some of the answer doesn't really fit my case (i did not say them wrong :) my excel file is just weird), so i did a little experiment and this code works perfectly for me.
Suggest when sheet "first" cell "A1" clicked we move to sheet "second" cell "A1"
sheet_to_move= 'second'
cell_to_move= 'A1'
first_sheet['A1'].hyperlink.location= f"'{sheet_to_move}'!{cell_to_move}"
first_sheet['A1'].hyperlink.display= f"'{sheet_to_move}'!{cell_to_move}"

Dynamically Parsing a worksheet in Pandas using Python 3

My question is regarding parsing worksheets in Panda (Python 3).
Right now my code looks like this:
var = input("Enter the path for the Excel file you want to use: ")
import pandas as pd
xl = pd.ExcelFile(var)
df = xl.parse("HelloWorld")
df.head()
with my code parsing the worksheet "HelloWorld" within an excel file the user inputs. However, sometimes the worksheet within the file will not be called "HelloWorld" in which case the parsing code will fail.
Does anyone know how to set the variable "df" to dynamically read the name of the worksheet within the excel file. There will always be only ONE worksheet in these excel files so whatever worksheet is in the file, I want my code to read.
Thank you for the help!
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.excel.ExcelFile.parse.html
You can pass in the sheet number instead of the name.
var = input("Enter the path for the Excel file you want to use: ")
import pandas as pd
xl = pd.ExcelFile(var)
df = xl.parse(sheetname=0)
df.head()

Save row_slice information in workbook xlrd python

I am trying to extract the header row (the first row) from multiple files, each of which has multiple sheets. The output of each sheet should be saved and appened in a new master file that contains all the headers from each sheet and each file.
The easiest way I have found is to use the command row_slice. However, the output from the file is a list of Cell objects and I cannot seem to access their indices.
I am looking for a way to save the data extracted into a new workbook.
Here is the code I have so far:
from xlrd import open_workbook,cellname
book = open_workbook('E:\Files_combine\MOU worksheets 2012\Walmart-GE_MOU 2012-209_worksheet_v03.xls')
last_index = len(book.sheet_names())
for sheet_index in range(last_index):
sheet = book.sheet_by_index(sheet_index)
print sheet.name
print sheet.row_slice(0,1)
I cannot get the output and store it as an input to a new file. Also, any ideas on how to automate this process for 100+ files will be appreciated.
You can store the output in a csv file and you can use the os.listdir and a for loop to loop over all the file names
import csv
import os
from xlrd import open_workbook, cellname
EXCEL_DIR = 'E:\Files_combine\MOU worksheets 2012'
with open("headers.csv", 'w') as csv_file:
writer = csv.writer(csv_file)
for file_name in os.listdir(EXCEL_DIR):
if file_name.endswith("xls"):
book = open_workbook(os.path.join(EXCEL_DIR, file_name))
for index, name in enumerate(book.sheet_names()):
sheet = book.sheet_by_index(index)
#the write row method takes a sequence
#I assume that row_slice returns a list or a tuple
writer.writerow(sheet.row_slice(0,1))

Categories

Resources