how to find cell containing specific text using python? - python

I am trying to iterate xlsx file and find the cell that contains our company's name using python. The file consists of 2 or more sheets, and each sheet has 6 company's information. Each cell I am looking for has formation as below:
Cell F6 = 1ST(Company_A+Company_B)
Cell G6 = 2ND(Company_C+Company_D)
Cell H6 = 3RD(Company_E+Company_F)
and so on.
I'd like to find the cell that contains Company_A. I have done some coding, but I got some problem.
The coding I can do is as following:
import openpyxl
bid = openpyxl.load_workbook('C:/Users/User/Desktop/bidding.xlsx', data_only=True)
for sheet in bid.worksheets:
for row in sheet.iter_rows():
for entry in row:
if entry.value == '1ST(Company_A+Company_B)':
print(entry.offset(row=1).value)
print(round(entry.offset(row=8).value/100,5))
I can find the value I want, but I want to find the cell without entering everything

As you're using == the script is checking for the string in the cell to match exactly that. Instead use in.
Your code should be:
import openpyxl
bid = openpyxl.load_workbook('C:/Users/User/Desktop/bidding.xlsx', data_only=True)
for sheet in bid.worksheets:
for row in sheet.iter_rows():
for entry in row:
try:
if 'Company_A' in entry.value:
print(entry.offset(row=1).value)
print(round(entry.offset(row=8).value/100,5))
except (AttributeError, TypeError):
continue

Related

Find column number in excel sheet using openpyxl [duplicate]

I am trying to iterate xlsx file and find the cell that contains our company's name using python. The file consists of 2 or more sheets, and each sheet has 6 company's information. Each cell I am looking for has formation as below:
Cell F6 = 1ST(Company_A+Company_B)
Cell G6 = 2ND(Company_C+Company_D)
Cell H6 = 3RD(Company_E+Company_F)
and so on.
I'd like to find the cell that contains Company_A. I have done some coding, but I got some problem.
The coding I can do is as following:
import openpyxl
bid = openpyxl.load_workbook('C:/Users/User/Desktop/bidding.xlsx', data_only=True)
for sheet in bid.worksheets:
for row in sheet.iter_rows():
for entry in row:
if entry.value == '1ST(Company_A+Company_B)':
print(entry.offset(row=1).value)
print(round(entry.offset(row=8).value/100,5))
I can find the value I want, but I want to find the cell without entering everything
As you're using == the script is checking for the string in the cell to match exactly that. Instead use in.
Your code should be:
import openpyxl
bid = openpyxl.load_workbook('C:/Users/User/Desktop/bidding.xlsx', data_only=True)
for sheet in bid.worksheets:
for row in sheet.iter_rows():
for entry in row:
try:
if 'Company_A' in entry.value:
print(entry.offset(row=1).value)
print(round(entry.offset(row=8).value/100,5))
except (AttributeError, TypeError):
continue

How to paste into cell based on value - Openpyxl

Good morning all.
I'm in a situation where I have two Excel workbooks. The first has my source data, and the second I'm trying to paste the source data into.
My code searches for a particular cell with today's date in in the first workbook, finds the cells of data I require associated with it, then tries to paste that range of data into a second workbook.
The code is able to currently iterate over the first workbook and find the correct data, but the issue comes to when I try and paste the data into the second workbook.
If for example the data found is from A40:C40, it will paste into the second workbook at the same location (A40:C40). I need the code to iterate the second workbook and find the correct location to paste the data in based on another cells value.
To be clear, the location of the copy and the paste varies every day. I cannot use a fixed cell reference.
from openpyxl import Workbook
import openpyxl
import datetime
wb = openpyxl.load_workbook('Online Log.xlsx')
wb1 = openpyxl.load_workbook('Blank.xlsx')
sheet = wb['Weather']
sheet1 = wb1['Sheet2']
# Find yesterday in date format
today = datetime.date.today()
yesterday = str(today - datetime.timedelta(days=1))
# Find position of midnight position on today's DPR
for row in sheet.iter_rows():
for cell in row:
if str(cell.value) == (str(today) + ' 00:00:00'):
Start_Coord = sheet.cell(row=cell.row, column=3).coordinate
End_Coord = sheet.cell(row=cell.row + 3, column=9).coordinate
for row in sheet[Start_Coord:End_Coord]:
for cell in row:
sheet1[cell.coordinate].value = cell.value
wb1.save('file2.xlsx')
I've tried incorporating the following code to search for the relevant place to paste into the second workbook, but that doesn't work either.
for rows in sheet1.iter_rows():
for cell in rows:
if str(cell.value) == 'Paste Cell Below':
Start_Coord_2 = sheet1.cell(row=cell.row, column=3).coordinate
End_Coord_2 = sheet1.cell(row=cell.row + 3, column=9).coordinate
for rows in sheet1[Start_Coord_2:End_Coord_2]:
for cell in rows:
sheet1[cell.coordinate].value = cell.value
print(cell.coordinate)

Python 3x win32com: Copying used cells from worksheets in workbook

I have 6 work sheets in my workbook. I want to copy data (all used cells except the header) from 5 worksheets and paste them into the 1st. Snippet of code that applies:
`
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(mergedXL)
wsSIR = wb.Sheets(1)
sheetList = wb.Sheets
for ws in sheetList:
used = ws.UsedRange
if ws.Name != "1st sheet":
print ("Copying cells from "+ws.Name)
used.Copy()
`
used.Copy() will copy ALL used cells, however I don't want the first row from any of the worksheets. I want to be able to copy from each sheet and paste it into the first blank row in the 1st sheet. So when cells from the first sheet (that is NOT the sheet I want to copy to) are pasted in the 1st sheet, they will be pasted starting in A3. Every subsequent paste needs to happen in the first available blank row. I probably haven't done a great job of explaining this, but would love some help. Haven't worked with win32com a ton.
I also have this code from one of my old scripts, but I don't understand exactly how it's copying stuff and how I can modify it to work for me this time around:
ws.Range(ws.Cells(1,1),ws.Cells(ws.UsedRange.Rows.Count,ws.UsedRange.Columns.Count)).Copy()
wsNew.Paste(wsNew.Cells(wsNew.UsedRange.Rows.Count,1))
If I understand well your problem, I think this code will do the job:
import win32com.client
# create an instance of Excel
excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
# Open the workbook
file_name = 'path_to_your\file.xlsx'
wb = excel.Workbooks.Open(file_name)
# Select the first sheet on which you want to write your data from the other sheets
ws_paste = wb.Sheets('Sheet1')
# Loop over all the sheets
for ws in wb.Sheets:
if ws.Name != 'Sheet1': # Not the first sheet
used_range = ws.UsedRange.SpecialCells(11) # 11 = xlCellTypeLastCell from VBA Range.SpecialCells Method
# With used_range.Row and used_range.Col you get the number of row and col in your range
# Copy the Range from the cell A2 to the last row/col
ws.Range("A2", ws.Cells(used_range.Row, used_range.Column)).Copy()
# Get the last row used in your first sheet
# NOTE: +1 to go to the next line to not overlapse
row_copy = ws_paste.UsedRange.SpecialCells(11).Row + 1
# Paste on the first sheet starting the first empty row and column A(1)
ws_paste.Paste(ws_paste.Cells(row_copy, 1))
# Save and close the workbook
wb.Save()
wb.Close()
# Quit excel instance
excel.Quit()
I hope it helps you to understand your old code as well.
Have you considered using pandas?
import pandas as pd
# create list of panda dataframes for each sheet (data starts ar E6
dfs=[pd.read_excel("source.xlsx",sheet_name=n,skiprows=5,usecols="E:J") for n in range(0,4)]
# concatenate the dataframes
df=pd.concat(dfs)
# write the dataframe to another spreadsheet
writer = pd.ExcelWriter('merged.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

How to know number of rowspan in Excel Python using openpyxl

I want to find the rowspan i.e. 4 in my case by first column containing "abc" using openpyxl in python. Its not a specific table, its somewhere in my Excel sheet and I am parsing that sheet.
One approach would be to determine if a given cell is within a merged range.
The following script will return you the associated range of merged cells for a given cell. First it locates the cell where your text is, and then attempts to determine the merged cell range:
import openpyxl
def find_cell(ws, text):
for row in ws.iter_rows():
for cell in row:
if cell.value == text:
return cell
return None
def get_merged_range(ws, cell):
if cell in ws.merged_cells:
for merged_range in ws.merged_cell_ranges:
if cell in [c[0] for c in openpyxl.utils.cells_from_range(merged_range)]:
return merged_range
return None
wb = openpyxl.load_workbook(filename = 'input.xlsx')
ws = wb.active
found_cell = find_cell(ws, 'abc').coordinate
print get_merged_range(ws, found_cell)
If the passed cell is not merged, the function will return None.

openpyxl: Issue reading $ and . from xlsx

I want to convert an xlsx file to TAB delimited csv using python. After reading I was pointed to library called openpyxl (code below)
def importXLSX(fileName):
temp = os.path.basename(fileName).split('.xlsx')
tempFileName = os.path.dirname(os.path.abspath(fileName))+"/TEMP_"+temp[0]+".csv"
tempFile = open(tempFileName,'w')
wb = load_workbook(filename=fileName)
ws = wb.worksheets[0] #Get first worksheet
for row in ws.rows: #Iterate over rows
for cell in row:
cellValue = ""
if cell.value is not None:
cellValue = cell.value
tempFile.write(cellValue+'\t')
tempFile.write('\n')
os.remove(fileName)
return tempFileName
The input file I have contains billing data, but this function is converting $2,000 to 2000 and 0.00 to 0
Any idea why?
This is because in Excel when you set the format for a cell to e.g. currency format so the value 2000 displays as $2000.00, you aren't changing the value in the cell, and openpyxl is reading the value in the cell, not its formatted/displayed presentation. if you had typed the string '$2000.00 in the cell then that's what openpyxl would see. Similarly for display format showing two decimal places, so 0 is displayed as 0.00, the value in the cell is still 0, so that's what openpyxl sees.
There are other questions on SO about reading cell formatting using the xlrd library instead of openpyxl: for example see How to get Excel cell properties in Python and Identifying Excel Sheet cell color code using XLRD package and in general google python excel read cell format

Categories

Resources