Loop through and compare spreadsheet cells in Python

Loop through and compare spreadsheet cells in Python - python

Please excuse the crude code and I'm sure there are better ways to accomplish this but I am new to programming. Basically I have an excel file with 2 sheets, sheet 1 is populated in column A, sheet 2 is populated in A, B, and C. I want to run through all of the cells in sheet 1 column A searching for a match in sheet 2 column A and copy the info from B and C to sheet 1 if found. The code below kind of works, it copies some data and populates it but it doesn't really match up correctly and it seems to skip a lot of cells if they are the same value as the previous cell. Any help would be greatly appreciated.
import openpyxl
wb = openpyxl.load_workbook('spreadsheet.xlsx')
sheet1 = wb.get_sheet_by_name('Sheet1')
sheet2 = wb.get_sheet_by_name('Sheet2')
for row in sheet1['A1':'A200']:
for cell in row:
obj1 = cell.value
for row2 in sheet2['A1':'A2000']:
for cell2 in row2:
obj2 = cell2.value
if obj1 == obj2:
row = str(cell2.row)
site = 'B' + row
tic = 'C' + row
sheet1[site] = sheet2[site].value
sheet1[tic] = sheet2[tic].value
wb.save('spreadsheet2.xlsx')

Your question is a little unclear but if I understand you correctly this should help:
import openpyxl
wb = openpyxl.load_workbook('spreadsheet.xlsx')
sheet1 = wb.get_sheet_by_name('Sheet1')
sheet2 = wb.get_sheet_by_name('Sheet2')
for i in range(1, 201):
if sheet1.cell(row = i, column = 1).value == sheet2.cell(row = i, column = 1).value:
sheet1.cell(row = i, column = 2).value = sheet2.cell(row = i, column = 2).value
sheet1.cell(row = i, column = 3).value = sheet2.cell(row = i, column = 3).value
wb.save('spreadsheet2.xlsx')
I was able to clean up the code by the using the .cell() method. If this isn't what you need just comment and tell me what exactly you are trying to do. Hope this helps!

Related

Python Openpyxl iter_rows and add defined value in each cell

Question: Can someone please let me know how I can achieve the following task:
I've defined the column, but i need a specific value to go into each cell within that column.
Also, if column 6 only has x amount of rows, then i want column 7 to also have only x amount of rows with the values pasted in it.
This is the code i've tried.
import openpyxl
wb = openpyxl.load_workbook(filename=r'C:\Users\.spyder-py3\data\BMA.xlsx')
ws = wb.worksheets[0]
for row in ws.iter_rows('G{}:G{}'.format(ws.min_row,ws.max_row)):
for cell in row:
ws.cell(row=cell, column=7).value = 'BMA'
wb.save(r'C:\Users\.spyder-py3\data\BMA.csv')
wb.close()

I figured out most of the issue by looking at this answer:
https://stackoverflow.com/a/15004956/9649146
This is the code i end up with:
import openpyxl
wb = openpyxl.load_workbook(filename=r'C:\Users\.spyder-py3\data\AAXN.xlsx')
ws = wb.worksheets[0]
r = 2
for row in ws.iter_rows('G{}:G{}'.format(ws.min_row,ws.max_row)):
for cell in row:
ws.cell(row=r, column=7).value = 'AAXN'
r += 1
wb.save(r'C:\Users\.spyder-py3\data\AAXN.csv')
wb.close()

Or, you can do something like this:
for row in filesheet.iter_rows(min_row=2, max_row=file_sheet.max_row):
filesheet.cell(row=row[0].row, column=7).value = 'my value'

How to transfer data from one worksheet into another using python in the same workbook?

I want to transfer my data which is in 'Sheet1' into worksheet 'Sheet2'in the same workbook, using python.I have written the script below, but facing an 'IndexError: list index out of range'. I know this is not the best way to go about it. I will appreciate if anyone can guide me with a more efficient way to go about it. I am sharing the Snapshot of excel file below
Can I directly enter 'Sheet1' cell value ---> 'Sheet2' cell value rather than doing 'Sheet1' cell value ---> List---->'Sheet2' cell value?
import openpyxl
wb = openpyxl.load_workbook('C:\Users\laban\Desktop\Par-emails1.xlsx')
type(wb)
wb.get_sheet_names()
sheet = wb.get_sheet_by_name('Sheet1')
type(sheet)
anotherSheet = wb.active
sheet1 = wb.get_sheet_by_name('Sheet2')
type(sheet1)
par=[sheet.cell(row= col, column=1).value for col in range(1, 2450)]
email_no=[sheet.cell(row= col, column=2).value for col in range(1, 2450)]
Domain=[sheet.cell(row= col, column=3).value for col in range(1, 2450)]
email=[sheet.cell(row= col, column=4).value for col in range(1, 2450)]
for x in range(0,2450):
if email_no[x]<9:
sheet1.cell(row= x+1, column=1).value=par[x]
sheet1.cell(row= x+1, column=2).value=email_no[x]
sheet1.cell(row= x+1, column=3).value=Domain[x]
sheet1.cell(row= x+1, column=4).value=email[x]
wb.save('C:\Users\laban\Desktop\Par-emails1.xlsx')

You can use:
wb = openpyxl.load_workbook('C:/Users/laban/Desktop/Par-emails1.xlsx')
sheet1 = wb.get_sheet_by_name('Sheet1')
sheet2 = wb.get_sheet_by_name('Sheet2')
for i,row in enumerate(sheet1.iter_rows()):
for j,col in enumerate(row):
sheet2.cell(row=i+1,column=j+1).value = col.value
Apparently in 2.4 you can do this with one command: Copy whole worksheet with openpyxl

Obviously this is very simplified version from what I am currently using.
Here is a code snippet to copy data from sheet1 to sheet2 without any formatting.
You dont need to specify max rows , max columns as you can get it using
sheet.max_row and sheet.max_columns methods.
from openpyxl.cell import Cell
max_row = sheet1.max_row #Get max row of first sheet
max_col = sheet1.max_column #Get max column of first sheet
for row_num in range(1,max_row + 1): #Iterate through rows
for col_num in range(1, max_col + 1): #Iterate through columns
_cell1 = sheet1.cell(row=row_num, column=col_num)
_cell2 = sheet2.cell(row=row_num, column=col_num)
_cell2.value = _cell1.value
Added extra variables for understanding. You can compact at your end.

bernie probably has the best answer here (Copy whole worksheet) but your index error might be coming from:
par=[sheet.cell(row= col, column=1).value for col in range(1, 2450)]
being 1 cell shorter than:
for x in range(0,2450):

def excel_op():
filename='D://Python//excelread.xlsx'
sheet_name='Sheet1'
book = xlrd.open_workbook(str(filename))
sheet = book.sheet_by_name(sheet_name)
workbook = xlsxwriter.Workbook('excelwrite.xlsx')
worksheet = workbook.add_worksheet()
row_count = int(sheet.nrows)
col_count = int(sheet.ncols)
for row in range(0, int(row_count)):
for col in range(0, int(col_count)):
worksheet.write(row,col,str(sheet.cell(row, col).value))
workbook.close()
del book

shutil.copy("Source.xlsx", "target.xlsx")

Write Pivot Table to Excel Workbook Range using XLSX Writer

I have a series of tables I would like to write to the same worksheet. The only other post similar to this is here. I also looked here but didn't see a solution.
I was hoping for a similar situation to SAS ODS Output codes that send proc freq results to an excel file. My thought was turning the table results into a new data frame and then stacking the output results to a worksheet.
pd.value_counts(df['name'])
df.groupby('name').aggregate({'Id': lambda x: x.unique()})
If I know the number of rows corresponding to the table, I should ideally know the appropriate range of cells to write to.
I am using:
import xlsxwriter
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()
tableone = pd.value_counts(df['name'])
tabletwo = df.groupby('name').aggregate({'Id': lambda x: x.unique()})
worksheet.write('B2:C15', tableone)
worksheet.write('D2:E15', tabletwo)
workbook.close()
EDIT: Include view of tableone
TableOne:
Name | Freq
A 5
B 1
C 6
D 11

import xlsxwriter
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()
tableone = pd.value_counts(df['name'])
tabletwo = df.groupby('name').aggregate({'Id': lambda x: x.unique()})
col = 1, row = 1 #This is cell b2
for value in tableone:
if col == 16:
row += 1
col = 1
worksheet.write(row,col, value)
col += 1
col = 3, row = 1 #This is cell d2
for value in tabletwo:
if col == 16:
row += 1
col = 1
worksheet.write(row,col,value)
col += 1

Conditional parsing and output of xlsx files with Openpyxl

I'm working through data for a research project. Output is in the form of .csv files, which have been converted to .xlsx files. There is a separate output file for each participant, with each file containing data on about 40 different measurements across several dozen (or so) stimuli. To make any sense of the data collected, we would need to look at each stimuli separately with relevant associated measurements. Each output file is large (50 columns by 60000 rows). I’m looking to parse the database using openpyxl to search for a cells in a pre-specified column with a particular string value. When such a cell is found, to then write that cell to a new workbook along with other specified columns in the same row.
For instance, parsing the following table, I’m trying to use openpyxl to search column A for ‘Slide 2’. When this value is found for a particular row, that cell is written to a new workbook along with the values in column C and D for that same row.
A B C D
1 Slide Data1 Data2 Data3
2 Slide 1 1 2 3
3 Slide 2 4 5 6
4 Slide 2 7 8 9
Would write:
A B C D
2 Slide 2 5 6
3
4
... or some similar format.
I would also look to fill column D and E with data from the next file, and F and G with data from the file after that (and so on), but I can probably figure that part out.
I’ve tried:
from openpyxl import load_workbook
wb = load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
dest_filename = r'output.xlsx'
for x in range (0, 100): #0-100 as proof of concept before parsing entire worksheet
if ws.cell(row = x, column =26) == ‘some_image.jpg':
print (ws.cell(row =x, column =26), ws.cell(row = x, column = 10), ws.cell(row = x, column = 17))
wb.save = dest_filename
also with adding the following in an attempt to create a worksheet in memory within which to manipulate cells:
for i in range (0, 30):
for j in range (0, 100):
print (ws.cell(row =i, column=j))
... both with minor variations, but they all output a copy of the original file.
I’ve read and re-read the documentation for openpyxl but to no avail. There doesn’t seem to be any similar question on the forums here either.
Any insight in correctly manipulating and writing data would be greatly appreciated. I also hope this might help other people trying to make sense of huge datasets. Thanks in advance!
I'm on Windows 7 running Python3.3.2 (64 bit) with openpyxl-1.6.2. Data was originally in .csv format, so could be exported to .xls or other formats if this helps. I looked into xlutils (using xlwt and xlrd) briefly, but openpyxl worked better with xlsx files.
Edit
Many thanks to #MikeMüller for pointing out I needed two workbooks to transfer data between. That makes much more sense.
I now have the following, but it still returns an empty workbook. The original cells are not blank. (The commented lines are for simplification - without the indent, of course - but code not successful either way.)
import openpyxl
wb = openpyxl.load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
wb_out = openpyxl.Workbook()
ws_out = wb_out.worksheets[0]
#n = 1
#for x in range (0, 1000):
#if ws.cell(row = x, column = 27) == '7.image2.jpg':
ws_out.cell(row = n, column = 1) == ws.cell(row = x, column = 26) #x changed
ws_out.cell(row = n, column = 2) == ws.cell(row = x, column = 10) #x changed
ws_out.cell(row = n, column = 3) == ws.cell(row = x, column = 17) #x changed
#n += 1
wb_out.save('output108.xlsx')
Edit 2
I've updated the code to include the .value for cells, but it still returns a blank workbook.
import openpyxl
wb = openpyxl.load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
wb_out = openpyxl.Workbook()
ws_out = wb_out.worksheets[0]
n = 1
for x in range (0, 1000):
if ws.cell(row=x, column=27).value == '7.Image001.jpg':
ws_out.cell(row=n, column=1).value = ws.cell(row=x, column=27).value
ws_out.cell(row=n, column=2).value = ws.cell(row=x, column=10).value
ws_out.cell(row=n, column=3).value = ws.cell(row=x, column=17).value
n += 1
wb_out.save('output108.xlsx')
Summary for the next person with trouble:
You need to create two worksheets in memory. One to import your file, the to other to write to a new workbook file.
Use the cell.value call function to pull the text entered into each cell of your imported workbook, and set it = the desired cells in the exported workbook.
Make sure you start counting rows and columns at zero.

You are doing cell assignment incorrectly. Here's what should work:
import openpyxl
wb = openpyxl.load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
wb_out = openpyxl.Workbook()
ws_out = wb_out.worksheets[0]
n = 1
for x in range (0, 1000):
if ws.cell(row=x, column=27).value == '7.image2.jpg':
ws_out.cell(row=n, column=1).value = ws.cell(row=x, column=26).value #x changed
ws_out.cell(row=n, column=2).value = ws.cell(row=x, column=10).value #x changed
ws_out.cell(row=n, column=3).value = ws.cell(row=x, column=17).value #x changed
n += 1
wb_out.save('output108.xlsx')

You need to open a second notebook for writing:
import openpyxl
wb_out = openpyxl.Workbook(dest_filename)
ws_out = wb_out.worksheets[0]
Put this in your loop:
ws_out.cell('cell indices here').value = desired_value
Save your file:
writer = openpyxl.ExelWriter(workbook=wb_out)
writer.save(dest_filename)

Insert column using openpyxl

I'm working on a script that modifies an existing excel document and I need to have the ability to insert a column between two other columns like the VBA macro command .EntireColumn.Insert.
Is there any method with openpyxl to insert a column like this?
If not, any advice on writing one?

Here is an example of a much much faster way:
import openpyxl
wb = openpyxl.load_workbook(filename)
sheet = wb.worksheets[0]
# this statement inserts a column before column 2
sheet.insert_cols(2)
wb.save("filename.xlsx")

Haven't found anything like .EntireColumn.Insert in openpyxl.
First thought coming into my mind is to insert column manually by modifying _cells on a worksheet. I don't think it's the best way to insert column but it works:
from openpyxl.workbook import Workbook
from openpyxl.cell import get_column_letter, Cell, column_index_from_string, coordinate_from_string
wb = Workbook()
dest_filename = r'empty_book.xlsx'
ws = wb.worksheets[0]
ws.title = "range names"
# inserting sample data
for col_idx in xrange(1, 10):
col = get_column_letter(col_idx)
for row in xrange(1, 10):
ws.cell('%s%s' % (col, row)).value = '%s%s' % (col, row)
# inserting column between 4 and 5
column_index = 5
new_cells = {}
ws.column_dimensions = {}
for coordinate, cell in ws._cells.iteritems():
column_letter, row = coordinate_from_string(coordinate)
column = column_index_from_string(column_letter)
# shifting columns
if column >= column_index:
column += 1
column_letter = get_column_letter(column)
coordinate = '%s%s' % (column_letter, row)
# it's important to create new Cell object
new_cells[coordinate] = Cell(ws, column_letter, row, cell.value)
ws._cells = new_cells
wb.save(filename=dest_filename)
I understand that this solution is very ugly but I hope it'll help you to think in a right direction.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loop through and compare spreadsheet cells in Python - python

Related

Python Openpyxl iter_rows and add defined value in each cell

How to transfer data from one worksheet into another using python in the same workbook?

Write Pivot Table to Excel Workbook Range using XLSX Writer

Conditional parsing and output of xlsx files with Openpyxl

Insert column using openpyxl

Categories

Resources