Getting staircase output from openpyxl when importing data - python

I'm trying to import data from multiple sheets to another in excel, and in order to do this I need python to input the data into the first empty cell, instead of overwriting the data from the last file. It seems to almost work, however, each column is jumping to its "own" empty row, and not staying in the correct row with the rest of its matching data, creating a staircase type pattern.
This is my code
import os
import openpyxl
os.chdir('C:\\Users\\XX\\Desktop')
wb1 = openpyxl.load_workbook('Test file python.xlsx', data_only = True) #open source excel file
ws1 = wb1.worksheets[0]
wb2 = openpyxl.load_workbook('test3.xlsx', data_only = True) #destination excel file
ws2 = wb2.active
#row_offset = ws2.max_row + 1
for i in range(10,150):
for j in range(3,13):
c = ws1.cell(row = i, column = j)
rowOffset = ws2.max_row + 1
rowNum = rowOffset
ws2.cell(row = rowNum, column = j-2).value = c.value
wb2.save('test3.xlsx')
Here is a screenshot of the output in excel Staircase output

You are changing ws2.max_row each time you put something in ws2 (i.e. - ws2.cell(row = rowNum, column = j-2).value = c.value) your max_row goes up by one affecting the entire loop creating that effect.
use current_row = ws2.max_row outside of the nested loop and it should fix your "staircase" issue.
Also, mind that when you run in the first iteration max_row == 1 that is why your sheet starts at row 2 and not at row 1.

Related

If condition based on cell value in excel using Openpyxl

There are two excel files, where the data on condition should be appended to another excel file.
CONDITION: If Any value in Column A is equal to 'x' then it should get value from col B and get it appended directly to col A/B in excel file 2.
The below table is present in Excel File 1.
The below should be the output... which is in Excel file 2.
Am new to this.. please help with this code, and preferably if code is done using "Openpyxl", it would be much helpful !
Thanks in advance.
A slight improvement on Redox's solution:
import openpyxl
#Open Input File open (file1)
wb1 = openpyxl.load_workbook('file1.xlsx')
ws1 = wb1['Sheet1']
wb2 = openpyxl.Workbook()
ws2 = wb2.active
ws2.append(["Base", "A/B"])
for row in ws1.iter_rows(min_row=2, max_col=3, values_only=True):
base, a, b = row
if a != "x":
new_row = [base, a]
else:
new_row = [base, b]
ws2.append(new_row)
Ideally you should also check that the third column has a valid value.
So, a simple solution and a more complicated one:
Then between files you can use a link or index() or indirect().
To do this using python-openpyxl, you can use the below code... added comments so it is easy to understand... hope this helps. Let me know in case of questions.
The python code
import openpyxl
#Open Input File open (file1)
wb1 = openpyxl.load_workbook('file1.xlsx')
ws1 = wb1['Sheet1']
#Create new file for Output (file2)
wb2 = openpyxl.Workbook()
ws2 = wb2.active
#Add header to output file
ws2.cell(row=1,column=1).value = "BASE"
ws2.cell(row=1,column=2).value = "A/B"
# Iterate through each line in input file from row 2 (skipping header) to last row
for row in ws1.iter_rows(min_row=2, max_row=ws1.max_row, min_col=1, max_col=3):
for col, cell in enumerate(row):
if col == 0: #First column, write to output
ws2.cell(cell.row, col+1).value = cell.value
elif col == 1:
if cell.value != "X": #2nd column, write to output if not X
ws2.cell(cell.row, col+1).value = cell.value
else: #2nd column, write 3rd column if X
ws2.cell(cell.row, col+1).value = ws1.cell(cell.row, col+2).value
wb2.save('file2.xlsx')
Output excel after running

Convert column of numbers to integer openpyxl

I'm getting data from 1 file to another with openpyxl. 1 column contains numbers formatted as text but I want to convert the full column without the header into integer. I managed to do it for 1 cell and I don't manage to do my loop in order to convert all cells.
I tried to loop on the cell 'A + i' to go through all cells but the int() functions doesn't accept it.
Below is the code working and getting all the data and converting only the cell A2:
(I use the first part of the loop as I'm getting values from different columns.)
for i in range(2, 6000):
# Case_ID
cell_range1 = ws1.cell(i,6)
cell_range2 = ws2.cell(i,1)
cell_range2.value = cell_range1.value
ws2['A2'] = int(ws2['A2'].value)
Hope you can help me :)
This should do the trick.
for i in range(2, 6000):
# Case_ID
# value of cell from worksheet 1
ws1_cell_value = ws1.cell(row=i, column=6).value
# convert ws1 value and set ws2 cell value
ws2.cell(row=i, column=1).value = int(ws1_cell_value)
I made a test file and it works as expected. Full code below:
from openpyxl import Workbook
# make test file
wb1 = Workbook()
ws1 = wb1.active
for x in range(2, 6000):
ws1.cell(row=x, column=6).value = f"{x}"
wb1.save('string_column.xlsx')
# out file
wb2 = Workbook()
ws2 = wb2.active
# ACTUAL ANSWER CODE:
for i in range(2, 6000):
# Case_ID
# value of cell from worksheet 1
ws1_cell_value = ws1.cell(row=i, column=6).value
# convert ws1 value and set ws2 cell value
ws2.cell(row=i, column=1).value = int(ws1_cell_value)
wb2.save('int_column.xlsx')

Python/Pandas copy and paste from excel sheet

I found this syntax to copy and paste from one workbook specific sheet to another workbook. however, what i need help with is how to paste the copied information to a specific cell in the second workbook/sheet. like i need to information to be pasted in cell B3 instead of A1.
Thank you
import openpyxl as xl
path1 = "C:/Users/almur_000/Desktop/disandpopbyage.xlsx"
path2 = "C:/Users/almur_000/Desktop/disandpopbyage2.xlsx"
wb1 = xl.load_workbook(filename=path1)
ws1 = wb1.worksheets[0]
wb2 = xl.load_workbook(filename=path2)
ws2 = wb2.create_sheet(ws1.title)
for row in ws1:
for cell in row:
ws2[cell.coordinate].value = cell.value
wb2.save(path2)
wb2 is path2 "C:/Users/almur_000/Desktop/disandpopbyage2.xlsx"
Since the OP is using the openpyxl module I wanted to show a way to do this using that module. With this answer I demonstrate a way to move the original data to new column and row coordinates (there may be better ways to do this).
This fully reproducible example first creates a workbook for demonstration purposes called 'test.xlsx', with three sheets named 'test_1', 'test_2' and 'test_3'. Then using openpyxl, it copies 'test_2' into a new workbook called 'new.xlsx' shifting the cells over 4 columns and down 3 columns. It makes use of the ord() and chr() functions.
import pandas as pd
import numpy as np
import openpyxl
# This section is sample code that creates a worbook in the current directory with 3 worksheets
df = pd.DataFrame(np.random.randn(10, 3), columns=list('ABC'))
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='test_1', index=False)
df.to_excel(writer, sheet_name='test_2', index=False)
df.to_excel(writer, sheet_name='test_3', index=False)
wb = writer.book
ws = writer.sheets['test_2']
writer.close()
# End of sample code that creates a worbook in the current directory with 3 worksheets
wb = openpyxl.load_workbook('test.xlsx')
ws_name_wanted = "test_2"
list_all_ws = wb.get_sheet_names()
for item in list_all_ws:
if item != ws_name_wanted:
remove = wb.get_sheet_by_name(item)
wb.remove_sheet(remove)
ws = wb['%s' % (ws_name_wanted)]
for row in ws.iter_rows():
for cell in row:
cell_value = cell.value
new_col_loc = (chr(int(ord(cell.coordinate[0:1])) + 4))
new_row_loc = cell.coordinate[1:]
ws['%s%d' % (new_col_loc ,int(new_row_loc) + 3)] = cell_value
ws['%s' % (cell.coordinate)] = ' '
wb.save("new.xlsx")
Here's what 'test.xlsx' looks like:
And here's what 'new.xlsx' looks like:
thank you for those helping me.
I found the answer with slight modification. I have removed the last def statement and kept every thing else as it is. it works fantastically. copy and paste in the place i need without removing anything from the template.
`#! Python 3
- Copy and Paste Ranges using OpenPyXl library
import openpyxl
#Prepare the spreadsheets to copy from and paste too.
#File to be copied
wb = openpyxl.load_workbook("foo.xlsx") #Add file name
sheet = wb.get_sheet_by_name("foo") #Add Sheet name
#File to be pasted into
template = openpyxl.load_workbook("foo2.xlsx") #Add file name
temp_sheet = template.get_sheet_by_name("foo2") #Add Sheet name
#Copy range of cells as a nested list
#Takes: start cell, end cell, and sheet you want to copy from.
def copyRange(startCol, startRow, endCol, endRow, sheet):
rangeSelected = []
#Loops through selected Rows
for i in range(startRow,endRow + 1,1):
#Appends the row to a RowSelected list
rowSelected = []
for j in range(startCol,endCol+1,1):
rowSelected.append(sheet.cell(row = i, column = j).value)
#Adds the RowSelected List and nests inside the rangeSelected
rangeSelected.append(rowSelected)
return rangeSelected
#Paste range
#Paste data from copyRange into template sheet
def pasteRange(startCol, startRow, endCol, endRow, sheetReceiving,copiedData):
countRow = 0
for i in range(startRow,endRow+1,1):
countCol = 0
for j in range(startCol,endCol+1,1):
sheetReceiving.cell(row = i, column = j).value = copiedData[countRow][countCol]
countCol += 1
countRow += 1
def createData():
print("Processing...")
selectedRange = copyRange(1,2,4,14,sheet) #Change the 4 number values
pastingRange = pasteRange(1,3,4,15,temp_sheet,selectedRange) #Change the 4 number values
#You can save the template as another file to create a new file here too.s
template.save("foo.xlsx")
print("Range copied and pasted!")`
To copy paste the entire sheet from work book to another.
import pandas as pd
#change NameOfTheSheet with the sheet name that includes the data
data = pd.read_excel(path1, sheet_name="NameOfTheSheet")
#save it to the 'NewSheet' in destfile
data.to_excel(path2, sheet_name='NewSheet')

Copying and Rearranging columns in excel with Openpyxl [duplicate]

This question already has an answer here:
Copy paste column range using OpenPyxl
(1 answer)
Closed 5 years ago.
I have data in an excel file, but for it to be useful I need to copy & paste the columns into a different order.
I have figured out how to open & read my file and to write a new excel file. I can also get the data from the original, and paste it into my new file but not in a loop.
here's an example of the data i'm working with to visualize my issue i need A1,B1,C1 next to each other and then A2,B2,C2, etc etc.
Here is my code from a smaller test file I created to play around with:
import openpyxl as op
wb = op.load_workbook('coding_test.xlsx')
ws = wb.active
mylist = []
mylist2 = []
mylist3 = []
for row in ws.iter_rows('H13:H23'):
for cell in row:
mylist.append(cell.value)
for row in ws.iter_rows('L13:L23'):
for cell in row:
mylist2.append(cell.value)
for row in ws.iter_rows('P13:P23'):
for cell in row:
mylist3.append(cell.value)
print (mylist, mylist2, mylist3)
new_wb = op.Workbook()
dest_filename = 'empty_coding_test.xlsx'
new_ws = new_wb.active
for row in zip (mylist, mylist2, mylist3):
new_ws.append(row)
new_wb.save(filename=dest_filename)
I want to create a loop to do the rest of the work, but I can't figure out how to design it so that I don't have to code for each column and set.
well, you can recycle code doing something like:
import openpyxl as op
wb = op.load_workbook('coding_test.xlsx')
ws = wb.active
new_wb = op.Workbook()
dest_filename = 'empty_coding_test.xlsx'
new_ws = new_wb.active
for row in ws.iter_rows('H13:H23'):
for cell in row:
new_ws['A%s' % cell].value = cell.value
for row in ws.iter_rows('L13:L23'):
for cell in row:
new_ws['B%s' % cell].value = cell.value
for row in ws.iter_rows('P13:P23'):
for cell in row:
new_ws['C%s' % cell].value = cell.value
new_wb.save(filename=dest_filename)
tell me if that work for you

Conditional parsing and output of xlsx files with Openpyxl

I'm working through data for a research project. Output is in the form of .csv files, which have been converted to .xlsx files. There is a separate output file for each participant, with each file containing data on about 40 different measurements across several dozen (or so) stimuli. To make any sense of the data collected, we would need to look at each stimuli separately with relevant associated measurements. Each output file is large (50 columns by 60000 rows). I’m looking to parse the database using openpyxl to search for a cells in a pre-specified column with a particular string value. When such a cell is found, to then write that cell to a new workbook along with other specified columns in the same row.
For instance, parsing the following table, I’m trying to use openpyxl to search column A for ‘Slide 2’. When this value is found for a particular row, that cell is written to a new workbook along with the values in column C and D for that same row.
A B C D
1 Slide Data1 Data2 Data3
2 Slide 1 1 2 3
3 Slide 2 4 5 6
4 Slide 2 7 8 9
Would write:
A B C D
2 Slide 2 5 6
3
4
... or some similar format.
I would also look to fill column D and E with data from the next file, and F and G with data from the file after that (and so on), but I can probably figure that part out.
I’ve tried:
from openpyxl import load_workbook
wb = load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
dest_filename = r'output.xlsx'
for x in range (0, 100): #0-100 as proof of concept before parsing entire worksheet
if ws.cell(row = x, column =26) == ‘some_image.jpg':
print (ws.cell(row =x, column =26), ws.cell(row = x, column = 10), ws.cell(row = x, column = 17))
wb.save = dest_filename
also with adding the following in an attempt to create a worksheet in memory within which to manipulate cells:
for i in range (0, 30):
for j in range (0, 100):
print (ws.cell(row =i, column=j))
... both with minor variations, but they all output a copy of the original file.
I’ve read and re-read the documentation for openpyxl but to no avail. There doesn’t seem to be any similar question on the forums here either.
Any insight in correctly manipulating and writing data would be greatly appreciated. I also hope this might help other people trying to make sense of huge datasets. Thanks in advance!
I'm on Windows 7 running Python3.3.2 (64 bit) with openpyxl-1.6.2. Data was originally in .csv format, so could be exported to .xls or other formats if this helps. I looked into xlutils (using xlwt and xlrd) briefly, but openpyxl worked better with xlsx files.
Edit
Many thanks to #MikeMüller for pointing out I needed two workbooks to transfer data between. That makes much more sense.
I now have the following, but it still returns an empty workbook. The original cells are not blank. (The commented lines are for simplification - without the indent, of course - but code not successful either way.)
import openpyxl
wb = openpyxl.load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
wb_out = openpyxl.Workbook()
ws_out = wb_out.worksheets[0]
#n = 1
#for x in range (0, 1000):
#if ws.cell(row = x, column = 27) == '7.image2.jpg':
ws_out.cell(row = n, column = 1) == ws.cell(row = x, column = 26) #x changed
ws_out.cell(row = n, column = 2) == ws.cell(row = x, column = 10) #x changed
ws_out.cell(row = n, column = 3) == ws.cell(row = x, column = 17) #x changed
#n += 1
wb_out.save('output108.xlsx')
Edit 2
I've updated the code to include the .value for cells, but it still returns a blank workbook.
import openpyxl
wb = openpyxl.load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
wb_out = openpyxl.Workbook()
ws_out = wb_out.worksheets[0]
n = 1
for x in range (0, 1000):
if ws.cell(row=x, column=27).value == '7.Image001.jpg':
ws_out.cell(row=n, column=1).value = ws.cell(row=x, column=27).value
ws_out.cell(row=n, column=2).value = ws.cell(row=x, column=10).value
ws_out.cell(row=n, column=3).value = ws.cell(row=x, column=17).value
n += 1
wb_out.save('output108.xlsx')
Summary for the next person with trouble:
You need to create two worksheets in memory. One to import your file, the to other to write to a new workbook file.
Use the cell.value call function to pull the text entered into each cell of your imported workbook, and set it = the desired cells in the exported workbook.
Make sure you start counting rows and columns at zero.
You are doing cell assignment incorrectly. Here's what should work:
import openpyxl
wb = openpyxl.load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
wb_out = openpyxl.Workbook()
ws_out = wb_out.worksheets[0]
n = 1
for x in range (0, 1000):
if ws.cell(row=x, column=27).value == '7.image2.jpg':
ws_out.cell(row=n, column=1).value = ws.cell(row=x, column=26).value #x changed
ws_out.cell(row=n, column=2).value = ws.cell(row=x, column=10).value #x changed
ws_out.cell(row=n, column=3).value = ws.cell(row=x, column=17).value #x changed
n += 1
wb_out.save('output108.xlsx')
You need to open a second notebook for writing:
import openpyxl
wb_out = openpyxl.Workbook(dest_filename)
ws_out = wb_out.worksheets[0]
Put this in your loop:
ws_out.cell('cell indices here').value = desired_value
Save your file:
writer = openpyxl.ExelWriter(workbook=wb_out)
writer.save(dest_filename)

Categories

Resources