Copying and Rearranging columns in excel with Openpyxl [duplicate]

Copying and Rearranging columns in excel with Openpyxl [duplicate] - python

This question already has an answer here:
Copy paste column range using OpenPyxl
(1 answer)
Closed 5 years ago.
I have data in an excel file, but for it to be useful I need to copy & paste the columns into a different order.
I have figured out how to open & read my file and to write a new excel file. I can also get the data from the original, and paste it into my new file but not in a loop.
here's an example of the data i'm working with to visualize my issue i need A1,B1,C1 next to each other and then A2,B2,C2, etc etc.
Here is my code from a smaller test file I created to play around with:
import openpyxl as op
wb = op.load_workbook('coding_test.xlsx')
ws = wb.active
mylist = []
mylist2 = []
mylist3 = []
for row in ws.iter_rows('H13:H23'):
for cell in row:
mylist.append(cell.value)
for row in ws.iter_rows('L13:L23'):
for cell in row:
mylist2.append(cell.value)
for row in ws.iter_rows('P13:P23'):
for cell in row:
mylist3.append(cell.value)
print (mylist, mylist2, mylist3)
new_wb = op.Workbook()
dest_filename = 'empty_coding_test.xlsx'
new_ws = new_wb.active
for row in zip (mylist, mylist2, mylist3):
new_ws.append(row)
new_wb.save(filename=dest_filename)
I want to create a loop to do the rest of the work, but I can't figure out how to design it so that I don't have to code for each column and set.

well, you can recycle code doing something like:
import openpyxl as op
wb = op.load_workbook('coding_test.xlsx')
ws = wb.active
new_wb = op.Workbook()
dest_filename = 'empty_coding_test.xlsx'
new_ws = new_wb.active
for row in ws.iter_rows('H13:H23'):
for cell in row:
new_ws['A%s' % cell].value = cell.value
for row in ws.iter_rows('L13:L23'):
for cell in row:
new_ws['B%s' % cell].value = cell.value
for row in ws.iter_rows('P13:P23'):
for cell in row:
new_ws['C%s' % cell].value = cell.value
new_wb.save(filename=dest_filename)
tell me if that work for you

Related

If condition based on cell value in excel using Openpyxl

There are two excel files, where the data on condition should be appended to another excel file.
CONDITION: If Any value in Column A is equal to 'x' then it should get value from col B and get it appended directly to col A/B in excel file 2.
The below table is present in Excel File 1.
The below should be the output... which is in Excel file 2.
Am new to this.. please help with this code, and preferably if code is done using "Openpyxl", it would be much helpful !
Thanks in advance.

A slight improvement on Redox's solution:
import openpyxl
#Open Input File open (file1)
wb1 = openpyxl.load_workbook('file1.xlsx')
ws1 = wb1['Sheet1']
wb2 = openpyxl.Workbook()
ws2 = wb2.active
ws2.append(["Base", "A/B"])
for row in ws1.iter_rows(min_row=2, max_col=3, values_only=True):
base, a, b = row
if a != "x":
new_row = [base, a]
else:
new_row = [base, b]
ws2.append(new_row)
Ideally you should also check that the third column has a valid value.

So, a simple solution and a more complicated one:
Then between files you can use a link or index() or indirect().

To do this using python-openpyxl, you can use the below code... added comments so it is easy to understand... hope this helps. Let me know in case of questions.
The python code
import openpyxl
#Open Input File open (file1)
wb1 = openpyxl.load_workbook('file1.xlsx')
ws1 = wb1['Sheet1']
#Create new file for Output (file2)
wb2 = openpyxl.Workbook()
ws2 = wb2.active
#Add header to output file
ws2.cell(row=1,column=1).value = "BASE"
ws2.cell(row=1,column=2).value = "A/B"
# Iterate through each line in input file from row 2 (skipping header) to last row
for row in ws1.iter_rows(min_row=2, max_row=ws1.max_row, min_col=1, max_col=3):
for col, cell in enumerate(row):
if col == 0: #First column, write to output
ws2.cell(cell.row, col+1).value = cell.value
elif col == 1:
if cell.value != "X": #2nd column, write to output if not X
ws2.cell(cell.row, col+1).value = cell.value
else: #2nd column, write 3rd column if X
ws2.cell(cell.row, col+1).value = ws1.cell(cell.row, col+2).value
wb2.save('file2.xlsx')
Output excel after running

Getting staircase output from openpyxl when importing data

I'm trying to import data from multiple sheets to another in excel, and in order to do this I need python to input the data into the first empty cell, instead of overwriting the data from the last file. It seems to almost work, however, each column is jumping to its "own" empty row, and not staying in the correct row with the rest of its matching data, creating a staircase type pattern.
This is my code
import os
import openpyxl
os.chdir('C:\\Users\\XX\\Desktop')
wb1 = openpyxl.load_workbook('Test file python.xlsx', data_only = True) #open source excel file
ws1 = wb1.worksheets[0]
wb2 = openpyxl.load_workbook('test3.xlsx', data_only = True) #destination excel file
ws2 = wb2.active
#row_offset = ws2.max_row + 1
for i in range(10,150):
for j in range(3,13):
c = ws1.cell(row = i, column = j)
rowOffset = ws2.max_row + 1
rowNum = rowOffset
ws2.cell(row = rowNum, column = j-2).value = c.value
wb2.save('test3.xlsx')
Here is a screenshot of the output in excel Staircase output

You are changing ws2.max_row each time you put something in ws2 (i.e. - ws2.cell(row = rowNum, column = j-2).value = c.value) your max_row goes up by one affecting the entire loop creating that effect.
use current_row = ws2.max_row outside of the nested loop and it should fix your "staircase" issue.
Also, mind that when you run in the first iteration max_row == 1 that is why your sheet starts at row 2 and not at row 1.

Openpyxl - Appending data from an Excel workbook to another

I would like to get a specific row from Workbook 1 and append it after the existing data in Workbook 2.
The code that I tried so far can be found down below:
import openpyxl as xl
from openpyxl.utils import range_boundaries
min_cols, min_rows, max_cols, max_rows = range_boundaries('A:GH')
#Take source file
source = r"C:\Users\Desktop\Python project\Workbook1.xlsx"
wb1 = xl.load_workbook(source)
ws1 = wb1["P2"] #get the needed sheet
#Take destination file
destination = r"C:\Users\Desktop\Python project\Workbook2.xlsx"
wb2 = xl.load_workbook(destination)
ws2 = wb2["A3"] #get the needed sheet
row_data = 0
#Get row & col position and store it in row_data & col_data
for row in ws1.iter_rows():
for cell in row:
if cell.value == "Positive":
row_data += cell.row
for row in ws1.iter_rows(min_row=row_data, min_col = 1, max_col=250, max_row = row_data):
ws2.append((cell.value for cell in row[min_cols:max_cols]))
wb2.save(destination)
wb2.close()
But when I use the above mentioned code, I get the result but with a shift of 1 row.
I want the data that is appended to row 8, to be on row 7, right after the last data in Workbook 2.
(See image below)
Workbook 2
Does anyone got any feedback?
Thanks!

I found the solution and will post it here in case anyone will have the same problem. Although the cells below looked empty, they had apparently, weird formatting. That's why the Python script saw the cells as Non-empty and appended/shifted the data in another place(the place where there was no formatting).
The Solution would be to format every row below your data as empty cells. (Just copy a range of empty cells from a new Workbook and paste it below your data)
Hope that helps! ;)

Openpyxl - Copy range of cells(with formula) from a workbook to another

I'm trying to copy specific rows from Workbook 1 and append it to the existing data in Workbook 2.
Copy the highlighed rows from
Workbook 1,
and append them in Workbook 2 below 'March'
So far I succeeded to copy and paste the range, but there are two problems:
1.Cells are a shifted
2.The percentage(formula) is missing, leaving only numeric values.
See Result here
import openpyxl as xl
source = r"C:\Users\Desktop\Test_project_20200401.xlsx"
wbs = xl.load_workbook(source)
wbs_sheet = wbs["P2"] #selecting the sheet
destination = r"C:\Users\Desktop\Try999.xlsx"
wbd = xl.load_workbook(destination)
wbd_sheet = wbd["A3"] #select the sheet
row_data = 0
for row in wbs_sheet.iter_rows():
for cell in row:
if cell.value == "Yes":
row_data += cell.row
for row in wbs_sheet.iter_rows(min_row=row_data, min_col = 1, max_col=250, max_row = row_data+1):
wbd_sheet.append((cell.value for cell in row))
wbd.save(destination)
Does anyone have any idea on how can I solve this?
Any feedback/solution would help!
Thanks!

I think min_col should = 0
Range("A1").Formula (in VBA) gets the formula.
Range("A1").Value (in VBA) gets the value.
So try using .formula in Python
(thanks to: Get back a formula from a cell - VBA ... if this works)

Just want to add my own solution in here.
What I did, was to iterate through the columns and apply "cell.number_format = '0%', which converts your cell value to percentage.
for col in ws.iter_cols(min_row=1, min_col=2, max_row=250, max_col=250):
for cell in col:
cell.number_format = '0%'
More info can be found in here:
https://openpyxl.readthedocs.io/en/stable/_modules/openpyxl/styles/numbers.html

Copying the segment from one Excel file to another with python: xlrd and xlsxwriter

I am trying to copy an entire segment of an Excel sheet to another file.
The segment is actually a header/description, which mainly describes the attributes of the file, the date it was created, etc...
All this takes some cells at first five rows and first 3 columns, say from A1:C3.
Here's the code I've written (for sake of example, made only for 3 rows):
import xlsxwriter
import xlrd
#### open original excelbook
workbook = xlrd.open_workbook('hello.xlsx')
sheet = workbook.sheet_by_index(0)
# list of populated header rows
row_header_list = ['A1','A2','A3','A4','A5']
i = 0
c = 0
while c <= 2:
#### read original xcel book 3 rows by loop - counter is futher below
data = [sheet.cell_value(c, col) for col in range(sheet.ncols)]
#print data
#### write rows to the new excel book
workbook = xlsxwriter.Workbook('tty_header.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write_row(row_header_list[i], data)
print i,c,row_header_list[i], data
i+=1
c+=1
print "new i is", i, "new c is", c, "list value", row_header_list[i],"data is", data
workbook.close()
The counters, data, list values - everything seems to be correct and on time, according to print commands, however, when I run this code, in the newly created file only 3'rd row gets populated, rows 1 and 2 are EMPTY. Don't understand why...
To test the issue, made another example-a really inelegant one - no looping, control lists, etc-just blunt approach:
import xlsxwriter
import xlrd
# open original excelbook
workbook = xlrd.open_workbook('hello.xlsx')
sheet = workbook.sheet_by_index(0)
data1 = [sheet.cell_value(0, col) for col in range(sheet.ncols)]
data2 = [sheet.cell_value(1, col) for col in range(sheet.ncols)]
data3 = [sheet.cell_value(2, col) for col in range(sheet.ncols)]
data4 = [sheet.cell_value(3, col) for col in range(sheet.ncols)]
### new excelbook
workbook = xlsxwriter.Workbook('tty_header2.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write_row('A1', data1)
worksheet.write_row('A2', data2)
worksheet.write_row('A3', data3)
worksheet.write_row('A4', data4)
workbook.close()
In THIS case everything worked out fine and all the needed data was transferred.
Anyone can explain me what is wrong with the first one? Thank you.
Additional trouble I have is if I, after placing header, start to populate columns, the header values become NULL. That is despite me, starting column population from the cell below the "header" cell(in the code, I provide below it's column 1, starting from cell 6. Any ideas on how to solve it?
workbook = xlrd.open_workbook('tty_header2.xlsx.xlsx')
sheet = workbook.sheet_by_index(0)
data = [sheet.cell_value(row, 2) for row in range(23, sheet.nrows)]
print data
##### writing new file with xlswriter
workbook = xlsxwriter.Workbook('try2.xlsx')
worksheet = workbook.add_worksheet('A')
worksheet.write_column('A6', data)
workbook.close()
UPDATE: Here's the revised code, after Mike's correction:
import xlsxwriter
import xlrd
# open original excelbook and access first sheet
workbook = xlrd.open_workbook('hello_.xlsx')
sheet = workbook.sheet_by_index(0)
# define description rows
row_header_list = ['A1','A2','A3','A4','A5']
i = 0
c = 0
#create second file, add first sheet
workbook2 = xlsxwriter.Workbook('try2.xlsx')
worksheet = workbook2.add_worksheet('A')
# read original xcel book 5 rows by loop - counter is futher below
while c <= 5:
data = [sheet.cell_value(c, col) for col in range(1,5)]
#print data
# write rows to the new excel book
worksheet.write_row(row_header_list[i], data)
# print "those are initial values",i,c,row_header_list[i], data
i+=1
c+=1
# print "new i is", i, "new c is", c, "list value", row_header_list[i],"data is", data
####### works !!! xlrd - copy some columns, disclaiming 23 first rows and writing data to the new file
columnB_data = [sheet.cell_value(row, 2) for row in range(23, 72)]
print columnB_data
##### writing new file with xlswriter - works, without (!!!) converting data to tuple
worksheet.write_column('A5', columnB_data)
columnG_data = [sheet.cell_value(row, 6) for row in range(23, 72)]
#worksheet = workbook.add_worksheet('B')
print columnG_data
worksheet.write_column('B5', columnG_data)
worksheet = workbook.add_worksheet('C')
columnC_dta = [sheet.cell_value(row, 7) for row in range(23, 72)]
print columnC_dta
worksheet.write_column('A5', columnC_dta)
#close workbook2
workbook2.close()
After running this I get the following error "Traceback (most recent call last):
File "C:/Users/Michael/PycharmProjects/untitled/cleaner.py", line 28, in
worksheet.write_row(row_header_list[i], data)
IndexError: list index out of range
Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in > ignored".
The "line 28" refers to:
worksheet.write_row(row_header_list[i], data)
running the entire segment from the beginning to finalizing the loop seems to be fine and provide correct output, thus the problem is down below.
If I use the explicit close method, as suggested, I will not be able to use add_sheet method again, since it'll run over my current sheet. In the provided documentation there are "sheet.activate" and "sheet.select" methods, but they seem to be for cosmetic improvement reasons. I have tried to place the xlsxwriter's work into a different variable (although if I place all the "copying" process at the top, I don't ming "workbook" being run over) - didn't help

You create new output file with the same name in each loop:
while c <= 2:
#...
workbook = xlsxwriter.Workbook('tty_header.xlsx')
worksheet = workbook.add_worksheet()
Therefore, you overwrite the file in each loop and only the last row gets saved.
Just move this out of the loop:
workbook = xlsxwriter.Workbook('tty_header.xlsx')
worksheet = workbook.add_worksheet()
while c <= 2:
#...
workbook.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Copying and Rearranging columns in excel with Openpyxl [duplicate] - python

Related

If condition based on cell value in excel using Openpyxl

Getting staircase output from openpyxl when importing data

Openpyxl - Appending data from an Excel workbook to another

Openpyxl - Copy range of cells(with formula) from a workbook to another

Copying the segment from one Excel file to another with python: xlrd and xlsxwriter

Categories

Resources