Convert column of numbers to integer openpyxl - python

I'm getting data from 1 file to another with openpyxl. 1 column contains numbers formatted as text but I want to convert the full column without the header into integer. I managed to do it for 1 cell and I don't manage to do my loop in order to convert all cells.
I tried to loop on the cell 'A + i' to go through all cells but the int() functions doesn't accept it.
Below is the code working and getting all the data and converting only the cell A2:
(I use the first part of the loop as I'm getting values from different columns.)
for i in range(2, 6000):
# Case_ID
cell_range1 = ws1.cell(i,6)
cell_range2 = ws2.cell(i,1)
cell_range2.value = cell_range1.value
ws2['A2'] = int(ws2['A2'].value)
Hope you can help me :)

This should do the trick.
for i in range(2, 6000):
# Case_ID
# value of cell from worksheet 1
ws1_cell_value = ws1.cell(row=i, column=6).value
# convert ws1 value and set ws2 cell value
ws2.cell(row=i, column=1).value = int(ws1_cell_value)
I made a test file and it works as expected. Full code below:
from openpyxl import Workbook
# make test file
wb1 = Workbook()
ws1 = wb1.active
for x in range(2, 6000):
ws1.cell(row=x, column=6).value = f"{x}"
wb1.save('string_column.xlsx')
# out file
wb2 = Workbook()
ws2 = wb2.active
# ACTUAL ANSWER CODE:
for i in range(2, 6000):
# Case_ID
# value of cell from worksheet 1
ws1_cell_value = ws1.cell(row=i, column=6).value
# convert ws1 value and set ws2 cell value
ws2.cell(row=i, column=1).value = int(ws1_cell_value)
wb2.save('int_column.xlsx')

Related

Python openpyxl to automate entire column in excel

import openpyxl
i=2
workbook= openpyxl.load_workbook()
sheet = workbook.active
for i, cellObj in enumerate (sheet['I'],2):
cellObj.value = '=IF(ISNUMBER(A2)*(A2<>0),A2,IF(ISNUMBER(F2)*(F2<>0),F2,IF(ISBLANK(A2)*ISBLANK(F2)*ISBLANK(H2),0,H2)))'
workbook.save()
Using openpxl, I tried to apply formula to entire column 'I' its not working as per the formula, I wanted formula to start from I2 but its start from I1 and wrong output as well.
I have attached a screenshot.
.
Can someone please correct the code?
Output of print(list(enumerate(sheet['I']))):
You'd probably be better off to do it this way, auto skip row 1 by starting the iteration at row 2 and update the formula using the cell row number.
import openpyxl
excelfile = 'foo.xlsx'
workbook= openpyxl.load_workbook(excelfile)
sheet = workbook.active
mr = sheet.max_row # Last row to add formula to
for row in sheet.iter_rows(min_col=9, max_col=9, min_row=2, max_row=mr):
for cell in row:
cr = cell.row # Get the current row number to use in formula
cell.value = f'=IF(ISNUMBER(A{cr})*(A{cr} <> 0), A{cr}, IF(ISNUMBER(F{cr})*(F{cr} <> 0), F{cr}, IF(ISBLANK(A{cr})*ISBLANK(F{cr})*ISBLANK(H{cr}), 0, H{cr})))'
workbook.save(excelfile)
If you know the from and to row numbers, then you can use it like this:
from openpyxl import load_workbook
wb = load_workbook(filename="/content/sample_data/Book1.xlsx")
ws = wb.active
from_row = 2
to_row = 4
for i in range(from_row, to_row+1):
ws[f"C{i}"] = f'=_xlfn.CONCAT(A{i}, "_", B{i})'
wb.save("/content/sample_data/formula.xlsx")
Input (Book1.xlsx):
Output (formula.xlsx):
I don't have your data, so I did not test the following formula; but your formula can be translated to format string as:
for i in range(from_row, to_row+1):
ws[f"I{i}"] = f'=IF(ISNUMBER(A{i})*(A{i}<>0),A{i},IF(ISNUMBER(F{i})*(F{i}<>0),F{i},IF(ISBLANK(A{i})*ISBLANK(F{i})*ISBLANK(H{i}),0,H{i})))'
It formats the formula as:
=IF(ISNUMBER(A2)*(A2<>0),A2,IF(ISNUMBER(F2)*(F2<>0),F2,IF(ISBLANK(A2)*ISBLANK(F2)*ISBLANK(H2),0,H2)))
=IF(ISNUMBER(A3)*(A3<>0),A3,IF(ISNUMBER(F3)*(F3<>0),F3,IF(ISBLANK(A3)*ISBLANK(F3)*ISBLANK(H3),0,H3)))
=IF(ISNUMBER(A4)*(A4<>0),A4,IF(ISNUMBER(F4)*(F4<>0),F4,IF(ISBLANK(A4)*ISBLANK(F4)*ISBLANK(H4),0,H4)))

If condition based on cell value in excel using Openpyxl

There are two excel files, where the data on condition should be appended to another excel file.
CONDITION: If Any value in Column A is equal to 'x' then it should get value from col B and get it appended directly to col A/B in excel file 2.
The below table is present in Excel File 1.
The below should be the output... which is in Excel file 2.
Am new to this.. please help with this code, and preferably if code is done using "Openpyxl", it would be much helpful !
Thanks in advance.
A slight improvement on Redox's solution:
import openpyxl
#Open Input File open (file1)
wb1 = openpyxl.load_workbook('file1.xlsx')
ws1 = wb1['Sheet1']
wb2 = openpyxl.Workbook()
ws2 = wb2.active
ws2.append(["Base", "A/B"])
for row in ws1.iter_rows(min_row=2, max_col=3, values_only=True):
base, a, b = row
if a != "x":
new_row = [base, a]
else:
new_row = [base, b]
ws2.append(new_row)
Ideally you should also check that the third column has a valid value.
So, a simple solution and a more complicated one:
Then between files you can use a link or index() or indirect().
To do this using python-openpyxl, you can use the below code... added comments so it is easy to understand... hope this helps. Let me know in case of questions.
The python code
import openpyxl
#Open Input File open (file1)
wb1 = openpyxl.load_workbook('file1.xlsx')
ws1 = wb1['Sheet1']
#Create new file for Output (file2)
wb2 = openpyxl.Workbook()
ws2 = wb2.active
#Add header to output file
ws2.cell(row=1,column=1).value = "BASE"
ws2.cell(row=1,column=2).value = "A/B"
# Iterate through each line in input file from row 2 (skipping header) to last row
for row in ws1.iter_rows(min_row=2, max_row=ws1.max_row, min_col=1, max_col=3):
for col, cell in enumerate(row):
if col == 0: #First column, write to output
ws2.cell(cell.row, col+1).value = cell.value
elif col == 1:
if cell.value != "X": #2nd column, write to output if not X
ws2.cell(cell.row, col+1).value = cell.value
else: #2nd column, write 3rd column if X
ws2.cell(cell.row, col+1).value = ws1.cell(cell.row, col+2).value
wb2.save('file2.xlsx')
Output excel after running

some of excel formula change into only value with python openpyxl

I am doing some projects with python openpyxl. In the excel file, some of the parts should be input excel function and the other parts should be input only value, so I cannot 'data_only = True'
wb = load_workbook(merged_excel_file)
ws = wb.active
last_row = ws.max_row
for o in range(5, last_row + 1):
ws.cell(row=o, column=17).value = f"=if(countif($B$5:B{o},B{o}) = 1,3000,"'0'")"
Is there anything else that I can change the excel function into only values except for 'data_only = True' Because other cells should be shown by the data formula? This is my codding as followed.

Getting staircase output from openpyxl when importing data

I'm trying to import data from multiple sheets to another in excel, and in order to do this I need python to input the data into the first empty cell, instead of overwriting the data from the last file. It seems to almost work, however, each column is jumping to its "own" empty row, and not staying in the correct row with the rest of its matching data, creating a staircase type pattern.
This is my code
import os
import openpyxl
os.chdir('C:\\Users\\XX\\Desktop')
wb1 = openpyxl.load_workbook('Test file python.xlsx', data_only = True) #open source excel file
ws1 = wb1.worksheets[0]
wb2 = openpyxl.load_workbook('test3.xlsx', data_only = True) #destination excel file
ws2 = wb2.active
#row_offset = ws2.max_row + 1
for i in range(10,150):
for j in range(3,13):
c = ws1.cell(row = i, column = j)
rowOffset = ws2.max_row + 1
rowNum = rowOffset
ws2.cell(row = rowNum, column = j-2).value = c.value
wb2.save('test3.xlsx')
Here is a screenshot of the output in excel Staircase output
You are changing ws2.max_row each time you put something in ws2 (i.e. - ws2.cell(row = rowNum, column = j-2).value = c.value) your max_row goes up by one affecting the entire loop creating that effect.
use current_row = ws2.max_row outside of the nested loop and it should fix your "staircase" issue.
Also, mind that when you run in the first iteration max_row == 1 that is why your sheet starts at row 2 and not at row 1.

Insert column using openpyxl

I'm working on a script that modifies an existing excel document and I need to have the ability to insert a column between two other columns like the VBA macro command .EntireColumn.Insert.
Is there any method with openpyxl to insert a column like this?
If not, any advice on writing one?
Here is an example of a much much faster way:
import openpyxl
wb = openpyxl.load_workbook(filename)
sheet = wb.worksheets[0]
# this statement inserts a column before column 2
sheet.insert_cols(2)
wb.save("filename.xlsx")
Haven't found anything like .EntireColumn.Insert in openpyxl.
First thought coming into my mind is to insert column manually by modifying _cells on a worksheet. I don't think it's the best way to insert column but it works:
from openpyxl.workbook import Workbook
from openpyxl.cell import get_column_letter, Cell, column_index_from_string, coordinate_from_string
wb = Workbook()
dest_filename = r'empty_book.xlsx'
ws = wb.worksheets[0]
ws.title = "range names"
# inserting sample data
for col_idx in xrange(1, 10):
col = get_column_letter(col_idx)
for row in xrange(1, 10):
ws.cell('%s%s' % (col, row)).value = '%s%s' % (col, row)
# inserting column between 4 and 5
column_index = 5
new_cells = {}
ws.column_dimensions = {}
for coordinate, cell in ws._cells.iteritems():
column_letter, row = coordinate_from_string(coordinate)
column = column_index_from_string(column_letter)
# shifting columns
if column >= column_index:
column += 1
column_letter = get_column_letter(column)
coordinate = '%s%s' % (column_letter, row)
# it's important to create new Cell object
new_cells[coordinate] = Cell(ws, column_letter, row, cell.value)
ws._cells = new_cells
wb.save(filename=dest_filename)
I understand that this solution is very ugly but I hope it'll help you to think in a right direction.

Categories

Resources