some of excel formula change into only value with python openpyxl - python

I am doing some projects with python openpyxl. In the excel file, some of the parts should be input excel function and the other parts should be input only value, so I cannot 'data_only = True'
wb = load_workbook(merged_excel_file)
ws = wb.active
last_row = ws.max_row
for o in range(5, last_row + 1):
ws.cell(row=o, column=17).value = f"=if(countif($B$5:B{o},B{o}) = 1,3000,"'0'")"
Is there anything else that I can change the excel function into only values except for 'data_only = True' Because other cells should be shown by the data formula? This is my codding as followed.

Related

Convert column of numbers to integer openpyxl

I'm getting data from 1 file to another with openpyxl. 1 column contains numbers formatted as text but I want to convert the full column without the header into integer. I managed to do it for 1 cell and I don't manage to do my loop in order to convert all cells.
I tried to loop on the cell 'A + i' to go through all cells but the int() functions doesn't accept it.
Below is the code working and getting all the data and converting only the cell A2:
(I use the first part of the loop as I'm getting values from different columns.)
for i in range(2, 6000):
# Case_ID
cell_range1 = ws1.cell(i,6)
cell_range2 = ws2.cell(i,1)
cell_range2.value = cell_range1.value
ws2['A2'] = int(ws2['A2'].value)
Hope you can help me :)
This should do the trick.
for i in range(2, 6000):
# Case_ID
# value of cell from worksheet 1
ws1_cell_value = ws1.cell(row=i, column=6).value
# convert ws1 value and set ws2 cell value
ws2.cell(row=i, column=1).value = int(ws1_cell_value)
I made a test file and it works as expected. Full code below:
from openpyxl import Workbook
# make test file
wb1 = Workbook()
ws1 = wb1.active
for x in range(2, 6000):
ws1.cell(row=x, column=6).value = f"{x}"
wb1.save('string_column.xlsx')
# out file
wb2 = Workbook()
ws2 = wb2.active
# ACTUAL ANSWER CODE:
for i in range(2, 6000):
# Case_ID
# value of cell from worksheet 1
ws1_cell_value = ws1.cell(row=i, column=6).value
# convert ws1 value and set ws2 cell value
ws2.cell(row=i, column=1).value = int(ws1_cell_value)
wb2.save('int_column.xlsx')

How to paste values only in Excel using Python and openpyxl

I have an Excel worksheet.
In column J i have some some source data which i used to make calculations in column K.
Column K has the values I need, but when i click on a cell the formula shows up.
I only want the values from column K, not the formula.
I read somewhere that i need to set data only=True, which I have done.
I then pasted data from Column K to Column L(with the intention of later deleting Columns J and K).
I thought that Column L will have only the values from K but if i click on a cell, the formula still shows up.
How do I simply paste values only from one column to another?
import openpyxl
wb = openpyxl.load_workbook('edited4.xlsx', data_only=True)
sheet = wb['Sheet1']
last_row = 100
for i in range(2, last_row):
cell = "K" + str(i)
a_cell = "J" + str(i)
sheet[cell] = '=IF(' + a_cell + '="R","Yes","No")'
rangeselected = []
for i in range (1, 100,1):
rangeselected.append(sheet.cell(row = i, column = 11).value)
for i in range (1, 1000,1):
sheet.cell(row=i, column=12).value = rangeselected[i-1]
wb.save('edited4.xlsx')
It's been a while since I've used openpyxl. But:
Openpyxl doesn't run an Excel formula. It reads either the formula string or the results of the last calculation run by Excel*. This means that if a calculation is created outside of Excel, and the file has never been open by Excel, then only the formula will be available. Unless you need to display (for historical purposes, etc.) what the formula is, you should do the calculation in Python - which will be faster and more efficient anyway.
* When I say Excel, I also include any Excel-like spreadsheet that will cache the results of the last run.
Try this (adjust column numbers as desired):
import openpyxl
wb = openpyxl.load_workbook('edited4.xlsx', data_only=True)
sheet = wb['Sheet1']
last_row = 100
data_column = 11
test_column = 12
result_column = 13
for i in range(2, last_row):
if sheet.cell(row=i, column=test_column).value == "R":
sheet.cell(row=i, column=result_column).value = "Yes"
else:
sheet.cell(row=i, column=result_column).value = "No"
wb.save('edited4.xlsx')
If you have a well-formed data sheet, you could probably shorten this by another step or two by using enumerate() and Worksheet.iter_rows() but I'll leave that to your imagination.

Getting staircase output from openpyxl when importing data

I'm trying to import data from multiple sheets to another in excel, and in order to do this I need python to input the data into the first empty cell, instead of overwriting the data from the last file. It seems to almost work, however, each column is jumping to its "own" empty row, and not staying in the correct row with the rest of its matching data, creating a staircase type pattern.
This is my code
import os
import openpyxl
os.chdir('C:\\Users\\XX\\Desktop')
wb1 = openpyxl.load_workbook('Test file python.xlsx', data_only = True) #open source excel file
ws1 = wb1.worksheets[0]
wb2 = openpyxl.load_workbook('test3.xlsx', data_only = True) #destination excel file
ws2 = wb2.active
#row_offset = ws2.max_row + 1
for i in range(10,150):
for j in range(3,13):
c = ws1.cell(row = i, column = j)
rowOffset = ws2.max_row + 1
rowNum = rowOffset
ws2.cell(row = rowNum, column = j-2).value = c.value
wb2.save('test3.xlsx')
Here is a screenshot of the output in excel Staircase output
You are changing ws2.max_row each time you put something in ws2 (i.e. - ws2.cell(row = rowNum, column = j-2).value = c.value) your max_row goes up by one affecting the entire loop creating that effect.
use current_row = ws2.max_row outside of the nested loop and it should fix your "staircase" issue.
Also, mind that when you run in the first iteration max_row == 1 that is why your sheet starts at row 2 and not at row 1.

Copying and Rearranging columns in excel with Openpyxl [duplicate]

This question already has an answer here:
Copy paste column range using OpenPyxl
(1 answer)
Closed 5 years ago.
I have data in an excel file, but for it to be useful I need to copy & paste the columns into a different order.
I have figured out how to open & read my file and to write a new excel file. I can also get the data from the original, and paste it into my new file but not in a loop.
here's an example of the data i'm working with to visualize my issue i need A1,B1,C1 next to each other and then A2,B2,C2, etc etc.
Here is my code from a smaller test file I created to play around with:
import openpyxl as op
wb = op.load_workbook('coding_test.xlsx')
ws = wb.active
mylist = []
mylist2 = []
mylist3 = []
for row in ws.iter_rows('H13:H23'):
for cell in row:
mylist.append(cell.value)
for row in ws.iter_rows('L13:L23'):
for cell in row:
mylist2.append(cell.value)
for row in ws.iter_rows('P13:P23'):
for cell in row:
mylist3.append(cell.value)
print (mylist, mylist2, mylist3)
new_wb = op.Workbook()
dest_filename = 'empty_coding_test.xlsx'
new_ws = new_wb.active
for row in zip (mylist, mylist2, mylist3):
new_ws.append(row)
new_wb.save(filename=dest_filename)
I want to create a loop to do the rest of the work, but I can't figure out how to design it so that I don't have to code for each column and set.
well, you can recycle code doing something like:
import openpyxl as op
wb = op.load_workbook('coding_test.xlsx')
ws = wb.active
new_wb = op.Workbook()
dest_filename = 'empty_coding_test.xlsx'
new_ws = new_wb.active
for row in ws.iter_rows('H13:H23'):
for cell in row:
new_ws['A%s' % cell].value = cell.value
for row in ws.iter_rows('L13:L23'):
for cell in row:
new_ws['B%s' % cell].value = cell.value
for row in ws.iter_rows('P13:P23'):
for cell in row:
new_ws['C%s' % cell].value = cell.value
new_wb.save(filename=dest_filename)
tell me if that work for you

Conditional parsing and output of xlsx files with Openpyxl

I'm working through data for a research project. Output is in the form of .csv files, which have been converted to .xlsx files. There is a separate output file for each participant, with each file containing data on about 40 different measurements across several dozen (or so) stimuli. To make any sense of the data collected, we would need to look at each stimuli separately with relevant associated measurements. Each output file is large (50 columns by 60000 rows). I’m looking to parse the database using openpyxl to search for a cells in a pre-specified column with a particular string value. When such a cell is found, to then write that cell to a new workbook along with other specified columns in the same row.
For instance, parsing the following table, I’m trying to use openpyxl to search column A for ‘Slide 2’. When this value is found for a particular row, that cell is written to a new workbook along with the values in column C and D for that same row.
A B C D
1 Slide Data1 Data2 Data3
2 Slide 1 1 2 3
3 Slide 2 4 5 6
4 Slide 2 7 8 9
Would write:
A B C D
2 Slide 2 5 6
3
4
... or some similar format.
I would also look to fill column D and E with data from the next file, and F and G with data from the file after that (and so on), but I can probably figure that part out.
I’ve tried:
from openpyxl import load_workbook
wb = load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
dest_filename = r'output.xlsx'
for x in range (0, 100): #0-100 as proof of concept before parsing entire worksheet
if ws.cell(row = x, column =26) == ‘some_image.jpg':
print (ws.cell(row =x, column =26), ws.cell(row = x, column = 10), ws.cell(row = x, column = 17))
wb.save = dest_filename
also with adding the following in an attempt to create a worksheet in memory within which to manipulate cells:
for i in range (0, 30):
for j in range (0, 100):
print (ws.cell(row =i, column=j))
... both with minor variations, but they all output a copy of the original file.
I’ve read and re-read the documentation for openpyxl but to no avail. There doesn’t seem to be any similar question on the forums here either.
Any insight in correctly manipulating and writing data would be greatly appreciated. I also hope this might help other people trying to make sense of huge datasets. Thanks in advance!
I'm on Windows 7 running Python3.3.2 (64 bit) with openpyxl-1.6.2. Data was originally in .csv format, so could be exported to .xls or other formats if this helps. I looked into xlutils (using xlwt and xlrd) briefly, but openpyxl worked better with xlsx files.
Edit
Many thanks to #MikeMüller for pointing out I needed two workbooks to transfer data between. That makes much more sense.
I now have the following, but it still returns an empty workbook. The original cells are not blank. (The commented lines are for simplification - without the indent, of course - but code not successful either way.)
import openpyxl
wb = openpyxl.load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
wb_out = openpyxl.Workbook()
ws_out = wb_out.worksheets[0]
#n = 1
#for x in range (0, 1000):
#if ws.cell(row = x, column = 27) == '7.image2.jpg':
ws_out.cell(row = n, column = 1) == ws.cell(row = x, column = 26) #x changed
ws_out.cell(row = n, column = 2) == ws.cell(row = x, column = 10) #x changed
ws_out.cell(row = n, column = 3) == ws.cell(row = x, column = 17) #x changed
#n += 1
wb_out.save('output108.xlsx')
Edit 2
I've updated the code to include the .value for cells, but it still returns a blank workbook.
import openpyxl
wb = openpyxl.load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
wb_out = openpyxl.Workbook()
ws_out = wb_out.worksheets[0]
n = 1
for x in range (0, 1000):
if ws.cell(row=x, column=27).value == '7.Image001.jpg':
ws_out.cell(row=n, column=1).value = ws.cell(row=x, column=27).value
ws_out.cell(row=n, column=2).value = ws.cell(row=x, column=10).value
ws_out.cell(row=n, column=3).value = ws.cell(row=x, column=17).value
n += 1
wb_out.save('output108.xlsx')
Summary for the next person with trouble:
You need to create two worksheets in memory. One to import your file, the to other to write to a new workbook file.
Use the cell.value call function to pull the text entered into each cell of your imported workbook, and set it = the desired cells in the exported workbook.
Make sure you start counting rows and columns at zero.
You are doing cell assignment incorrectly. Here's what should work:
import openpyxl
wb = openpyxl.load_workbook(filename = r'test108.xlsx')
ws = wb.worksheets[0]
wb_out = openpyxl.Workbook()
ws_out = wb_out.worksheets[0]
n = 1
for x in range (0, 1000):
if ws.cell(row=x, column=27).value == '7.image2.jpg':
ws_out.cell(row=n, column=1).value = ws.cell(row=x, column=26).value #x changed
ws_out.cell(row=n, column=2).value = ws.cell(row=x, column=10).value #x changed
ws_out.cell(row=n, column=3).value = ws.cell(row=x, column=17).value #x changed
n += 1
wb_out.save('output108.xlsx')
You need to open a second notebook for writing:
import openpyxl
wb_out = openpyxl.Workbook(dest_filename)
ws_out = wb_out.worksheets[0]
Put this in your loop:
ws_out.cell('cell indices here').value = desired_value
Save your file:
writer = openpyxl.ExelWriter(workbook=wb_out)
writer.save(dest_filename)

Categories

Resources