I have a dictionary of filenames and corresponding values that I want to populate to a spreadsheet with openpyxl. In the code I've included a small example dict.
The filenames are already in Column A of Sheet1 but I'm struggling to populate the values to Column B in the corresponding rows. There isn't a logical order to the files so I've written a function to iterate over Column A and populate Column B when the required filename is found. I then run the dict through that function. At the moment it's returning a TypeError: 'str' object cannot be interpreted as an integer.
I'm thinking there's definitely a more straightforward way to do this...
from openpyxl import Workbook
import openpyxl
dict = {'file_a': 20, 'file_b': 30, 'file_c': 40}
file = 'spreadsheey.xlsx'
wb = openpyxl.load_workbook(file, read_only=True)
ws = wb.active
def populate_row(filename, length):
for row in ws.iter_rows('A'):
for cell in row:
if cell.value == filename:
ws.cell(row=cell.row, column=2).value = length
for key, value in dict.items():
populate_row(key, value)
Related
I am using Python Openpyxl to extract values from an Excel sheet, store it into an array, and use the array to write to a new file.
This is the code;
from openpyxl import load_workbook
workbook = load_workbook(filename="./test.xlsx")
for column in sheet.iter_cols(min_row=2, min_col=1, max_row=300, max_col=1):
for cell in column:
fileNameArray.append(cell.value)
Now, I want to create a new array called linkArray as follows;
If filename starts with anything but XYZ, then linkArray for that corresponding fileName in fileNameArray should be https://www.linkName.com/fileNameArrayValue1.
If filename starts with XYZ, then linkArray should be https://www.differentLinkIfNameStartsWithXYZ/fileNameArrayValue2.
And so on for hundreds of values.
So, in the end;
linkArray = ['https://www.linkName.com/fileNameArrayValue1', 'https://www.differentLinkIfNameStartsWithXYZ/fileNameArrayValue2' ...]
What's the best way to achieve this?
Thanks!
Extending your code:
from openpyxl import load_workbook
workbook = load_workbook(filename="./test.xlsx")
for column in sheet.iter_cols(min_row=2, min_col=1, max_row=300, max_col=1):
for cell in column:
temp = cell.value
fileNameArray.append(temp)
if temp.startswith('XYZ'):
linkArray.append(f'https://www.linkName.com/{temp}')
else:
linkArray.append(f'https://www.differentLinkIfNameStartsWithXYZ/{temp}')
or, if you are populating linkArray in a second moment:
for temp in fileNameArray:
if temp.startswith('XYZ'):
linkArray.append(f'https://www.linkName.com/{temp}')
else:
linkArray.append(f'https://www.differentLinkIfNameStartsWithXYZ/{temp}')
I am new to programming, currently learning python and openpyxl for excel file manipulations.
I am writing a script to help in updating repairs database which picks specific records in an excel sheet.
I have written a script where I can get the row numbers of the excel data I need to update but the challenge now is about how to create a list within a list of a row (record) through iteration.
For example I have found that I need data from rows 22, 23, 34 & 35. Is there a way of getting the data of these rows without having to change min_row and max_row number for every instance of row?
the portion of the code that I need rewritten is:
# copy the record on the rows of the row numbers found above.
rows_records = []
row_record = []
for row_cells in ws2.iter_rows(min_row=23, max_row=23, min_col=1, max_col = 6):
for cell in row_cells:
row_record.append(cell.value)
rows_records.append(row_record)
print(row_record)
print(rows_records)
enter image description here
Technically the same as accepted answer
sheet : Sheet = book.active
indices : set[int] = {10, 20, }
rows : Iterable[int] = (row for i, row in enumerate(sheet.rows, 1) if i in indices)
for row in rows:
... do whatever
Since openpyxl return a generator you have to convert it to a list to access an index in particular
import openpyxl
workbook = openpyxl.load_workbook('numbers.xlsx')
worksheet = workbook .active
rows = list(worksheet.iter_rows(max_col=6, values_only=True))
values = []
row_indices = [21, 22, 23, 34, 35]
for row_index in row_indices:
values.append(rows[row_index])
import openpyxl
book = openpyxl.load_workbook('numbers.xlsx')
sheet = book.active
rows = sheet.rows
values = []
your_row_indices = [21,22,23]
for row in rows[your_row_indices]:
for cell in row:
values.append(cell.value)
I have 3 Excel files with a column of data in cells A1 to A10 (the "Source Cells") in each workbook (on sheet 1 in each workbook). I would like to copy the data from the Source Cells into a new workbook, but the data must shift into a new column each time.
For example:
the Source Cells in File 1 must be copied to cells A1 to A10 in the new workbook;
the Source Cells in File 2 must be copied to cells B1 to B10 in the new workbook; and
the Source Cells in File 3 must be copied to cells C1 to C10 in the new workbook.
I'm struggling to figure the best way to adjust "j" in my code on each iteration. I'm also not sure what the cleanest way is to run each function for the different source files.
All suggestions on how to make this code cleaner will also be appreciated because I admit it's so messy at the moment!
Thanks in advance!
import openpyxl as xl
filename_1 = "C:\\workspace\\scripts\\file1.xlsx"
filename_2 = "C:\\workspace\\scripts\\file2.xlsx"
filename_3 = "C:\\workspace\\scripts\\file3.xlsx"
destination_filename = "C:\\workspace\\scripts\\new_file.xlsx"
num_rows = 10
num_columns = 1
def open_source_workbook(path):
'''Open the workbook and worksheet in the source Excel file'''
workbook = xl.load_workbook(path)
worksheet = workbook.worksheets[0]
return worksheet
def open_destination_workbook(path):
'''Open the destination workbook I want to copy the data to.'''
new_workbook = xl.load_workbook(path)
return new_workbook
def open_destination_worksheet(path):
'''Open the worksheet of the destination workbook I want to copy the data to.'''
new_worksheet = new_workbook.active
return new_worksheet
def copy_to_new_file(worksheet, new_worksheet):
for i in range (1, num_rows + 1):
for j in range (1, num_columns + 1):
c = worksheet.cell(row = i, column = j)
new_worksheet.cell(row = i, column = j).value = c.value
worksheet = open_source_workbook(filename_1)
new_workbook = open_destination_workbook(destination_filename)
new_worksheet = open_destination_worksheet(new_workbook)
copy_to_new_file(worksheet, new_worksheet)
new_workbook.save(str(destination_filename))
Question: Loop files, copy a specific column, with each new “paste” shifting to the adjacent column?
This approach first aggregates from all files the Column Cell values.
Then rearange it so, that it can be used by the openpyxl.append(... method.
Therefore, no target Column knowledge are needed.
Reference:
class collections.OrderedDict([items])
Ordered dictionaries are just like regular dictionaries but have some extra capabilities relating to ordering operations.
openpyxl.utils.cell.coordinate_to_tuple(coordinate)
Convert an Excel style coordinate to (row, colum) tuple
iter_rows(min_row=None, max_row=None, min_col=None, max_col=None, values_only=False)
Produces cells from the worksheet, by row. Specify the iteration range using indices of rows and columns.
map(function, iterable, ...)
Return an iterator that applies function to every item of iterable, yielding the results.
zip(*iterables)
Make an iterator that aggregates elements from each of the iterables.
Used imports
import openpyxl as opxl
from collections import OrderedDict
Define the files in a OrderedDict to retain file <=> column order
file = OrderedDict.fromkeys(('file1', 'file2', 'file3'))
Define the Range as index values.
Convert the Excel A1 notation into index values
min_col, max_col, min_row, max_row =
opxl.utils.cell.range_to_tuple('DUMMY!A1:A10')[1]
Loop the defined files,
load every Workbook and get a reference to the default Worksheet
get the cell values from the defined range:
min_col=1, max_col=1, min_row=1, max_row=10
for fname in file.keys():
wb = openpyxl.load_workbook(fname)
ws = wb.current()
file[fname] = ws.iter_rows(min_row=min_row,
max_row=max_row,
min_col=min_col,
max_col=max_col,
values_only=True)
Define a new Workbook and get a reference to the default Worksheet
wb2 = opxl.Workbook()
ws2 = wb2.current()
Zip the values, Row per Row from all files
Map the ziped list of tuples using a lambda to flatten to a list of Row values.
Append the list of values to the new Worksheet
for row_value in map(lambda r:tuple(v for c in r for v in c),
zip(*(file[k] for k in file))
):
ws2.append(row_value)
Save the new Workbook
# wb2.save(...)
I have a problem with multiindex column name. I'm using XLRD to convert excel data to json using json.dumps but instead it gives me only one row of column name only. I have read about multilevel json but i have no idea how to do it using XLRD.
Here is my sample of table column name
Sample of code:
for i in path:
with xlrd.open_workbook(i) as wb:
print([i])
kwd = 'sage'
print(wb.sheet_names())
for j in range(wb.nsheets):
worksheet = wb.sheet_by_index(j)
data = []
n = 0
nn = 0
keyword = 'sage'
keyword2 = 'adm'
try:
skip = skip_row(worksheet, n, keyword)
keys = [v.value for v in worksheet.row(skip)]
except:
try:
skip = skip_row2(worksheet, nn, keyword2)
keys = [v.value for v in worksheet.row(skip)]
except:
continue
print(keys)
for row_number in range(check_skip(skip), worksheet.nrows):
if row_number == 0:
continue
row_data = {}
for col_number, cell in enumerate(worksheet.row(row_number)):
row_data[keys[col_number]] = cell.value
data.append(row_data)
print(json.dumps({'Data': data}))
ouh by the way, each worksheet have different number to skip before column name so that's why my code got function of skip row. After I skip the row and found the exact location of my column name. Then i start to read the values. But it yah there is where the problem raise from my view because i got two rows of column name. And still confuse how to do multi level json with XLRD or at least join the column name with XLRD (which i guess can't).
Desired outcome multilevel json:
{ "Data":[{ "ID" : "997", "Tax" : [{"Date" : "9/7/2019", "Total" : 2300, "Grand Total" : 340000"}], "Tax ID" : "ST-000", .... }]}
pss:// I've tried to use pandas but it gives me a lot of trouble since i work with big data.
You can use multi indexing in panda, first you need to get header row index for each sheet.
header_indexes = get_header_indexes(excel_filepath, sheet_index) #returns list of header indexes
You need to write get_header_indexes function which scans sheet and return header indexes.
you can use panda to get JSON from dataframe.
import pandas as pd
df = pd.read_excel(excel_filepath, header=header_indexes, sheet_name=sheet_index)
data = df.to_dict(orient="records")
for multiple headers data containts list of dict and each dict has tuple as key, you can reformat it to final JSON as per your requirement.
Note: Use chunksize for reading large files.
I am using openpyxl to attempt to delete rows from a spreadsheet. I understand that there is a funciton specifically for deleting rows, however, I was trying to overcome this problem without knowledge of that function, and I am now wondering why my method does not work.
To simplify the problem, I set up a spreadsheet and filled it with letters in some of the cells. In this case, the first print(sheet.max_row) printed "9". After setting all the cell values to None, I expected the number of rows to be 0, however, the second print statement printed "9" again.
Is it possible to reduce the row count by setting all the cells in a row to None?
import openpyxl
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter, column_index_from_string
spreadsheet = load_workbook(filename = pathToSpreadsheet) #pathToSpreadsheet represents the absolute path I had to the spreadsheet that I created.
sheet = spreadsheet.active
print(sheet.max_row) # Printed "9".
rowCount = sheet.max_row
columnCount = sheet.max_column
finalBoundary = get_column_letter(columnCount) + str(rowCount)
allCellObjects = sheet["A1":finalBoundary]
for rowOfCells in allCellObjects:
for cell in rowOfCells:
cell.value = None
print(sheet.max_row) # Also printed "9".
Thank you for your time and effort!
Short answer NO.
However, you could access the cell from the sheet with the cell coordinates and delete them.
for rowOfCells in allCellObjects:
for cell in rowOfCells:
del sheet[cell.coordinate]
print(sheet.max_row)
A little more elaborate answer would be that a worksheet in Openpyxl stores it's _cells as a dict with coordinates as key. max_row property is defined
#property
def max_row(self):
"""The maximum row index containing data (1-based)
:type: int
"""
max_row = 1
if self._cells:
rows = set(c[0] for c in self._cells)
max_row = max(rows)
return max_row
So if the cells was None, the keys/coordinates would still prevail eg: _cells = {(1,1):None, (1,2):None, (5,4): None}.
max_rowwould then still give us the biggest y-component of the key.