I'm trying to make code using Python to repeatedly copy a certain range of Excel files.
Like the image file below, the same content is copied by placing a few columns on a sheet with existing content in the same sheet.
enter image description here
Despite being an introductory level of ability, at first I believed that it would be really easy to implement. And I have implemented the most of my plans through the code below except merged cells
import openpyxl
from openpyxl import Workbook
wb = openpyxl.load_workbook('DSD.xlsx')
ws = wb.worksheets[3]
from openpyxl.utils import range_boundaries
min_col, min_row, max_col, max_row = range_boundaries('I1')
for row, row_cells in enumerate(wb['2'], min_row):
for column, cell in enumerate(row_cells, min_col):
# Copy Value from Copy.Cell to given Worksheet.Cell
ws.cell(row=row, column=column).value = cell.value
ws.cell(row=row, column=column)._style = cell._style
wb.save('DSD.xlsx')
enter image description here
However, I've searching for a few days and thinking about it, but I don't know how to merge the merged cells. Beyond that, I don't even know if this is technically possible.
The way I thought of it today was to pull out a list of merged cells in the sheet and then add only more columns from the coordinates of each merged cell to add commands to merge to the same extent.
However, since I pulled out the list of merged cells as shown below, I can't think of what to do.
Is there a way?
import openpyxl
from openpyxl import Workbook
wb=openpyxl.load_workbook('merged.xlsx')
ws=wb.active
Merged_Cells = ws.merged_cell_ranges
a = Merged_Cells[0]
print(ws.merged_cell_ranges)
print(a)
[<MergedCellRange B2:C3>, <MergedCellRange B9:C9>, <MergedCellRange B13:B14>]
B2:C3
The value B2:C3, corresponding to a, appears to be a special value in which a.replace(~~) does not work even if the value is seem to be "B2:C3".
Is there really no way?
You'll need to look at how CellRanges work because you will need to calculate the offset for each range and then create a new MergedCellRange for the new cell using the dimensions from the original.
The following should give you an idea of how to do ths.
from openpyxl.worksheet.cell_range import CellRange
area = CellRange("A1:F13") # area being copied
for mcr in ws.merged_cells:
if mcr.coord not in area:
continue
cr = CellRange(mcr.coord)
cr.shift(col_shift=10)
ws.merge_cells(cr.coord)
Related
I am looking for a way to find the first empty column and the row. As a part of my use case, I am trying to find out H3 (to add current date) and then H4 and H5 (to add my daily metrics) [screenshot attached]. I have tried below with xlwings.
import xlwings as xw
from xlwings import Range, constants
wb = xw.Book(r"path to xlsx")
sht1 = wb.sheets['Sheet1']
sht1.range('G3').value = current_date
sht1.range('G4').value = 5678
sht1.range('G5').value = 1234
wb.save(r"path to xlsx")
The issue is I have hardcoded the column and row references in the script. I want H3, H4 and H5 to find out dynamically through xlwings and update the metrics programmatically. Can someone guide me on this?
You can do this by finding the last column of the data used. Here are two options to get this data:
Using SpecialCells(11), which is a VBA function accessed through the .api, information about this can be found here.
Using .end("right"), the equivalent of ctrl + right in Excel.
Option 1 would work well if there is no other data in the spreadsheet, so the last cell in the sheet would be the correct column. This is convenient and doesn't require knowledge of the starting cell (in this case B3).
Option 2 would be preferred for spreadsheets where other data may be on the sheet, so the last cell will not necessarily be in the last column of your desired data. This option does, however, require no missing columns as moving the last right-most cell in the group of cells would therefore not strictly be the last column of the data.
An alternative could be to import all the data to Python as a pd.DataFrame, then append an additional column and return. If you need to append many columns of data, this would probably be more efficient (especially if you already have a DataFrame of the data you are pasting to Excel).
The last_col is an integer, as this is most easily manipulated (such as increasing by 1). Therefore, the range has also be modified to make use of this, instead of using A1 style (e.g. range("A1")), a tuple is used of format (row_num, col_num) (e.g. range((row_num, col_num))).
import xlwings as xw
import datetime as dt
current_date = dt.date.today().strftime("%d-%b-%y")
wb = xw.Book(r"path to xlsx")
sht1 = wb.sheets['Sheet1']
# options 1: last column in the sheet through SpecialCells
last_col = sht1.range("A1").api.SpecialCells(11).Column
# option 2: starting at cell B3, the first in the date headers, move to the right (like ctrl+right in Excel)
last_col = sht1.range("B3").end("right").column
# paste new values
sht1.range((3, last_col+1)).value = current_date
sht1.range((4, last_col+1)).value = 5678
sht1.range((5, last_col+1)).value = 1234
wb.save(r"path to xlsx")
Set-up
I create a Pandas dataframe from all records in a google sheet like this,
df = pd.DataFrame(wsheet.get_all_records())
as explained in the Gspread docs.
Issue
It seems Python stays in limbo when I execute the command since today. I don't get any error; I interrupt Python with KeyboardInterrupt after a while.
I suspect Google finds the records too much; ±3500 rows with 18 columns.
Question
Now, I actually don't really need the entire sheet. The first 300 rows would do just fine.
The docs show values_list = worksheet.row_values(1), which would return the first row values in a list.
I guess I could create a loop, but I was wondering if there's a build-in / better solution?
I believe your goal as follows.
You want to retrieve the values from 1st row to 300 row from a sheet in Google Spreadsheet.
From I suspect Google finds the records too much; ±3500 rows with 18 columns., you want to retrieve the values from the columns "A" to "R"?
You want to convert the retrieved values to the dataFrame.
You want to achieve this using gspread.
In order to achieve this, I would like to propose the following sample script.
In this answer, I used the method of values_get.
Sample script:
spreadsheetId = "###" # Please set the Spreadsheet ID.
rangeA1notation = "Sheet1!A1:R300" # Please set the range using A1Notation.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
values = spreadsheet.values_get(rangeA1notation)
v = values['values']
df = pd.DataFrame(v)
print(df)
Note:
Please set the range as the A1Notation. In this case, when "A1:R300" instead of "Sheet1!A1:R300" is used, the values are retrieved from the 1st tab in the Spreadsheet.
When "A1:300" is used, the values are retrieved from the column "A" to the last column of the sheet.
When the 1st row is the header row and the data is after the 2nd row, please modify as follows.
From
df = pd.DataFrame(v)
To
df = pd.DataFrame(v[1:], columns=v[0])
Reference:
values_get
I used openpyxl package.
import openpyxl as xl
wb = xl.load_workbook('your_file_name')>
sheet = wb['name_of_your_sheet']
Specify your range.
for row in range(1, 300):
Now you can perform many opertions e.g this will point at row(1) & col(3) in first iteration
cell = sheet.cell(row, 3)
if you want to change the cell value
cell.value = 'something'
It's has pretty much all of it.
Here is a link to the docs: https://openpyxl.readthedocs.io/en/stable/
I am new to Python especially when it comes to using it with Excel. I need to write code to search for the string “Mac”, “Asus”, “AlienWare”, “Sony”, or “Gigabit” within a longer string for each cell in column A. Depending on which of these strings it finds within the entire entry in column A’s cell, it should write one of these 5 strings to the corresponding row in column C’s cell. Else if it doesn’t find any of the five, it would write “Other” to the corresponding row in column C. For example, if Column A2’s cell contained the string “ProLiant Asus DL980 G7, the correct code would write “Asus” to column C2’s cell. It should do this for every single cell in column A, writing the appropriate string to the corresponding cell in column C. Every cell in column A will have one of the five strings Mac, Asus, AlienWare, Sony, or Gigabit within it. If it doesn’t contain one of those strings, I want the corresponding cell in column 3 to have the string “Other” written to it. So far, this is the code that I have (not much at all):
import openpyxl
wb = openpyxl.load_workbook(path)
sheet = wb.active
for i in range (sheet.max_row):
cell1 = sheet.cell (row = i, column = 1)
cell2 = sheet.cell (row = I, column = 3)
# missing code here
wb.save(path)
You haven't tried writing any code to solve the problem. You might want to first get openpyxl to write to the excel workbook and verify that is working - even if it's dummy data. This page looks helpful - here
Once that is working all you'd need is a simple function that takes in a string as an argument.
def get_column_c_value(string_from_column_a):
if "Lenovo" in string_from_column_a:
return "Lenovo"
else if "HP" in string_from_column_a:
return "HP"
# strings you need to check for here in the same format as above
else return "other"
Try out those and if you have any issues let me know where you're getting stuck.
I have not worked much with openpyxl, but it sounds like you are trying to do a simple string search.
You can access individual cells by using
cell1.internal_value
Then, your if/else statement would look something like
if "HP" in str(cell1.internal_value):
Data can be assigned directly to a cell so you could have
ws['C' + str(i)] = "HP"
You could do this for all of the data in your cells
Let's say I have a cell (9,3). I want to get the values from (9,3) to (9,99). How do I go down the columns to get the values. I am trying to write the values into another excel file that starts from (13, 3) and ends at (13,99). How do I write a loop for that in xlrd?
def write_into_cols_rows(r, c):
for num in range (0,96):
c += 1
return (r,c)
worksheet.row(int) will return you the row, and to get the value of certain columns, you need to run row[int].value to get the value.
For more information, you can read this pdf file (Page 9 Introspecting a sheet).
import xlrd
workbook = xlrd.open_workbook(filename)
# This will get you the very first sheet in the workbook.
worksheet = workbook.sheet_by_name(workbook.sheet_names()[0])
for index in range(worksheet.nrows):
try:
row = worksheet.row(index)
row_value = [col.value for col in row]
# now row_value is a list contains all the column values
print row_value[3:99]
except:
pass
To write data to Excel file, you might want to check out xlwt package.
BTW, seems like you are doing something like reading from excel.. do some work... write to excel...
I would also recommend you take a look at numpy, scipy or R. When I usually do data munging, I use R and it saves me so much time.
I started learning python and was trying to work on a small project for myself. Its just to open an excel spreadsheet then look in one column and then randomly choose one of the cells to print. I did some research and found multiple ways to do it but kind of liked this one due to it being short and sweet. The problem I am having is just when it prints i want it to randomize the selection in one column. So wanted to know if there is a way for me to do it. Thanks all help will be appreciated!!!!
import xlrd
wb = xlrd.open_workbook("quotes.xlsx")
sh1 = wb.sheet_by_index(0)
print sh1.cell(0,0).value
Use the following:
from random import choice
import xlrd
wb = xlrd.open_workbook("quotes.xlsx")
sh1 = wb.sheet_by_index(0)
column = 2 # or whatever column you want to select from
print choice(sh1.col(column)).value
The Sheet.col() method returns a list, and random.choice returns a random element from a list.
If you want to restrict the rows from which you randomly select an element you can generate a random row number and use that to index the column instead. You can do that like this:
import random
startRow = 3
endRow = 29
row = random.randint(startRow, endRow)
print sh1.cell(column, row).value
See also: How to randomly select an item from a list?