manipulating excel spreadsheets with python - python

I am new to Python especially when it comes to using it with Excel. I need to write code to search for the string “Mac”, “Asus”, “AlienWare”, “Sony”, or “Gigabit” within a longer string for each cell in column A. Depending on which of these strings it finds within the entire entry in column A’s cell, it should write one of these 5 strings to the corresponding row in column C’s cell. Else if it doesn’t find any of the five, it would write “Other” to the corresponding row in column C. For example, if Column A2’s cell contained the string “ProLiant Asus DL980 G7, the correct code would write “Asus” to column C2’s cell. It should do this for every single cell in column A, writing the appropriate string to the corresponding cell in column C. Every cell in column A will have one of the five strings Mac, Asus, AlienWare, Sony, or Gigabit within it. If it doesn’t contain one of those strings, I want the corresponding cell in column 3 to have the string “Other” written to it. So far, this is the code that I have (not much at all):
import openpyxl
wb = openpyxl.load_workbook(path)
sheet = wb.active
for i in range (sheet.max_row):
cell1 = sheet.cell (row = i, column = 1)
cell2 = sheet.cell (row = I, column = 3)
# missing code here
wb.save(path)

You haven't tried writing any code to solve the problem. You might want to first get openpyxl to write to the excel workbook and verify that is working - even if it's dummy data. This page looks helpful - here
Once that is working all you'd need is a simple function that takes in a string as an argument.
def get_column_c_value(string_from_column_a):
if "Lenovo" in string_from_column_a:
return "Lenovo"
else if "HP" in string_from_column_a:
return "HP"
# strings you need to check for here in the same format as above
else return "other"
Try out those and if you have any issues let me know where you're getting stuck.

I have not worked much with openpyxl, but it sounds like you are trying to do a simple string search.
You can access individual cells by using
cell1.internal_value
Then, your if/else statement would look something like
if "HP" in str(cell1.internal_value):
Data can be assigned directly to a cell so you could have
ws['C' + str(i)] = "HP"
You could do this for all of the data in your cells

Related

How to use python to fill specific data to column in excel based on information of the first column?

I have a problem with an excel file! and I want to automate it by using python script to complete a column based on the information of the first column: for example:
if data == 'G711Alaw 64k' or 'G711Ulaw 64k'
print('1-Jan) till find it == '2-Jan' then print('2-Jan') and so on.
befor automate
I need its looks like this after automate:
after automate
Is there anyone can help me to do solve this issue?
The file:
the excel file
Thanks a lot for your help.
Try this, pandas reads your jan-1 is datetime type, if you need to change it to a string you can set it directly in the code, the following code will directly assign the value read to the second column:
import pandas as pd
df = pd.read_excel("add_date_column.xlsx", engine="openpyxl")
sig = []
def t(x):
global sig
if not isinstance(x.values[0], str):
tmp_sig = x.values[0]
if tmp_sig not in sig:
sig = [tmp_sig]
x.values[1] = sig[-1]
return x
new_df = df.apply(t, axis=1)
new_df.to_excel("new.xlsx", index=False)
The concept is very simple :
If the value is date/time, copy to the [same row, next column].
If not, [same row, next column] is copied from [previous row, next
column].
You do not specifically need Python for this task. The excel formula for this would be;
=IF(ISNUMBER(A:A),A:A,B1)
Instead of checking if it is date/time, I took adavantage of the fact that the rest of the entries are alphanumeric (including both alphabets and numbers). This formula is applied on the new column.
Of course, you might already be in Python and just work within the same environment. So, here's the loop :
for i in range(len(df)):
if type(df["Orig. Codec"][i]) is datetime:
df["Column1"][i] = df["Orig. Codec"][i]
else:
df["Column1"][i] = df["Column1"][i-1]
There might be ways to lambda function for the same concept, not that I am aware of how to apply lambda and shift at the same time.

Retrieve first empty column and row using xlwings

I am looking for a way to find the first empty column and the row. As a part of my use case, I am trying to find out H3 (to add current date) and then H4 and H5 (to add my daily metrics) [screenshot attached]. I have tried below with xlwings.
import xlwings as xw
from xlwings import Range, constants
wb = xw.Book(r"path to xlsx")
sht1 = wb.sheets['Sheet1']
sht1.range('G3').value = current_date
sht1.range('G4').value = 5678
sht1.range('G5').value = 1234
wb.save(r"path to xlsx")
The issue is I have hardcoded the column and row references in the script. I want H3, H4 and H5 to find out dynamically through xlwings and update the metrics programmatically. Can someone guide me on this?
You can do this by finding the last column of the data used. Here are two options to get this data:
Using SpecialCells(11), which is a VBA function accessed through the .api, information about this can be found here.
Using .end("right"), the equivalent of ctrl + right in Excel.
Option 1 would work well if there is no other data in the spreadsheet, so the last cell in the sheet would be the correct column. This is convenient and doesn't require knowledge of the starting cell (in this case B3).
Option 2 would be preferred for spreadsheets where other data may be on the sheet, so the last cell will not necessarily be in the last column of your desired data. This option does, however, require no missing columns as moving the last right-most cell in the group of cells would therefore not strictly be the last column of the data.
An alternative could be to import all the data to Python as a pd.DataFrame, then append an additional column and return. If you need to append many columns of data, this would probably be more efficient (especially if you already have a DataFrame of the data you are pasting to Excel).
The last_col is an integer, as this is most easily manipulated (such as increasing by 1). Therefore, the range has also be modified to make use of this, instead of using A1 style (e.g. range("A1")), a tuple is used of format (row_num, col_num) (e.g. range((row_num, col_num))).
import xlwings as xw
import datetime as dt
current_date = dt.date.today().strftime("%d-%b-%y")
wb = xw.Book(r"path to xlsx")
sht1 = wb.sheets['Sheet1']
# options 1: last column in the sheet through SpecialCells
last_col = sht1.range("A1").api.SpecialCells(11).Column
# option 2: starting at cell B3, the first in the date headers, move to the right (like ctrl+right in Excel)
last_col = sht1.range("B3").end("right").column
# paste new values
sht1.range((3, last_col+1)).value = current_date
sht1.range((4, last_col+1)).value = 5678
sht1.range((5, last_col+1)).value = 1234
wb.save(r"path to xlsx")

How to return the string of a header based on the max value of a cell in Openpyxl

Good morning guys! quick question for Openpyxl:
I am working with Python editing a xlsx document and generating various stats. Part of my script is to generate max values of a cell range :
temp_list=[]
temp_max=[]
for row in sheet.iter_rows(min_row=3, min_col=10, max_row=508, max_col=13):
print(row)
for cell in row:
temp_list.append(cell.value)
print(temp_list)
temp_max.append(max(temp_list))
temp_list=[]
I would also like to be able to print the string of the header of the column that contains the max value for the cell range desired. My data structure looks like this :
Any idea on how to do so?
Thanks!
This seems like a typical INDEX/MATCH Excel problem.
Have you tried retrieving the index for the max value in each temp_list?
You can use a function like numpy.argmax() to get the index of your max value within your "temp_list" array, then use this index to locate the header and append the string to a new list called, say, "max_headers" which contains all the header strings in order of appearance.
It would look something like this
for cell in row:
temp_list.append(cell.value)
i_max = np.argmax(temp_list)
max_headers.append(cell(row = 1, column = i_max).value)
And so on and so forth. Of course, for that to work, your temp_list should be a numpy array instead of a simple python list, and the max_headers list would have to be defined.
First, Thanks Bernardo for the hint. I found a decently working solution but still have a little issue. Perhaps someone can be of assistance.
Let me amend my initial statement : here is the code I am working with now :
temp_list=[]
headers_list=[]
for row in sheet.iter_rows(min_row=3, min_col=27, max_row=508, max_col=32): #Index starts at 1 // Here we set the rows/columns containing the data to be analyzed
for cell in row:
temp_list.append(cell.value)
for cell in row:
if cell.value == max(temp_list):
print(str(cell.column))
print(cell.value)
print(sheet.cell(row=1, column=cell.column).value)
headers_list.append(sheet.cell(row=1,column=cell.column).value)
else:
print('keep going.')
temp_list = []
This formula works, but has a little issue : If, for instance, a row has the same value twice (ie : 25,9,25,8,9), this loop will print 2 headers instead of one. My question is :
how can I get this loop to take in account only the first match of a max value in a row?
You probably want something like this:
headers = [c for c in next(ws.iter_rows(min_col=27, max_col=32, min_row=1, max_row=1, values_only=True))]
for row in ws.iter_rows(min_row=3, min_col=27, max_row=508, max_col=32, values_only=True):
mx = max(row)
idx = row.index(mx)
col = headers[idx]

Going down columns using xlrd

Let's say I have a cell (9,3). I want to get the values from (9,3) to (9,99). How do I go down the columns to get the values. I am trying to write the values into another excel file that starts from (13, 3) and ends at (13,99). How do I write a loop for that in xlrd?
def write_into_cols_rows(r, c):
for num in range (0,96):
c += 1
return (r,c)
worksheet.row(int) will return you the row, and to get the value of certain columns, you need to run row[int].value to get the value.
For more information, you can read this pdf file (Page 9 Introspecting a sheet).
import xlrd
workbook = xlrd.open_workbook(filename)
# This will get you the very first sheet in the workbook.
worksheet = workbook.sheet_by_name(workbook.sheet_names()[0])
for index in range(worksheet.nrows):
try:
row = worksheet.row(index)
row_value = [col.value for col in row]
# now row_value is a list contains all the column values
print row_value[3:99]
except:
pass
To write data to Excel file, you might want to check out xlwt package.
BTW, seems like you are doing something like reading from excel.. do some work... write to excel...
I would also recommend you take a look at numpy, scipy or R. When I usually do data munging, I use R and it saves me so much time.

Extracting values only from the value of excel row recived using xlrd -python

This problem is specific wrt using xlrd package in python
I got row of excel which is in form of list but each item is integer value;
type:value
this is not string. The row is save by;
import xlrd
book = xlrd.open_workbook('myfile.xls')
sh = book.sheet_by_index(0)
for rx in range(sh2.nrows):
row = sh.row(rx)
so row saved has value;
row=[text:u'R', text:u'xyz', text:u'Y', text:u'abc', text:u'lmn', empty:'']
This is a list of int. I want the values extracted -
R
xyz
Y
abc
lmn
''
There has to be some method to convert it, but not sure which and how.
Now, I know I can get value just by;
cell_value = sh.cell_value(rowx=rx, colx=1)
but my program requires to collect rows first and then extract values from save row.
Thanks.
The row is a sequence of Cell instances, which have the attribute value.
for cell in row:
cell_value = cell.value
# etc
I am not sure why you want to do it this way - the reference to collecting rows first seems odd to me, given that you can get the rows directly from the worksheet.

Categories

Resources