Let's say I have a cell (9,3). I want to get the values from (9,3) to (9,99). How do I go down the columns to get the values. I am trying to write the values into another excel file that starts from (13, 3) and ends at (13,99). How do I write a loop for that in xlrd?
def write_into_cols_rows(r, c):
for num in range (0,96):
c += 1
return (r,c)
worksheet.row(int) will return you the row, and to get the value of certain columns, you need to run row[int].value to get the value.
For more information, you can read this pdf file (Page 9 Introspecting a sheet).
import xlrd
workbook = xlrd.open_workbook(filename)
# This will get you the very first sheet in the workbook.
worksheet = workbook.sheet_by_name(workbook.sheet_names()[0])
for index in range(worksheet.nrows):
try:
row = worksheet.row(index)
row_value = [col.value for col in row]
# now row_value is a list contains all the column values
print row_value[3:99]
except:
pass
To write data to Excel file, you might want to check out xlwt package.
BTW, seems like you are doing something like reading from excel.. do some work... write to excel...
I would also recommend you take a look at numpy, scipy or R. When I usually do data munging, I use R and it saves me so much time.
Related
I am trying to use Openpyxl to search for data in one Excel workbook, and write it to another (pre-existing) Excel workbook. The aim is something like this:
Search in Workbook1 for rows containing the word "Sales" (there being several such rows)
Copy the data from Column E of those rows (a numerical value), into a specific cell in Workbook2 (second worksheet, cell C3).
My code below appears to run without any errors, however, when I open Workbook 2, no data is being written in to it. Does anyone know why/can suggest a fix?
# importing openpyxl module
import openpyxl as xl
import sys
# opening the source excel file
filename ="C:\\Users\\hadam\\Documents\\Testing\\TB1.xlsx"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]
# opening the destination excel file
filename1 = "C:\\Users\\hadam\\Documents\\Testing\\comp2.xlsx"
wb2 = xl.load_workbook(filename1)
ws2 = wb2.worksheets[1]
for sheet in wb1.worksheets:
for row in sheet.iter_rows():
for cell in row:
try:
if 'Sale' in cell.value:
# reading cell value from source excel file
c = ws1.cell(row=cell.row, column=4).value
# writing the read value to destination excel file
ws2.cell(row=2, column=2).value = c.value
except (AttributeError, TypeError):
continue
# saving the destination excel file
wb2.save(str(filename1))
sys.exit()
Other info:
Specifically, the text string I am searching for ('Sales') is in Column A of Workbook1. It is not an exact match, but e.g. a given cell contains "5301 Sales - Domestic - type4". Therefore I want to sum the numerical values in Column E which contain "Sales" in Column A, into a single cell in Workbook2.
I am mega new to Python/coding. However, my environment seems to be set up okay, e.g. I have already tested a code (copied from elsewhere in the Web) that can write all the data from one spreadsheet into another pre-existing spreadsheet). I am using Mu editor in Python 3 mode and openpyxl module.
The line
c = ws1.cell(row=cell.row, column=4).value
copies the value in Col 4 to the variable 'c'. 'column=4' is Column D in the spreadsheet, you want this to be 'column=5' to get the value from Column E. If Column D is empty c would have no value each time.
In your code c is equal to the value in that Column E which matches your search criteria, currently it would be overwritten on each iteration of the loop. To sum the values you must add each new value to the existing value of c for each iteration using '+='.
There is no need to update the 2nd book cell value for each loop, you only want the resulting sum writen so these steps can be completed once the sum has
been calculated at the completion of the loop.
See code below that modifies that part of your code;
initialize c
c = 0
change the line so c sums the value in Col E for each loop match
c += ws1.cell(row=cell.row, column=5).value
then place the cell value update and save to excel book at the same indentation as the loop so this is executed once the loop completes.
Note that c is a python integer it has no attribute called value it's just 'c'. Using 'c.value' would give a blank value. Also the cell C3 you want to write this value to is row/col 3
ws2.cell(row=3, column=3).value = c
Finally save the cell to the 2nd excel book
Modded code;
...
c = 0
for sheet in wb1.worksheets:
for row in sheet.iter_rows():
for cell in row:
try:
if 'Sale' in cell.value:
# reading cell value from source excel file
c += ws1.cell(row=cell.row, column=5).value
except (AttributeError, TypeError):
continue
# writing the read value to destination excel file
ws2.cell(row=3, column=3).value = c
# saving the destination excel file
wb2.save(str(filename1))
...
Set-up
I create a Pandas dataframe from all records in a google sheet like this,
df = pd.DataFrame(wsheet.get_all_records())
as explained in the Gspread docs.
Issue
It seems Python stays in limbo when I execute the command since today. I don't get any error; I interrupt Python with KeyboardInterrupt after a while.
I suspect Google finds the records too much; ±3500 rows with 18 columns.
Question
Now, I actually don't really need the entire sheet. The first 300 rows would do just fine.
The docs show values_list = worksheet.row_values(1), which would return the first row values in a list.
I guess I could create a loop, but I was wondering if there's a build-in / better solution?
I believe your goal as follows.
You want to retrieve the values from 1st row to 300 row from a sheet in Google Spreadsheet.
From I suspect Google finds the records too much; ±3500 rows with 18 columns., you want to retrieve the values from the columns "A" to "R"?
You want to convert the retrieved values to the dataFrame.
You want to achieve this using gspread.
In order to achieve this, I would like to propose the following sample script.
In this answer, I used the method of values_get.
Sample script:
spreadsheetId = "###" # Please set the Spreadsheet ID.
rangeA1notation = "Sheet1!A1:R300" # Please set the range using A1Notation.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
values = spreadsheet.values_get(rangeA1notation)
v = values['values']
df = pd.DataFrame(v)
print(df)
Note:
Please set the range as the A1Notation. In this case, when "A1:R300" instead of "Sheet1!A1:R300" is used, the values are retrieved from the 1st tab in the Spreadsheet.
When "A1:300" is used, the values are retrieved from the column "A" to the last column of the sheet.
When the 1st row is the header row and the data is after the 2nd row, please modify as follows.
From
df = pd.DataFrame(v)
To
df = pd.DataFrame(v[1:], columns=v[0])
Reference:
values_get
I used openpyxl package.
import openpyxl as xl
wb = xl.load_workbook('your_file_name')>
sheet = wb['name_of_your_sheet']
Specify your range.
for row in range(1, 300):
Now you can perform many opertions e.g this will point at row(1) & col(3) in first iteration
cell = sheet.cell(row, 3)
if you want to change the cell value
cell.value = 'something'
It's has pretty much all of it.
Here is a link to the docs: https://openpyxl.readthedocs.io/en/stable/
I am new to Python especially when it comes to using it with Excel. I need to write code to search for the string “Mac”, “Asus”, “AlienWare”, “Sony”, or “Gigabit” within a longer string for each cell in column A. Depending on which of these strings it finds within the entire entry in column A’s cell, it should write one of these 5 strings to the corresponding row in column C’s cell. Else if it doesn’t find any of the five, it would write “Other” to the corresponding row in column C. For example, if Column A2’s cell contained the string “ProLiant Asus DL980 G7, the correct code would write “Asus” to column C2’s cell. It should do this for every single cell in column A, writing the appropriate string to the corresponding cell in column C. Every cell in column A will have one of the five strings Mac, Asus, AlienWare, Sony, or Gigabit within it. If it doesn’t contain one of those strings, I want the corresponding cell in column 3 to have the string “Other” written to it. So far, this is the code that I have (not much at all):
import openpyxl
wb = openpyxl.load_workbook(path)
sheet = wb.active
for i in range (sheet.max_row):
cell1 = sheet.cell (row = i, column = 1)
cell2 = sheet.cell (row = I, column = 3)
# missing code here
wb.save(path)
You haven't tried writing any code to solve the problem. You might want to first get openpyxl to write to the excel workbook and verify that is working - even if it's dummy data. This page looks helpful - here
Once that is working all you'd need is a simple function that takes in a string as an argument.
def get_column_c_value(string_from_column_a):
if "Lenovo" in string_from_column_a:
return "Lenovo"
else if "HP" in string_from_column_a:
return "HP"
# strings you need to check for here in the same format as above
else return "other"
Try out those and if you have any issues let me know where you're getting stuck.
I have not worked much with openpyxl, but it sounds like you are trying to do a simple string search.
You can access individual cells by using
cell1.internal_value
Then, your if/else statement would look something like
if "HP" in str(cell1.internal_value):
Data can be assigned directly to a cell so you could have
ws['C' + str(i)] = "HP"
You could do this for all of the data in your cells
I am new to Python, and I am trying to sort of 'migrate' a excel solver model that I have created to Python, in hopes of more efficient processing time.
I receive a .csv sheet that I use as my input for the model, it is always in the same format.
This model essentially uses 4 different metrics associated with product A, B and C, and I essentially determine how to price A, B, and C accordingly.
I am at the very nascent stage of effectively inputting this data to Python. This is what I have, and I would not be surprised if there is a better approach, so open to trying anything you veterans have to recommend!
import csv
f = open("141881.csv")
for row in csv.reader(f):
price = row[0]
a_metric1 = row[1]
a_metric2 = row[2]
a_metric3 = row[3]
a_metric4 = row[4]
b_metric1 = row[7]
b_metric2 = row[8]
b_metric3 = row[9]
b_metric4 = row[10]
c_metric1 = row[13]
c_metric2 = row[14]
c_metric3 = row[15]
c_metric4 = row[16]
The .csv file comes in the format of price,a_metric1,a_metric2,a_metric3,a_metric4,,price,b_metric1,b_metric2,b_metric3,b_metric4,price,,c_metric1,c_metric2,c_metric3,c_metric4
I skip the second and third price column as they are identical to the first one.
However when I run the python script, I get the following error:
c_metric1 = row[13]
IndexError: list index out of range
And I have no idea why this occurs, when I can see the data is there myself (in excel, this .csv file would go all the way to column Q, or what I understand as row[16].
Your help is appreciated, and any advice on my approach is more than welcomed.
Thanks in advance!
Using print() can be your friend here:
import csv
with open('141881.csv') as file_handle:
file_reader = csv.reader(file_handle)
for row in file_reader:
print(row)
The code above will print out EACH row.
To print out ONLY the first row replace the for loop with: print(file_reader.__next__()) (assuming Python3)
Printing out row(s) will allow you to see what exactly a "row" is.
P.S.
Using with is advisable because it handles the opening and closing of the file for you
Look into pandas.
Read file as:
data = pd.read_csv('141881.csv'))
to read a columns:
col = data.columns['column_name']
to read a row:
row = data.ix[row_number]
CSV Module in Python transforms a spreadsheet into a matrice : a list of list
The python module to read csv transform each line of your input into a list.
For each row, it will split the row into a list of cell.In other words, one array is composed of as many columns you have into your excel spreadsheet.
Try in terminal:
>>> f = open("141881.csv")
>>> print csv.reader(f)
>>>[["id", "name", "company", "email"],[1563, "defoe", "SuperFastCompany",],["def#superfastcie.net"],[1564, "doe", "Awsomestartup", "doe#awesomestartup"], ...]`
So that's why you iterate throught the rows of your spreadsheet assigning the value into a new variable.
I recommend you to read on basics of list manipulation.
But...
What is an IndexError? catching exception:
If one cell is empty or one row has less columns than other: it will thraw an Error. Such as you described. IndexError means Python wasn't able to find a value for this specific cell. In other words if some line of your excel spreadsheet are smaller than the other it will say there is no such value to asign and throw an Index Error. That why knowing how to catch exception could be very useful to see the problem. Try to verify that the list of each has the same lenght if not assign an empty value for example
try:
#if row has always 17 cells with values
#I can just assign it directly using a little trick
price,a_metric1,a_metric2,a_metric3,a_metric4,,price,b_metric1,b_metric2,b_metric3,b_metric4,price,c_metric1,c_metric2,c_metric3,c_metric4 = row'
except IndexError:
# if there is no 17 cells
# tell me how many cells is actually in the list
# you will see there that there less than 17 elements
print len(row)
Now you can just skip the error by assigning None value to those who don't appears in the csv file
You can read more about Catching Exception
Thanks everyone for your input - printing the results made me realize that I was getting the IndexError because of the very first row, which only had headers. Skipping that row got rid of the error.
I will look into pandas, it seems like that will be useful for the type of work I am doing.
Thanks again for all of your help, much appreciated.
This problem is specific wrt using xlrd package in python
I got row of excel which is in form of list but each item is integer value;
type:value
this is not string. The row is save by;
import xlrd
book = xlrd.open_workbook('myfile.xls')
sh = book.sheet_by_index(0)
for rx in range(sh2.nrows):
row = sh.row(rx)
so row saved has value;
row=[text:u'R', text:u'xyz', text:u'Y', text:u'abc', text:u'lmn', empty:'']
This is a list of int. I want the values extracted -
R
xyz
Y
abc
lmn
''
There has to be some method to convert it, but not sure which and how.
Now, I know I can get value just by;
cell_value = sh.cell_value(rowx=rx, colx=1)
but my program requires to collect rows first and then extract values from save row.
Thanks.
The row is a sequence of Cell instances, which have the attribute value.
for cell in row:
cell_value = cell.value
# etc
I am not sure why you want to do it this way - the reference to collecting rows first seems odd to me, given that you can get the rows directly from the worksheet.