Populating google spreadsheet by row, not by cell - python

I have a spreadsheet whose values I want to populate with values from dictionaries within a list. I wrote a for loop that updates cell by cell, but it is too slow and I get the gspread.httpsession.HTTPError often. I am trying to write a loop to update row by row. Thats what I have:
lstdic=[
{'Amount': 583.33, 'Notes': '', 'Name': 'Jone', 'isTrue': False,},
{'Amount': 58.4, 'Notes': '', 'Name': 'Kit', 'isTrue': False,},
{'Amount': 1083.27, 'Notes': 'Nothing', 'Name': 'Jordan', 'isTrue': True,}
]
Here is my cell by cell loop:
headers = wks.row_values(1)
for k in range(len(lstdic)):
for key in headers:
cell = wks.find(key)
cell_value = lstdic[k][key]
wks.update_cell(cell.row + 1 + k, cell.col, cell_value)
What it does is it finds a header that corresponds to the key in the list of dictionaries and updates the cell under it. The next iteration the row is increased by one, so it updates cells in the same columns, but next row. This is too slow and I want to update by row. My attempt:
headers = wks.row_values(1)
row=2
for k in range(len(lsdic)):
cell_list=wks.range('B%s:AA%s' % (row,row))
for key in headers:
for cell in cell_list:
cell.value = lsdic[k][key]
row+=1
wks.update_cells(cell_list)
This one updates each row quickly, but with the same value. So, the third nested for loop assigns the the same value for each cell. I am breaking my head trying to figure out how to assign right values to the cells. Help appreciated.
P.S. by the way I am using headers because I want a certain order in which values in the google spreadsheet should appear.

The following code is similar to Koba's answer but writes the full sheet at once instead of per row. This is even faster:
# sheet_data is a list of lists representing a matrix of data, headers being the first row.
#first make sure the worksheet is the right size
worksheet.resize(len(sheet_data), len(sheet_data[0]))
cell_matrix = []
rownumber = 1
for row in sheet_data:
# max 24 table width, otherwise a two character selection should be used, I didn't need this.
cellrange = 'A{row}:{letter}{row}'.format(row=rownumber, letter=chr(len(row) + ord('a') - 1))
# get the row from the worksheet
cell_list = worksheet.range(cellrange)
columnnumber = 0
for cell in row:
cell_list[columnnumber].value = row[columnnumber]
columnnumber += 1
# add the cell_list, which represents all cells in a row to the full matrix
cell_matrix = cell_matrix + cell_list
rownumber += 1
# output the full matrix all at once to the worksheet.
worksheet.update_cells(cell_matrix)

I ended up writing the following loop that fills a spreadsheet by row amazingly fast.
headers = wks.row_values(1)
row = 2 # start from the second row because the first row are headers
for k in range(len(lstdic)):
values=[]
cell_list=wks.range('B%s:AB%s' % (row,row)) # make sure your row range equals the length of the values list
for key in headers:
values.append(lstdic[k][key])
for i in range(len(cell_list)):
cell_list[i].value = values[i]
wks.update_cells(cell_list)
print "Updating row " + str(k+2) + '/' + str(len(lstdic) + 1)
row += 1

Related

How do I update a value in a cell without coordinates?

I find the required cell through the row number and column name (Column Search is necessary because columns can change their place in the table) and try to update its value using gspread. The code works without error, but for some reason it updates a completely different cell (A1)
import gspread
sa = gspread.service_account('path to service_account.json')
sh = sa.open('name of sheet')
wks = sh.worksheet('Sheet 1')
all = wks.get_all_records() #Get a list of dictionaries from the sheet
tab = 'Column number 3' #We will search and change the value from this column
cell_to_update = (all[0].get(tab)) #If we print() this, we get value of cell C2
wks.update(cell_to_update,'any text')
I do not know why, but it updates cell A1, although it should update cell C2
Thanks to a tip from #BRemmelzwaal, I found the answer to my question:
all = wks.get_all_records() #Get a list of dictionaries from the sheet
tab = 'Column number 3' #We will search and change the value from this column
value_to_update = (all[0].get(tab)) #If we print() this, we get value of cell C2
cell_to_update = wks.find(str(value_to_update))
wks.update_cell(cell_to_update.row, cell_to_update.col, 'any text')

Parsing excel sheet in python with no empty spaces if column extends across multiple rows

I am trying to parse an excel sheet in which some columns spread across multiple rows. Here is the sheet -
I am trying to get all the inputs and those start from column E. In the sheet 'human readable' is just 1 column in row 4. When I print row 4 I get the output - 'human readable', '', '', '', 'from OVERRIDE_EN_TT', 'test00[3]', 'test00[2:0]', 'test08[2:0]', 'test09[2:0]', 'test10[2:0]', 'test11[2:0]. Those 2 empty spaces after 'human readable' are causing error for me. The code I have is -
import xlrd
wbook = xlrd.open_workbook(Book2.xlsx, 'r')
my_sheet = wbook.sheet_by_name("Sheet1")
for ii in range(0, my_sheet.nrows):
if "table_start" in str(my_sheet.cell(ii, 0)):
starting_row = ii+1
hr_row = ii+3
len_row = len(my_sheet.row(hr_row))
for aa in range(1, len_row):
if ((my_sheet.cell(starting_row, aa).value == 'INPUT') and not \
(my_sheet.cell(starting_row+1, aa).value == 'human readable'):
starting_column = aa
break
How can I get starting column as column E? Or if there is any other way to do it, how can I do it?
I want a list of items from column E to end . I want starting column in order to run a loop like this -
for i in range(1, len_row):
for j in range(starting_column, ending column):
my_list.append(my_sheet.cell(i, j).value)
I want my_list like -
['', 'from OVERRIDE_EN_TT', 'test00[3]', 'test00[2:0]', 'test08[2:0]', 'test09[2:0]', 'test10[2:0]', 'test11[2:0]]
I am having trouble to get the starting column because of the merged cell.

Is it possible to update a row of data using position of column (e.g. like a list index) in Python / SQLAlchemy?

I am trying to compare two rows of data to one another which I have stored in a list.
for x in range(0, len_data_row):
if company_data[0][0][x] == company_data[1][0][x]:
print ('MATCH 1: {} - {}'.format(x, company_data[0][0][x]))
# do nothing
if company_data[0][0][x] == None and company_data[1][0][x] != None:
print ('MATCH 2: {} - {}'.format(x, company_data[1][0][x]))
# update first company_id with data from 2nd
if company_data[0][0][x] != None and company_data[1][0][x] == None:
print ('MATCH 3: {} - {}'.format(x, company_data[0][0][x]))
# update second company_id with data from 1st
Psuedocode of what I want to do:
If data at index[x] of a list is not None for row 2, but is blank for row 1, then write the value of row 2 at index[x] for row 1 data in my database.
The part I can't figure out is if in SQLAlchemy you can do specify which column is being updated by an "index" (I think in db-land index means something different than what I mean. What I mean is like a list index, e.g., list[1]). And also if you can dynamically specify which column is being updated by passing a variable to the update code? Here's what I'm looking to do (it doesn't work of course):
def some_name(column_by_index, column_value):
u = table_name.update().where(table_name.c.id==row_id).values(column_by_index=column_value)
db.execute(u)
Thank you!

OpenPyXL - ReadOnly: How to skip empty rows without knowing when they occur?

i'm pretty new to programming so please bear with me if my code is not nice and the answer is too obvious. :)
I want to parse an excel file into a directory so i can later access them via key. I won't know how the excel file will be structured before parsing it. So I can't just code it that way to skip a certain empty row since they will be random.
For this, i am using Python 3 and OpenPyXl (Read Only). This is my code:
from openpyxl import load_workbook
import pprint
# path to file
c = "test.xlsx"
wb = load_workbook(filename=c, read_only=True, data_only=True)
# key for directory
data = {}
# list of worksheet names
wsname = []
# values in rows per worksheet
valuename = []
# took this odd numbers since pprint organizes the numbers weird when 1s and 10s are involved
# counter for row
k = 9
# counter for column
i = 10
# splits name of xlsx - file from .xlsx
workbook = c.split(".")[0]
data[workbook] = {}
for ws in wb.worksheets:
# takes worksheet name and parses it into the wsname list
wsname.append(ws.title)
wsrealname = wsname.pop()
worksheet = wsrealname
data[workbook][worksheet] = {}
for row in ws.rows:
k += 1
for cell in row:
# reads value per row and column
data[workbook][worksheet]["Row: " + str(k) + " Column: " + str(i)] = cell.value
i += 1
i = 10
k = 9
pprint.pprint(data)
And with this i get output like this:
{'test': {'Worksheet1': {'Row: 10 Column: 10': None,
'Row: 10 Column: 11': None,
'Row: 10 Column: 12': None,
'Row: 10 Column: 13': None,
'Row: 11 Column: 10': None,
'Row: 11 Column: 11': 'Test1',
'Row: 11 Column: 12': None,
'Row: 11 Column: 13': None}}}
Which is the Output i want, despite the fact they i want to skip in this example the whole Row 10, since all values are None and therefore empty.
As mentioned, I don't know when empty rows will occur so I can't just hardcode a certain row to be skipped. In Read Only Mode, if you print(row) there will be just 'EmptyCell' in the row like this:
(<EmptyCell>, <EmptyCell>, <EmptyCell>, <EmptyCell>)
I tried to let my program check with set() whether there are duplicates in the row "values".
if len(set(row)) == 1:
.....
but that doesn't solve this issue, since I get this Error Message:
TypeError: unhashable type: 'ReadOnlyCell'
If I compare the cell.value with 'None' and exlude all 'Nones', I get this Output:
{'test': {'Worksheet1': {'Row: 11 Column: 11': 'Test1'}}}
which is not beneficial, since I just want just to skip cells if the whole row is empty. Output should be like that:
{'test': {'Worksheet1': {'Row: 11 Column: 10': None,
'Row: 11 Column: 11': 'Test1',
'Row: 11 Column: 12': None,
'Row: 11 Column: 13': None}}}
So, could you please help in figuring out how to skip cells only if the complete row (and therefore all cells) is empty?
Thanks a lot!
from openpyxl.cell.read_only import EmptyCell
for row in ws:
empty = all(isinstance(cell, EmptyCell) for cell in row) # or check if the value is None
NB. in read-only mode avoid multiple calls like data[workbook][worksheet]['A1'] as they will force the library to parse the worsheet again and again
Just create your custom generator which would yield only not empty rows:
def iter_rows_with_data(worksheet):
for row in worksheet.iter_rows(values_only=True):
if any(row):
yield row

Google chart input data

I have a python script to build inputs for a Google chart. It correctly creates column headers and the correct number of rows, but repeats the data for the last row in every row. I tried explicitly setting the row indices rather than using a loop (which wouldn't work in practice, but should have worked in testing). It still gives me the same values for each entry. I also had it working when I had this code on the same page as the HTML user form.
end1 = number of rows in the data table
end2 = number of columns in the data table represented by a list of column headers
viewData = data stored in database
c = connections['default'].cursor()
c.execute("SELECT * FROM {0}.\"{1}\"".format(analysis_schema, viewName))
viewData=c.fetchall()
curDesc = c.description
end1 = len(viewData)
end2 = len(curDesc)
Creates column headers:
colOrder=[curDesc[2][0]]
if activityOrCommodity=="activity":
tableDescription={curDesc[2][0] : ("string", "Activity")}
elif (activityOrCommodity == "commodity") or (activityOrCommodity == "aa_commodity"):
tableDescription={curDesc[2][0] : ("string", "Commodity")}
for i in range(3,end2 ):
attValue = curDesc[i][0]
tableDescription[curDesc[i][0]]= ("number", attValue)
colOrder.append(curDesc[i][0])
Creates row data:
data=[]
values = {}
for i in range(0,end1):
for j in range(2, end2):
if j == 2:
values[curDesc[j][0]] = viewData[i][j].encode("utf-8")
else:
values[curDesc[j][0]] = viewData[i][j]
data.append(values)
dataTable = gviz_api.DataTable(tableDescription)
dataTable.LoadData(data)
return dataTable.ToJSon(columns_order=colOrder)
An example javascript output:
var dt = new google.visualization.DataTable({cols:[{id:'activity',label:'Activity',type:'string'},{id:'size',label:'size',type:'number'},{id:'compositeutility',label:'compositeutility',type:'number'}],rows:[{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]}]}, 0.6);
it seems you're appending values to the data but your values are not being reset after each iteration...
i assume this is not intended right? if so just move values inside the first for loop in your row setting code

Categories

Resources