How to get first 300 rows of Google sheet via gspread - python

Set-up
I create a Pandas dataframe from all records in a google sheet like this,
df = pd.DataFrame(wsheet.get_all_records())
as explained in the Gspread docs.
Issue
It seems Python stays in limbo when I execute the command since today. I don't get any error; I interrupt Python with KeyboardInterrupt after a while.
I suspect Google finds the records too much; ±3500 rows with 18 columns.
Question
Now, I actually don't really need the entire sheet. The first 300 rows would do just fine.
The docs show values_list = worksheet.row_values(1), which would return the first row values in a list.
I guess I could create a loop, but I was wondering if there's a build-in / better solution?

I believe your goal as follows.
You want to retrieve the values from 1st row to 300 row from a sheet in Google Spreadsheet.
From I suspect Google finds the records too much; ±3500 rows with 18 columns., you want to retrieve the values from the columns "A" to "R"?
You want to convert the retrieved values to the dataFrame.
You want to achieve this using gspread.
In order to achieve this, I would like to propose the following sample script.
In this answer, I used the method of values_get.
Sample script:
spreadsheetId = "###" # Please set the Spreadsheet ID.
rangeA1notation = "Sheet1!A1:R300" # Please set the range using A1Notation.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
values = spreadsheet.values_get(rangeA1notation)
v = values['values']
df = pd.DataFrame(v)
print(df)
Note:
Please set the range as the A1Notation. In this case, when "A1:R300" instead of "Sheet1!A1:R300" is used, the values are retrieved from the 1st tab in the Spreadsheet.
When "A1:300" is used, the values are retrieved from the column "A" to the last column of the sheet.
When the 1st row is the header row and the data is after the 2nd row, please modify as follows.
From
df = pd.DataFrame(v)
To
df = pd.DataFrame(v[1:], columns=v[0])
Reference:
values_get

I used openpyxl package.
import openpyxl as xl
wb = xl.load_workbook('your_file_name')>
sheet = wb['name_of_your_sheet']
Specify your range.
for row in range(1, 300):
Now you can perform many opertions e.g this will point at row(1) & col(3) in first iteration
cell = sheet.cell(row, 3)
if you want to change the cell value
cell.value = 'something'
It's has pretty much all of it.
Here is a link to the docs: https://openpyxl.readthedocs.io/en/stable/

Related

How to use python to fill specific data to column in excel based on information of the first column?

I have a problem with an excel file! and I want to automate it by using python script to complete a column based on the information of the first column: for example:
if data == 'G711Alaw 64k' or 'G711Ulaw 64k'
print('1-Jan) till find it == '2-Jan' then print('2-Jan') and so on.
befor automate
I need its looks like this after automate:
after automate
Is there anyone can help me to do solve this issue?
The file:
the excel file
Thanks a lot for your help.
Try this, pandas reads your jan-1 is datetime type, if you need to change it to a string you can set it directly in the code, the following code will directly assign the value read to the second column:
import pandas as pd
df = pd.read_excel("add_date_column.xlsx", engine="openpyxl")
sig = []
def t(x):
global sig
if not isinstance(x.values[0], str):
tmp_sig = x.values[0]
if tmp_sig not in sig:
sig = [tmp_sig]
x.values[1] = sig[-1]
return x
new_df = df.apply(t, axis=1)
new_df.to_excel("new.xlsx", index=False)
The concept is very simple :
If the value is date/time, copy to the [same row, next column].
If not, [same row, next column] is copied from [previous row, next
column].
You do not specifically need Python for this task. The excel formula for this would be;
=IF(ISNUMBER(A:A),A:A,B1)
Instead of checking if it is date/time, I took adavantage of the fact that the rest of the entries are alphanumeric (including both alphabets and numbers). This formula is applied on the new column.
Of course, you might already be in Python and just work within the same environment. So, here's the loop :
for i in range(len(df)):
if type(df["Orig. Codec"][i]) is datetime:
df["Column1"][i] = df["Orig. Codec"][i]
else:
df["Column1"][i] = df["Column1"][i-1]
There might be ways to lambda function for the same concept, not that I am aware of how to apply lambda and shift at the same time.

Get specific sheet range using gspread acell

I want to get specific sheet range which consists of multiple row for example :
So my code is :
work_book = sheet.get_worksheet(sheet_idx)
row_value = work_book.acell('A1:C3').value
print(row_value)
When I run the code I only get value from cell A1 (which is empty). If I use A2:C3, the result will be Hello. I expected it will show entire sheet range. Should I loop or something? Please assist.
acell only returns a single cell. For a range you should use get https://gspread.readthedocs.io/en/latest/api.html#gspread.models.Worksheet.get

Retrieve first empty column and row using xlwings

I am looking for a way to find the first empty column and the row. As a part of my use case, I am trying to find out H3 (to add current date) and then H4 and H5 (to add my daily metrics) [screenshot attached]. I have tried below with xlwings.
import xlwings as xw
from xlwings import Range, constants
wb = xw.Book(r"path to xlsx")
sht1 = wb.sheets['Sheet1']
sht1.range('G3').value = current_date
sht1.range('G4').value = 5678
sht1.range('G5').value = 1234
wb.save(r"path to xlsx")
The issue is I have hardcoded the column and row references in the script. I want H3, H4 and H5 to find out dynamically through xlwings and update the metrics programmatically. Can someone guide me on this?
You can do this by finding the last column of the data used. Here are two options to get this data:
Using SpecialCells(11), which is a VBA function accessed through the .api, information about this can be found here.
Using .end("right"), the equivalent of ctrl + right in Excel.
Option 1 would work well if there is no other data in the spreadsheet, so the last cell in the sheet would be the correct column. This is convenient and doesn't require knowledge of the starting cell (in this case B3).
Option 2 would be preferred for spreadsheets where other data may be on the sheet, so the last cell will not necessarily be in the last column of your desired data. This option does, however, require no missing columns as moving the last right-most cell in the group of cells would therefore not strictly be the last column of the data.
An alternative could be to import all the data to Python as a pd.DataFrame, then append an additional column and return. If you need to append many columns of data, this would probably be more efficient (especially if you already have a DataFrame of the data you are pasting to Excel).
The last_col is an integer, as this is most easily manipulated (such as increasing by 1). Therefore, the range has also be modified to make use of this, instead of using A1 style (e.g. range("A1")), a tuple is used of format (row_num, col_num) (e.g. range((row_num, col_num))).
import xlwings as xw
import datetime as dt
current_date = dt.date.today().strftime("%d-%b-%y")
wb = xw.Book(r"path to xlsx")
sht1 = wb.sheets['Sheet1']
# options 1: last column in the sheet through SpecialCells
last_col = sht1.range("A1").api.SpecialCells(11).Column
# option 2: starting at cell B3, the first in the date headers, move to the right (like ctrl+right in Excel)
last_col = sht1.range("B3").end("right").column
# paste new values
sht1.range((3, last_col+1)).value = current_date
sht1.range((4, last_col+1)).value = 5678
sht1.range((5, last_col+1)).value = 1234
wb.save(r"path to xlsx")

How to fix: linksOutToCells[] are not allowed for update_rows() operation

I'm using the Smartsheet Python SDK and attempting to update rows in a smartsheet in which many of the cells to be updated have existing links out to other sheets. I want to update the cell values with data from a pandas df, while keeping the links out intact. When I attempt to update_rows with new cell values (but keeping the original links_out_to_cells object attached to the original cell), I get API Error 1032: "The attribute(s) cell.linksOutToCells[] are not allowed for this operation." Does anyone know a good workaround for this issue?
Here is my evaluate_row_and_build_updates function (passing in the smartsheet row and the row from the pandas df -- the first value in each row in the smartsheet is meant to be preserved with the update)
def evaluate_row_and_build_updates(ss_row, df_ro):
new_row = smartsheet.models.Row()
new_row.id = ss_row.id
new_row.cells = ss_row.cells
empty_cell_lst = list(new_row.cells)[1:]
for i in range(len(empty_cell_lst)):
empty_cell_lst[i].value = df_row[1][i]
return new_row
When making the request to update the cell values on the source cells for the links you don't have to include the linksOutToCells object. You can just update the cells value. The link out to the other sheet will stay in place and the new cell value you added will be linked out to the other sheets.
It could look like this:
# Build new cell value
new_cell = smartsheet.models.Cell()
new_cell.column_id = <COLUMN_ID>
new_cell.value = "testing"
# Build the row to update
new_row = smartsheet.models.Row()
new_row.id = <ROW_ID>
new_row.cells.append(new_cell)
# Update rows
updated_row = smar_client.Sheets.update_rows(
<SHEET_ID>,
[new_row])
Running that code on a cell that has a link going out will keep the cell link in place.

Google Sheet Python API: how get values in first sheet if don't know range?

The Python API for Google sheets has a get method to get values from a spreadsheet, but it requires a range argument. I.e., your code must be something like this,
sheets_service = service.spreadsheets().values()
data = sheets_service.get(spreadsheetId = _GS_ID, range = _SHEET_NAME).execute()
and you cannot omit the range argument, nor will a value of '' work, or a value of 'Sheet1' or similar (unless there is a sheet named Sheet1).
What if I do not know the sheet name ahead of time? Can I reference the first or left-most sheet somehow? Failing that, is there a way to get a list of all the sheets? I have been looking at the API and have not found anything for that purpose, but this seems like such a basic need that I feel I'm missing something obvious.
You can retrieve the values and metadata of Spreadsheet using spreadsheets.get of Sheets API. By using the parameter of fields, you can retrieve various information of the Spreadsheet.
Sample 1 :
This sample retrieves the index, sheet ID and sheet name in Spreadsheet. In this case, index: 0 means the first sheet.
service.spreadsheets().get(spreadsheetId=_GS_ID, fields='sheets(properties(index,sheetId,title))').execute()
Sample 2 :
This sample retrieves the sheet name, the number of last row and last column of data range using sheet index. When 0 is used for the sheet index, it means the first sheet.
res = service.spreadsheets().get(spreadsheetId=_GS_ID, fields='sheets(data/rowData/values/userEnteredValue,properties(index,sheetId,title))').execute()
sheetIndex = 0
sheetName = res['sheets'][sheetIndex]['properties']['title']
lastRow = len(res['sheets'][sheetIndex]['data'][0]['rowData'])
lastColumn = max([len(e['values']) for e in res['sheets'][sheetIndex]['data'][0]['rowData'] if e])
Reference :
spreadsheets.get
Convert column index into corresponding column letter
For column, you can see the method about converting from index to letter at above thread.

Categories

Resources