I'm really new with Python, and I’m working with gspread and Google Sheets. I have several spreadsheets I would like to pull data from. They all have the same name with an appended numerical value (e.g., SpreadSheet(1), SpreadSheet(2), SpreadSheet(3), etc.)
I would like to parse through each spreadsheet, pull the data, and generate a single data frame with the data. I can do this quite easily with a single spreadsheet, but I’m having trouble doing it with several.
I can create a list of the spreadsheets titles with the code below, but I'm not sure if that's the right direction.
titles_list = []
for spreadsheet in client.openall():
titles_list.append(spreadsheet.title)
Using a mix of both your starting code and #Tanaike's answer here you have a snippet of code that does what you expect.
# Create an authorized client
client = gspread.authorize(credentials)
# Create a list to hold the values
values = []
# Get all spreadsheets
for spreadsheet in client.openall():
# Get spreadsheet's worksheets
worksheets = spreadsheet.worksheets()
for ws in worksheets:
# Append the values of the worksheet to values
values.extend(ws.get_all_values())
# create df from values
df = pd.DataFrame(values)
print(df)
Hope I was clear.
I believe your goal as follows.
You want to merge the values retrieved from all sheets in a Google Spreadsheet.
You want to convert the retrieved values to the dataframe.
Each sheet has 4 columns, 100 rows and no header rows.
You want to achieve this using gspread with python.
You have already been able to get and put values for Google Spreadsheet using Sheets API.
For this, how about this answer?
Flow:
Retrieve all sheets in the Google Spreadsheet using worksheets().
Retrieve all values from all sheets using get_all_values() and merge the values.
Convert the retrieved values to the dataframe.
Sample script:
spreadsheetId = "###" # Please set the Spreadsheet ID.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
worksheets = spreadsheet.worksheets()
values = []
for ws in worksheets:
values.extend(ws.get_all_values())
df = pd.DataFrame(values)
print(df)
References:
worksheets()
get_all_values()
Related
I have a data source that has a column containing text that is hyperlinked. When I read at pandas, the hyperlinks are gone. I want to still get the URL of each of the rows and create it into a new column called "URL".
So, the idea is to create a new column that contains the URL. In this example, the pandas dataframe will have 4 columns:
Agreement Code
URL
Entity Name
Agreement Date
As per my knowledge pandas doesn't have this functionality as there is an open feature request for hyperlinks here. Moreover you can use openpyxl to accomplish this task:
import openpyxl
### Loads the worksheet
wb = openpyxl.load_workbook('file_name.xlsx')
ws = wb.get_sheet_by_name('sheet_name')
### You can access the hyperlinks like this by changing row number
print(ws.cell(row=2, column=1).hyperlink.target)
You can iterate row-wise to get all the hyperlinks and store in a new column. For more details regarding openpyxl please refer the docs.
I am working in an automation that consists on sending the values of a dataframe to a google sheet, the following is my code for a sample dataframe, which is similar to the one I am working on:
#Creates a dictionary containing values for 1 column to be used in pandas dataframe
col = {'id':["1"],'name':["Juan"], 'code':["1563"], 'group':["3"], 'class':["A"]}
#Creates a pandas dataframe
df = pd.DataFrame(col)
df
I need to send to google sheet just the dataframe values, without the header, this is just a sample of the data I am working with, and of course I need the header in the dataframe because I am doing some column transformations in the dataframe before sending it to sheets, due to data comes from an API.
This is the code to send the dataframe to google sheet:
import gspread
from gspread_dataframe import set_with_dataframe
gc = gspread.service_account(filename='API_creds.json')
sheet = gc.open_by_key('SHEET_ID')
# Sending values from aimleap dataframe google sheet
row=1
col=1
worksheet = sheet.get_worksheet(0)
set_with_dataframe(worksheet,df,row,col)
After sending the dataframe to sheets through set_with_dataframe(worksheet,df,row,col), sheets gets updated with the dataframe including the header, I just need to update the sheet with just the values of the dataframe, how could I modify the parameters of set_with_dataframe() to achieve this?
This is how it looks when sending the dataframe:
You should be able to do this by setting the include_column_header argument to False.
set_with_dataframe(worksheet,df,row,col, include_column_header=False)
So basically I am making a discord bot that takes trades for ingame items on a game I play and stores the order in a google sheet. What would be the easiest way to do this through python, I know how to do all the bot stuff but when it comes to accessing a google sheet, searching through it and collecting certain rows of information I cant find much that helps. What module would I use to make this easiest as possible, the module needs to be able to search through the sheet for specific values in one column, find the first find the first empty cell in a column as well as collect all the information from a row. If anyone knows a good module for doing this it would be greatly appreciated.
Note: I have set up the OAuth and all that kind of stuff for the sheets api, I saw that there was a bunch of modules that make accessing the sheet easier however so I was wondering which one was the best at making the coding easier as I am not super experienced.
Use Googlesheets API to get the data and then pandas to read in the data as a dataframe. Once you have the dataframe, Pandas can accomplish this in various ways: "needs to be able to search through the sheet for specific values in one column, find the first empty cell in a column as well as collect all the information from a row"
from googleapiclient.discovery import build
from google.oauth2.credentials import Credentials
import pandas as pd
SAMPLE_SPREADSHEET_ID = 'sheet ID' # your sheet ID
SAMPLE_NAME = 'sheet name' # your sheet name
RANGE = '!A1:D2' # your row/col sheet range
TOKEN_PATH = 'token.json' # path to your token file
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']
SAMPLE_RANGE_NAME = SAMPLE_NAME + RANGE
creds = Credentials.from_authorized_user_file(TOKEN_PATH, SCOPES)
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId=SAMPLE_SPREADSHEET_ID,
range=SAMPLE_RANGE_NAME).execute()
values = result.get('values', [])
df = pd.DataFrame(data=values[1:], columns=values[0])
Adapted from https://developers.google.com/sheets/api/quickstart/python
I am willing to scrape a website for some information. It would be 3 to 4 columns. The difficult part is, i want to export all the data in to the google sheets and make the crawler run after some specific intervals. I 'll be using scrapy for this purpose. Any suggestions on how can i do this (by making custom pipeline or any other way as i don't have much experience in writing custom pipelines)
You can use the Google API and python pygsheets module.
Refer this link for more details Click Here
Please see the sample code and this might help you.
import pygsheets
import pandas as pd
#authorization
gc = pygsheets.authorize(service_file='/Users/desktop/creds.json')
# Create empty dataframe
df = pd.DataFrame()
# Create a column
df['name'] = ['John', 'Steve', 'Sarah']
#open the google spreadsheet (where 'PY to Gsheet Test' is the name of my sheet)
sh = gc.open('PY to Gsheet Test')
#select the first sheet
wks = sh[0]
#update the first sheet with df, starting at cell B2.
wks.set_dataframe(df,(1,1))
I'm working on a program that generates a dynamic google spreadsheet report.
Sometimes when I create a new row (with data) in google spreadsheet using gspread append_row function it doesn't work as expected and no exception is thrown. The new line is added but there is no data inside.
example code below:
#!/usr/bin/python
import gspread
# report line data
report_line = ['name', 'finished <None or int>', 'duration <str>', 'id']
connection = gspread.login('email#google.com', 'password')
spreadsheet = connection.open('report_name')
sheet = spreadsheet.sheet1
sheet.append_row(report_line)
Am I missing something? Is this a known issue?
How can I be certain that the append_row function completes successfully?
It appends a new row after the last row in the sheet. A sheet has 1000 rows by default so you should find your appended row at index=1001.
Try to resize the sheet to the number of rows that are present:
sheet.resize(1)
You should now be able to append rows at the end of your data rather than at the end of the sheet. The number of rows has to be >= 1.
I'm adding to #BFTM's answer:
make sure you are doing the sheet.resize(1) one time, and not everytime you want to add append a row (it will delete all the rows you wrote beyond 1)
To get the number of rows dynamically, in case you don't know to do it manually, you can look at this answer. (get values from one column and find the length)
If you have direct accsess to the spreadsheet you can just delete the empty rows once, and then use append_row() as usual.