Get hyperlink from a cell in google Sheet api V4 - python

I want to get the hyperlink of a cell (A1, for example) in Python. I have this code so far. Thanks
properties = {
"requests": [
{
"cell": {
"HyperlinkDisplayType": "LINKED"
},
"fields": "userEnteredFormat.HyperlinkDisplayType"
}
]
}
result = service.spreadsheets().values().get(
spreadsheetId=spreadsheet_id, range=rangeName, body=properties).execute()
values = result.get('values', [])

How about using sheets.spreadsheets.get? This sample script supposes that service of your script has already been able to be used for spreadsheets().values().get().
Sample script :
spreadsheetId = '### Spreadsheet ID ###'
range = ['sheet1!A1:A1'] # This is a sample.
result = service.spreadsheets().get(
spreadsheetId=spreadsheetId,
ranges=range,
fields="sheets/data/rowData/values/hyperlink"
).execute()
If this was not useful for you, I'm sorry.

It seems to me like this is the only way to actually get the link info (address as well as display text):
result = service.spreadsheets().values().get(
spreadsheetId=spreadsheetId, range=range_name,
valueRenderOption='FORMULA').execute()
values = results.get('values', [])
This returns the raw content of the cells which for hyperlinks look like this for each cell:
'=HYPERLINK("sample-link","http://www.sample.com")'
For my use I've parsed it with the following simple regex:
r'=HYPERLINK\("(.*?)","(.*?)"\)'

You can check the hyperlink if you add at the end:
print (values[0])

Related

Python for Google Sheets: Create new sheet (in the same workbook) automatically and write a new dataframe into it

I have a loop that runs 'n' times and generates 1 new dataframe in each iteration. How can I automatically create a new Google sheet (in the same workbook) every time a new dataframe is created at the end of each iteration?
I use this code to write one dataframe to a Google Sheet which is already created manually by me:
#Code to write one dataframe to Google Sheets:
cell_range_insert= 'B7'
values = df1.to_json()
body = {'values': values}
response_date= service.spreadsheets().values().append(
spreadsheetId=spreadsheet_id,
valueInputOption='RAW',
range=cell_range_insert,
body=dict(
majorDimension= 'ROWS',
values=df1.T.reset_index().T.values.tolist()
)
).execute()
#Code to generate one dataframe in each iteration:
Ticker_List= ["AAPL", "GOOG", "AMZN"]
for Ticker in Ticker_List:
values = [['=GOOGLEFINANCE("' + str(Ticker) + '", "ALL", "1/1/2014", "2/6/2018" ,"DAILY")']]
cell_range_insert = 'B7'
body = {'values': values}
#Send formula to Google Sheet
service.spreadsheets().values().update(
spreadsheetId=spreadsheet_id,
valueInputOption='USER_ENTERED',
range=cell_range_insert,
body=body
).execute()
# Readback stock data from Google Sheets:
response = service.spreadsheets().values().get(
spreadsheetId=spreadsheet_id,
majorDimension='ROWS',
range='Sheet1'
).execute()
# Readback data from Google Sheets and assign it to dataframe df:
columns = response['values'][1]
data = response['values'][2:]
df = pd.DataFrame(data, columns=columns)
In your situation, how about the following flow.
Insert new sheets and put the formula to the cell "B7" to each new sheet using the batchUpdate method.
Retrieve values from each new sheet using the values.batchGet method.
When this flow is reflected in your script, it becomes as follows.
Modified script:
spreadsheet_id = "###" # Please set your Spreadsheet ID.
requests = []
Ticker_List = ["AAPL", "GOOG", "AMZN"]
start_sheet_id = 123456
for i, Ticker in enumerate(Ticker_List):
sheet_id = start_sheet_id + i
formula = '=GOOGLEFINANCE("' + str(Ticker) + '", "ALL", "1/1/2014", "2/6/2018" ,"DAILY")'
requests.extend(
[
{"addSheet": {"properties": {"title": Ticker, "sheetId": sheet_id}}},
{
"updateCells": {
"start": {"sheetId": sheet_id, "rowIndex": 6, "columnIndex": 1},
"rows": [
{
"values": [
{"userEnteredValue": {"formulaValue": formula}}
]
}
],
"fields": "userEnteredValue",
}
},
]
)
service.spreadsheets().batchUpdate(spreadsheetId=spreadsheet_id, body={"requests": requests}).execute()
time.sleep(5) # This is required to be used.
res = service.spreadsheets().values().batchGet(spreadsheetId=spreadsheet_id, ranges=[f"'{e}'!B7:G" for e in Ticker_List]).execute()
# Put the values to dataframe.
for e in res["valueRanges"]:
values = e["values"]
columns = values[0]
data = values[1:]
df = pd.DataFrame(data, columns=columns)
print(df)
When this script is run, 3 new sheets are inserted as the sheet names of "AAPL", "GOOG", "AMZN". And the formula is put to the cell "B7" of each inserted sheet. This is run with the batchUpdate method.
After the batchUpdate method was finished, 5 seconds are waiting. In this case, please adjust for your actual situation. 10 seconds might be suitable.
After the wait, the values are retrieved from each inserted sheet. And, the values are put into the dataframe.
Note:
In this script, 2 quotas of Sheets API are used.
About the dataframe, this script is from your showing script. So, please modify this for your actual situation.
References:
Method: spreadsheets.batchUpdate
Method: spreadsheets.values.batchGet

constructing a message format from the fetchall result in python

*New to Programming
Question: I need to use the below "Data" (two rows as arrays) queried from sql and use it to create the message structure below.
data from sql using fetchall()
Data = [[100,1,4,5],[101,1,4,6]]
##expected message structure
message = {
"name":"Tom",
"Job":"IT",
"info": [
{
"id_1":"100",
"id_2":"1",
"id_3":"4",
"id_4":"5"
},
{
"id_1":"101",
"id_2":"1",
"id_3":"4",
"id_4":"6"
},
]
}
I tried to create below method to iterate over the rows and then input the values, this is was just a starting, but this was also not working
def create_message(data)
for row in data:
{
"id_1":str(data[0][0],
"id_2":str(data[0][1],
"id_3":str(data[0][2],
"id_4":str(data[0][3],
}
Latest Code
def create_info(data):
info = []
for row in data:
temp_dict = {"id_1_tom":"","id_2_hell":"","id_3_trip":"","id_4_clap":""}
for i in range(0,1):
temp_dict["id_1_tom"] = str(row[i])
temp_dict["id_2_hell"] = str(row[i+1])
temp_dict["id_3_trip"] = str(row[i+2])
temp_dict["id_4_clap"] = str(row[i+3])
info.append(temp_dict)
return info
Edit: Updated answer based on updates to the question and comment by original poster.
This function might work for the example you've given to get the desired output, based on the attempt you've provided:
def create_info(data):
info = []
for row in data:
temp_dict = {}
temp_dict['id_1_tom'] = str(row[0])
temp_dict['id_2_hell'] = str(row[1])
temp_dict['id_3_trip'] = str(row[2])
temp_dict['id_4_clap'] = str(row[3])
info.append(temp_dict)
return info
For the input:
[[100, 1, 4, 5],[101,1,4,6]]
This function will return a list of dictionaries:
[{"id_1_tom":"100","id_2_hell":"1","id_3_trip":"4","id_4_clap":"5"},
{"id_1_tom":"101","id_2_hell":"1","id_3_trip":"4","id_4_clap":"6"}]
This can serve as the value for the key info in your dictionary message. Note that you would still have to construct the message dictionary.

API not accepting my JSON data from Python

I'm new to Python and dealing with JSON. I'm trying to grab an array of strings from my database and give them to an API. I don't know why I'm getting the missing data error. Can you guys take a look?
###########################################
rpt_cursor = rpt_conn.cursor()
sql="""SELECT `ContactID` AS 'ContactId' FROM
`BWG_reports`.`bounce_log_dummy`;"""
rpt_cursor.execute(sql)
row_headers=[x[0] for x in rpt_cursor.description] #this will extract row headers
row_values= rpt_cursor.fetchall()
json_data=[]
for result in row_values:
json_data.append(dict(zip(row_headers,result)))
results_to_load = json.dumps(json_data)
print(results_to_load) # Prints: [{"ContactId": 9}, {"ContactId": 274556}]
headers = {
'Content-Type': 'application/json',
'Accept': 'application/json',
}
targetlist = '302'
# This is for their PUT to "add multiple contacts to lists".
api_request_url = 'https://api2.xyz.com/api/list/' + str(targetlist)
+'/contactid/Api_Key/' + bwg_apikey
print(api_request_url) #Prints https://api2.xyz.com/api/list/302/contactid/Api_Key/#####
response = requests.put(api_request_url, headers=headers, data=results_to_load)
print(response) #Prints <Response [200]>
print(response.content) #Prints b'{"status":"error","Message":"ContactId is Required."}'
rpt_conn.commit()
rpt_cursor.close()
###########################################################
Edit for Clarity:
I'm passing it this [{"ContactId": 9}, {"ContactId": 274556}]
and I'm getting this response body b'{"status":"error","Message":"ContactId is Required."}'
The API doc gives this as the from to follow for the request body.
[
{
"ContactId": "string"
}
]
When I manually put this data in there test thing I get what I want.
[
{
"ContactId": "9"
},
{
"ContactId": "274556"
}
]
Maybe there is something wrong with json.dumps vs json.load? Am I not creating a dict, but rather a string that looks like a dict?
EDIT I FIGURED IT OUT!:
This was dumb.
I needed to define results_to_load = [] as a dict before I loaded it at results_to_load = json.dumps(json_data).
Thanks for all the answers and attempts to help.
I would recommend you to go and check the API docs to be specific, but from it seems, the API requires a field with the name ContactID which is an array, rather and an array of objects where every object has key as contactId
Or
//correct
{
contactId: [9,229]
}
instead of
// not correct
[{contactId:9}, {contactId:229}]
Tweaking this might help:
res = {}
contacts = []
for result in row_values:
contacts.append(result)
res[contactId] = contacts
...
...
response = requests.put(api_request_url, headers=headers, data=res)
I FIGURED IT OUT!:
This was dumb.
I needed to define results_to_load = [] as an empty dict before I loaded it at results_to_load = json.dumps(json_data).
Thanks for all the answers and attempts to help.

How to get all documents under an elasticsearch index with python client ?

I'm trying to get all index document using python client but the result show me only the first document
This is my python code :
res = es.search(index="92c603b3-8173-4d7a-9aca-f8c115ff5a18", doc_type="doc", body = {
'size' : 10000,
'query': {
'match_all' : {}
}
})
print("%d documents found" % res['hits']['total'])
data = [doc for doc in res['hits']['hits']]
for doc in data:
print(doc)
return "%s %s %s" % (doc['_id'], doc['_source']['0'], doc['_source']['5'])
try "_doc" instead of "doc"
res = es.search(index="92c603b3-8173-4d7a-9aca-f8c115ff5a18", doc_type="_doc", body = {
'size' : 100,
'query': {
'match_all' : {}
}
})
Elasticsearch by default retrieve only 10 documents. You could change this behaviour - doc here . The best practice for pagination are search after query and scroll query. It depends from your needs. Please read this answer Elastic search not giving data with big number for page size
To show all the results:
for doc in res['hits']['hits']:
print doc['_id'], doc['_source']
You can try the following query. It will return all the documents.
result = es.search(index="index_name", body={"query":{"match_all":{}}})
You can also use elasticsearch_dsl and its Search API which allows you to iterate over all your documents via the scan method.
import elasticsearch
from elasticsearch_dsl import Search
client = elasticsearch.Elasticsearch()
search = Search(using=client, index="92c603b3-8173-4d7a-9aca-f8c115ff5a18")
for hit in search.scan():
print(hit)
I dont see mentioned that the index must be refreshed if you just added data. Use this:
es.indices.refresh(index="index_name")

Dynamically create list/dict during for loop for conversion to JSON

I am trying to build a list/dict that will be converted to JSON later on. I am trying to write the code that builds and populates the multiple levels of the JSON format I ultimately need. I am having an issue wrapping my head around this. Thank you for the help.
What I ultimately need -> Populate this list/dict:
dataset_permission_json = []
with this format:
{
"projects":[
{
"project":"test-project-1",
"datasets":[
{
"dataset":"testing1",
"permissions":[
{
"role":"READER",
"google_group":"testing1#test.com"
}
]
},
{
"dataset":"testing2",
"permissions":[
{
"role":"OWNER",
"google_group":"testing2#test.com"
}
]
},
{
"dataset":"testing3",
"permissions":[
{
"role":"READER",
"google_group":"testing3#test.com"
}
]
},
{
"dataset":"testing4",
"permissions":[
{
"role":"WRITER",
"google_group":"testing4#test.com"
}
]
}
]
}
]
}
I have multiple for loops that successfully print out the information I am pulling from an external API but I to be able to enter that data into the list/dict. The dynamic values I am trying to input are:
'project' i.e. test-project-1
'dataset' i.e. testing1
'role' i.e. READER
'google_group' i.e. testing1#test.com
I have tried things like:
dataset_permission_json.update({'project': project})
but cannot figure out how not to overwrite the data during the multiple for loops.
for project in projects:
print(project) ## Need to add this variable to 'projects'
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
print(dataset['datasetReference']['datasetId']) ##Add the dataset to 'datasets' under the project
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
### ADD THE NEXT LEVEL 'permissions' level here?
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
print(dataset['datasetReference']['datasetId'] && dataset_permission['groupByEmail']) ##Add to each dataset
I appreciate the help.
EDIT: Updated Progress
Ok I have created the nested structure that I was looking for using StackOverflow
Things are great except for the last part. I am trying to append the role & group to each 'permission' nest, but after everything runs the data is only appended to the last 'permission' nest in the JSON structure. It seems like it is overwriting itself during the for loop. Thoughts?
Updated for loop:
for project in projects:
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['permissions'] = permission
UPDATE: Solved.
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions'] = permission

Categories

Resources