Paginating Azure CosmosDB with Python SDK using continuation tokens - python

I am trying to implement pagination using APIs with Azure Cosmos Python SDK. From what I have read and understand, we need continuation tokens. However, I cannot find any function in the SDK documentation here, that would consume the token and return the remaining data from the queries. My flow currently:
Initialize CosmosClient
Get database object
Get container object
Query the container, set max_count_size=1
Get Paged response, send it as a response to the API call
Now if I want the next page from the query, where do I pass the continuation token so that I can get the next page based on the previous query for API call?
from azure.cosmos import exceptions, CosmosClient, PartitionKey
endpoint = "https://xxxxxxxx.documents.azure.com:443/"
key = '===xxxx===xxxx===xxx'
client = CosmosClient(endpoint, key)
database_name = 'test'
database = client.create_database_if_not_exists(id=database_name)
container_name = 'FamilyContainer'
container = database.get_container_client(container_name)
query = "SELECT * FROM c "
items = container.query_items(
query=query,
max_item_count=1,
enable_cross_partition_query=True
)
pager = items.by_page()
first_page = list(pager.next())
print("first page: ", first_page )
Now, if I want the next page in another API call, where do I pass the continuation token?
Azure SDK versions:
$ pip freeze | grep azure
azure-core==1.9.0
azure-cosmos==4.2.0
azure-nspkg==3.0.2
azure-storage-blob==12.6.0
azure-storage-nspkg==3.1.0

Here is an example on how to use it.
Here is the README file for the SDK, with lots of tips and valuable information like limitations.

Related

Receive HTTP request with Variable, Query BQ and Return Response

I'm looking to create a Cloud Function in GCP that receives a HTTP request with parameters, takes the parameters and passes them to Bigquery within SQL statement and returns a result that I can pass back to website.
I am very new at this and I am not an engineer by any stretch. I have got to the point where my Cloud Function deploys correctly and I receive a response "OK" within the browser
When I hit it but can't get the values returned from BQ to show on browser.
Here's my function so far and thanks for any help in advance.
import google.cloud.bigquery
def audience(QUERY):
# BQ Query to get add to cart sessions
QUERY = """select
visitId,
from bigquery-public-data.google_analytics_sample.ga_sessions_20170801
limit 10;
return QUERY"""
print(audience)
This is an example of a Cloud Function that runs exactly the query that you mention on your post. Nonetheless, it could be adapted to any other query very easily according to your needs. You basically need to follow this tutorial in order to deploy the function and get a basic understanding as to how to query data using the Client Library for BigQuery.
Here is the summary of what you need to do:
Create a folder (e.g. cloudfunctionsexample) and use cd [FOLDERNAME e.g. cloudfunctionsexample] to get inside the folder and inside the folder create two files: main.py and requirements.txt.
a. main.py :
from flask import escape
from google.cloud import bigquery
client = bigquery.Client()
def bigquery_example(request):
request_json = request.get_json(silent=True)
request_args = request.args
#Check if request have all the correct parameters to run the query
if request_json and 'column' in request_json:
column = request_json['column']
elif request_args and 'column' in request_args:
column = request_args['column']
else:
return('You are missing the column parameter on the request.')
if request_json and 'name' in request_json:
name = request_json['name']
elif request_args and 'name' in request_args:
name = request_args['name']
else:
return('You are missing the name of the dataset parameter on the request.')
if request_json and 'limit' in request_json:
limit = request_json['limit']
elif request_args and 'limit' in request_args:
limit = request_args['limit']
else:
return('You are missing the limit parameter on the request.')
#Construct the query based on the parameters
QUERY = ('SELECT '+column+' FROM `'+name+'` LIMIT '+limit)
#print(QUERY)
try:
query_job = client.query(QUERY) # API request
rows = query_job.result() # Waits for query to finish
# Create a list and make the results HTML compatible to be able to be displayed on the browser.
row_list = []
for row in rows:
row_list.append(str(row[column]))
return("<p>" + "</p><p>".join(row_list) + "</p>")
except e:
return(e)
b. requirements.txt :
flask
google-cloud-bigquery
(Assuming you have the Cloud SDK installed) and that your make sure that the App Engine default service account (which is the default account used by Cloud Functions) has the Editor role assigned run the following command to deploy the function on your project:
gcloud functions deploy bigquery_http_example --runtime python37 --trigger-http --allow-unauthenticated --entry-point=bigquery_example --timeout=540
Get the Cloud Function URL and use either the curl command to make a POST request or simply add the parameters to the Cloud Function URL to make an HTTP request to the Cloud Function endpoint and see the results directly on your browser.
a. curl :
curl -X POST https://[REGION-FUNCTIONS_PROJECT_ID].cloudfunctions.net/bigquery_http_example -H "Content-Type:application/json" -d '{"column":"visitId","name":"bigquery-public-data.google_analytics_sample.ga_sessions_20170801","limit":"10"}'
b. Cloud Function URL :
https://[REGION-FUNCTIONS_PROJECT_ID].cloudfunctions.net/bigquery_http_example?column=visitId&name=bigquery-public-data.google_analytics_sample.ga_sessions_20170801&limit=10

Where would I receive http request in AWS

Am beginner to Amazon web services.
I have a below lambda python function
import sys
import logging
import pymysql
import json
rds_host=".amazonaws.com"
name="name"
password="123"
db_name="db"
port = 3306
def save_events(event):
result = []
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name,
connect_timeout=30)
with conn.cursor(pymysql.cursors.DictCursor) as cur:
cur.execute("select * from bodyPart")
result = cur.fetchall()
cur.close()
print ("Data from RDS...")
print (result)
cur.close()
bodyparts = json.dumps(result)
bodyParts=(bodyparts.replace("\"", "'"))
def lambda_handler(event, context):
save_events(event)
return bodyParts
using an above function am sending json to the client using API gateway, now suppose user selects an item from the list and send it back, in form of json where will i get http request and how should i process that request
I just made an additional information for #Harsh Manvar.
The easiest way I think is you can use
api-gateway-proxy-integration-lambda
Currently API Gateway support AWS lambda very good, you can pass request body (json) by using event.body to your lambda function.
I used it everyday in my hobby project (a Slack command bot, it is harder because you need to map from application/x-www-form-urlencoded to json through mapping template)
And for you I think it is simple because you using only json as request and response. The key is you should to select Integratiton type to Lambda function
You can take some quick tutorials in Medium.com for more detail, I only link the docs from Amazon.
#mohith: Hi man, I just made a simple approach for you, you can see it here.
The first you need to create an API (see the docs above) then link it to your Lambda function, because you only use json, so you need to check the named Use Lambda Proxy integration like this:
Then you need to deploy it!
Then in your function, you can handle your code, in my case, I return all the event that is passed to my function like this:
Finally you can post to your endpoint, I used postman in my case:
I hope you get my idea, when you successfully deployed your API then you can do anything with it in your front end.
I suggest you research more about CloudWatch, when you work with API Gateway, Lambda, ... it is Swiss army knife, you can not live without it, it is very easy for tracing and debug your code.
Please do not hesitate to ask me anything.
you can use aws service called API-gateway it will give you endpoint for http api requests.
this api gateway make connection with your lambda and you can pass http request to lambda.
here sharing info about creating rest api on lambda you can check it out : https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-create-api.html
aws also provide example for lambda GET, POST lambda example.you just have to edit code it will automatically make api-gateway.as reference you can check it.
From Lambda Console > create function > choose AWS serverless repository > in search bar type "get" and search > api-lambda-dynamodb > it will take value from user and process in lambda.
here sharing link you can direct check examples : https://console.aws.amazon.com/lambda/home?region=us-east-1#/create?tab=serverlessApps

How to query AWS CloudSearch domain using Python boto3 library?

I'm trying to use boto3 to query my CloudSearch domain using the docs as a guide: http://boto3.readthedocs.io/en/latest/reference/services/cloudsearchdomain.html#client
import boto3
import json
boto3.setup_default_session(profile_name='myprofile')
cloudsearch = boto3.client('cloudsearchdomain')
response = cloudsearch.search(
query="(and name:'foobar')",
queryParser='structured',
returnFields='address',
size=10
)
print( json.dumps(response) )
...but it fails with:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://cloudsearchdomain.eu-west-1.amazonaws.com/2013-01-01/search"
But how am I supposed to set or configure the endpoint or domain that I want to connect to? I tried adding an endpoint parameter to the request, thinking maybe it was an accidental omission from the docs, but I got this error response:
Unknown parameter in input: "endpoint", must be one of: cursor, expr, facet, filterQuery, highlight, partial, query, queryOptions, queryParser, return, size, sort, start, stats
The docs say:
The endpoint for submitting Search requests is domain-specific. You submit search requests to a domain's search endpoint. To get the search endpoint for your domain, use the Amazon CloudSearch configuration service DescribeDomains action. A domain's endpoints are also displayed on the domain dashboard in the Amazon CloudSearch console.
I know what my search endpoint is, but how do I supply it?
I found a post on a Google forum with the answer. You have to add the endpoint_url parameter into the client constructor e.g.
client = boto3.client('cloudsearchdomain', endpoint_url='http://...')
I hope those docs get updated, because I wasted a lot of time before I figured that out.
import boto3
client = boto3.client('cloudsearchdomain',
aws_access_key_id= 'access-key',
aws_secret_access_key= 'some-secret-key',
region_name = 'us-east-1', # your chosen region
endpoint_url= 'cloudsearch-url'
# endpoint_url is your Search Endpoint as defined in AWS console
)
response = client.search(
query='Foo', # your search string
size = 10
)
Reference response['hits'] for returned results.

How to modify API gateway integration request using Boto3

I have created an api gateway from my existing api using boto3 import command.
apiClient = boto3.client('apigateway', awsregion)
api_response=apiClient.import_rest_api
(
failOnWarnings=True,
body=open('apifileswagger.json', 'rb').read()
)
But i cant modify integration request. I tried with following Boto3 command.
apiClient = boto3.client('apigateway', awsregion)
api_response=apiClient.put_integration
(
restApiId=apiName,
resourceId='/api/v1/hub',
httpMethod='GET',
integrationHttpMethod='GET',
type='AWS',
uri='arn:aws:lambda:us-east-1:141697213513:function:test-lambda',
)
But I got error like this
Unexpected error: An error occurred () when calling the PutIntegration operation:
I need to change lambda function region & name using Boto3 command. is it possible? .
if it is possible what is the actual issue with this command?
In the put_integration() call listed above, your restApiId and resourceId look incorrect. Here's what you should do.
After importing your rest API, check to see if it is available by calling your apiClient's get_rest_apis(). If the API was imported correctly, you should see it listed in the response along with the API's ID (which is generated by AWS). Capture this ID for future operations.
Next, you'll need to look at all of the resources associated with this API by calling your apiClient's get_resources(). Capture the resource ID for the resource you wish to modify.
Using the API ID and resource ID, check to see if an integration config exists by calling your apiClient's get_integration(). If it does exist you can modify the integration request by calling update_integration(); if it does not exist, you need to create a new integration by calling put_integration() and passing the integration request as a parameter.
Here's an example of how that might look in code:
# Import API
api_response1 = apiClient.import_rest_api(failOnWarnings=True, body=open('apifileswagger.json', 'rb').read())
print(api_response1)
# Get API ID
api_response2 = apiClient.get_rest_apis()
for endpoint in api_response2['items']:
if endpoint['name'] == "YOUR_API_NAME":
api_ID = endpoint['id']
# Get Resource ID
api_response3 = apiClient.get_resources(restApiId=api_ID)
for resource in api_response3['items']:
if resource['path'] == "YOUR_PATH":
resource_ID = resource['id']
# Check for Existing Integrations
api_response4 = apiClient.get_integration(restApiId=api_ID, resourceId=resource_ID , httpMethod='GET')
print(api_response4)
# Create Integration with Request
integration_request = { 'application/json': '{\r\n "body" : $input.json(\'$\'),\r\n}' }
api_response5 = apiClient.put_integration(restApiId=api_ID, resourceId=resource_ID , httpMethod='GET', type='AWS',
integrationHttpMethod='GET', uri="YOUR_LAMBDA_URI", requestTemplates=integration_request)
print(api_response5)
All the methods listed above are explained in the Boto3 Documentation found here.
As with most API Gateway updates to API definitions, in order to update an integration request, you have to do a PATCH and pass a body with a patch document using the expected format. See documentation here

How to use Bigquery streaming insertall on app engine & python

I would like to develop an app engine application that directly stream data into a BigQuery table.
According to Google's documentation there is a simple way to stream data into bigquery:
http://googlecloudplatform.blogspot.co.il/2013/09/google-bigquery-goes-real-time-with-streaming-inserts-time-based-queries-and-more.html
https://developers.google.com/bigquery/streaming-data-into-bigquery#streaminginsertexamples
(note: in the above link you should select the python tab and not Java)
Here is the sample code snippet on how streaming insert should be coded:
body = {"rows":[
{"json": {"column_name":7.7,}}
]}
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body=body).execute()
Although I've downloaded the client api I didn't find any reference to a "bigquery" module/object referenced in the above Google's example.
Where is the the bigquery object (from snippet) should be located?
Can anyone show a more complete way to use this snippet (with the right imports)?
I've Been searching for that a lot and found documentation confusing and partial.
Minimal working (as long as you fill in the right ids for your project) example:
import httplib2
from apiclient import discovery
from oauth2client import appengine
_SCOPE = 'https://www.googleapis.com/auth/bigquery'
# Change the following 3 values:
PROJECT_ID = 'your_project'
DATASET_ID = 'your_dataset'
TABLE_ID = 'TestTable'
body = {"rows":[
{"json": {"Col1":7,}}
]}
credentials = appengine.AppAssertionCredentials(scope=_SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery = discovery.build('bigquery', 'v2', http=http)
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body=body).execute()
print response
As Jordan says: "Note that this uses the appengine robot to authenticate with BigQuery, so you'll to add the robot account to the ACL of the dataset. Note that if you also want to use the robot to run queries, not just stream, you need the robot to be a member of the project 'team' so that it is authorized to run jobs."
Here is a working code example from an appengine app that streams records to a BigQuery table. It is open source at code.google.com:
http://code.google.com/p/bigquery-e2e/source/browse/sensors/cloud/src/main.py#124
To find out where the bigquery object comes from, see
http://code.google.com/p/bigquery-e2e/source/browse/sensors/cloud/src/config.py
Note that this uses the appengine robot to authenticate with BigQuery, so you'll to add the robot account to the ACL of the dataset.
Note that if you also want to use the robot to run queries, not just stream, you need to robot to be a member of the project 'team' so that it is authorized to run jobs.

Categories

Resources