Problem with continuation token when querying from Cosmos DB

Problem with continuation token when querying from Cosmos DB - python

I'm facing a problem with continuation when querying items from CosmosDB.
I've already tried the following solution but with no success. I'm only able to query the first 10 results of a page even though I get a token that is not NULL.
The token has a size of 10733 bytes and looks like this.
{"token":"+RID:gtQwAJ9KbavOAAAAAAAAAA==#RT:1#TRC:10#FPP:AggAAAAAAAAAAGoAAAAAKAAAAAAAAAAAAADCBc6AEoAGgAqADoASgAaACoAOgBKABoAKgA6AE4AHgAuAD4ASgAeACoAPgBOAB4ALgA+AE4AHgAqAD4ASgAeAC4APgBOAB4ALgA+AE4AIgA2AEYAFgAmADYARgAaACYAPgBKABYAKgA6AE4AHgAuAD4ATgAeAC4APgBOAB4ALgA+AE4AIgAuAD4ATgAeAC4APgBOACIAMgA+AFIAIgAyAD4AUgAmADIAQgAWACIALgBCABIAIgAyAEIAEgAiADIAQgAOACYANgBKAB4AJgA6AEYAGgAqADoATgAeAC4APgB....etc...etc","range":{"min":"","max":"05C1BF3FB3CFC0"}}
Code looks like this. Function QueryDocuments did not work. Instead I had to use QueryItems.
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 10
q = client.QueryItems(collection_link, query, options)
results_1 = q._fetch_function(options)
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
data = list(q._fetch_function({'maxItemCount':10,'enableCrossPartitionQuery':True, 'continuation':token}))
Is there a solution to this? Thanks for your help.

Please use pydocumentdb package and refer to below sample code.
from pydocumentdb import document_client
endpoint = "https://***.documents.azure.com:443/";
primaryKey = "***";
client = document_client.DocumentClient(endpoint, {'masterKey': primaryKey})
collection_link = "dbs/db/colls/coll"
query = "select c.id from c"
query_with_optional_parameters = [];
q = client.QueryDocuments(collection_link, query, {'maxItemCount': 2})
results_1 = q._fetch_function({'maxItemCount': 2})
print(results_1)
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount': 2, 'continuation': token})
print(results_2)
Output:

Related

Elasticsearch DSL update by scan?

Is it possible to update with the scan() method from Python's elasticsearch-dsl library?
For example:
search = Search(index=INDEX).query(query).params(request_timeout=6000)
print('Scanning Query and updating.')
for hit in search.scan():
_id = hit.meta.id
# sql query to get the record from th database
record = get_record_by_id(_id)
hit.column1 = record['column1']
hit.column2 = record['column2']
# not sure what use to update here

This can be done by using the elasticsearch client:
for hit in search.scan():
_id = hit.meta.id
print(_id)
# get data from database
record = get_record_by_id(_id)
body = dict(record)
body.pop('_id')
# update the elastichsearch client
elastic_client.update(
index=INDEX,
id=_id,
body={"doc": body}
)

Orator ORM model Create method invalid SQL

I have a database I created with a migration. One of my tables looks like this
def create_customer_table(self):
with self.schema.create("customer") as table:
table.char("name",120).unique()
table.integer("transmitting_hours").default(24) #how many hours after transmission vehicle is considered transmitting
table.boolean("is_tpms").default(False)
table.boolean("is_dor").default(False)
table.boolean("is_otr").default(False)
table.boolean("is_track_and_trace").default(False)
table.char("contact_person",25)
table.char("created_by",25)
table.enum("temperature_unit",TEMP_UNITS)
table.enum("pressure_unit",PRESSURE_UNITS)
table.enum("distance_unit",DISTANCE_UNITS)
table.char("time_zone",25)
table.char("language",2)
table.timestamps()
I have a very simplistic ORM model on top
class Customer(Model):
__table__ = "customer"
__timestamps__ = False
__primary_key__ = "name"
__fillable__ = ['*']
I then try to do a basic insert with the following code
def add_sample_customer():
sample_customer = {}
sample_customer["name"] = "customer_2"
sample_customer["contact_person"] = "Abradolf"
sample_customer["created_by"] = "Frodo"
sample_customer["time_zone"] = "GMT-5"
sample_customer["language"] = "EN"
sample_customer["temperature_unit"] = "FAHRENHEIT"
sample_customer["pressure_unit"] = "PSI"
sample_customer["distance_unit"] = "MI"
customer_model = Customer.create(_attributes = sample_customer)
The exception I get from this code looks like
orator.exceptions.query.QueryException: syntax error at or near ")"
LINE 1: INSERT INTO "customer" () VALUES () RETURNING "name"
(SQL: INSERT INTO "customer" () VALUES () RETURNING "name" ([]))
it looks like orator just isn't filling in the cols and vals here. I have also tried it with a few different syntactic ways of dropping the dict in there, using **sample_customer and also just putting the dict in directly and none of them work, all with the same exception. I started debugging by printing stuff out of the orator libraries but haven't gotten anywhere yet.
my inserts work if I do the model attribute assignment individually and use the model.save() method like this
def add_sample_customer():
sample_customer = {}
sample_customer["name"] = "customer_2"
sample_customer["contact_person"] = "Abradolf"
sample_customer["created_by"] = "Frodo"
sample_customer["time_zone"] = "GMT-5"
sample_customer["language"] = "EN"
sample_customer["temperature_unit"] = "FAHRENHEIT"
sample_customer["pressure_unit"] = "PSI"
sample_customer["distance_unit"] = "MI"
customer_model = Customer()
for k,v in sample_customer.items():
setattr(customer_model,k,v)
customer_model.save()
Does anyone understand why the model.create() syntax fails?

I would think the answer would be:
Simply passing the dictionary instead of using keyword notation with attributes:
Customer.create(sample_customer)
or
Customer.create(attribute=value,attribute2=value2,..etc)
Which are the valid notations

Cosmos DB - Delete Document with Python

In this SO question I had learnt that I cannot delete a Cosmos DB document using SQL.
Using Python, I believe I need the DeleteDocument() method. This is how I'm getting the document ID's that are required (I believe) to then call the DeleteDocument() method.
# set up the client
client = document_client.DocumentClient()
# use a SQL based query to get a bunch of documents
query = { 'query': 'SELECT * FROM server s' }
result_iterable = client.QueryDocuments('dbs/DB/colls/coll', query, options)
results = list(result_iterable);
for x in range(0, len (results)):
docID = results[x]['id']
Now, at this stage I want to call DeleteDocument().
The inputs into which are document_link and options.
I can define document_link as something like
document_link = 'dbs/DB/colls/coll/docs/'+docID
And successfully call ReadAttachments() for example, which has the same inputs as DeleteDocument().
When I do however, I get an error...
The partition key supplied in x-ms-partitionkey header has fewer
components than defined in the the collection
...and now I'm totally lost
UPDATE
Following on from Jay's help, I believe I'm missing the partitonKey element in the options.
In this example, I've created a testing database, it looks like this
So I think my partition key is /testPART
When I include the partitionKey in the options however, no results are returned, (and so print len(results) outputs 0).
Removing partitionKey means that results are returned, but the delete attempt fails as before.
# Query them in SQL
query = { 'query': 'SELECT * FROM c' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
options['partitionKey'] = '/testPART'
result_iterable = client.QueryDocuments('dbs/testDB/colls/testCOLL', query, options)
results = list(result_iterable)
# should be > 0
print len(results)
for x in range(0, len (results)):
docID = results[x]['id']
print docID
client.DeleteDocument('dbs/testDB/colls/testCOLL/docs/'+docID, options=options)
print 'deleted', docID

According to your description, I tried to use pydocument module to delete document in my azure document db and it works for me.
Here is my code:
import pydocumentdb;
import pydocumentdb.document_client as document_client
config = {
'ENDPOINT': 'Your url',
'MASTERKEY': 'Your master key',
'DOCUMENTDB_DATABASE': 'familydb',
'DOCUMENTDB_COLLECTION': 'familycoll'
};
# Initialize the Python DocumentDB client
client = document_client.DocumentClient(config['ENDPOINT'], {'masterKey': config['MASTERKEY']})
# use a SQL based query to get a bunch of documents
query = { 'query': 'SELECT * FROM server s' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
result_iterable = client.QueryDocuments('dbs/familydb/colls/familycoll', query, options)
results = list(result_iterable);
print(results)
client.DeleteDocument('dbs/familydb/colls/familycoll/docs/id1',options)
print 'delete success'
Console Result:
[{u'_self': u'dbs/hitPAA==/colls/hitPAL3OLgA=/docs/hitPAL3OLgABAAAAAAAAAA==/', u'myJsonArray': [{u'subId': u'sub1', u'val': u'value1'}, {u'subId': u'sub2', u'val': u'value2'}], u'_ts': 1507687788, u'_rid': u'hitPAL3OLgABAAAAAAAAAA==', u'_attachments': u'attachments/', u'_etag': u'"00002100-0000-0000-0000-59dd7d6c0000"', u'id': u'id1'}, {u'_self': u'dbs/hitPAA==/colls/hitPAL3OLgA=/docs/hitPAL3OLgACAAAAAAAAAA==/', u'myJsonArray': [{u'subId': u'sub3', u'val': u'value3'}, {u'subId': u'sub4', u'val': u'value4'}], u'_ts': 1507687809, u'_rid': u'hitPAL3OLgACAAAAAAAAAA==', u'_attachments': u'attachments/', u'_etag': u'"00002200-0000-0000-0000-59dd7d810000"', u'id': u'id2'}]
delete success
Please notice that you need to set the enableCrossPartitionQuery property to True in options if your documents are cross-partitioned.
Must be set to true for any query that requires to be executed across
more than one partition. This is an explicit flag to enable you to
make conscious performance tradeoffs during development time.
You could find above description from here.
Update Answer:
I think you misunderstand the meaning of partitionkey property in the options[].
For example , my container is created like this:
My documents as below :
{
"id": "1",
"name": "jay"
}
{
"id": "2",
"name": "jay2"
}
My partitionkey is 'name', so here I have two paritions : 'jay' and 'jay1'.
So, here you should set the partitionkey property to 'jay' or 'jay2',not 'name'.
Please modify your code as below:
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
options['partitionKey'] = 'jay' (please change here in your code)
result_iterable = client.QueryDocuments('dbs/db/colls/testcoll', query, options)
results = list(result_iterable);
print(results)
Hope it helps you.

Using the azure.cosmos library:
install and import azure cosmos package:
from azure.cosmos import exceptions, CosmosClient, PartitionKey
define delete items function - in this case using the partition key in query:
def deleteItems(deviceid):
client = CosmosClient(config.cosmos.endpoint, config.cosmos.primarykey)
# Create a database if not exists
database = client.create_database_if_not_exists(id=azure-cosmos-db-name)
# Create a container
# Using a good partition key improves the performance of database operations.
container = database.create_container_if_not_exists(id=container-name, partition_key=PartitionKey(path='/your-pattition-path'), offer_throughput=400)
#fetch items
query = f"SELECT * FROM c WHERE c.device.deviceid IN ('{deviceid}')"
items = list(container.query_items(query=query, enable_cross_partition_query=False))
for item in items:
container.delete_item(item, 'partition-key')
usage:
deviceid=10
deleteItems(items)
github full example here: https://github.com/eladtpro/python-iothub-cosmos

How to prevent triples from getting mixed up while uploading to Dydra programmatically?

I am trying to upload some data to Dydra from a Sesame triplestore I have on my computer. While the download from Sesame works fine, the triples get mixed up (the s-p-o relationships change as the object of one becomes object of another). Can someone please explain why this is happening and how it can be resolved? The code is below:
#Querying the triplestore to retrieve all results
sesameSparqlEndpoint = 'http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name'
sparql = SPARQLWrapper(sesameSparqlEndpoint)
queryStringDownload = 'SELECT * WHERE {?s ?p ?o}'
dataGraph = Graph()
sparql.setQuery(queryStringDownload)
sparql.method = 'GET'
sparql.setReturnFormat(JSON)
output = sparql.query().convert()
print output
for i in range(len(output['results']['bindings'])):
#The encoding is necessary to parse non-English characters
output['results']['bindings'][i]['s']['value'].encode('utf-8')
try:
subject_extract = output['results']['bindings'][i]['s']['value']
if 'http' in subject_extract:
subject = "<" + subject_extract + ">"
subject_url = URIRef(subject)
print subject_url
predicate_extract = output['results']['bindings'][i]['p']['value']
if 'http' in predicate_extract:
predicate = "<" + predicate_extract + ">"
predicate_url = URIRef(predicate)
print predicate_url
objec_extract = output['results']['bindings'][i]['o']['value']
if 'http' in objec_extract:
objec = "<" + objec_extract + ">"
objec_url = URIRef(objec)
print objec_url
else:
objec = objec_extract
objec_wip = '"' + objec + '"'
objec_url = URIRef(objec_wip)
# Loading the data on a graph
dataGraph.add((subject_url,predicate_url,objec_url))
except UnicodeError as error:
print error
#Print all statements in dataGraph
for stmt in dataGraph:
pprint.pprint(stmt)
# Upload to Dydra
URL = 'http://dydra.com/login'
key = 'my_key'
with requests.Session() as s:
resp = s.get(URL)
soup = BeautifulSoup(resp.text,"html5lib")
csrfToken = soup.find('meta',{'name':'csrf-token'}).get('content')
# print csrf_token
payload = {
'account[login]':key,
'account[password]':'',
'csrfmiddlewaretoken':csrfToken,
'next':'/'
}
# print payload
p = s.post(URL,data=payload, headers=dict(Referer=URL))
# print p.text
r = s.get('http://dydra.com/username/rep_name/sparql')
# print r.text
dydraSparqlEndpoint = 'http://dydra.com/username/rep_name/sparql'
for stmt in dataGraph:
queryStringUpload = 'INSERT DATA {%s %s %s}' % stmt
sparql = SPARQLWrapper(dydraSparqlEndpoint)
sparql.setCredentials(key,key)
sparql.setQuery(queryStringUpload)
sparql.method = 'POST'
sparql.query()

A far simpler way to copy your data over (apart from using a CONSTRUCT query instead of a SELECT, like I mentioned in the comment) is simply to have Dydra itself directly access your Sesame endpoint, for example via a SERVICE-clause.
Execute the following on your Dydra database, and (after some time, depending on how large your Sesame database is), everything will be copied over:
INSERT { ?s ?p ?o }
WHERE {
SERVICE <http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name>
{ ?s ?p ?o }
}
If the above doesn't work on Dydra, you can alternatively just directly access the RDF statements from your Sesame store by using the URI http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements. Assuming Dydra has an upload-feature where you can provide the URL of an RDF document, you can simply provide it the above URI and it should be able to load it.

The code above can work if the following changes are made:
Use CONSTRUCT query instead of SELECT. Details here -> How to iterate over CONSTRUCT output from rdflib?
Use key as input for both account[login] and account[password]
However, this is probably not the most efficient way. Primarily, doing individual INSERTs for every triple is not a good way. Dydra doesn't record all statements this way (I got only about 30% of the triples inserted). On the contrary, using the http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements method as suggested by Jeen enabled me to port all the data successfully.

Select creatives of type TemplateCreatives with DoubleClick DFP

How can I query for creatives of Type TemplateCreative with the DoubleClick DFP API
I am able to query for creatives with the DFP API by using PQL and the Python Google Ads API Library.
Like this:
from googleads import dfp
client = dfp.DfpClient.LoadFromStorage('./googleads.yaml')
creative_service = client.GetService('CreativeService', version='v201502')
creative_query = '' creative_statement = dfp.FilterStatement(creative_query, creative_values)
while True: response = creative_service.getCreativesByStatement(
creative_statement.ToStatement())
if 'results' in response:
#do your thing
creative_statement.offset += dfp.SUGGESTED_PAGE_LIMIT
This will return all creatives. It works as advertised! In my case a combination of ImageCreative, CustomCreative and TemplateCreative
When I use the query to select only ImageCreatives. It also works!
Like this:
from googleads import dfp
client = dfp.DfpClient.LoadFromStorage('./googleads.yaml')
creative_service = client.GetService('CreativeService', version='v201502')
creative_values = [{ 'key': 'creativeType', 'value'::{
'xsi_type': 'TextValue', 'value': 'ImageCreative' } }]
creative_query = 'WHERE creativeType = :creativeType'
creative_statement = dfp.FilterStatement(creative_query, creative_values)
    while True: response = creative_service.getCreativesByStatement(
creative_statement.ToStatement())
if 'results' in response:
#do your thing
creative_statement.offset += dfp.SUGGESTED_PAGE_LIMIT
If I search for CustomCreative it also works. However I don't seem to be able to query for TemplateCreative

Answering my own question: It seems that the DFP API does not support this: Getting creatives by type TemplateCreative
The solution I have adopted is getting all Creatives and parse them. Which sucks. Like this:
for creative in response['results']:
if "TemplateCreative" in str(creative.__class__):

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problem with continuation token when querying from Cosmos DB - python

Related

Elasticsearch DSL update by scan?

Orator ORM model Create method invalid SQL

Cosmos DB - Delete Document with Python

How to prevent triples from getting mixed up while uploading to Dydra programmatically?

Select creatives of type TemplateCreatives with DoubleClick DFP

Categories

Resources