Problem of size for my query to ElasticSearch

Problem of size for my query to ElasticSearch - python

I make a request to my ES in Python but I can only get 10,000 data. I will need to recover many more ( several hundred thousand).
I've modified the "size" variable but it can't go over 10.000
res_cpe = es.search(index=cpe_index, doc_type="entries", body = {
'size' : 10000,
'query': {
'match_all' : {}
}
})
I would like to have all entries in my "res_cpe" variable

You should try to use Scroll API which should help you to retrieve large numbers of results (or even all results, like in your case).
This functionality is similar to cursors from a traditional databases.
All you need to do, is to add scroll param to your request in Python client. Minimum viable example could look like this:
page = es.search(
index = 'yourIndex',
doc_type = 'yourType',
scroll = '2m',
search_type = 'query_then_fetch',
size = 1000,
body = {
//Your query's body
})
sid = page['_scroll_id']
scroll_size = page['hits']['total']
//Start scrolling
while (scroll_size > 0):
print "Scrolling..."
page = es.scroll(scroll_id = sid, scroll = '2m')
//Update the scroll ID
sid = page['_scroll_id']
//Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
print "scroll size: " + str(scroll_size)
//Do something with the obtained page
Example taken from here - https://gist.github.com/drorata/146ce50807d16fd4a6aa
Python client docs reference - https://elasticsearch-py.readthedocs.io/en/master/api.html

Related

I have a problem in Trading with Kucoin API (Python)

I tried to do API trading in Kucoin. I developed a bot that is finding the trading opportunities well, while I encountered problems in making new orders. Pls check the code and help me to make it functional.
The code was edited in accordance with the comment of Lev Levitsky
is as follows:
import json
import urllib
import requests
import base64
import hmac
import hashlib
api_key = 'api_key'
api_secret = 'api_secret'
api_passphrase = 'api_passphrase'
base_uri = 'https://api-futures.kucoin.com'
endpoint = '/api/v1/orders?symbol=MATICUSDTM'
method = 'POST'
x= {}
x["symbol"] = "MATICUSDTM"
x["signal_type"] = "SHORT"
x["leverage"] = 5
x["exchange"] = "Kucoin"
x["entrance_price"] = 2.1000
x["trading_size"] = 150
x["tp1"] = 2.08
x["sl1"] = 2.12
all_futures_signals = list()
all_futures_signals.append(x)
def get_headers(method, endpoint, api_key, api_passphrase,body):
api_secret = ''
now = int(time.time() * 1000)
str_to_sign = str(now) + method + endpoint + str(body)
signature = base64.b64encode(hmac.new(api_secret.encode('utf-8'), str_to_sign.encode('utf-8'), hashlib.sha256).digest())
passphrase = base64.b64encode(hmac.new(api_secret.encode('utf-8'), api_passphrase.encode('utf-8'), hashlib.sha256).digest())
return {'KC-API-KEY': api_key,
'KC-API-KEY-VERSION': '2',
'KC-API-PASSPHRASE': passphrase,
'KC-API-SIGN': signature,
'KC-API-TIMESTAMP': str(now)}
body = {
"clientOid" : "",
"reduceOnly" : False, # A mark to reduce the position size only
"closeOrder" : False, # If closeOrder is set to TRUE, the system will close the position and the position size will become 0. Side, Size and Leverage fields can be left empty and the system will determine the side and size automatically.
"forceHold" : False, # The system will forcely freeze certain amount of funds for this order, including orders whose direction is opposite to the current positions. This feature is to ensure that the order won’t be canceled by the matching engine in such a circumstance that not enough funds are frozen for the order.
"hidden" : False, # A hidden order will enter but not display on the orderbook.
"iceberg" : False, # When placing an iceberg order, you need to set the visible size. The minimum visible size is 1/20 of the order size. The minimum visible size shall be greater than the minimum order size, or an error will occur.
"visibleSize" : 0, # When placing an iceberg order, you need to set the visible size. The minimum visible size is 1/20 of the order size. The minimum visible size shall be greater than the minimum order size, or an error will occur.
"leverage" : x["leverage"],
"postOnly" : False, # The post-only flag ensures that the trader always pays the maker fee and provides liquidity to the order book.
"price" : 2.1000, # The price specified must be a multiple number of the contract tickSize,
"remark" : "remark",
"side" : "buy",# sell/buy
"size" : x["trading_size"], # The size must be no less than the lotSize for the contract and no larger than the maxOrderQty.
"stop" : "", # down/up
"stopPrice" : "",
"stopPriceType": "", # TP/MP/IP: TP for trade price, MP for mark price, and IP for index price
"symbol" : x["symbol"],
"timeInForce" : "", # GTC/IOC: Good Till Canceled GTC and Immediate Or Cancel IOC.
"type" : "limit", # limit/market
}
headers = get_headers(method, endpoint, api_key, api_passphrase, body)
x["opening_response"] = requests.post( base_uri + endpoint, body, headers=headers).json()
print(x["opening_response"])
I receive this error: {'code': '400005', 'msg': 'Invalid KC-API-SIGN'}
All inputs are correct. I think there is a problem with the code.
Best Regards
Javad

I think the problem is your endpoint variable. I believe you should not add the symbol to endpoint, when you are trying to add a new order. Remove it from endpoint and pass the symbol in body object. The other thing is that I think you do not need to pass empty strings to optional fields in body object. I am not sure about this but I think you should remove them from body object. It is a signing problem, therefore you have to check 4 variables: timestamp, method, endpoint and body. I hope this works.

have you considered using ccxt? It makes dealing with lower level API stuff like this somewhat easier.
Given that it's singling out the signature as being invalid, but not saying any headers are missing, it could mean that the signature for str_to_sign variable is wrong?
Let's look at it:
api_secret = ''
now = int(time.time() * 1000)
str_to_sign = str(now) + method + endpoint + str(body)
From the looks of it, your api_secret is just an empty string, so the resulting signature won't be correct.
so, a couple of lines down when you make the signature:
signature = base64.b64encode(hmac.new(api_secret.encode('utf-8'), str_to_sign.encode('utf-8'), hashlib.sha256).digest())
Even though api_secret is given a value in the higher scope, it's overridden by this local variable which is an empty string.
If your api_secret were actually an empty string, then your code would produce the correct signature, but it's not so it isn't
So, If you give it as a param to your get_headers function ie.
def get_headers(method, endpoint, api_key, api_passphrase, api_secret, body):
And delete the first line api_secret = ''
Then maybe it will work?
If it doesn't work, then it's a different problem. (eg. if you had actually put your API key in there and just redacted it before posting). I don't know since I haven't tried running your code.
PS: To Edelweiss: Grüzi aus Berner Oberland!

According to a comment of edelweiss, an easier way is using the client of Kucoin API. It is here.
First, I installed the client using this code.
!pip install kucoin-futures-python
Then, I opened a position by this code:
from kucoin_futures.client import Trade
client = Trade(key='api_key', secret='api_secret', passphrase='api_passphrase', is_sandbox=False, url='')
order_id = client.create_limit_order(symbol, side, lever, size, price, clientOid='', **kwargs)

Python microsoft graph api

I am using microsoft graph api to pull my emails in python and return them as a json object. There is a limitation that it only returns 12 emails. The code is:
def get_calendar_events(token):
graph_client = OAuth2Session(token=token)
# Configure query parameters to
# modify the results
query_params = {
#'$select': 'subject,organizer,start,end,location',
#'$orderby': 'createdDateTime DESC'
'$select': 'sender, subject',
'$skip': 0,
'$count': 'true'
}
# Send GET to /me/events
events = graph_client.get('{0}/me/messages'.format(graph_url), params=query_params)
events = events.json()
# Return the JSON result
return events
The response I get are twelve emails with subject and sender, and total count of my email.
Now I want iterate over emails changing the skip in query_params to get the next 12. Any method of how to iterate it using loops or recursion.

I'm thinking something along the lines of this:
def get_calendar_events(token):
graph_client = OAuth2Session(token=token)
# Configure query parameters to
# modify the results
json_list = []
ct = 0
while True:
query_params = {
#'$select': 'subject,organizer,start,end,location',
#'$orderby': 'createdDateTime DESC'
'$select': 'sender, subject',
'$skip': ct,
'$count': 'true'
}
# Send GET to /me/events
events = graph_client.get('{0}/me/messages'.format(graph_url), params=query_params)
events = events.json()
json_list.append(events)
ct += 12
# Return the JSON result
return json_list
May require some tweaking but essentially you're adding 12 to the offset each time as long as it doesn't return an error. Then it appends the json to a list and returns that.
If you know how many emails you have, you could also batch it that way.

Cosmos DB - Delete Document with Python

In this SO question I had learnt that I cannot delete a Cosmos DB document using SQL.
Using Python, I believe I need the DeleteDocument() method. This is how I'm getting the document ID's that are required (I believe) to then call the DeleteDocument() method.
# set up the client
client = document_client.DocumentClient()
# use a SQL based query to get a bunch of documents
query = { 'query': 'SELECT * FROM server s' }
result_iterable = client.QueryDocuments('dbs/DB/colls/coll', query, options)
results = list(result_iterable);
for x in range(0, len (results)):
docID = results[x]['id']
Now, at this stage I want to call DeleteDocument().
The inputs into which are document_link and options.
I can define document_link as something like
document_link = 'dbs/DB/colls/coll/docs/'+docID
And successfully call ReadAttachments() for example, which has the same inputs as DeleteDocument().
When I do however, I get an error...
The partition key supplied in x-ms-partitionkey header has fewer
components than defined in the the collection
...and now I'm totally lost
UPDATE
Following on from Jay's help, I believe I'm missing the partitonKey element in the options.
In this example, I've created a testing database, it looks like this
So I think my partition key is /testPART
When I include the partitionKey in the options however, no results are returned, (and so print len(results) outputs 0).
Removing partitionKey means that results are returned, but the delete attempt fails as before.
# Query them in SQL
query = { 'query': 'SELECT * FROM c' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
options['partitionKey'] = '/testPART'
result_iterable = client.QueryDocuments('dbs/testDB/colls/testCOLL', query, options)
results = list(result_iterable)
# should be > 0
print len(results)
for x in range(0, len (results)):
docID = results[x]['id']
print docID
client.DeleteDocument('dbs/testDB/colls/testCOLL/docs/'+docID, options=options)
print 'deleted', docID

According to your description, I tried to use pydocument module to delete document in my azure document db and it works for me.
Here is my code:
import pydocumentdb;
import pydocumentdb.document_client as document_client
config = {
'ENDPOINT': 'Your url',
'MASTERKEY': 'Your master key',
'DOCUMENTDB_DATABASE': 'familydb',
'DOCUMENTDB_COLLECTION': 'familycoll'
};
# Initialize the Python DocumentDB client
client = document_client.DocumentClient(config['ENDPOINT'], {'masterKey': config['MASTERKEY']})
# use a SQL based query to get a bunch of documents
query = { 'query': 'SELECT * FROM server s' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
result_iterable = client.QueryDocuments('dbs/familydb/colls/familycoll', query, options)
results = list(result_iterable);
print(results)
client.DeleteDocument('dbs/familydb/colls/familycoll/docs/id1',options)
print 'delete success'
Console Result:
[{u'_self': u'dbs/hitPAA==/colls/hitPAL3OLgA=/docs/hitPAL3OLgABAAAAAAAAAA==/', u'myJsonArray': [{u'subId': u'sub1', u'val': u'value1'}, {u'subId': u'sub2', u'val': u'value2'}], u'_ts': 1507687788, u'_rid': u'hitPAL3OLgABAAAAAAAAAA==', u'_attachments': u'attachments/', u'_etag': u'"00002100-0000-0000-0000-59dd7d6c0000"', u'id': u'id1'}, {u'_self': u'dbs/hitPAA==/colls/hitPAL3OLgA=/docs/hitPAL3OLgACAAAAAAAAAA==/', u'myJsonArray': [{u'subId': u'sub3', u'val': u'value3'}, {u'subId': u'sub4', u'val': u'value4'}], u'_ts': 1507687809, u'_rid': u'hitPAL3OLgACAAAAAAAAAA==', u'_attachments': u'attachments/', u'_etag': u'"00002200-0000-0000-0000-59dd7d810000"', u'id': u'id2'}]
delete success
Please notice that you need to set the enableCrossPartitionQuery property to True in options if your documents are cross-partitioned.
Must be set to true for any query that requires to be executed across
more than one partition. This is an explicit flag to enable you to
make conscious performance tradeoffs during development time.
You could find above description from here.
Update Answer:
I think you misunderstand the meaning of partitionkey property in the options[].
For example , my container is created like this:
My documents as below :
{
"id": "1",
"name": "jay"
}
{
"id": "2",
"name": "jay2"
}
My partitionkey is 'name', so here I have two paritions : 'jay' and 'jay1'.
So, here you should set the partitionkey property to 'jay' or 'jay2',not 'name'.
Please modify your code as below:
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
options['partitionKey'] = 'jay' (please change here in your code)
result_iterable = client.QueryDocuments('dbs/db/colls/testcoll', query, options)
results = list(result_iterable);
print(results)
Hope it helps you.

Using the azure.cosmos library:
install and import azure cosmos package:
from azure.cosmos import exceptions, CosmosClient, PartitionKey
define delete items function - in this case using the partition key in query:
def deleteItems(deviceid):
client = CosmosClient(config.cosmos.endpoint, config.cosmos.primarykey)
# Create a database if not exists
database = client.create_database_if_not_exists(id=azure-cosmos-db-name)
# Create a container
# Using a good partition key improves the performance of database operations.
container = database.create_container_if_not_exists(id=container-name, partition_key=PartitionKey(path='/your-pattition-path'), offer_throughput=400)
#fetch items
query = f"SELECT * FROM c WHERE c.device.deviceid IN ('{deviceid}')"
items = list(container.query_items(query=query, enable_cross_partition_query=False))
for item in items:
container.delete_item(item, 'partition-key')
usage:
deviceid=10
deleteItems(items)
github full example here: https://github.com/eladtpro/python-iothub-cosmos

How to prevent triples from getting mixed up while uploading to Dydra programmatically?

I am trying to upload some data to Dydra from a Sesame triplestore I have on my computer. While the download from Sesame works fine, the triples get mixed up (the s-p-o relationships change as the object of one becomes object of another). Can someone please explain why this is happening and how it can be resolved? The code is below:
#Querying the triplestore to retrieve all results
sesameSparqlEndpoint = 'http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name'
sparql = SPARQLWrapper(sesameSparqlEndpoint)
queryStringDownload = 'SELECT * WHERE {?s ?p ?o}'
dataGraph = Graph()
sparql.setQuery(queryStringDownload)
sparql.method = 'GET'
sparql.setReturnFormat(JSON)
output = sparql.query().convert()
print output
for i in range(len(output['results']['bindings'])):
#The encoding is necessary to parse non-English characters
output['results']['bindings'][i]['s']['value'].encode('utf-8')
try:
subject_extract = output['results']['bindings'][i]['s']['value']
if 'http' in subject_extract:
subject = "<" + subject_extract + ">"
subject_url = URIRef(subject)
print subject_url
predicate_extract = output['results']['bindings'][i]['p']['value']
if 'http' in predicate_extract:
predicate = "<" + predicate_extract + ">"
predicate_url = URIRef(predicate)
print predicate_url
objec_extract = output['results']['bindings'][i]['o']['value']
if 'http' in objec_extract:
objec = "<" + objec_extract + ">"
objec_url = URIRef(objec)
print objec_url
else:
objec = objec_extract
objec_wip = '"' + objec + '"'
objec_url = URIRef(objec_wip)
# Loading the data on a graph
dataGraph.add((subject_url,predicate_url,objec_url))
except UnicodeError as error:
print error
#Print all statements in dataGraph
for stmt in dataGraph:
pprint.pprint(stmt)
# Upload to Dydra
URL = 'http://dydra.com/login'
key = 'my_key'
with requests.Session() as s:
resp = s.get(URL)
soup = BeautifulSoup(resp.text,"html5lib")
csrfToken = soup.find('meta',{'name':'csrf-token'}).get('content')
# print csrf_token
payload = {
'account[login]':key,
'account[password]':'',
'csrfmiddlewaretoken':csrfToken,
'next':'/'
}
# print payload
p = s.post(URL,data=payload, headers=dict(Referer=URL))
# print p.text
r = s.get('http://dydra.com/username/rep_name/sparql')
# print r.text
dydraSparqlEndpoint = 'http://dydra.com/username/rep_name/sparql'
for stmt in dataGraph:
queryStringUpload = 'INSERT DATA {%s %s %s}' % stmt
sparql = SPARQLWrapper(dydraSparqlEndpoint)
sparql.setCredentials(key,key)
sparql.setQuery(queryStringUpload)
sparql.method = 'POST'
sparql.query()

A far simpler way to copy your data over (apart from using a CONSTRUCT query instead of a SELECT, like I mentioned in the comment) is simply to have Dydra itself directly access your Sesame endpoint, for example via a SERVICE-clause.
Execute the following on your Dydra database, and (after some time, depending on how large your Sesame database is), everything will be copied over:
INSERT { ?s ?p ?o }
WHERE {
SERVICE <http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name>
{ ?s ?p ?o }
}
If the above doesn't work on Dydra, you can alternatively just directly access the RDF statements from your Sesame store by using the URI http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements. Assuming Dydra has an upload-feature where you can provide the URL of an RDF document, you can simply provide it the above URI and it should be able to load it.

The code above can work if the following changes are made:
Use CONSTRUCT query instead of SELECT. Details here -> How to iterate over CONSTRUCT output from rdflib?
Use key as input for both account[login] and account[password]
However, this is probably not the most efficient way. Primarily, doing individual INSERTs for every triple is not a good way. Dydra doesn't record all statements this way (I got only about 30% of the triples inserted). On the contrary, using the http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements method as suggested by Jeen enabled me to port all the data successfully.

What is Eating My Array in Django Session?

I want to fetch 10 articles from my Articles model on first load and then another 10 as users scroll towards the bottom via AJAX. I store the ids of the first 10 in an array and then append the subsequent ones to it. This is to enable fetch articles that are not in that list but I get empty array and the first 10 articles are still fetched when I scroll to the bottom again and again.
First load:
import json
my_interest = user_object.interet #this returns [3,4,55,24,57]
articles = Articles.objects.all()[:10]
fetched = [x.id for x in articles]
request.session['fetched'] = json.dumps(fetched)
Another 10 via AJAX:
import operator
fetched = json.loads(request.session['fetched'])
my_interest = user_object.interet #this returns [3,4,55,24,57]
query = reduce(operator.and_,[Q(cat_id__in = my_interest ), ~Q(id__in = fetched )])
articles = Articles.objects.filter(query)[:10]
request.session['fetched'] = json.dumps( fetched + [x.id for x in articles])
context = {'articles': articles, 'fetched': request.session['fetched']}
return render_to_response('mysite/loadmore.html', context)
But I still get the same 10 that was first fetched repeatedly as I scroll to the bottom of the page and if I <p> Fetched: {{fetched}} </p> on my template I only see Fetched:

I'm not sure if this is your problem, but there is no need for either reduce or Q in your query. It is much simpler and clearer to write:
articles = Articles.objects.filter(cat_id__in=my_interest).exclude(id__in=fetched )
Also there's no point in dumping/loading to JSON when saving the IDs to the session. The session already takes care of serialization. Just do:
fetched = request.session['fetched']
...
request.session['fetched'].extend([x.id for x in articles])

Q(id__in = fetched )
Looks like you request articles with the same ids, that you already fetched and saved to session?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problem of size for my query to ElasticSearch - python

Related

I have a problem in Trading with Kucoin API (Python)

Python microsoft graph api

Cosmos DB - Delete Document with Python

How to prevent triples from getting mixed up while uploading to Dydra programmatically?

What is Eating My Array in Django Session?

Categories

Resources