Status update on bulk elastic upload using helper

Status update on bulk elastic upload using helper - python

I am trying to update a bunch of records in a single index, however i need the status if the update is successful or not. Reason is that the particular id may not be available in the index. This is how my code looks like, i dont get any output for "update_response" even though the records are being updated in my index. Shouldnt i be getting a " It returns a tuple with summary information - number of successfully executed actions and either list of errors or number of errors if stats_only is set to True."
as per function definition at this page https://elasticsearch-py.readthedocs.io/en/7.x/helpers.html
es = config.createESConnection()
start_time = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")
api_url = 'abcd'
parameters = {
"processed":False,
"date":start_time
}
response = requests.get(api_url,params=parameters)
info= requests.get(api_url,params=parameters).json()['data']
es_data=[]
for i in range(len(info)):
action = {
"_index": "index",
"_id" : info[i]['id'],
"_op_type":"update",
"doc":{"x" : info[i]['x']}
}
es_data.append(action)
update_response = helpers.bulk(es, es_data,stats_only =True)
print(update_response)
except Exception:

okay so this is to help others, in case there is an error in helpers.bulk you need to enable raise_on_error=False, doing so will complete the updation on all the records, skipping the ones having error and then you get the return object.
so use this
update_response = helpers.bulk(es, es_data,raise_on_error=False)

Related

Using python and suds, data not read by server side because element is not defined as an array

I am a very inexperienced programmer with no formal education. Details will be extremely helpful in any responses.
I have made several basic python scripts to call SOAP APIs, but I am running into an issue with a specific API function that has an embedded array.
Here is a sample excerpt from a working XML format to show nested data:
<bomData xsi:type="urn:inputBOM" SOAP-ENC:arrayType="urn:bomItem[]">
<bomItem>
<item_partnum></item_partnum>
<item_partrev></item_partrev>
<item_serial></item_serial>
<item_lotnum></item_lotnum>
<item_sublotnum></item_sublotnum>
<item_qty></item_qty>
</bomItem>
<bomItem>
<item_partnum></item_partnum>
<item_partrev></item_partrev>
<item_serial></item_serial>
<item_lotnum></item_lotnum>
<item_sublotnum></item_sublotnum>
<item_qty></item_qty>
</bomItem>
</bomData>
I have tried 3 different things to get this to work to no avail.
I can generate the near exact XML from my script, but a key attribute missing is the 'SOAP-ENC:arrayType="urn:bomItem[]"' in the above XML example.
Option 1 was using MessagePlugin, but I get an error because my section is like the 3 element and it always injects into the first element. I have tried body[2], but this throws an error.
Option 2 I am trying to create the object(?). I read a lot of stack overflow, but I might be missing something for this.
Option 3 looked simple enough, but also failed. I tried setting the values in the JSON directly. I got these examples by an XML sample to JSON.
I have also done a several other minor things to try to get it working, but not worth mentioning. Although, if there is a way to somehow do the following, then I'm all ears:
bomItem[]: bomData = {"bomItem"[{...,...,...}]}
Here is a sample of my script:
# for python 3
# using pip install suds-py3
from suds.client import Client
from suds.plugin import MessagePlugin
# Config
#option 1: trying to set it as an array using plugin
class MyPlugin(MessagePlugin):
def marshalled(self, context):
body = context.envelope.getChild('Body')
bomItem = body[0]
bomItem.set('SOAP-ENC:arrayType', 'urn:bomItem[]')
URL = "http://localhost/application/soap?wsdl"
client = Client(URL, plugins=[MyPlugin()])
transact_info = {
"username":"",
"transaction":"",
"workorder":"",
"serial":"",
"trans_qty":"",
"seqnum":"",
"opcode":"",
"warehouseloc":"",
"warehousebin":"",
"machine_id":"",
"comment":"",
"defect_code":""
}
#WIP - trying to get bomData below working first
inputData = {
"dataItem":[
{
"fieldname": "",
"fielddata": ""
}
]
}
#option 2: trying to create the element here and define as an array
#inputbom = client.factory.create('ns3:inputBOM')
#inputbom._type = "SOAP-ENC:arrayType"
#inputbom.value = "urn:bomItem[]"
bomData = {
#Option 3: trying to set the time and array type in JSON
#"#xsi:type":"urn:inputBOM",
#"#SOAP-ENC:arrayType":"urn:bomItem[]",
"bomItem":[
{
"item_partnum":"",
"item_partrev":"",
"item_serial":"",
"item_lotnum":"",
"item_sublotnum":"",
"item_qty":""
},
{
"item_partnum":"",
"item_partrev":"",
"item_serial":"",
"item_lotnum":"",
"item_sublotnum":"",
"item_qty":""
}
]
}
try:
response = client.service.transactUnit(transact_info,inputData,bomData)
print("RESPONSE: ")
print(response)
#print(client)
#print(envelope)
except Exception as e:
#handle error here
print(e)
I appreciate any help and hope it is easy to solve.

I have found the answer I was looking for. At least a working solution.
In any case, option 1 worked out. I read up on it at the following link:
https://suds-py3.readthedocs.io/en/latest/
You can review at the '!MessagePlugin' section.
I found a solution to get message plugin working from the following post:
unmarshalling Error: For input string: ""
A user posted an example how to crawl through the XML structure and modify it.
Here is my modified example to get my script working:
#Using MessagePlugin to modify elements before sending to server
class MyPlugin(MessagePlugin):
# created method that could be reused to modify sections with similar
# structure/requirements
def addArrayType(self, dataType, arrayType, transactUnit):
# this is the code that is key to crawling through the XML - I get
# the child of each parent element until I am at the right level for
# modification
data = transactUnit.getChild(dataType)
if data:
data.set('SOAP-ENC:arrayType', arrayType)
def marshalled(self, context):
# Alter the envelope so that the xsd namespace is allowed
context.envelope.nsprefixes['xsd'] = 'http://www.w3.org/2001/XMLSchema'
body = context.envelope.getChild('Body')
transactUnit = body.getChild("transactUnit")
if transactUnit:
self.addArrayType('inputData', 'urn:dataItem[]', transactUnit)
self.addArrayType('bomData', 'urn:bomItem[]', transactUnit)

Find all unique values for field in Elasticsearch through python

I've been scouring the web for some good python documentation for Elasticsearch. I've got a query term that I know returns the information I need, but I'm struggling to convert the raw string into something Python can interpret.
This will return a list of all unique 'VALUE's in the dataset.
{"find": "terms", "field": "hierarchy1.hierarchy2.VALUE"}
Which I have taken from a dashboarding tool which accesses this data.
But I don't seem to be able to convert this into correct python.
I've tried this:
body_test = {"find": "terms", "field": "hierarchy1.hierarchy2.VALUE"}
es = Elasticsearch(SETUP CONNECTION)
es.search(
index="INDEX_NAME",
body = body_test
)
but it doesn't like the find value. I can't find anything in the documentation about find.
RequestError: RequestError(400, 'parsing_exception', 'Unknown key for
a VALUE_STRING in [find].')
The only way I've got it to slightly work is with
es_search = (
Search(
using=es,
index=db_index
).source(['hierarchy1.hierarchy2.VALUE'])
)
But I think this is pulling the entire dataset and then filtering (which I obviously don't want to be doing each time I run this code). This needs to be done through python and so I cannot simply POST the query I know works.
I am completely new to ES and so this is all a little confusing. Thanks in advance!

So it turns out that the find in this case was specific to Grafana (the dashboarding tool I took the query from.
In the end I used this site and used the code from there. It's a LOT more complicated than I thought it was going to be. But it works very quickly and doesn't put a strain on the database (which my alternative method was doing).
In case the link dies in future years, here's the code I used:
from elasticsearch import Elasticsearch
es = Elasticsearch()
def iterate_distinct_field(es, fieldname, pagesize=250, **kwargs):
"""
Helper to get all distinct values from ElasticSearch
(ordered by number of occurrences)
"""
compositeQuery = {
"size": pagesize,
"sources": [{
fieldname: {
"terms": {
"field": fieldname
}
}
}
]
}
# Iterate over pages
while True:
result = es.search(**kwargs, body={
"aggs": {
"values": {
"composite": compositeQuery
}
}
})
# Yield each bucket
for aggregation in result["aggregations"]["values"]["buckets"]:
yield aggregation
# Set "after" field
if "after_key" in result["aggregations"]["values"]:
compositeQuery["after"] = \
result["aggregations"]["values"]["after_key"]
else: # Finished!
break
# Usage example
for result in iterate_distinct_field(es, fieldname="pattern.keyword", index="strings"):
print(result) # e.g. {'key': {'pattern': 'mypattern'}, 'doc_count': 315}

Elasticsearch TransportError(400, u'parse_exception', u'Failed to derive xcontent')

I'm trying to add a new field to an existing document in elasticsearch. I use the elasticsearch python API to do this.
My query is :
addField = { "script" : { "inline" : "ctx._source.langTranslation = test"}}
esclient.bulk(body = addField)
When I try this I get this error :
I search how to avoid this error but not found anything. What is wrong with my resquest ?
Thanks for your answer

I have found a solution, instead of use bulk function I use update function :
esclient.update(index = hit['_index'], doc_type = hit['_type'], id = hit['_id'], body = {"doc": {"langTranslation": translation}})
And it works, I don't have any error message.

Best way to change type of column in dynamoDB

I have the DynamoDB table that is filled from different services/sources. The table has next schema:
{
"Id": 14782,
"ExtId": 1478240974, //pay attention it is Number
"Name": "name1"
}
Sometimes, after services started to work I had found that one service sends data in incorrect format. It looks like:
{
"Id": 14782,
"ExtId": "1478240974", //pay attention it is String
"Name": "name1"
}
DynamoDB is NoSQL database so, now I have millions mixed records that are difficult to query or scan. I understand that my main fault was missed validation.
Now I have to go throw all records and if it is inappropriate type - remove it and add with same data but with the correct format. Is it possible to do in another gracefully way?

So it was pretty easy. It is possible to do with attribute_type method.
First of all, I added imports:
from boto3.dynamodb.conditions import Attr
import boto3
And my code:
attr = Attr('ExtId').attribute_type('S')
response = table.scan(FilterExpression = attr)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.scan(FilterExpression = attr, ExclusiveStartKey = response['LastEvaluatedKey'])
items.extend(response['Items'])
It is possible to find more condition customization in the next article - DynamoDB Customization Reference

Updating a custom field using ASANA Python API

I'm trying to update the values of custom fields in my Asana list. I'm using the Official Python client library for the Asana API v1.
My code currently looks like this;
project = "Example Project"
keyword = "Example Task"
print "Logging into ASANA"
api_key = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
client = asana.Client.basic_auth(api_key)
me = client.users.me()
all_projects = next(workspace for workspace in me['workspaces'])
projects = client.projects.find_by_workspace(all_projects['id'])
for project in projects:
if 'Example Project' not in project['name']:
continue
print "Project found."
print "\t"+project['name']
print
tasks = client.tasks.find_by_project(project['id'], {"opt_fields":"this.name,custom_fields"}, iterator_type=None)
for task in tasks:
if keyword in task['name']:
print "Task found:"
print "\t"+str(task)
print
for custom_field in task['custom_fields']:
custom_field['text_value'] = "New Data!"
print client.tasks.update(task['id'], {'data':task})
But when I run the code, the task doesn't update. The return of print client.tasks.update returns all the details of the task, but the custom field has not been updated.

I think the problem is that our API is not symmetrical with respect to custom fields... which I kind of find to be a bummer; it can be a real gotcha in cases like this. Rather than being able to set the value of a custom field within the block of values as you're doing above, which is intuitive, you have to set them with a key:value dictionary-like setup of custom_field_id:new_value - not as intuitive, unfortunately. So above, where you have
for custom_field in task['custom_fields']:
custom_field['text_value'] = "New Data!"
I think you'd have to do something like this:
new_custom_fields = {}
for custom_field in task['custom_fields']:
new_custom_fields[custom_field['id']] = "New Data!"
task['custom_fields'] = new_custom_fields
The goal is to generate JSON for the POST request that looks something like
{
"data": {
"custom_fields":{
"12345678":"New Data!"
}
}
}
As a further note, the value should be the new text string if you have a text custom field, a number if it's a number custom field, and the ID of the enum_options choice (take a look at the third example under this header on our documentation site) if it's an enum custom field.

Thanks to Matt, I got to the solution.
new_custom_fields = {}
for custom_field in task['custom_fields']:
new_custom_fields[custom_field['id']] = "New Data!"
print client.tasks.update(task['id'], {'custom_fields':new_custom_fields})
There were two problems in my original code, the first was that I was trying to treat the API symmetrically and this was identified and solved by Matt. The second was that I was trying to update in an incorrect format. Note the difference between client.tasks.update in my original and updated code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Status update on bulk elastic upload using helper - python

Related

Using python and suds, data not read by server side because element is not defined as an array

Find all unique values for field in Elasticsearch through python

Elasticsearch TransportError(400, u'parse_exception', u'Failed to derive xcontent')

Best way to change type of column in dynamoDB

Updating a custom field using ASANA Python API

Categories

Resources