How to send data post data to jobs api elasticsearch - python

I'm trying to post data to a machine learning api using elasticsearch. What format does the json docs need to be in?
I've attempted to send data with json docs separated by newline in a txt file. I've also tried converting back and forth to json using dump and load to no avail. The documentation states that the documents can be separated by whitespace, but no matter what I try it won't accept them.
https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-post-data.html
Here is an example of a json doc saved as file_name.json:
[{"myid": "id1", "client": "client1", "submit_date": 1514764857},
{"my_id": "id2", "client": "client_2", "submit_date": 1514764857}]
Here is the basic code needed to post data:
from elasticsearch import Elasticsearch
from elasticsearch.client.xpack import MlClient
es = elastic_connection()
es_ml = MlClient(es)
def post_training_data(directory='Training Data', file_name='file_name.json'):
with open(os.path.join(directory, file_name), mode='r') as train_file:
train_data = json.load(train_file)
es_ml.post_data(job_id=job_id, body=train_data)
post_training_data()
This is the specific error I am getting with this:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "..\train_model.py", line 218, in post_training_data
self.es_ml.post_data(job_id=self.job_id, body=train_data)
File "..\inc_anamoly\lib\site-packages\elasticsearch\client\utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "..\inc_anamoly\lib\site-packages\elasticsearch\client\xpack\ml.py", line 81, in post_data
body=self._bulk_body(body))
AttributeError: 'MlClient' object has no attribute '_bulk_body'

This turns out to be a bug. Issue reported.
https://github.com/elastic/elasticsearch-py/issues/959

Related

openapi-python-client write-only field is required in GET request

i'm using DRF+openapi-python-client for my project.
i have 2 question for now.
openapi-python-client generate does not accept url for generating client
generated client fileas to serialize model having write-only file field
Thanx in advance!
Launching my server with manage.py script works ok, i can get my openapi schema on http://127.0.0.1:8000/control_center/schema/?format=openapi-json with my browser. On the same time,command openapi-python-client generate --url http://localhost:8000/control_center/schema?format=openapi-json gives me
Traceback (most recent call last):
File "pydantic/main.py", line 522, in pydantic.main.BaseModel.parse_obj
TypeError: 'NoneType' object is not iterable
What am i missing? workaround for me is to copy schema from browser to json and run openapi-python-client generate --path schema.json
i have a model StorageFile with write-only file field. In GET responses this field is ommited,getting file content is another endpoint. Getting storage file details with browser or with manual request is ok,but using generated client like
file_details: models.StorageFile = retrieve_storage_file.sync(id=file.id,
client=self.api_client)
gives me error
Error
Traceback (most recent call last):
File ".../api_client_tests/test_storage_files_api.py", line 18, in test_can_get_storage_file_details
file_details: models.StorageFile = retrieve_storage_file.sync(id=file.id,
File ".../api_client/control_center_client/control_center_client/api/control_center/retrieve_storage_file.py", line 86, in sync
return sync_detailed(
File ".../api_client/control_center_client/control_center_client/api/control_center/retrieve_storage_file.py", line 70, in sync_detailed
return _build_response(response=response)
File ".../api_client/control_center_client/control_center_client/api/control_center/retrieve_storage_file.py", line 43, in _build_response
parsed=_parse_response(response=response),
File ".../api_client/control_center_client/control_center_client/api/control_center/retrieve_storage_file.py", line 32, in _parse_response
response_200 = StorageFile.from_dict(response.json())
File ".../api_client/control_center_client/control_center_client/models/storage_file.py", line 128, in from_dict
file = File(payload=BytesIO(d.pop("file")))
KeyError: 'file'
In schem.json this field is described as
"file": {
"type": "string",
"format": "binary",
"writeOnly": true
}
What am i missing here?

Unable to read json file with python

I am reading a json file with python using below code:
import json
Ums = json.load(open('commerceProduct.json'))
for um in Ums :
des = um['description']
if des == None:
um['description'] = "Null"
with open("sample.json", "w") as outfile:
json.dump(um, outfile)
break
It is giving me the following error:
Traceback (most recent call last):
File "test.py", line 2, in <module>
Ums = json.load(open('commerceProduct.json'))
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 5528 (char 5527)
while I am checking the json file, it looks fine.
The thing is it has one object on one line with deliminator being '\n'.
It is not corrupted since i have imported the same file in mongo.
Can someone please suggest what can be wrong in it ?
Thanks.
your JSON data is not in a valid format, one miss will mess up the python parser. Try to test your JSON data here, make sure it is in a correct format.
this return _default_decoder.decode(s) is returned when the python parser find somthing wrong in your json.
The code is valid and will work with a valid json doc.
You have one json object per line? That's not a valid json file. You have newline-delimited json, so consider using the ndjson package to read it. It has the same API as the json package you are familiar with.
import ndjson
Ums = ndjson.load(open('commerceProduct.json'))
...

Parsing Google Analytics API Python json response into python dataframe

Trying to parse Google Analytics API Python json response into python dataframe, and then ETL to MS SQL Server using python.
I get a successful output called feed
import json, gdata
data_query = gdata.analytics.client.DataFeedQuery({
'ids': 'ga:67981229',
'dimensions': 'ga:userType,ga:sessionCount,ga:source', ##ga:source,ga:medium
'metrics': 'ga:pageviews',
##'filters': 'ga:pagePath==/my_url_comes_here/',
##'segment':'',
'start-date': '2015-01-01',
'end-date': '2015-01-03',
'prettyprint': 'true',
'output':'json',
})
feed = my_client.GetDataFeed(data_query)
However, when I try to parse the the data using this code it doesn't work and I get the below error
response = json.parse(feed) ## I also tried json.load(feed) and json.loads(feed)
data = json.parse(feed)
Traceback (most recent call last):
File "", line 1, in
data = json.parse(feed)
AttributeError: 'module' object has no attribute 'parse'
data = json.loads(feed)
Traceback (most recent call last):
File "", line 1, in
data = json.loads(feed)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 338, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
data = json.load(feed)
Traceback (most recent call last):
File "", line 1, in
data = json.load(feed)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 286, in load
return loads(fp.read(),
AttributeError: 'DataFeed' object has no attribute 'read'
And I have already imported all of json as seen at the top, furthermore - my end objective is to ETL this to MS SQL Server - so any help on effective method to do this with a JSON Python object would help a LOT! Thanks!
Instead of parsing the json response manually into a dataframe you could try using the Pandas library which has built in methods to query the Google Analytics API. Once you get your Google Analytics Metrics into a dataframe, you could insert records into SQL Server using the to_sql method.

reading a file with json data with python throw an error that I cannot identify

I have the following json file named json.txt with the following data,
{"id":99903727,"nickname":"TEST_MLA_OFF","registration_date":"2010-12-03T14:19:33.000-04:00","country_id":"AR","user_type":"normal","logo":null,"points":0,"site_id":"MLA","permalink":"http://perfil.mercadolibre.com.ar/TEST_MLA_OFF","seller_reputation":{"level_id":null,"power_seller_status":null,"transactions":{"period":"12 months","total":25,"completed":25,"canceled":0,"ratings":{"positive":0,"negative":0,"neutral":1}}},"status":{"site_status":"deactive"}}
I obtained it using wget. I tried to load that json data with python using the following python code,
json_data = json.load('json.txt')
data = json.load(json_data)
json_data.close()
print data
but that throws the following error,
Traceback (most recent call last):
File "json-example.py", line 28, in <module>
main()
File "json-example.py", line 21, in main
json_data = json.load('json.txt')
File "/opt/sage-4.6.2-linux-64bit-ubuntu_8.04.4_lts-x86_64-Linux/local/lib/python/json/__init__.py", line 264, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
I couldn't find googling what is the reason of the error.
Best regards.
Even better practice is to use the with statement.
with open('json.txt', 'r') as json_file:
data = json.load(json_file)
This makes sure the file gets closed properly without
you worrying about it.
You need to give json.load a file stream object:
json_file = open('json.txt')
data = json.load(json_file)
json_file.close()
print data

Yahoo BOSS Python Library, ExpatError

I tried to install the Yahoo BOSS mashup framework, but am having trouble running the examples provided. Examples 1, 2, 5, and 6 work, but 3 & 4 give Expat errors. Here is the output from ex3.py:
gpython examples/ex3.py
examples/ex3.py:33: Warning: 'as' will become a reserved keyword in Python 2.6
Traceback (most recent call last):
File "examples/ex3.py", line 27, in <module>
digg = db.select(name="dg", udf=titlef, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news")
File "/usr/lib/python2.5/site-packages/yos/yql/db.py", line 214, in select
tb = create(name, data=data, url=url, keep_standards_prefix=keep_standards_prefix)
File "/usr/lib/python2.5/site-packages/yos/yql/db.py", line 201, in create
return WebTable(name, d=rest.load(url), keep_standards_prefix=keep_standards_prefix)
File "/usr/lib/python2.5/site-packages/yos/crawl/rest.py", line 38, in load
return xml2dict.fromstring(dl)
File "/usr/lib/python2.5/site-packages/yos/crawl/xml2dict.py", line 41, in fromstring
t = ET.fromstring(s)
File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 963, in XML
parser.feed(text)
File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 1245, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0
It looks like both examples are failing when trying to query Digg.com. Here is the query that is constructed in ex3.py's code:
diggf = lambda r: {"title": r["title"]["value"], "diggs": int(r["diggCount"]["value"])}
digg = db.select(name="dg", udf=diggf, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news")
The problem is the digg search string. It should be "s=". Not "search="
I believe that must be an error in the example: it's getting a JSON result (indeed if you copy and paste that URL in your browser, you'll download a file names search.json which starts with
{"results":[{"profile_image_url":
"http://a3.twimg.com/profile_images/255524395/KEN_OMALLEY_REVISED_normal.jpg",
"created_at":"Mon, 14 Sep 2009 14:52:07 +0000","from_user":"twilightlords",
i.e. perfectly normal JSON; but then instead of parsing it with modules such as json or simplejson, it tries to parse it as XML -- and obviously this attempt fails.
I believe the fix (which probably needs to be brought to the attention of whoever maintains that code so they can incorporate it) is either to ask for XML instead of JSON output, OR to parse the resulting JSON with appropriate means instead of trying to look at it as XML (not sure how to best implement either change, as I'm not familiar with that code).

Categories

Resources