Deleting an Index in Elasticsearch - python

I want to write a python program to delete a whole index.
The following code is working, I am able to connect to my elasticsearch instance and able to create a client and perform searches. But when it comes to deleting an index, I am failing.
The following block of code runs perfectly and returns the results.
from elasticsearch import Elasticsearch
ELASTIC_PASSWORD = "password"
client = Elasticsearch(
"http://username:password#localhost:9200",
basic_auth=("elastic", ELASTIC_PASSWORD)
)
print(client.info())
user_request = "some information"
query_body = {
"query": {
"bool": {
"must": {
"match": {
"Log-Message": user_request
}
}
}
}
}
result = client.search(index="my_index", body=query_body)
print("query hits:", result["hits"]["hits"])
Now, in the next step, I am trying to delete the index named my_index using the following code and I get this error.
result = client.delete(index='my_index')
print(result)
The error is:
return func(*args, params=params, headers=headers, **kwargs)
TypeError: delete() takes at least 3 arguments (4 given)
According the elasticsearch python API documentation I think I am using the correct function to delete the index.
But now I am not sure if I am not using the correct delete index function or I am making a mistake in using the python functions.
Any kind of information would be useful!
Thanks in advance!

Related

Trouble in comparison JSON with Python

I have to implement a web service using Flask and MongoDB. I have two collections. The JSON files are shown below:
File products.json:
{ "name":"red apples", "price":0.5, "description":"Apples from Greece", "category":"fruits", "stock":25 }
{ "name":"fish", "price":5, "description":"Fresh fish", "category":"fish", "stock":25 }
{ "name":"pasta", "price":1.5, "description":"Pasta from Italia", "category":"pasta", "stock":25 }
File users.json:
{"name":"George Vaios", "password":"321", "email":"george#gmail.gr", "category":"admin"}
{"name":"Admin", "password":"admin", "email":"admin#gmail.gr", "category":"admin"}
What I'm trying to do?
I'm trying to implement a function called add_product. This function allows users whose category is admin ("category": "admin"). As a route has a POST method because I'm parsing the email of a user and the details are needed to import a new product in products.json.
Code:
(lines of code for loading data)
user = users.find_one({"email":data['email']})
if (user['category']=="admin"):
product = {
"name": data['name'],
"price": data['price'],
"description": data['description'],
"category": data['category'],
"stock": data['stock']
}
products.insert_one(product)
return Response("Product was added succesfully", status=200, mimetype='application/json')
else:
return Response("Account has no permission. Must be an admin", status=400, mimetype='application/json')
With that code I'm trying to check if the user is an admin by finding him in the JSON file using the find_one method and then compare the user['category'] with the string 'admin'.
Data that I'm parsing to postman:
{
"email": "george#gmail.gr",
"name": "chocolate",
"price": 25,
"description": "mklkgj",
"category": "sweets",
"stock": 30
}
Error that I get:
File "/home/michael/main.py", line 241, in add_product_admin
if (user['category']=="admin"):
TypeError: 'NoneType' object is not subscriptable
I can't understand why the if statement doesn't make the comparison. Any thoughts?
Mongo is telling you that nothing came up when searching for that value. I would start by checking if data['email'] has the value you would expect and if your collection is initialized correctly.
Thanks to everyone who explained to me what's going on. The actual problem was on MongoDB and client parameters on Python script. That's why the entries and JSON files were unreadable. If anyone comes up again with this problem be sure that you have checked:
# Choose database
database = client['database you want to access']
Then don't forget to check with mongo shell if your expected collections exist:
db.(your database name).find({})
Thank you for your time!

Nested Dict As HttpRequest Django

I am trying to write some test cases for some code I've developed using Elasticsearch and Django. The concept is straightforward - I just want to test a get request, which will be an Elasticsearch query. However, I am constructing the query as a nested dict. When I pass the nested dict to the Client object in the test script it gets passed through Django until it ends up at the urlencode function which doesn't look like it can handle nested dicts only MultivalueDicts. Any suggestions or solutions? I don't want to use any additional packages as I don't want to depend on potentially non-supported packages for this application.
Generic Code:
class MyViewTest(TestCase):
es_connection = elasticsearch.Elasticsearch("localhost:9200")
def test_es_query(self):
client = Client()
query = {
"query": {
"term": {
"city": "some city"
}
}
}
response = client.get("", query)
print(response)
Link for urlencode function: urlencode Django
The issue is clearly at the conditional statement when the urlencode function checks if the dictionary value is a str or bytes object. If it isn't it creates a generator object which can never access the nested portions of the dictionary.
EDIT: 07/25/2018
So I was able to come up with a temporary work around to at least run the test. However, it is ugly and I feel like there must be a better way. The first thing I tried was specifying the content_type and converting the dict to a json string first. However, Django still kicked back and error in the urlencode function.
class MyViewTest(TestCase):
es_connection = elasticsearch.Elasticsearch("localhost:9200")
def test_es_query(self):
client = Client()
query = {
"query": {
"term": {
"city": "some city"
}
}
}
response = client.get("", data=json.dumps(query), content_type="application/json")
print(response)
So instead I had to do:
class MyViewTest(TestCase):
es_connection = elasticsearch.Elasticsearch("localhost:9200")
def test_es_query(self):
client = Client()
query = {
"query": {
"term": {
"city": "some city"
}
}
}
query = json.dumps(query)
response = client.get("", data={"q": query}, content_type="application/json")
print(response)
This let me send the HttpRequest to my View and parse it back out using:
json.loads(request.GET["q"])
Then I was able to successfully get the requested data from Elasticsearch and return it as an HttpResponse. I feel like in Django though there has to be a way to just pass a json formatted string directly to the Client object's get function. I thought specifying the content_type as application/json would work but it still calls the urlencode function. Any ideas? I really don't want to implement this current system into production.

update elasticsearch data by using elasticserch dsl

How can I update elasticsearch data by using elasticsearch-dsl package? Is that possible ?
I found elasticsearch update api, but it seems like bit difficult. What I am looking for is,
searchObj = Search(using=logserver, index=INDEX)
searchObj=searchObj.query("term",attribute=value).update(attribute=new_value)
response = searchObj.execute()
#kingArther answer is not correct.
elasticsearch-dsl support update very well!
By mapping an index to an object (DocType) it allows
you to save and update easily without any JSON rest requests.
You can find examples an API here
Its probably late but i am leaving this reply for those who are still having same issue:
elasticsearch-dsl offers update function that can be called on classes extending Document class of elasticsearch-dsl. Following is the code:
data = yourIndexClass.get(id=documentIdInIndex)
data.update(key=NewValue)
That is it. Simple. Find details Here
If I'm not wrong elasticsearch_dsl doesn't have an option for update/bulk update.
So, if you like, you can use elasticsearch-py pckage for the same.
Example
from elasticsearch import Elasticsearch
INDEX = 'myindex'
LOG_HOST = 'myhost:myport'
logserver = Elasticsearch(LOG_HOST)
script = "ctx._source.attribute = new_value"
update_body = {
"query": {
"bool": {
"filter": {
"term": {"attribute": "value"}
}
}
},
"script": {
"source": script,
"lang": "painless"
}
}
update_response = logserver.update_by_query(index=INDEX, body=update_body)
For more information, see this official documentation

"Response too large to return" on simple SELECT in BigQuery, even with allowLargeResults=True?

I am using BigQuery with Python. I am trying to work out how to run a simple SELECT query, but I am getting errors about large results.
I have tested my query in the BigQuery interface before writing it in Python. It runs fine, returns 1 row, takes 4.0 seconds and processes 18.2GB. The underlying table is about 150GB, 200m rows.
This is my code:
credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)
try:
query_request = bigquery_service.jobs()
query_data = {
"allowLargeResults": True,
'query': (
'SELECT org_code, item_code FROM [mytable] ',
"WHERE (time_period='201501') ",
"AND item_code='0212000AAAAAAAA' ",
"AND (org_code='B82005') "
"LIMIT 10;"
)
}
print ' '.join(query_data['query'])
response = query_request.query(
projectId=project_id,
body=query_data).execute()
job_ref = response['jobReference']
print 'job_ref', job_ref
except HttpError as err:
print('Error: {}'.format(err.content))
raise err
This is the output I get:
SELECT org_code, item_code FROM [mytable] WHERE (time_period='201501') AND (item_code='0212000AAAAAAAA') AND (org_code='B82005') LIMIT 10;
Error: {
"error": {
"errors": [
{
"domain": "global",
"reason": "responseTooLarge",
"message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
}
],
"code": 403,
"message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
}
}
Traceback (most recent call last):
File "query.py", line 93, in <module>
main(args.project_id)
File "query.py", line 82, in main
raise err
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/bigquery/v2/projects/824821804911/queries?alt=json returned "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors">
There are a couple of different things that confuse me about this:
It says I should use allowLargeResults, even though I already am.
It's giving me a warning about large results, although this is a straightforward SELECT query with no grouping, and it returns 1 row.
I understand that the warning will fire if any part of the query processing becomes too large. But I don't really know how to get round this, given the query I'm doing is just a SELECT with no grouping etc. I'm not even using SELECT *.
Surely the whole point of BigQuery is that it can handle this kind of thing?
How can I fix this problem?
If configuration.query.allowLargeResults is set to true - it also requires configuration.query.destinationTable
You should either add destinationTable object or (as your output seems to be small) set allowLargeResults to false
Added example of configuration:
'query': {
'query': 'my_query_text',
'destinationTable': {
'projectId': 'my_project',
'datasetId': 'my_dataset',
'tableId': 'my_table'
},
'createDisposition': 'CREATE_IF_NEEDED',
'writeDisposition': 'WRITE_TRUNCATE',
'allowLargeResults': True
}
I took a look at your job; you're not setting allowLargeResults, and you're also not using a limit or a filter (your query was essentially just selecting two fields from the table).
There are two ways to run a query in the BigQuery API. The first is to call the jobs.query(). This is the 'simple' way, but it is lacking some bells and whistles. The other is to call jobs.insert() with a query job configuration. This has full support for things like setting a destination table and allowing large results.
It looks like you are calling the former (jobs.query()), but you want jobs.insert().
It is easier than it sounds to use the more fully-fledged jobs.insert() call. You can get the job id back from the jobs.insert() call, and then pass that to jobs.getQueryResults() to get the query results; the format of the results returned by that method is the same as calling jobs.query(). Check out the sample code here.
Let's clear some things which are wrong here.
Queries that return large results are subject to additional limitations:
You must specify a destination table.
You can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
Window functions can return large query results only if used in conjunction with a PARTITION BY clause.
The documentation is clear about configuration.query.allowLargeResults If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires destinationTable to be set.
Is [mytable] possibly a view rather than a table?
I had a same issue. I solved it by using job.insert() instead of job.query(). Specify true for allowLargeResults. Also give destinationTable for query.
Here is the sample code:
job_data = {
"jobReference": {
"projectId": "project_id"
},
"configuration": {
"query": {
"query": "query",
"allowLargeResults": "True",
"destinationTable": {
"projectId": "project_id",
"tableId": "table_name",
"datasetId": "dataset_name"
}
}
}
}
return bigquery.jobs().insert(
projectId="project_id",
body=job_data).execute()

400 Error while trying to POST to JIRA issue

I am trying to set the 'transition' property in a JIRA issue from whatever it is, to completed(which according to the doc is 10000). According to the documentation, this error is 'If there is no transition specified.'
Also I have used ?expand=transitions.fields to verify that 10000 is for complete.
using these docs
https://docs.atlassian.com/jira/REST/latest/#api/2/issue-doTransition
https://jira.atlassian.com/plugins/servlet/restbrowser#/resource/api-2-issue-issueidorkey-transitions/POST
Here is my request
url = 'http://MYURL/rest/api/2/issue/ISSUE-ID/transitions'
payload1 = open('data3.json', 'r').read()
payload = json.loads(payload1)
textFile = requests.post(url, auth=('username', 'password'), json=payload)
The contents on my data3.json file are
{
"transition": 10000
}
edit: I also changed my JSON to this and I get a 500 error
{
"transition": {
"id": "10000"
}
}
The error I get
{"errorMessages":["Can not instantiate value of type [simple type,classcom.atlassian.jira.rest.v2.issue.TransitionBean] from JSON integral number;no single-int-arg constructor/factory method (through reference chain:com.atlassian.jira.rest.v2.issue.IssueUpdateBean[\"transition\"])"]}400
I'm pretty confident that my issue is in my json file since I have used GET in the code above this snippit multiple times, but I could be wrong.
Possible cause - https://jira.atlassian.com/browse/JRA-32132
I believe the issue I was having was a process flow one. I cannot jump right from my issue being opened, to 'completed'. However, I can go from the issue being created to 'Done'.
{
"transition": {
"name": "Done",
"id": "151"
}
}
As this does what I need, I will use it. If I find how to make ticket complete I will post back.
Also, I think the fact we customize our JIRA lead to my getting 'Completed' as a valid transition even though it wasn't.
Yes, you're right that the JSON is wrong, it's not even a valid json since the value is not a number, string, object, or array. The doc says:
The fields that can be set on transtion, in either the fields
parameter or the update parameter can be determined using the
/rest/api/2/issue/{issueIdOrKey}/transitions?expand=transitions.fields
resource.
So you need to do a get request on /rest/api/2/issue/{issueIdOrKey}/transitions?expand=transitions.fields to get the list of possible values and then set that in the json
{
"transition": {
"id" : "an_id_from_response"
}
}

Categories

Resources