I want to programmatically (w/ Python 3.10.5) query an ElasticSearch (v7.16.2) index and receive a list of fields identical to what Kibana would show under "Available Fields" for the index.
I've seen a few versions of this question asked. I think one answer involved an API call that is no longer available. I tried the "Field usage stats API" but there were always fields missing. Using the "Get field mapping API" returns everything in the mapping which is way too broad. A couple of answers have suggested that it isn't possible to do this, which makes me curious as to how Kibana knows which fields are available for each index.
These are the lines of code I used to pull the fields from each index.
field_usage_stats = es.indices.field_usage_stats(index=index) #too few fields returned
field_mapping = es.indices.get_mapping(index=index) #too many fields returned
I reviewed the raw data returned by each of these methods to ensure the issue wasn't an error in my parsing.
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/field-usage-stats.html
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/indices-get-mapping.html
Related
I am trying to get additional fields when making a call through Asana's Python API Tasks.find_by_project(). My code for the call is:
project_tasks = Tasks(self.client).find_by_project(project_gid, opt_fields= ["name", "memberships", "gid"])
And I am getting:
{'id': 408541814417314, 'gid': '408541814417314', 'memberships': [{}], 'name': 'Reports - Develop quality control report to run for MES'}
It seems like I can only access the fields that are populated by the compact task record, but I need additional fields and would like to get them without re-looping through all the tasks and getting the complete task. Oddly, it returns an empty list, but when I look at the full tasks record there are memberships for this task.
I saw this question, which seems to be similar but the given (attempted) solution doesn't work for me (I get no additional fields):
How can I access custom fields from Asana API using Python?
In case anyone else runs into this issue, I had tyo work with Asana to get this figured out. memberships isn't callable, you have to call Tasks(self.client).find_by_project(project_gid, opt_fields= ["name", "memberships.section", "gid"]) or Tasks(self.client).find_by_project(project_gid, opt_fields= ["name", "memberships.project", "gid"]) you can also apparently call opt_expand=['memberships'] to get all of the data.
from asana:
Thanks for your patience!
We heard back from our Platform Team regarding the issue. What you are
experiencing is currently expected behavior, but it is not intuitive,
because the membership object doesn't have any data of its own.
If you wanted to get the nested data, you can specify which data you
want opt_fields=['memberships.project', 'memberships.section'] in
their opt_fields request. Another option is to use
opt_expand=['memberships'] to get all of the data.
Hope this helps! Let me know if there's anything else I can assist you
with.
How do you find the database that a record was loaded from, utilizing Django's multiple database support?
I know how to retrieve a record from the non-default database by doing:
record = MyModel.objects.using('otherdatabase').get(id=123)
But given the record, how do I lookup the using value? I tried:
record._default_manager.db
but this always returns default, regardless of the value I sent to using().
_state seems to hold what you're looking for.
record._state.db
If you're interested, it's used internally in the source code here: https://docs.djangoproject.com/en/dev/_modules/django/db/models/base/
Let's take an example on which I run a blog that automatically updates its posts.
I would like to keep an entity of class(=model) BlogPost in two different "groups", one called "FutureBlogPosts" and one called "PastBlogPosts".
This is a reasonable division that will allow me to work with my blog posts efficiently (query them separately etc.).
Basically the problem is the "kind" of my model will always be "BlogPost". So how can I separate it into two different groups?
Here are the options I found so far:
Duplicating the same model class code twice (once FutureBlogPost class and once PastBlogPost class (so their kinds will be different)) -- seems quite ridiculous.
Putting them under different anchestors (FutureBlogPost, "SomeConstantValue", BlogPost, #id) but this method also has its implications (1 write per second?) and also the whole ancestor-child relationship doesn't seem fit here. (and why do I have to use "SomeConstantValue" if I choose that option?)
Using different namespaces -- seems too radical for such a simple separation
What is the right way to do it?
Well seems like I finally found the relevant article.
As I understand it, pulling all entities by a specific kind and pulling them by a specific property would make no difference, both will require the same type of work on the background.
(However, querying by a specific full-key, is still faster)
So basically adding a property named "Type" or any other property you want to use to split your specific entities into groups is just as useful as giving it a certain kind.
Read more here: https://developers.google.com/appengine/articles/storage_breakdown
As you see, both EntitiesByKind and EntitiesByProperty are nothing but index tables to the original key.
Finally, an answer.
Why not just put a boolean in your "BlogPost" Entity, 0 if it's past, 1 if it's future? will let you query them separately easily.
Is there a difference in the results I can expect from this code:
query = MyModel.all(keys_only=True).filter('myFlag', True)
keys = list(query)
models = db.get(keys)
versus this code:
query = MyModel.all().filter('myFlag', True)
models = list(query)
i.e, will models be the same in both?
If not, why not? I had thought that eventual consistency is used to describe how indices for models take a while to update and can therefore be inconsistent with the most recently written data.
But I recently experienced a case where I was actually getting stale data from a query like the second one, where model.myFlag was True for the models retrieved via query but False when I actually got the model via key.
So in that case, where is the data for myFlag coming from?
Is it that getting an entity via key ensures replication across the datastore nodes and returns the latest data, whereas getting it via query simply retrieves the data from the nearest datastore node?
Edit:
I read this article, and assuming the Cloud Datastore works the same way as the Appengine Datastore, the answer to my question is yes, entities returned from queries may have stale values.
https://cloud.google.com/developers/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore#h.tf76fya5nqk8
Yes, as you mentioned queries may return stale values. When doing a query, the datastore chooses performance over consistency.
More in-depth: For an entity group, each node has a log of writes which have not been applied yet. When you execute a read or an ancestor query, entity groups that are involved first have their logs applied. However when you execute a normal query the results could be from any entity group so the entity groups are not caught up. Be careful about using the first code example though, the indexes that are used to actually find those entities may not be up-to-date. So it is very possible to not get all entities with myFlag = True. If you are interested, I would recommend reading the Megastore paper.
I just started with Flask and SQLAlchemy in flask.
So I have a many-to-many relationship using the example here http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
If you scrolldown to the part about Keywords and tags this is what I am working on.
So far I am able to insert new Keywords related to my Post and I am using append. Which is wrong I know. So what happens is that the next time a non unique keyword occurs in a blog post it will throw an error about Conflict with Keyword (since keywords are supposed to be unique)
I know the right way is something else, I just don't know what. I have seen an example of
get_or_create(keyword) which basically filters by keyword and then adds it if not found. However I believe as data size grows this will also be wrong. (Several calls on every save with single insert). I love the way SQLAlchemy is doing multiple insert automatically. I wish to keep that but avoid this duplicate key issue.
Edit: found the solution, SQLAlchemy docs guide you towards error but the explanation is in there. I have added the answer.
Ok after hours of trial and error I found the solution, plus somethings I was doing wrong.
This is how SQL alchemy works. the answer is merge.
make a LIST of tags as Tag models, don;t matter if they exist as long as your primary key is name or something unique.
tags = [Tag('a1'),Tag('a2')]
Say you have Tag a1 already in DB but we don't really care. All we want is to insert if related data does not exist. WHich is what SQLalchemy does.
Now you make a Post with the LIST of ALL the tags we made. If its one only , it also is a list.
therefore
new_post = Post('a great new post',post_tags=tags)
db.session.merge(new_post)
db.session.commit()
I have used Flask syntax but the idea is same. Just make sure you are not creating the Model OUTSIDE the session. More likely, you wont do it.
This was actually simple but nowhere in the SQLAlchemy docs this example is mentioned. They use append() which is wrong. It's only to create new Tags knowing you are not making duplicates.
Hope it helps.