define parent with elasticsearch-py - python

I am trying to put parent/child relationships into ElasticSearch. I was able to put my parents into ES but when I try to add the child objects I get the following error
elasticsearch.exceptions.RequestsError: TransportError(400:, 'illegal_argument_exception', "Can't specify parent if no parent field has been configured")
I understand, that I have to tell ES that my parents are actually parents somewhere in mappings. But where and how?
This is how I do it right now:
# the parent
es.index(index='www01', doc_type='domain', id=k, body=json.dumps(rest))
# the child
es.index(index='www01', doc_type='framework', id=key, body=json.dumps(value), parent=k)
Edit:
Alright. I think I've got it now. You really have to type everything that deviates from the standard settings explicitly into a mapping and create an empty index with that mapping first.
In this mapping you have to say that your child type has a parent with a specific type. (The ElasticSearch documentation is pretty clear) RTFM
After you've created your specific mapping on an empty index, you can put your stuff in.
Note: You don't have to specify everything in your mapping, just everything that is not "standard". So at first it is enough to just give the relationships. If you later want to say that some fields are dates or geo-coordinates you can later specify that. But keep in mind, that you have to empty your index first. So save your data somewhere!
Here is what I did:
def del_all_indices():
print "----- deleting the following indices (not >>.kibana<<) -----"
for idx in es.indices.get('*'):
print idx
if not idx == ".kibana":
es.indices.delete(index=idx, ignore=[400, 404])
print "----- remaining indices -----"
for idx in es.indices.get('*'):
print idx
create_index_body = {
"mappings": {
"domain": {},
"framework": {
"_parent": {
"type": "domain"
}
}
}
}
del_all_indices()
es.indices.create(index='www01', body=create_index_body)
# the parent
es.index(index='www01', doc_type='domain', id=k, body=json.dumps(rest))
# the child
es.index(index='www01', doc_type='framework', id=key, body=json.dumps(value), parent=k)
It goes without saying, that if I'm missing something, please tell me. (I hope this helps, since there is almost no documentation on that topic.)

Related

What's the most Pythonic way to pull from structured data with an inconsistent maximum "depth"?

I've got a JSON file holding different dialog lines for a Discord bot to use, sorted by which bot command triggers the line. It looks something like this:
{
"!remind": {
"responses": {
"welcome": "Reminder set for {time}.",
"reminder": "At {time} you asked me to remind you {thing}."
},
"errors": {
"R0": "Invalid reminder time.",
"R1": "Reminder time is in the past."
},
"help": "To set a reminder say `!reminder [time] [thing]`"
},
"<!timezone, !spoilimage, !crosspost, etc.>": {
<same structure>
}
}
I have a function that's meant to access the values stored in the JSON file, do any necessary formatting using kwargs, and return a string. My original approach was
def dialog(command, category, name, **fmt):
json_data = <json stuff>
return json_data[command][category][name].format(**fmt)
# Sample call:
pastcommand = <magic>
reply = dialog("!remind", "response", "reminder", time=pastcommand.time, thing=pastcommand.message)
# Although in practice I've made wrapper methods to avoid having to specify all of these args each time
But this will only work for "responses" and "errors", not "help," since in "help" the message to send is a level "shallower".
Two other things to note:
It's unlikely there will ever need to be anything in "help" other than the single value.
Currently there are no name conflicts between keys in different subcategories, and it's very easy to keep it that way. However, "responses"/"errors"/"help" is consistent across all categories, and some key names are repeated across categories (although I could change that if necessary).
So, in terms of fixing this, I could always just restructure the JSON file, something like
"help": {"main": "To set a reminder say `!reminder [time] [thing]`"}
but I don't like the idea of turning a string into a dict containing just a single string, just to satisfy the constraints of a function that pulls it.
Beyond that, I've run through a number of options, namely: explicitly checking the category and making it a special case (if category == "help"); trying both options with a try/except block, and using pandas.json_normalize (which I'm pretty sure would work? I haven't actually worked with it. Either way, any time a seemingly simple problem brings me to a third-party library, it makes me suspect I'm doing something wrong.).
What I've settled on, so far, is this:
def dialog(*json_keys, **fmt):
json_data = <json stuff>
current_level = json_data
for key in json_keys:
# Let's pretend I did error-handling here.
current_level = current_level[key]
return current_level.format(**fmt)
It's a lot more elegant and more flexible than any of the other things I considered, but I'm self-taught and pretty inexperienced, and I'm wondering if I'm overlooking some better approach.

Entity Framework: Update multiple objects only if all objects exist

I'm writing a PATCH REST API in C# using Entity Framework that updates a specific field in multiple objects.
Request body:
{
"ids": [
"id1",
"id2"
],
"foo": "bar"
}
I would like to update all objects' foo field to be bar, but only if all objects exist.
I'm trying to keep it clean by not having a preemptive select that checks whether all objects exist (which BTW might not be good enough because if an object exist now it doesn't mean it will still exist few milliseconds later).
I'm looking for a short solution that would rollback and raise an exception if one of the objects didn't successfully update (or doesn't exist).
The only solution I found is to open a transaction and update each object in a loop, which IMHO isn't the best way because I don't want to access the database each row at a time.
What's the best way to implement this?
The DbContext.SaveChanges method returns the number of entries written to.
In case of an update, it will return the number of updated rows.
So what you want to do is:
Start a new transaction
Execute a single update query for all you IDs together
Check the return value of SaveChanges, and Commit if it matches the number of IDs in your query, or Abort otherwise.
The best thing I can come up with is the following:
var ids = new List<int>(){1,2,3,4,5,6,7};
var records = db.Records.Where(x=> ids.Contains(x.Id));
try
{
foreach(var i in ids)
{
var record = records.FirstOrDefault(x=>x.Id == i);
if(record == null)
{
throw new Exception($"Record with Id {i} not found");
}
record.Foo = "Bar";
}
db.saveChanges();
}
catch(Exception ex)
{
//roll back changes
var changedEntries = db.ChangeTracker.Entries()
.Where(x => x.State != EntityState.Unchanged).ToList();
foreach(var entry in changedEntries)
{
db.Entry(entry).State = EntityState.Unchanged;
}
}
The reasoning here is that EF implicitly uses a transaction, which is "committed" when you call .SaveChanges(). If something goes wrong, you simply reset the entities' state to Unchanged and never call SaveChanges().

Repeating an extra when using a Dragonfly CompoundRule

Using dragonfly2, the voice command framework, you can make a grammar like so:
chrome_rules = MappingRule(
name='chrome',
mapping={
'down [<n>]': actions.Key('space:%(n)d'),
},
extras=[
IntegerRef("n", 1, 100)
],
defaults={
"n": 1
}
)
This lets me press space n times, where n is some integer. But what do I do if I want to use the same variable (n), multiple times in the same grammar? If I repeat it in the grammar, e.g. 'down <n> <n>' and then say something like "down three four", Dragonfly will parse it correctly, but it will only execute the actions.Key('space:%(n)d') with n=3, using the first value of n. How can I get it to execute it 3 times, and then 4 times using the same variable?
Ideally I don't want to have to duplicate the variable n, in the extras and defaults, because that seems like redundant code.
TL;DR: Your MappingRule passes data to your Action (e.g. Key, Text) in the form of a dictionary, so it can only pass one value per extra. Your best bet right now is probably to create multiple extras.
This is a side-effect of the way dragonfly parses recognitions. I'll explain it first with Action objects, then we can break down why this happens at the Rule level.
When Dragonfly receives a recognition, it has to deconstruct it and extract any extras that occurred. The speech recognition engine itself has no trouble with multiple occurrances of the same extra, and it does pass that data to dragonfly, but dragonfly loses that information.
All Action objects are derived from ActionBase, and this is the method dragonfly calls when it wants to execute an Action:
def execute(self, data=None):
self._log_exec.debug("Executing action: %s (%s)" % (self, data))
try:
if self._execute(data) == False:
raise ActionError(str(self))
except ActionError as e:
self._log_exec.error("Execution failed: %s" % e)
return False
return True
This is how Text works, same with Key. It's not documented here, but data is a dictionary of extras mapped to values. For example:
{
"n": "3",
"text": "some recognized dictation",
}
See the issue? That means we can only communicate a single value per extra. Even if we combine multiple actions, we have the same problem. For example:
{
"down <n> <n>": Key("%(n)d") + Text("%(n)d"),
}
Under the hood, these two actions are combined into an ActionSeries object - a single action. It exposes the same execute interface. One series of actions, one data dict.
Note that this doesn't happen with compound rules, even if each underlying rule shares an extra with the same name. That's because data is decoded & passed per-rule. Each rule passes a different data dict to the Action it wishes to execute.
If you're curious where we lose the second extra, we can navigate up the call chain.
Each rule has a process_recognition method. This is the method that's called when a recognition occurs. It takes the current rule's node and processes it. This node might be a tree of rules, or it could be something lower-level, like an Action. Let's look at the implementation in MappingRule:
def process_recognition(self, node):
"""
Process a recognition of this rule.
This method is called by the containing Grammar when this
rule is recognized. This method collects information about
the recognition and then calls *self._process_recognition*.
- *node* -- The root node of the recognition parse tree.
"""
# Prepare *extras* dict for passing to _process_recognition().
extras = {
"_grammar": self.grammar,
"_rule": self,
"_node": node,
}
extras.update(self._defaults)
for name, element in self._extras.items():
extra_node = node.get_child_by_name(name, shallow=True)
if extra_node:
extras[name] = extra_node.value()
elif element.has_default():
extras[name] = element.default
# Call the method to do the actual processing.
self._process_recognition(node, extras)
I'm going to skip some complexity - the extras variable you see here is an early form of the data dictionary. See where we lose the value?
extra_node = node.get_child_by_name(name, shallow=True)
Which looks like:
def get_child_by_name(self, name, shallow=False):
"""Get one node below this node with the given name."""
for child in self.children:
if child.name:
if child.name == name:
return child
if shallow:
# If shallow, don't look past named children.
continue
match = child.get_child_by_name(name, shallow)
if match:
return match
return None
So, you see the issue. Dragonfly tries to extract one value for each extra, and it gets the first one. Then, it stuffs that value into a dictionary and passes it down to Action. Additional occurrences are lost.

py2neo cypher create several relations to central node in for loop

just starting out with neo4j, py2neo and Cypher.
I have encountered the following problem and google and my knowledge of what to ask have not yet given me an answer or a helpful hint in the right direction. Anyway:
Problem:
I don't know how to, in python/py2neo, create relations between a unique starting node and a number of following nodes that I create dynamically in a for loop.
Background:
I have a json object which defines a person object, who will have an id, and several properties, such as favourite colour, favourite food etc.
So at the start of my py2neo script I define my person. After this I loop through my json for every property this person has.
This works fine, and with no relations I end up with a neo4j chart with several nodes with the right parameters.
If I'm understanding the docs right I have to make a match to find my newly created person, for each new property I want to link. This seems absurd to me as I just created this person and still have the reference to the person object in memory. But for me it is unclear on how to actually write the code for creating the relation. Also, as a relative newbie in both python and Cypher, best practices are still an unknown to me.
What I understand is I can use py2neo
graph = Graph(http://...)
tx = graph.begin()
p = Node("Person", id)
tx.create(p)
and then I can reference p later on. But for my properties, of which there can be many, I create a string in python like so (pseudocode here, I have a nice oneliner for this that fits my actual case with lambda, join, map, format and so on)
for param in params:
par = "MERGE (par:" + param + ... )
tx.append(par)
tx.process()
tx.commit()
How do I create a relation "likes" back to the person for each and every par in the for loop?
Or do I need to rethink my whole solution?
Help?! :-)
//Jonas
Considering you've created a node Alice and you want to create the other as dynamic, I'll suggest while dynamically parsing through the nodes, store it everytime (in the loop) in a variable, make a node out of it and then implement in Relationship Syntax. The syntax is
Relationship(Node_1, Relation, Node_2)
Now key thing to know here is type(Node_1) and type(Node_2) both will be Node.
I've stored all the nodes (only their names) from json in a list named nodes.
Since you mentioned you only have reference to Alice
a = ("Person", name:"Alice")
for node in nodes: (excluding Alice)
= Node(, name:"")
= Relationship(a, ,
Make sure to iterate variable name, else it'll keep overwriting.

Using google.protobuf.Any in python file

I have such .proto file
syntax = "proto3";
import "google/protobuf/any.proto";
message Request {
google.protobuf.Any request_parameters = 1;
}
How can I create Request object and populate its fields? I tried this:
import ma_pb2
from google.protobuf.any_pb2 import Any
parameters = {"a": 1, "b": 2}
Request = ma_pb2.Request()
some_any = Any()
some_any.CopyFrom(parameters)
Request.request_parameters = some_any
But I have an error:
TypeError: Parameter to CopyFrom() must be instance of same class: expected google.protobuf.Any got dict.
UPDATE
Following prompts of #Kevin I added new message to .proto file:
message Small {
string a = 1;
}
Now code looks like this:
Request = ma_pb2.Request()
small = ma_pb2.Small()
small.a = "1"
some_any = Any()
some_any.Pack(small)
Request.request_parameters = small
But at the last assignment I have an error:
Request.request_parameters = small
AttributeError: Assignment not allowed to field "request_parameters" in protocol message object.
What did I do wrong?
Any is not a magic box for storing arbitrary keys and values. The purpose of Any is to denote "any" message type, in cases where you might not know which message you want to use until runtime. But at runtime, you still need to have some specific message in mind. You can then use the .Pack() and .Unpack() methods to convert that message into an Any, and at that point you would do something like Request.request_parameters.CopyFrom(some_any).
So, if you want to store this specific dictionary:
{"a": 1, "b": 2}
...you'll need a .proto file which describes some message type that has integer fields named a and b. Personally, I'd see that as overkill; just throw your a and b fields directly into the Request message, unless you have a good reason for separating them out. If you "forget" one of these keys, you can always add it later, so don't worry too much about completeness.
If you really want a "magic box for storing arbitrary keys and values" rather than what I described above, you could use a Map instead of Any. This has the advantage of not requiring you to declare all of your keys upfront, in cases where the set of keys might include arbitrary strings (for example, HTTP headers). It has the disadvantage of being harder to lint or type-check (especially in statically-typed languages), because you can misspell a string more easily than an attribute. As shown in the linked resource, Maps are basically syntactic sugar for a repeated field like the following (that is, the on-wire representation is exactly the same as what you'd get from doing this, so it's backwards compatible to clients which don't support Maps):
message MapFieldEntry {
key_type key = 1;
value_type value = 2;
}
repeated MapFieldEntry map_field = N;

Categories

Resources