querying all fields in neo4j index using python-embedded bindings

querying all fields in neo4j index using python-embedded bindings - python

I'm trying to query a node index across all fields. This is what I thought would work:
idx = db.node.indexes.get('myindex')
idx.query('*:search_query')
But this returns no results. However, this works
idx = db.node.indexes.get('myindex')
idx.query('*:*')
And it returns all the nodes in the index as expected. Am I wrong in assuming that the first version should work at all?

I don't expect the first version to work, and am surprised the second does. Neo4j parses those queries using this Lucene syntax- I don't see anything about wildcard fields. Instead, remove the field to search against an implied "all fields".
Plug - for an easier way to build Lucene queries (compatible with Neo4j), check out lucene-querybuilder. It's used by neo4j-rest-client and neo4django.
EDIT:
I can't seem to find support for the "all fields" implicit search I thought existed- sorry! I guess you'll just have to manually include all fields in the query (eg, "name:falmarri OR userType:falmarri").

Related

How to get Document Name from DocumentReference in Firestore Python

I have a document reference that I am retreiving from a query on my Firestore database. I want to use the DocumentReference as a query parameter for another query. However, when I do that, it says
TypeError: sequence item 1: expected str instance, DocumentReference found
This makes sense, because I am trying to pass a DocumentReference in my update statement:
db.collection("Teams").document(team).update("Dictionary here") # team is a DocumentReference
Is there a way to get the document name from a DocumentReference? Now before you mark this as duplicate: I tried looking at the docs here, and the question here, although the docs were so confusing and the question had no answer.
Any help is appreciated, Thank You in advance!

Yes,split the .refPath. The document "name" is always the last element after the split; something like lodash _.last() can work, or any other technique that identifies the last element in the array.
Note, btw, the refPath is the full path to the document. This is extremely useful (as in: I use it a lot) when you find documents via collectionGroup() - it allows you to parse to find parent document(s)/collection(s) a particular document came from.
Also note: there is a pseudo-field __name__ available. (really an alias of documentID()). In spite of it's name(s), it returns the FULL PATH (i.e. refPath) to the document NOT the documentID by itself.

I think I figured out - by doing team.path.split("/")[1] I could get the document name. Although this might not work for all firestore databases (like subcollections) so if anyone has a better solution, please go ahead. Thanks!

Does gdata-python-client allow fulltext queries with multiple terms?

I'm attempting to search for contacts via the Google Contacts API, using multiple search terms. Searching by a single term works fine and returns contact(s):
query = gdata.contacts.client.ContactsQuery()
query.text_query = '1048'
feed = gd_client.GetContacts(q=query)
for entry in feed.entry:
# Do stuff
However, I would like to search by multiple terms:
query = gdata.contacts.client.ContactsQuery()
query.text_query = '1048 1049 1050'
feed = gd_client.GetContacts(q=query)
When I do this, no results are returned, and I've found so far that spaces are being replaced by + signs:
https://www.google.com/m8/feeds/contacts/default/full?q=3066+3068+3073+3074
I'm digging through the gdata-client-python code right now to find where it's building the query string, but wanted to toss the question out there as well.
According to the docs, both types of search are supported by the API, and I've seen some similar docs when searching through related APIs (Docs, Calendar, etc):
https://developers.google.com/google-apps/contacts/v3/reference#contacts-query-parameters-reference
Thanks!

Looks like I was mistaken in my understanding of the gdata query string functionality.
https://developers.google.com/gdata/docs/2.0/reference?hl=en#Queries
'The service returns all entries that match all of the search terms (like using AND between terms).'
Helps to read the docs and understand them!

Putting together Haystack records without templates

The haystack documentation (link below) makes this statement:
Additionally, we're providing use_template=True on the text field.
This allows us to use a data template (rather than error prone
concatenation) to build the document the search engine will use in
searching.
How would one go about using concatenation to build the document? I couldn't find an example.
It may have something to do with overriding the prepare method (second link). But in the example given in the documentation the prepare method is used together with a template, so the two might also be orthogonal.
https://github.com/toastdriven/django-haystack/blob/master/docs/tutorial.rst
http://django-haystack.readthedocs.org/en/latest/searchindex_api.html#advanced-data-preparation

You can see how it works in the Haystack source. Basically, the default implementation of the prepare method on SearchField (the base class for Haystack's fields) calls prepare_template if use_template is True.
If you don't want to use a template, you can indeed use concatenation - it's as simple as just joining the data you want together, separated by something (here I've used a newline):
def prepare_myfield(self, obj):
return self.cleaned_data['field1'] + '\n' + self.cleaned_data['field2']
etc.

Is it possible to use "find" method with "javascript" query with pymongo?

With Mongo, it is OK with the following:
> db.posts.find("this.text.indexOf('Hello') > 0")
But with pymongo, when executing the following:
for post in db.posts.find("this.text.indexOf('Hello') > 0"):
print post['text']
the error occurred.
I think Full Text Search in Mongo is better way in this example, but is it possible to use "find" method with "javascript" query with pymongo?

You are correct - you do this with server side javascript by using the $where clause[1]:
db.posts.find({"$where": "this.text.indexOf('Hello') > 0"})
Will work on all but sharded setups but the costs of doing are deemed prohibitive as you will be inspecting all documents in the collection, which is why generally its not considered a great idea.
You could also do a regular expression search:
db.posts.find({'text':{'$regex':'Hello'}})
This will also do a full collection scan as the regular expression isn't anchored (if you anchor a regular expression for example you're checking if a field begins with an value and have an index on that field you can utilise the index).
Given that those two approaches are expensive and won't perform or scale well then the best approach?
Well the full text search approach as described in the link you gave[2] works well. Create a _keywords field which stores the keywords as lowercase in an array, index that field then you can query like so:
db.posts.find({"_keywords": {"$in": "hello"});
That will scale and utilises an index so will be performant.
[1] http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-JavascriptExpressionsand%7B%7B%24where%7D%7D
[2] http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo

Parsing SQL with Python

I want to create a SQL interface on top of a non-relational data store. Non-relational data store, but it makes sense to access the data in a relational manner.
I am looking into using ANTLR to produce an AST that represents the SQL as a relational algebra expression. Then return data by evaluating/walking the tree.
I have never implemented a parser before, and I would therefore like some advice on how to best implement a SQL parser and evaluator.
Does the approach described above sound about right?
Are there other tools/libraries I should look into? Like PLY or Pyparsing.
Pointers to articles, books or source code that will help me is appreciated.
Update:
I implemented a simple SQL parser using pyparsing. Combined with Python code that implement the relational operations against my data store, this was fairly simple.
As I said in one of the comments, the point of the exercise was to make the data available to reporting engines. To do this, I probably will need to implement an ODBC driver. This is probably a lot of work.

I have looked into this issue quite extensively. Python-sqlparse is a non validating parser which is not really what you need. The examples in antlr need a lot of work to convert to a nice ast in python. The sql standard grammars are here, but it would be a full time job to convert them yourself and it is likely that you would only need a subset of them i.e no joins. You could try looking at the gadfly (a Python SQL database) as well, but I avoided it as they used their own parsing tool.
For my case, I only essentially needed a where clause. I tried booleneo (a boolean expression parser) written with pyparsing but ended up using pyparsing from scratch. The first link in the reddit post of Mark Rushakoff gives a SQL example using it. Whoosh a full text search engine also uses it but I have not looked at the source to see how.
Pyparsing is very easy to use and you can very easily customize it to not be exactly the same as SQL (most of the syntax you will not need). I did not like ply as it uses some magic using naming conventions.
In short give pyparsing a try, it will most likely be powerful enough to do what you need and the simple integration with python (with easy callbacks and error handling) will make the experience pretty painless.

This reddit post suggests python-sqlparse as an existing implementation, among a couple other links.

TwoLaid's Python SQL Parser works very well for my purposes. It's written in C and needs to be compiled. It is robust. It parses out individual elements of each clause.
https://github.com/TwoLaid/python-sqlparser
I'm using it to parse out queries column names to use in report headers. Here is an example.
import sqlparser
def get_query_columns(sql):
'''Return a list of column headers from given sqls select clause'''
columns = []
parser = sqlparser.Parser()
# Parser does not like new lines
sql2 = sql.replace('\n', ' ')
# Check for syntax errors
if parser.check_syntax(sql2) != 0:
raise Exception('get_query_columns: SQL invalid.')
stmt = parser.get_statement(0)
root = stmt.get_root()
qcolumns = root.__dict__['resultColumnList']
for qcolumn in qcolumns.list:
if qcolumn.aliasClause:
alias = qcolumn.aliasClause.get_text()
columns.append(alias)
else:
name = qcolumn.get_text()
name = name.split('.')[-1] # remove table alias
columns.append(name)
return columns
sql = '''
SELECT
a.a,
replace(coalesce(a.b, 'x'), 'x', 'y') as jim,
a.bla as sally -- some comment
FROM
table_a as a
WHERE
c > 20
'''
print get_query_columns(sql)
# output: ['a', 'jim', 'sally']

Of course, it may be best to leverage python-sqlparse on Google Code
UPDATE: Now I see that this has been suggested - I concur that this is worthwhile:

I am using python-sqlparse with great success.
In my case I am working with queries that are already validated, my AST-walking code can make some sane assumptions about the structure.
https://pypi.org/project/sqlparse/
https://sqlparse.readthedocs.io/en/latest/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

querying all fields in neo4j index using python-embedded bindings - python

Related

How to get Document Name from DocumentReference in Firestore Python

Does gdata-python-client allow fulltext queries with multiple terms?

Putting together Haystack records without templates

Is it possible to use "find" method with "javascript" query with pymongo?

Parsing SQL with Python

Categories

Resources