pyravendb query parameters parsing error - python

I've noticed a weird parsing problem with ravendb's python client.
when i use this query
query_result = list(session.query().where_equals("url",url).select("Id","htmlCode","url"))
knowing that url = "http://www.mywebsite.net/"
The relevent part of the error stack is the following :
File "/usr/local/lib/python3.5/dist-packages/pyravendb/store/session_query.py", line 71, in __iter__
return self._execute_query().__iter__()
File "/usr/local/lib/python3.5/dist-packages/pyravendb/store/session_query.py", line 307, in _execute_query
includes=self.includes)
File "/usr/local/lib/python3.5/dist-packages/pyravendb/d_commands/database_commands.py", line 286, in query
raise exceptions.ErrorResponseException(response["Error"][:100])
pyravendb.custom_exceptions.exceptions.ErrorResponseException: Lucene.Net.QueryParsers.ParseException: Could not parse: 'url:http://www.mywebsite.net/' --->
BUT if I simply add a simple ' ' to the url parameter in the query, it works without any parsing error (but dosent returns a result though since syntax isnt the same).
I would like to contribute to the pyravendb on github but I'm not sure where it's parsing the parameters, it's probably calling lucene for that.
Any idea why a simple space can prevent proper parsing ?

The query you send to lucene is this url:http://www.mywebsite.net/
lucene key will be the url and the value suppose to be http://www.mywebsite.net/
because you have : in http://www.mywebsite.net/ the lucene parser get "confused" and raise a parsing error.(split key,value special character is :)
To fix your problem you need to escape the : in your url parameter and then give it to the query so your url parameter should look like this:
http\://www.mywebsite.net/
For your question why simple space can prevent proper parsing is because space in lucene indicates about another parameter to look for. (you can see what query we build when you using the where_in method)
This issue will be fixed in the next version of pyravendb (current version is 1.3.1.1)

Related

How to search exact index (not index pattern) in elasticsearch eland - updated?

I am calling elasticsearch data using eland. The documentation is simple and I am able to implement it, but when searching the index it searches the index string using es_index_pattern which is basically a wildcard (it is also stated in the documentation).
from elasticsearch import ElasticSearch
import eland as ed
es = Elasticsearch(hosts="myhost", "port":0000)
search_body={
"bool":{
"filter":[
{"exists": {"field": "customer_name"}},
{"match_phrase": {"city": "chicago"}},
]
}
}
# Success : I am able to get the results if I search the index through "elasticsearch" api. Tried this repetitively and it works every time
results = es.search(index="my_index", body=search_body)
# Failure : But, I do not get results (but ReadTimeoutError) if I connect to 'my_index' index via the same localhost Elasticsearch using Eland
df = ed.DataFrame(es_client=es, es_index_pattern = 'my_index')
I have to hand type the error message becasue I cannot copy the error outside the environment I am using. Also, my host and port would be different
...
File ".../elasticsearch/transport.py", line 458, in perform_request
raise e
File "......elasticsearch/transport.py", line 419, in perform_request
File "..... /elasticsearch/connection/http_urllib3.py", line 275, in perform_request
raise ConnectionTimeout("TIMEOUT", str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnctionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host=myhost', port=0000): Read timed out. (read timeout=10)
I think that search through elasticsearch is able to get results bc it's calling the exact index name and hence not running into timedout.
But, Eland is rather using es_index_pattern thereby using my_index as wildcard i.e *my_index*, therefore I must be running into ReadTimeOutError.
I looked inside the source code to see if there was anything I could do, so Eland did not search the index as a pattern but exact match. But, I see no option for searching the exact index both in the documentation and the source code.
How do I search for exact index string in Eland?
Sources:
https://www.elastic.co/guide/en/elasticsearch/client/eland/current/overview.html
https://www.elastic.co/guide/en/elasticsearch/client/eland/current/overview.html
https://github.com/elastic/eland/blob/main/eland/ndframe.py
https://github.com/elastic/eland/blob/main/eland/dataframe.py
Also posted this on Github but I'll replicate here:
Searching an exact index only requires passing the exact index name, no wildcards are used:
import eland as ed
from elasticsearch import Elasticsearch
client = Elasticsearch(...)
client.index(index="test", document={"should": "seethis"})
client.index(index="test1", document={"should": "notseethis"})
client.index(index="1test", document={"should": "notseethis"})
client.indices.refresh(index="*test*")
df = ed.DataFrame(client, es_index_pattern="test")
print(df.to_pandas())
The output of the above is this as expected:
should
SNTTnH4BRC8cqQQMds-V seethis
The pattern word in the option doesn't mean we're using wildcards, it's the pattern that we're sending to Elasticsearch in the search and index APIs.

Escaping characters for instance query matching in webpy

(The title may be in error here, but I believe that the problem is related to escaping characters)
I'm using webpy to create a VERY simple todo list using peewee with Sqlite to store simple, user submitted todo list items, such as "do my taxes" or "don't forget to interact with people", etc.
What I've noticed is that the DELETE request fails on certain inputs that contain specific symbols. For example, while I can add the following entries to my Sqlite database that contains all the user input, I cannot DELETE them:
what?
test#
test & test
This is a test?
Any other user input with any other symbols I'm able to DELETE with no issues. Here's the webpy error message I get in the browser when I try to DELETE the inputs list above:
<class 'peewee.UserInfoDoesNotExist'> at /del/test
Instance matching query does not exist: SQL: SELECT "t1"."id", "t1"."title" FROM "userinfo" AS t1 WHERE ("t1"."title" = ?) PARAMS: [u'test']
Python /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/peewee.py in get, line 2598
Web POST http://0.0.0.0:7700/del/test
When I view the database file (called todoUserList.db) in sqlitebrowser, I can see that these entries do exist with the symbols, they're all there.
In my main webpy app script, I'm using a regex to search through the db to make a DELETE request, it looks like this:
urls = (
'/', 'Index',
'/del/(.*?)', 'Delete'
)
I've tried variations of the regex, such as '/del/(.*)', but still get the same error, so I don't think that's the problem.
Given the error message above, is webpy not "seeing" certain symbols in the user input because they're not being escaped properly?
Confused as to why it seems to only happen with the select symbols listed above.
Depending on how the URL escaping is functioning it could be an issue in particular with how "?" and "&" are interpreted by the browser (in a typical GET style request & and ? are special character used to separate query string parameters)
Instead of passing those in as part of the URL itself you should pass them in as an escaped querystring. As far as I know, no web server is going to respect wacky values like that as part of a URL. If they are escaped and put in the querystring (or POST body) you'll be fine, though.

Strange urllib2.urlopen() error with variable vs string

I am having some strange behavior while using urllib2 to open a URL and download a video.
I am trying to open a video resource and here is an example link:
https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361
I have the following code:
mp4_url = ''
#response_body is a json response that I get the mp4_url from
if response_body['outputs'][0]['label'] == 'mp4':
mp4_url = response_body['outputs'][0]['url']
if mp4_url:
logging.info('this is the mp4_url')
logging.info(mp4_url)
#if I add the line directly below this then it works just fine
mp4_url = 'https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361'
mp4_video = urllib2.urlopen(mp4_url)
logging.info('succesfully opened the url')
The code works when I add the designated line but it gives me a HTTP Error 403: Forbidden message when I don't which makes me think it is messing up the mp4_url somehow. But the confusing part is that when I check the logging line for mp4_url it is exactly what I hardcoded in there. What could the difference be? Are there some characters in there that may be disrupting it? I have tried converting it to a string by doing:
mp4_video = urllib2.urlopen(str(mp4_url))
But that didn't do anything. Any ideas?
UPDATE:
With the suggestion to use print repr(mp4_url) it is giving me:
u'https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361'
And I suppose the difference is what is causing the error but what would be the best way to parse this?
UPDATE II:
It ended up that I did need to cast it to a string but also the source that I was getting the link (an encoded video) needed nearly a 60 second delay before it could serve that URL so that is why it kept working when I hardcoded it because it had that delay. Thanks for the help!
It would be better to simply dump the response obtained. This way you would be able to check what response_body['outputs'][0]['label'] evaluates to. In you case, you are initializing mp4_url to ''. This is not the same as None and hence the condition if mp4_url: will always be true.
You may want to check that the initial if statement where you check that response_body['outputs'][0]['label'] is correct.

Catch full URL in python

I'm developing application using Bottle. How do I get full query string when I get a GET Request.
I dont want to catch using individual parameters like:
param_a = request.GET.get("a","")
as I dont want to fix number of parameters in the URL.
How to get full query string of requested url
You can use the attribute request.query_string to get the whole query string.
Use request.query or request.query.getall(key) if you have more than one value for a single key.
For eg., request.query.a will return you the param_a you wanted. request.query.b will return the parameter for b and so on.
If you only want the query string alone, you can use #halex's answer.

AssertionError - Selenium/Python

I am creating a Python script with Selenium. I want to run a specific test that checks the default text of a textbox when the page loads up. Below is my code.......
try:
self.assertEqual("Search by template name or category..", sel.get_text("//table[#id='pluginToolbarButton_forms']/tbody/tr[2]/td[2]/em"))
logging.info(' PASS: text box text is correct')
except Exception:
logging.exception(' FAIL: text box text is incorrect')
Here is my error......
self.assertEqual("Search by template name or category..", sel.get_text("//table[#id='pluginToolbarButton_forms']/tbody/tr[2]/td[2]/em"))
File "C:\Python27\lib\unittest\case.py", line 509, in assertEqual
assertion_func(first, second, msg=msg)
File "C:\Python27\lib\unittest\case.py", line 502, in _baseAssertEqual
raise self.failureException(msg)
AssertionError: 'Search by template name or category..' != u'Submitter Requests'
Am I using the wrong function?
Your AssertionError states that the assertion you tried (that's the self.assertEqual(...) in your first code example) failed:
AssertionError: 'Search by template name or category..' != u'Submitter Requests'
This assertion explains that the string 'Search by template name or category' is different from 'Submitter Requests', which is correct ... the strings are, in fact, different.
I would check your second parameter to self.assertEqual and make sure that you're selecting the right feature.
This looks like you are using the right function, but perhaps you are not running your tests in the correct fashion.
The problem seems to be that you are not selecting the right element to compare with. You are basically telling the program to match that "Search by template name or category.." is equal to the contents of whatever is in:
//table[#id='pluginToolbarButton_forms']/tbody/tr[2]/td[2]/em
Apparently, the contents are "Submitter Requests", i.e not what you would expect, so the test fails (as it should). You might not be selecting the right element with that XPath query. Maybe a CSS query would be best. You can read about element selectors in the Selenium documentation.
Keep an eye open for a pitfall as well: the text returned by Selenium is a Unicode object, and you are comparing it against a string. This might not work as expected on special characters.

Categories

Resources