Mongoengine - using icontains with all - python

I have seen this question but it does not answer my question, or even pose it very well.
I think that this is best explained with an example:
class Blah(Document):
someList = ListField(StringField())
Blah.drop_collection()
Blah(someList=['lop', 'glob', 'hat']).save()
Blah(someList=['hello', 'kitty']).save()
# One of these should match the first entry
print(Blah.objects(someList__icontains__all=['Lo']).count())
print(Blah.objects(someList__all__icontains=['Lo']).count())
I assumed that this would print either 1, 0 or 0, 1 (or miraculously 1, 1) but instead it gives
0
Traceback (most recent call last):
File "metst.py", line 14, in <module>
print(Blah.objects(someList__all__icontains=['lO']).count())
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 1034, in count
return self._cursor.count(with_limit_and_skip=True)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 608, in _cursor
self._cursor_obj = self._collection.find(self._query,
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 390, in _query
self._mongo_query = self._query_obj.to_query(self._document)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 213, in to_query
query = query.accept(QueryCompilerVisitor(document))
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 278, in accept
return visitor.visit_query(self)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 170, in visit_query
return QuerySet._transform_query(self.document, **query.query)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/queryset.py", line 755, in _transform_query
value = field.prepare_query_value(op, value)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/fields.py", line 594, in prepare_query_value
return self.field.prepare_query_value(op, value)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/site-packages/mongoengine/fields.py", line 95, in prepare_query_value
value = re.escape(value)
File "/home/blah/.pythonbrew/pythons/Python-3.1.4/lib/python3.1/re.py", line 246, in escape
return bytes(s)
TypeError: 'str' object cannot be interpreted as an integer
Neither query works!
Does MongoEngine support some way to search using icontains and all? Or some way to get around this?
Note: I want to use MongoEngine, not PyMongo.
Edit: The same issue exists with Python 2.7.3.

The only way to do this, as of now(version 0.8.0) is by using a __raw__ query, possibly combined with re.compile(). Like so:
import re
input_list = ['Lo']
converted_list = [re.compile(q, re.I) for q in input_list]
print(Blah.objects(__raw__={"someList": {"$all": converted_list}}).count())
There is currently no way in mongoengine to combine all and icontains, and the only operator that can be used with other operators is not. This is subtly mentioned in the docs, as in it says that:
not – negate a standard check, may be used before other operators (e.g. Q(age_not_mod=5))
emphasis mine
But it does not say that you can not do this with other operators, which is actually the case.
You can confirm this behavior by looking at the source:
version 0.8.0+ (in module - mongoengine/queryset/transform.py - lines 42-48):
if parts[-1] in MATCH_OPERATORS:
op = parts.pop()
negate = False
if parts[-1] == 'not':
parts.pop()
negate = True
In older versions the above lines can be seen in mongoengine/queryset.py within the _transform_query method.

Related

Columnwise combining of three strings throws 'TypeError: '<' not supported between instances of 'str' and 'int'

I am puzzled and have no idea what is happening. My script contains the following line. It is used to combine contents of three columns of a dataframe into one of them (only for rows that fulfill the specified condition):
share_data_sm[yr]['MMR']= np.where((share_data_sm[yr]['MC']!='MA') & (share_data_sm[yr]['MC']!=' ') & (share_data_sm[yr]['MY']!=' '), share_data_sm[yr]['MC'].astype(str) + share_data_sm[yr]['N'].astype(str) + share_data_sm[yr]['MFR'].astype(str), share_data_sm[yr]['MFR'])
'share_data_sm' is a dictionary of dataframes - one table per 'yr'. What puzzles me the most, is that the error is thrown only for one particular value of 'yr' (command is a part of a loop that goes over several values of 'yr' and except for this one particular value (2021) the script runs smoothly). I though maybe there are some peculiarities in the data contents of the 2021 dataframe, but nothing exceptional there everything is exactly as the others. The following is traceback from console:
Traceback (most recent call last):
File "…ipykernel_1380/3858926177.py", line 1, in <module>
runfile('…work_folder/Groups/Structure/shareholding.py', wdir='…_work_folder/Groups/Structure')
File "…pydevd\_pydev_bundle\pydev_umd.py", line 167, in runfile
execfile(filename, namespace)
File "…pydevd\_pydev_imps\_pydev_execfile.py", line 25, in execfile
exec(compile(contents + "\n", file, 'exec'), glob, loc)
File "…work_folder/Groups/Structure/shareholding.py", line 281, in <module>
share_data_sm[yr]['MMR']= np.where((share_data_sm[yr]['MC']!='MA') & (share_data_sm[yr]['MC']!=' ') & (share_data_sm[yr]['MY']!=' '), share_data_sm[yr]['MC'].astype(str) + share_data_sm[yr]['N'].astype(str) + share_data_sm[yr]['MFR'].astype(str), share_data_sm[yr]['MFR'])
File "…pandas\core\ops\common.py", line 69, in new_method
return method(self, other)
File "…pandas\core\arraylike.py", line 96, in __radd__
return self._arith_method(other, roperator.radd)
File "…pandas\core\frame.py", line 6864, in _arith_method
self, other = ops.align_method_FRAME(self, other, axis, flex=True, level=None)
File "…pandas\core\ops\__init__.py", line 306, in align_method_FRAME
left, right = left.align(
File "…pandas\core\frame.py", line 4677, in align
return super().align(
File "…pandas\core\generic.py", line 8591, in align
return self._align_series(
File "…pandas\core\generic.py", line 8708, in _align_series
join_index, lidx, ridx = join_index.join(
File "…pandas\core\indexes\base.py", line 207, in join
join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
File "…pandas\core\indexes\base.py", line 3987, in join
return this.join(other, how=how, return_indexers=True)
File "…pandas\core\indexes\base.py", line 207, in join
join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
File "…pandas\core\indexes\base.py", line 3995, in join
return self._join_monotonic(other, how=how)
File "…pandas\core\indexes\base.py", line 4327, in _join_monotonic
join_array, lidx, ridx = self._outer_indexer(other)
File "…pandas\core\indexes\base.py", line 345, in _outer_indexer
joined_ndarray, lidx, ridx = libjoin.outer_join_indexer(sv, ov)
File "…pandas\_libs\join.pyx", line 562, in pandas._libs.join.outer_join_indexer
TypeError: '<' not supported between instances of 'str' and 'int'
I'll appreciate any help - how can I overcome the problem?
I think I see it.
The code can be reformatted as:
condition = \
(share_data_sm[yr]['MC']!='MA') & \
(share_data_sm[yr]['MC']!=' ') & \
(share_data_sm[yr]['MY']!=' ')
val_if_true = share_data_sm[yr]['MC'].astype(str) + share_data_sm[yr]['N'].astype(str) + share_data_sm[yr]['MFR'].astype(str)
val_if_false = share_data_sm[yr]['MFR']
share_data_sm[yr]['MMR'] = np.where(condition, val_if_true, val_if_false)
Now you can see that the value types of val_if_true and val_if_false are different - in the first case, you add together 3 str values. In the second, you're keeping the datatype of share_data_sm[yr]['MFR'].
I bet it complains when you try to add both types into the same array.
The traceback says the error is in the complicated
np.where((share_data_sm[yr]['MC']!='MA') & (share_data_sm[yr]['MC']!=' ') & (share_data_sm[yr]['MY']!=' '), share_data_sm[yr]['MC'].astype(str) + share_data_sm[yr]['N'].astype(str) + share_data_sm[yr]['MFR'].astype(str), share_data_sm[yr]['MFR'])
But keep in mind that it has to evaluate each of the 3 arguments before passing them to where.
(share_data_sm[yr]['MC']!='MA') & (share_data_sm[yr]['MC']!=' ') & (share_data_sm[yr]['MY']!=' ')
share_data_sm[yr]['MC'].astype(str) + share_data_sm[yr]['N'].astype(str) + share_data_sm[yr]['MFR'].astype(str)
share_data_sm[yr]['MFR']
It's a little hard to read the traceback, but the str+ error suggests it in the middle argument. But you are adding string values.
But I am seeing this.join and indices which suggests that it's trying to line up the indices of the series. So frame indices may be mostly strings, with an oddball numeric index. But this is just a guess; I'm not a pandas expert.
I'd suggest evaluating those 3 arguments before hand, before using them in the where to better isolate the problem. Expressions that extend over many lines are hard to debug.

How to use a list in word2vec.similarity

I have a word2vec model using pre-trained GoogleNews-vectors-negative300.bin. The model works fine and I can get the similarities between the two words. For example:
word2vec.similarity('culture','friendship')
0.2732939
Now, I want to use list elements instead of the words. For example, suppose that I have a list which its name is "tag". and the first two elements in the first row are culture and friendship. So, tag[0,0]= culture, and tag[0,1]=friendship.
I use the following code which gives me an error:
word2vec.similarity(tag[0,0],tag[0,1])
the "tag" list is a numpy.ndarray
the error is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\s\AppData\Local\Programs\Python6436\Python36\lib\site-packages\gensim\models\keyedvectors.py", line 992, in similarity
return dot(matutils.unitvec(self[w1]), matutils.unitvec(self[w2]))
File "C:\Users\s\AppData\Local\Programs\Python6436\Python36\lib\site-packages\gensim\models\keyedvectors.py", line 337, in __getitem__
return self.get_vector(entities)
File "C:\Users\s\AppData\Local\Programs\Python6436\Python36\lib\site-packages\gensim\models\keyedvectors.py", line 455, in get_vector
return self.word_vec(word)
File "C:\Users\s\AppData\Local\Programs\Python6436\Python36\lib\site-packages\gensim\models\keyedvectors.py", line 452, in word_vec
raise KeyError("word '%s' not in vocabulary" % word)
KeyError: "word ' friendship' not in vocabulary"
I think there are some leading spaces in your word ' friendship'.
Could you try this:
word2vec.similarity(tag[0,0].strip(),tag[0,1].strip())
If tag is,according to your question,is a python-list.Then the problem is you can not index a list with a tuple.
If your list is like [["culture","friendship"],[...]...]
Then your should write word2vec.similarity(tag[0][0],tag[0][1])

how to replace +xx in pandas str replace

I'm using Python 2.7.12 and pandas 0.20.3, I have a data frame like below, I want to replace column called number, this column dtype is object, when I try to replace +91 in that column I'm getting error like below,
number
0 +9185600XXXXX
1 +9199651XXXXX
2 99211XXXXX
3 99341XXXXX
4 +9199651XXXXX
sre_constants.error: nothing to repeat
full trace,
Traceback (most recent call last):
File "encoder.py", line 21, in
df['number']=df['number'].str.replace('+91','')
File "/home/hduser/.local/lib/python2.7/site-packages/pandas/core/strings.py", line 1574, in replace
flags=flags)
File "/home/hduser/.local/lib/python2.7/site-packages/pandas/core/strings.py", line 424, in str_replace
regex = re.compile(pat, flags=flags)
File "/usr/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
But when I replace 91 it works as I expected, It's not working when I put + in prefix,
Please help me to solve this problem.
Error Occurs at,
df['number']=df['number'].str.replace('+91','')
You can escape special regex value + (one or more repetitions) like:
df['number'] = df['number'].str.replace('\+91','')
Or use parameter regex=False:
df['number'] = df['number'].str.replace('+91','', regex=False)
import pandas as pd
data={'number':['+9185600XXXXX','+9199651XXXXX']}
f=pd.DataFrame(data)
f['number']=f['number'].str.replace('\+91','')
print(f)

convert date from numpyarray into datetime type -> getting mystic error

I load a file f with the numpy.loadtxt function and wanted to extract some dates.
The date has a format like this: 15.08. - 21.08.2011
numpyarr = loadtxt(f, dtype=str, delimiter=";", skiprows = 1)
alldate = numpyarr[:,0][0]
dat = datetime.datetime.strptime(alldate,"%d.%m. - %d.%m.%Y")
And here is the whole error:
Traceback (most recent call last):
File "C:\PYTHON\Test DATETIME_2.py", line 52, in <module>
dat = datetime.datetime.strptime(alldate,"%d.%m. - %d.%m.%Y")
File "C:\Python27\lib\_strptime.py", line 308, in _strptime
format_regex = _TimeRE_cache.compile(format)
File "C:\Python27\lib\_strptime.py", line 265, in compile
return re_compile(self.pattern(format), IGNORECASE)
File "C:\Python27\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: redefinition of group name 'd' as group 3; was group 1
Does somebody have an idea was going on?
A datetime holds a single date & time, while your field contains two dates and trys to extract them into a single variable. Specifically, the error you're getting is because you've used %d and %m twice.
You can try something along the lines of:
a = datetime.datetime.strptime(alldate.split('-')[0],"%d.%m. ")
b = datetime.datetime.strptime(alldate.split('-')[1]," %d.%m.%Y")
a = datetime.datetime(b.year, a.month, a.day)
(it's not the best code, but it demonstrates the fact that there are two dates in two different formats hiding in your string).

Weird TypeError from json.dumps

In python 3.4.0, using json.dumps() throws me a TypeError in one case but works like a charm in other case (which I think is equivalent to the first one).
I have a dict where keys are strings and values are numbers and other dicts (i.e. something like {'x': 1.234, 'y': -5.678, 'z': {'a': 4, 'b': 0, 'c': -6}}).
This fails (the stacktrace is not from this particular code snippet but from my larger script which I won't paste here but it is essentialy the same):
>>> x = dict(foo()) # obtain the data and make a new dict of it to really be sure
>>> import json
>>> json.dumps(x)
Traceback (most recent call last):
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/pydevd.py", line 1733, in <module>
debugger.run(setup['file'], None, None)
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/pydevd.py", line 1226, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/_pydev_execfile.py", line 38, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc) #execute the script
File "/mnt/data/gandalv/School/PhD/Other work/Krachy/code/recalculate.py", line 54, in <module>
ls[1] = json.dumps(f)
File "/usr/lib/python3.4/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.4/json/encoder.py", line 192, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.4/json/encoder.py", line 250, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 306 is not JSON serializable
The 306 is one of the values in one of ther inner dicts in x. It is not always the same number, sometimes it is a different number contained in the dict, apparently because of the unorderedness of a dict.
However, this works like a charm:
>>> x = foo() # obtain the data and make a new dict of it to really be sure
>>> import ast
>>> import json
>>> x2 = ast.literal_eval(repr(x))
>>> x == x2
True
>>> json.dumps(x2)
"{...}" # the json representation of dict as it should be
Could anyone, please, tell me why does this happen or what could be the cause? The most confusing part is that those two dicts (the original one and the one obtained through evaluation of the representation of the original one) are equal but the dumps() function behaves differently for each of them.
The cause was that the numbers inside the dict were not ordinary python ints but numpy.in64s which are apparently not supported by the json encoder.
As you have seen, numpy int64 data types are not serializable into json directly:
>>> import numpy as np
>>> import json
>>> a=np.zeros(3, dtype=np.int64)
>>> a[0]=-9223372036854775808
>>> a[2]=9223372036854775807
>>> jstr=json.dumps(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 192, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 250, in iterencode
return _iterencode(o, 0)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([-9223372036854775808, 0, 9223372036854775807]) is not JSON serializable
However, Python integers -- including longer integers -- can be serialized and deserialized:
>>> json.loads(json.dumps(2**123))==2**123
True
So with numpy, you can convert directly to Python data structures then serialize:
>>> jstr=json.dumps(a.tolist())
>>> b=np.array(json.loads(jstr))
>>> np.array_equal(a,b)
True

Categories

Resources