I'm running some code that trains, saves and loads a Word2Vec model (it's part of a library I've downloaded, made by a user on github, as part of a published paper). Upon running it, two parts of the code seem to be problematic, although the code does actually run to the end.
The first error arises from a method called train_word2vec() (which is called as part of the main method of the program).
The second error arises from a line later in the main method.
Problematic line 1 - within a method train_word2vec():
if exists(model_name):
embedding_model = word2vec.Word2Vec.load(model_name) #This line causes a UserWarning.
Problematic line 2 - Later in the program, in the main method:
x_train, x_val, x_test, vocabulary, vocabulary_inv, sentences = load_data() #This line seems to run fine.
embedding_weights = train_word2vec(sentences, vocabulary_inv) #This line causes two DeprecationWarnings.
The DeprecationWarnings are specifically created by the following line in train_word2Vec:
embedding_weights = [np.array([embedding_model[w] if w in embedding_model else np.random.uniform(-0.25,0.25,embedding_model.vector_size) for w in vocabulary_inv])]
Upon executing the code, the first problematic line causes a UserWarning:
" C:\Users\User1\Anaconda3\lib\site-packages\smart_open\smart_open_lib.py398: UserWarning: This function is deprecated, use smart_open.open instead. See the migration notes for details: https://github.com/RaRe-Technologies/smart_open/blob/master/README.rst#migrating-to-the-new-open-function "
The second problematic line causes two DeprecationWarnings:
"
load_w2v.py:91: DeprecationWarning: Call to deprecated 'contains' (Method will be removed in 4.0.0, use self.wv.contains() instead).
embedding_weights = [np.array([embedding_model[w] if w in embedding_model else np.random.uniform(-0.25,0.25,embedding_model.vector_size) for w in vocabulary_inv])]"
"load_w2v.py:91: DeprecationWarning: Call to deprecated 'getitem' (Method will be removed in 4.0.0, use self.wv.getitem() instead).
embedding_weights = [np.array([embedding_model[w] if w in embedding_model else np.random.uniform(-0.25,0.25,embedding_model.vector_size) for w in vocabulary_inv])]
"
I've looked at the RaRe technologies README. What seems confusing is that nowhere in my code do I use the 'smart_open' function, so I don't understand why the first warning has been raised. smart_open isn't even in the imports at the start of the Python file.
Regarding the DeprecationWarnings, I use neither a 'contains' method nor a 'getitem' method in my code, so not sure where these warnings are coming from either.
As far as I can tell, the code seems to run properly, and the final file seems to have been created successfully. However, as I am recreating some code that somebody else has written, I am not certain that the file has been created properly.
Do DeprecationWarnings and UserWarnings actually indicate that the program has not executed successfully? Or are they there just as 'warnings'? I.e. is it possible for the code to run fine and 'warnings' still be thrown?
If anyone can see how I could alter the code to avoid these errors, that'd be appreciated. I'm new to Python so please point out any errors. Thanks.
In general, you can often ignore various 'warnings'. If they halted operation, or corrupted results, they'd appear as more serious errors or exceptions-that-must-be-handled for execution to proceed.
Specifically here, both warnings are actually about 'deprecation' of some method. That typically means a method's use is discouraged in favor of some newer, more-recommended approach – but still works for now (and possibly for quite a while longer).
With an abundance of caution, you could try to preemptively ensure all your code (and libraries) are using the most-recommended approaches – but it's not usually urgent or even necessary if things are otherwise working.
The notice about smart_open is essentially an issue for gensim to fix in a forthcoming release. The smart_open package changed it's preferred way of doing something, gensim is still using the older (still-working but 'deprecated') approach. You're not calling smart_open directly so it's not really your concern.
The notices about contains and getitem might be more under your control – they appear to be triggered by lines in a file load_w2v.py – which is not in gensim, and whose code you've not shown. In particular, gensim now encourages word-vector accesses to go through a Word2Vec model's .wv property, rather than through the top-level model (where these warnings will be generated). (Still, though, the old method works, and will until gensim decides to make a breaking change.)
If the warning display really bugs you, and you don't want to deep-dive into your code to avoid triggering them, then you can also simply suppress their display, as the comment & link from #tom-dalton describes.
Related
I am new to the tensorflow and programing in general.
I am following an instruction in github (https://github.com/experiencor/keras-yolo3) to learn object detection by YOLO-3.
after running code below:
!python train.py -c config.json
I received several messages in the output, and I am trying to understand what each meanS.
One of them is as below:
WARNING:tensorflow:From train.py:26: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
Question one : Is that, do I have to fix the meantion part of code (tf.keras.backend.set_session) since it is "deprecated" as said here?
Question two : How does a warning generally, and specifically this warning may effect my final model if not to be fixed?
Answer one : long story short, a deprecated function is an old one, replaced by something (hopefully) better, and still there for retro-compatibility. You can use it but will not get the latest development/support and, at some point, your code will not be functional anymore (since the faith of a deprecated function is to disappear in a future release).
Answer two :
Warning messages are typically issued in situations where it is useful to alert the user of some condition in a program, where that condition (normally) doesn’t warrant raising an exception and terminating the program. For example, one might want to issue a warning when a program uses an obsolete module.
https://docs.python.org/3/library/warnings.html
All in all, here, the interpreter just warms you that you are using a function that you will not be able to use in the future.
I have a coverage report that may be lying or distorted. It says that I have coverage for a line in my Django model code. I can't see where that line is being exercised. I can see that the module is imported, that the class is imported, but not that it's being invoked/instantiated.
Thus, coverage report says I have Line A covered. Presumably that means Line B, somewhere, is exercising it. I'd like to know where Line B is. Is there a way to find the set of Line-B's (one or more) that are calling Line A, in my tests?
It seems this could be an annotation in the coverage report somehow/somewhere. It's definitely knowable, since coverage has to keep track of a thing being used.
I'm not seeing it.
If this isn't implemented, I'd like to suggest it. I know, it may be too complex as a full stack trace for each line of execution. But, maybe just the inspect of the immediate calling frame would be a good start, and helpful.
New in coverage.py 5.0 are dynamic contexts which can tell you what test ran each line of code. It won't tell you the immediate caller of the line, but it's a start.
Here's a fun way to discover what covers that line:
Insert a bug in the line.
If you then run the tests, the ones truly covering the line will fail. The stacktraces should include Line B.
My python script starts with
from __future__ import division
In R I do
library(rPython)
python.load("myscript.py")
I get
File "", line 2 SyntaxError: from future imports must
occur at the beginning of the file
I just bumped into the same problem - apparently python.load() is simply executing the script loaded from the location as if it were a bunch of commands.
I'm not sure if it's wrapped or preceded with some boilerplate code by default somehow, but it seems so. And if you were to catch errors using rPython it would surely be executed within a try... block (given the current code on GitHub at least).
However, using a workaround based on execfile() did the job for me:
python.exec("execfile('myscript.py')")
Another approach is, if there's no need to execute code in the main block, to import the module
python.exec("import myscript")
however, in this slightly more convoluted case, you likely have to deal with path problems, as mentioned e.g. here.
(It would probably be a good idea to let the package maintainers know about this situation, and that it could use something better than a workaround.)
I am working on a project using python 2.7.2, sqlalchemy 0.7, unittest, eclipse 3.7.2 and pydev 2.4. I am setting breakpoints in python files (unit test files), but they are completely ignored (before, at some point, they worked). By now i have upgraded all related software (see above), started new projects, played around with settings, hypnotized my screen, but nothing works.
The only idea i got from some post is that it has something to de with changing some .py file names to lower case.
Does anyone have any ideas?
added: I even installed the aptana version of eclipse and copied the .py files to it => same result; breakpoints are still ignored.
still no progress: I have changed some code that might be seen as unusual and replaced it with a more straightforward solution.
some more info: it probably has something to do with module unittest:
breakpoints in my files defining test suites work,
breakpoints in the standard unittest files themselves work
breakpoints in my tests methods in classes derived from unittest.TestCase do not work
breakpoints in my code being tested in the test cases do not work
at some point before i could define working breakpoints in test methods or the code being tested
some things i changed after that are: started using test suites, changed some filenames to lowercase, ...
this problem also occurs if my code works without exceptions or test failures.
what I already tried is:
remove .pyc files
define new project and copy only .py files to it
rebooted several times in between
upgraded to eclipse 3.7.2
installed latest pydev on eclipse 3.7.2
switch to aptana (and back)
removed code that 'manually' added classes to my module
fiddled with some configurations
what I can still do is:
start new project with my code, start removing/changing code until breakpoints work and sort of black box figure out if this has something to do with some part of my code
Does anyone have any idea what might cause these problems or how they might be solved?
Is there any other place i could look for a solution?
Do pydev developers look into the questions on stackoverflow?
Is there an older version of pydev that i might try?
I have been working with pydev/eclipse for a long time and it works well for me, but without debugging i'd forced to switch IDE.
In answer to Fabio's questions below:
The python version is 2.7.2,
The sys.gettrace gives None (but I have no idea what in my code could influence that)
This is the output of the debugger after changing the suggested parameters:
pydev debugger:
starting
('Executing file ', 'D:\\.eclipse\\org.eclipse.platform_3.7.0_248562372\\plugins\\org.python.pydev.debug_2.4.0.2012020116\\pysrc\\runfiles.py')
('arguments:', "['D:\\\\.eclipse\\\\org.eclipse.platform_3.7.0_248562372\\\\plugins\\\\org.python.pydev.debug_2.4.0.2012020116\\\\pysrc\\\\runfiles.py', 'D:\\\\Documents\\\\Code\\\\Eclipse\\\\workspace\\\\sqladata\\\\src\\\\unit_test.py', '--port', '49856', '--verbosity', '0']")
('Connecting to ', '127.0.0.1', ':', '49857')
('Connected.',)
('received command ', '501\t1\t1.1')
sending cmd: CMD_VERSION 501 1 1.1
sending cmd: CMD_THREAD_CREATE 103 2 <xml><thread name="pydevd.reader" id="-1"/></xml>
sending cmd: CMD_THREAD_CREATE 103 4 <xml><thread name="pydevd.writer" id="-1"/></xml>
('received command ', '111\t3\tD:\\Documents\\Code\\Eclipse\\workspace\\sqladata\\src\\testData.py\t85\t**FUNC**testAdjacency\tNone')
Added breakpoint:d:\documents\code\eclipse\workspace\sqladata\src\testdata.py - line:85 - func_name:testAdjacency
('received command ', '122\t5\t;;')
Exceptions to hook : []
('received command ', '124\t7\t')
('received command ', '101\t9\t')
Finding files... done.
Importing test modules ... testAtomic (testTypes.TypeTest) ... ok
testCyclic (testTypes.TypeTest) ...
The rest is output of the unit test.
Continuing from Fabio's answer part 2:
I have added the code at the start of the program and the debugger stops working at the last line of following the method in sqlalchemy\orm\attributes.py (it is a descriptor, but how or whther it interferes with the debugging is beyond my current knowledge):
class InstrumentedAttribute(QueryableAttribute):
"""Class bound instrumented attribute which adds descriptor methods."""
def __set__(self, instance, value):
self.impl.set(instance_state(instance),
instance_dict(instance), value, None)
def __delete__(self, instance):
self.impl.delete(instance_state(instance), instance_dict(instance))
def __get__(self, instance, owner):
if instance is None:
return self
dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
return self.impl.get(instance_state(instance),dict_) #<= last line of debugging
From there the debugger steps into the __getattr__ method of one of my own classes, derived from a declarative_base() class of sqlalchemy.
Probably solved (though not understood):
The problem seemed to be that the __getattr__ mentioned above, created something similar to infinite recursion, however the program/unittest/sqlalchemy recovered without reporting any error. I do not understand the sqlalchemy code sufficiently to understand why the __getattr__ method was called.
I changed the __getattr__ method to call super for the attribute name for which the recursion occurred (most likely not my final solution) and the breakpoint problem seems gone.
If i can formulate the problem in a consise manner, i will probably try to get some more info on the google sqlalchemy newsgroup, or at least check my solution for robustness.
Thank you Fabio for your support, the trace_func() function pinpointed the problem for me.
Seems really strange... I need some more info to better diagnose the issue:
Open \plugins\org.python.pydev.debug\pysrc\pydevd_constants.py and change
DEBUG_TRACE_LEVEL = 3
DEBUG_TRACE_BREAKPOINTS = 3
run your use-case with the problem and add the output to your question...
Also, it could be that for some reason the debugging facility is reset in some library you use or in your code, so, do the following: in the same place that you'd put the breakpoint do:
import sys
print 'current trace function', sys.gettrace()
(note: when running in the debugger, it'd be expected that the trace function is something as: <bound method PyDB.trace_dispatch of <__main__.PyDB instance at 0x01D44878>> )
Also, please post which Python version you're using.
Answer part 2:
The fact that sys.gettrace() returns None is probably the real issue... I know some external libraries which mess with it (i.e.:DecoratorTools -- read: http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html) and have even seen Python bugs and compiled extensions break it...
Still, the most common reason it breaks is probably because Python will silently disable the tracing (and thus the debugger) when a recursion throws a stack overflow error (i.e.: RuntimeError: maximum recursion depth exceeded).
You can probably put a breakpoint in the very beginning of your program and step in the debugger until it stops working.
Or maybe simpler is the following: Add the code below to the very beginning of your program and see how far it goes with the printing... The last thing printed is the code just before it broke (so, you could put a breakpoint at the last line printed knowing it should be the last line where it'd work) -- note that if it's a large program, printing may take a long time -- it may even be faster printing to a file instead of a console (such as cmd, bash or eclipse) and later opening that file (just redirect the print from the example to a file).
import sys
def trace_func(frame, event, arg):
print 'Context: ', frame.f_code.co_name, '\tFile:', frame.f_code.co_filename, '\tLine:', frame.f_lineno, '\tEvent:', event
return trace_func
sys.settrace(trace_func)
If you still can't figure it out, please post more information on the obtained results...
Note: a workaround until you don't find the actual place is using:
import pydevd;pydevd.settrace()
on the place where you'd put the breakpoint -- that way you'd have a breakpoint in code which should definitely work, as it'll force setting the tracing facility at that point (it's very similar to the remote debugging: http://pydev.org/manual_adv_remote_debugger.html except that as the debugger was already previously connected, you don't really have to start the remote debugger, just do the settrace to emulate a breakpoint)
Coming late into the conversation, but just in case it helps. I just run into a similar problem and I found that the debugger is very particular w.r.t. what lines it considers "executable" and available to break on.
If you are using line continuations, or multi-line expressions (e.g. inside a list), put the breakpoint in the last line of the statement.
I hope it helps.
Try removing the corresponding .pyc file (compiled) and then running.
Also I have sometimes realized I was running more than one instance of a program.. which confused pydev.
I've definitely seen this before too. Quite a few times.
Ran into a similar situation running a django app in Eclipse/pydev. what was happening was that the code that was running was the one installed in my virtualenv, not my source code. I removed my project from my virtual env site-packages, restarted the django up in the eclipse/pydev debugger and everything was fine.
I had similar-sounding symptoms. It turned out that my module import sequence was rexec'ing my entry-point python module because a binary (non-Python) library had to be dynamically loaded, i.e., the LD_LIBRARY_PATH was dynamically reset. I don't know why this causes the debugger to ignore subsequent breakpoints. Perhaps the rexec call is not specifying debug=true; it should specify debug=true/false based on the calling context state?
Try setting a breakpoint at your first import statement being cognizant of whether you are then s(tep)'ing into or n(ext)'ing over the imports. When I would "next" over the 3rdparty import that required the dynamic lib loading, the debug interpreter would just continue past all breakpoints.
I plan to use those functions in web-environment, so my concern is if those functions can be exploited and used for executing malicious software on the server.
Edit: I don't execute the result. I parse the AST tree and/or catch SyntaxError.
This is the code in question:
try:
#compile the code and check for syntax errors
compile(code_string, filename, "exec")
except SyntaxError, value:
msg = value.args[0]
(lineno, offset, text) = value.lineno, value.offset, value.text
if text is None:
return [{"line": 0, "offset": 0,
"message": u"Problem decoding source"}]
else:
line = text.splitlines()[-1]
if offset is not None:
offset = offset - (len(text) - len(line))
else:
offset = 0
return [{"line": lineno, "offset": offset, "message": msg}]
else:
#no syntax errors, check it with pyflakes
tree = compiler.parse(code_string)
w = checker.Checker(tree, filename)
w.messages.sort(lambda a, b: cmp(a.lineno, b.lineno))
checker.Checker is pyflakes class that parses the AST tree.
I think the more interesting question is what are you doing with the compiled functions? Running them is definitely unsafe.
I've tested the few exploits i could think of seeing as its just a syntax checker (can't redefine classes/functions etc) i don't think there is anyway to get python to execute arbitrary code at compile time
If the resulting code or AST object is never evaluated, I think you are only subject to DDoS attacks.
If you are evaluating user inputed code, it is the same as giving shell access as the webserver user to every user.
They are not, but it's not too hard finding a subset of Python that can be sandboxed to a point. If you want to go down that road you need to parse that subset of Python yourself and intercept all calls, attribute lookups and everything else involved. You also don't want to give users access to any language construct such as unterminating loops and more.
Still interested? Head over to jinja2.sandbox :)
compiler.parse and compile could most definitely be used for an attack if the attacker can control their input and the output is executed. In most cases, you are going to either eval or exec their output to make it run so those are still the usual suspects and compile and compiler.parse (deprecated BTW) are just adding another step between the malicious input and the execution.
EDIT: Just saw that you left a comment indicating that you are actually planning on using these on USER INPUT. Don't do that. Or at least, don't actually execute the result. That's a huge security hole for whoever ends up running that code. And if nobody's going to run it, why compile it? Since you clarified that you only want to check syntax, this should be fine. I would not store the output though as there's no reason to make anything easier for a potential attacker and being able to get arbitrary code onto your system is a first step.
If you do need to store it, I would probably favor a scheme similar to that commonly used for images where they are renamed in a non-predictable manner with the added step of making sure that it is not stored on the import path.
Yes, they can be maliciously exploited.
If you really want safe sandboxing, you could look at PyPy's sandboxing features, but be aware that sandboxing is not easy, and there may be better ways to accomplish whatever you are seeking.
Correction
Since you've updated your question to clarify that you're only parsing the untrusted input to AST, there is no need to sandbox anything: sandboxing is specifically about executing untrusted code (which most people probably assumed your goal was, by asking about sandboxing).
Using compile / compiler only for parsing this way should be safe: Python source parsing does not have any hooks into code execution. (Note that this is not necessarily true of all languages: for example, Perl cannot be (completely) parsed without code execution.)
The only other remaining risk is that someone may be able to craft some pathological Python source code that makes one of the parsers use runaway amounts of memory / processor time, but resource exhaustion attacks affect everything, so you'll just want to manage this as it becomes necessary. (For example, if your deployment is mission-critical and cannot afford a denial of service by an attacker armed with pathological source code, you can execute the parsing in a resource-limited subprocess).