I'm learning about crawlers, and after a few basic ones I tried downloading the google scholar crawler master from github to see how it runs, after a few errors that I could fix, I ran into a ModuleNotFoundError: No module named 'proxy' error (middleware.py file, from proxy import PROXIES line is the issue).
This code has had a few problems containing solutions that are no longer supported/advised in python 3.x versions, including modules that have since been renamed/moved, but I was unable to find out if this is the case for this as well, would appreciate help.
Assuming you're talking about this https://github.com/geekan/google-scholar-crawler crawler:
I just tried to run it on Python 2.7 and had no problems with it. A brief look at misc module told me, that there is a possible problem with relative imports (some information about it may be found in this quesion Relative imports in Python 3).
So, the short answer is simply to use python 2.7 as it will allow to concentrate on understanding how scrapy crawlers work instead of understanding language version differences.
UPD: also make sure to remove all of the import pdb; pdb.set_trace() breakpoints in the code
Related
I followed a tutorial for websockets in python and stumbled across the issue that pylance does not reccomend me the fuctions related to the class that I have imported from a module:
My Editor:
The Tutorial:
The Code Itself runs without any issue, so the Import seems to work, but I dont recieve the reccomendations in vs code. What is the reason for this or where could i debug something like this?
Thanks to wjandrea I found some thing I personally did not stumble acros before.
So a classic mistake happened to me when following a tutorial, I work on newer versions than the 1.5 year old video... Unfortunately the part talking about versions was in a nother part...
Long story short, in the mean time a bigger change for the websockets module appeared since the used functions are now imported lazily. Wich makes sense to reduce startup time incase you'll run a websocket server with the module.
Little info about lazy imports (for me it was the first time i heard about this intelligent feature)
Incase anybody else stumbles across this im currently on Python 3.10.7 and I am talking about websockets 10.3!
Back to the issue.
Pylance obviously can't make any reccomendations since the functions like websockets.connect(uri, ...) are just loaded if they are used in the runtime by the default websockets module so tools for code reccomendations inside the editor dont not know they are there..
I took a glance inside the module and through the indirect hint from wjandrea about the lazy imports inside __init__.py the listed dictionary made much more sense now! Based on this I could backtrack the Python scripts I need for my functions or rather Pylance needs to create those handy reccomendations for me inside vs code (or any other ide).
For now I just manually imported the desired script so I have a bit more guidance while writing and since startuptime in my current project is not crutial I let the manual imports exist or i´ll just change out the import variations based on the cirumstance if I am currently developing or if the code goes into production.
I have an environment with some extreme constraints that require me to reduce the size of a planned Python 3.8.1 installation. The OS is not connected to the internet, and a user will never open an interactive shell or attach a debugger.
There are of course lots of ways to do this, and one of the ways I am exploring is to remove some core modules, for example python3-email. I am concerned that there are 3rd-party packages that future developers may include in their apps that have unused but required dependencies on core python features. For example, if python3-email is missing, what 3rd-party packages might not work that one would expect too? If a developer decides to use a logging package that contains an unreferenced EmailLogger class in a referenced module, it will break, simply because import email appears at the top.
Do package design requirements or guidelines exist that address this?
It's an interesting question, but it is too broad to be cleanly answered here. In short, the Python standard library is expected to always be there, even though sometimes it broken up in multiple parts (Debian for example). But you say it yourself, you don't know what your requirements are since you don't know yet what future packages will run on this interpreter... This is impossible to answer. One thing you could do is to use something like modulefinder on the future code before letting it run on that constrained Python interpreter.
I was able to get to a solution. The issue was best described to me as cascading imports. It is possible to stop a module from being loaded, by adding an entry to sys.modules. For example, when importing the asyncio module ssl and _ssl modules will be loaded, even though they are not needed outside of ssl. This can be stopped with the following code. This can be verified both by seeing the python process is 3MB smaller, but also by using module load hooks to watch each module as it loads:
import importhook
import sys
sys.modules['ssl'] = None
#importhook.on_import(importhook.ANY_MODULE)
def on_any_import(module):
print(module.__spec__.name)
assert module.__spec__.name not in ['ssl', '_ssl']
import asyncio
For my original question about 3rd-party design guidelines, some recommend placing the import statements within the class rathe that at the module level, however this is not routinely done.
I'm trying to run a Python script using Matlab's built-in py. It's pretty simple, but I'm running into some difficulty drying to debug an error in my code (which runs fine testing in my Python IDE but crashes when run through Matlab).
The issue is that Matlab seems to be caching the module the first time I call a function, and I can't figure out how to get it to recognize changes to the module without restarting Matlab. Is anyone aware of a way to avoid this issue?
This is the first limitation listed on the MATLAB documentation's Limitations to Python Support page:
Editing and reloading a Python® module in the same MATLAB session. To
use an updated module, restart MATLAB
Sorry. That said, that page might help you figure out what the issue is, as there are other limitations that might be coming into play. You might also find their page about troubleshooting Python errors useful.
I use intellij with python plugin.
when I want to import python libs like
import random
I got editor error.
No module named random less... (Ctrl+F1)
This inspection detects names that should resolve but don't. Due to dynamic dispatch and duck typing, this is possible in a limited but useful number of cases. Top-level and class-level items are supported better than instance items.
when I run the code every thing is ok
what can I do to make the intelij recognize this libs?
You may have fixed this by now, but I was having the same problem and finally solved it, so I figured I'd post the solution for anyone who came here by Googling/Binging the error message:
I went to File > Project Structure > Modules, highlighted the main module, then pressed the plus sign and then "Python" under "Framework".
Hope that helps you or someone else.
may be my intellij version is different with you guys.
on Windows platform I fix this problem by:
1.File > Project Structure > Modules
2.on the module's dependencies panel,change the module SDK from JDK to python
3.done
I followed this guide to setup OpenCV 2.3.1 in Python 2.7 with Eclipse.
I also copied the libraries into my python folder:
http://i.snag.gy/J9RrC.jpg
Here is my Hello World program which runs correctly (creates a named window and displays the image) but Eclipse still shows syntax errors
every error says "Undefined variable from import"
Here are my python settings for this project:
http://i.snag.gy/KBXiB.jpg
http://i.snag.gy/KfTpF.jpg
Have I setup my PythonPath incorrectly? How can i get Eclipse to work properly?
Thanks
I had the same problem, everything ran correctly even though there were undefined import errors all over the place. I eventually solved it by adding 'cv' to the list of Forced Builtins: Window > Preferences > Pydev > Interpreter - Python > Forced Builtins > New.
This is how I came across the solution:
How to use code completion into Eclipse with OpenCV
I hope that this may help you too.
EDIT: FYI, according to the top answer here, if you're just getting started (like me!) it's almost certainly better to use the cv2 interface instead of the older one provided in cv2.cv. The author of that answer, Abid Rahman, has some tutorials that look pretty good. (end EDIT)
I used Debian's tools to install the python-opencv package. There was no .../dist-packages/opencv directory to be found, and the cv.py file contained only:
from cv2.cv import *
I'm fairly inexperienced with Python and completely so with Python access to external libraries, so this looked like some sort of workaround related to that. Not so, apparently. I followed Casper's link above, and found the solution that he used (which worked for me,) but I wasn't happy using "forced builtins" when I wasn't entirely sure of the consequences.
However, the second, lower-rated answer there is my preferred solution. Instead of
import cv
I'm using
import cv2.cv as cv
From what I can tell, this just removes the cv.py middleman from the import chain, if that makes sense. A save/close/reload of my script had Eclipse recognizing cv.LoadImageM as defined and autocompleting other things from OpenCV.
I'm reproducing that answer here because it seems cleaner to me and I found this question first when I searched for the answer to the same problem.
It would be helpful to show the error you're getting and your code. However, I suspect that the problem is that the syntax errors which PyDev shows are based on its own parsing of the code, which is much more simplistic that the actual python interpreter. If your code runs, then the apparently undefined variables must be defined, but the PyDev parser just can't see them and reports them as "undefined".
The cause of this is that OpenCV doesn't explicitly define its variables in a way which can be read by PyDev. Unfortunately I don't have an easy solution. I usually deal with the problem by using from ... import ... so that the error only appears once. If you want you could write a wrapper module which explicitly imports the variables into its local namespace, then import that module instead.