I try to test the scrapy log with ScrapyFileLogObserver. In my source code, I correctly set the package to use:
from scrapy.log import ScrapyFileLogObserver
but i've got this error when I launch my spider:
from scrapy.log import ScrapyFileLogObserver
ImportError: cannot import name ScrapyFileLogObserver
for information I use the last version of scrapy (Scrapy 1.0.1).
How I can fixe my bug ?
In 1.0 Scrapy's logging system was completely rewritten, there is no ScrapyFileLogObserver anymore. Instead, Scrapy now uses twisted's PythonLoggingObserver directly:
observer = twisted_log.PythonLoggingObserver('twisted')
observer.start()
Related
I made a big mistake by creating an app inside my Django project called requests that happened a long time ago and the system has already been running for years. now I need to use the requests library and it is being imported like import requests as mentioned in the documentation ... and of course whenever I do this import it imports my app instead of the library .. so how to solve this?
You can try importing the requests/__init__.py file directly as shown in docs: https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly.
Example:
import sys
import importlib.util
module_name = 'requests'
# declare the full path to requests/__init__.py file below
module_path = '/path/to/virtualenv/site-packages/requests/__init__.py'
spec = importlib.util.spec_from_file_location(module_name, module_path)
requests = importlib.util.module_from_spec(spec)
sys.modules[module_name] = requests
spec.loader.exec_module(requests)
print(requests.post) # should not raise error
I'm trying to import the class from a spider folder, file but it gives me the error.
I used following method to import the class:
from .spiders.amazon_Spider import amazonSpider
Following is my files enlignment:
-amazonWebScraping
-amazonWebScraping
-spiders
-amazon_spiders.py
-scraper.py
I'm trying to access class amazonSpider(scrapy.Spider) from amazon_spider.py file, in scraper.py file but it's giving me the error
ImportError: attempted relative import with no known parent package
I have been using the method described on stackoverflow (https://stackoverflow.com/a/43661172/5037146) , to make scrapy run from script using Crawler Runner to allow to restart the process.
However, I don't get any console logs when running the process through CrawlerRunner, whereas when I using CrawlerProcess, it outputs the status and progress.
Code is available online: https://colab.research.google.com/drive/14hKTjvWWrP--h_yRqUrtxy6aa4jG18nJ
With CrawlerRunner you need to manually setup logging, which you can do with configure_logging(). See https://docs.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script
When you use CrawlerRunner you have to manually configure a logger
You can do it using scrapy.utils.log.configure_logging function
for example
import scrapy.crawler
from my_spider import MySpider
runner = scrapy.crawler.CrawlerRunner()
scrapy.utils.log.configure_logging(
{
"LOG_FORMAT": "%(levelname)s: %(message)s",
},
)
crawler = runner.create_crawler(MySpider)
crawler.crawl()
I want to test my scrapy spider. I want to import spider to a test file an make a test spider and override start_urls, but I have a problem with importing it. Here is a project structure
...product-scraper\test_spider.py
...product-scraper\oxygen\oxygen\spiders\oxygen_spider.py
...product-scraper\oxygen\oxygen\items.py
the problem is that spider import Product class from items.py
from oxygen.items import Product
ImportError: No module named items
cmdscrapy crawl oxygen_spider works
I tried change sys.path or site.addsitedir in all possible ways
basedir = os.path.abspath(os.path.dirname(__file__))
module_path = os.path.join(basedir, "oxygen\\oxygen")
sys.path.append(basedir) # module_path
no success :(
I use python 2.7 on windows
Do you really get the error "No module named items"? Or is it something like "No module named oxygen.items"?
Also I'm not really sure why you would want to use os.path commands. Wouldn't this just work:
from items import Product
So without the "oxygen." This would however, as far as I know, only work if Product is a class in your items.py. If it's not a class I would suggest to just use:
import items
If that does not work, please specify what Product is in your items.py
I have a very basic spider, following the instructions in the getting started guide, but for some reason, trying to import my items into my spider returns an error. Spider and items code is shown below:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from myProject.items import item
class MyProject(BaseSpider):
name = "spider"
allowed_domains = ["website.com"]
start_urls = [
"website.com/start"
]
def parse(self, response):
print response.body
from scrapy.item import Item, Field
class ProjectItem(Item):
title = Field()
When I run this code scrapy either can't find my spider, or can't import my items file. What's going on here? This should be a really example to run right?
I also had this several times while working with scrapy. You could add at the beginning of your Python modules this line:
from __future__ import absolute_import
More info here:
http://www.python.org/dev/peps/pep-0328/#rationale-for-absolute-imports
http://pythonquirks.blogspot.ru/2010/07/absolutely-relative-import.html
you are importing a field ,you must import a class from items.py
like from myproject.items import class_name.
So, this was a problem that I came across the other day that I was able to fix through some trial and error, but I wasn't able to find any documentation of it so I thought I'd put this up in case anyone happens to run into the same problem I did.
This isn't so much an issue with scrapy as it is an issue with naming files and how python deals with importing modules. Basically the problem is that if you name your spider file the same thing as the project then your imports are going to break. Python will try to import from the directory closest to your current position which means it's going to try to import from the spider's directory which isn't going to work.
Basically just change the name of your spider file to something else and it'll all be up and running just fine.
if the structure like this:
package/
__init__.py
subpackage1/
__init__.py
moduleX.py
moduleY.py
subpackage2/
__init__.py
moduleZ.py
moduleA.py
and if you are in moduleX.py, the way to import other modules can be:
from .moduleY.py import *
from ..moduleA.py import *
from ..subpackage2.moduleZ.py import *
refer:PEP Imports: Multi-Line and Absolute/Relative