How to import django models in scrapy pipelines.py file - python

I'm trying to import models of one django application in my pipelines.py to save data using django orm. I created a scrapy project scrapy_project in the first involved django application "app1" (is it a good choice by the way?).
I added these lines to my scrapy settings file:
def setup_django_env(path):
import imp, os
from django.core.management import setup_environ
f, filename, desc = imp.find_module('settings', [path])
project = imp.load_module('settings', f, filename, desc)
setup_environ(project)
current_dir = os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
setup_django_env(os.path.join(current_dir, '../../d_project1'))
When I try to import models of my django application app1 I get this error message:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 122, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 76, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 129, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 43, in run
spider = self.crawler.spiders.create(spname, **opts.spargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/command.py", line 33, in crawler
self._crawler.configure()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 41, in configure
self.engine = ExecutionEngine(self, self._spider_closed)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 63, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 39, in load_object
raise ImportError, "Error loading object '%s': %s" % (path, e)
ImportError: Error loading object 'scrapy_project.pipelines.storage.storage': No module named dydict.models
Why cannot scrapy access django application models (given that app1 in the installed_app ) ?

In the pipelines you don't import django models, you use scrapy models bounded to a django model.
You have to add Django Settings at scrapy settings, not after.
To use django models in scrapy project you have to use django_Item
https://github.com/scrapy-plugins/scrapy-djangoitem (import to your pythonpath)
My recommended file structure is:
Projects
|-DjangoScrapy
|-DjangoProject
| |-Djangoproject
| |-DjangoAPP
|-ScrapyProject
|-ScrapyProject
|-Spiders
Then in your scrapy project you hace to add pythonpath ull path to the django project:
**# Setting up django's project full path.**
import sys
sys.path.insert(0, '/home/PycharmProject/scrap/DjangoProject')
# Setting up django's settings module name.
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'DjangoProject.settings'
Then in your items.py you cand bound your Django models to scrapy models:
from DjangoProject.models import Person, Job
from scrapy_djangoitem import DjangoItem
class Person(DjangoItem):
django_model = Person
class Job(DjangoItem):
django_model = Job
Then u can use the .save() method in pipelines after yeld of an object:
spider.py
from scrapy.spider import BaseSpider
from mybot.items import PersonItem
class ExampleSpider(BaseSpider):
name = "example"
allowed_domains = ["dmoz.org"]
start_urls = ['http://www.dmoz.org/World/Espa%C3%B1ol/Artes/Artesan%C3%ADa/']
def parse(self, response):
# do stuff
return PersonItem(name='zartch')
pipelines.py
from myapp.models import Person
class MybotPipeline(object):
def process_item(self, item, spider):
obj = Person.objects.get_or_create(name=item['name'])
return obj
I have a repository with the minimal code working: (you just have to set the path of your django project in scrapy settings)
https://github.com/Zartch/Scrapy-Django-Minimal
in:
https://github.com/Zartch/Scrapy-Django-Minimal/blob/master/mybot/mybot/settings.py
You have to change my Django Project path to your DjangoProject path:
sys.path.insert(0, '/home/zartch/PycharmProjects/Scrapy-Django-Minimal/myweb')

Try:
from .. models import MyModel
OR
from ... models import MyModel
Every dot represent the location

Related

Circular import with beanie ODM

I need to use a cross-reference in my MongoDB schema. I use beanie as ODM. Here is my models:
entity.py
from beanie import Document
class Entity(Document):
path: List["Folder"] = []
folder.py
from entity import Entity
class Folder(Entity)
pass
init_beanie.py
import beanie
from motor.motor_asyncio import AsyncIOMotorClient
from entity import Entity
from folder import Folder
models = [Entity, Folder]
async def init_beanie():
client = AsyncIOMotorClient("mongo-uri")
Entity.update_forward_refs(Folder=Folder)
await beanie.init_beanie(database=client["mongo-db-name"], document_models=models)
main.py
from fastapi import FastAPI
from init_beanie import init_beanie
my_app = FastAPI()
#my_app.on_event("startup")
async def init():
await init_beanie()
But when I start my app I got an erorr:
...
File "pydantic/main.py", line 816, in pydantic.main.BaseModel.update_forward_refs
File "pydantic/typing.py", line 553, in pydantic.typing.update_model_forward_refs
File "pydantic/typing.py", line 519, in pydantic.typing.update_field_forward_refs
File "pydantic/typing.py", line 65, in pydantic.typing.evaluate_forwardref
File "/usr/local/lib/python3.9/typing.py", line 554, in _evaluate
eval(self.__forward_code__, globalns, localns),
File "<string>", line 1, in <module>
NameError: name 'Folder' is not defined
what am I doing wrong?

Django Exception : AppRegistryNotReady("Models aren't loaded yet.")

I found a lot of content on the AppRegistryNotReady Exception, but none of them seem defenitive. I just wanted my 2 cents of info on the topic.
My django project was working fine. I created a new app, and created the following model. No view, no urls nothing. Just a model.
from __future__ import unicode_literals
from django.db import models
# Create your models here.
from django.conf import settings
from django.core.exceptions import ValidationError
from django.contrib.auth import get_user_model
User = get_user_model()
class File(models.Model):
path = models.TextField() #The path does not include MEDIA_ROOT, obviously
filename = models.CharField(max_length=500)
# file = models.FileField(upload_to=upload_to)
file = models.FileField(upload_to=path+filename)
user = models.ForeignKey(settings.AUTH_USER_MODEL, models.PROTECT) #Protects User from being deleted when there are files left
def clean(self):
#Check if path has a trailing '/'
if self.path[-1]!='/':
self.path = self.path+"/"
if self.filename[0]=='/':
self.filename = self.filename[1:]
#Get the full path
username = self.user.__dict__[User.USERNAME_FIELD] #Need to do this the roundabout way to make sure that this works with CUSTOM USER MODELS. Else, we could have simply went for self.user.username
self.path = "tau/"+username+"/"+self.path
def save(self, *args, **kwargs):
self.full_clean()
return super(File, self).save(*args, **kwargs)
def __str__(self):
if path[-1]=='/':
text = "\n"+str(path)+str(filename)
else:
text = "\n"+str(path)+"/"+str(filename)
return text
Then I tried to makemigrations on the model. And ended up with the following error.
(test) ~/Workspace/WebDevelopment/Django/test/stud$python manage.py makemigrations
Traceback (most recent call last):
File "manage.py", line 22, in <module>
execute_from_command_line(sys.argv)
File "/home/raghuram/Workspace/WebDevelopment/Django/test/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/home/raghuram/Workspace/WebDevelopment/Django/test/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 341, in execute
django.setup()
File "/home/raghuram/Workspace/WebDevelopment/Django/test/local/lib/python2.7/site-packages/django/__init__.py", line 27, in setup
apps.populate(settings.INSTALLED_APPS)
File "/home/raghuram/Workspace/WebDevelopment/Django/test/local/lib/python2.7/site-packages/django/apps/registry.py", line 108, in populate
app_config.import_models(all_models)
File "/home/raghuram/Workspace/WebDevelopment/Django/test/local/lib/python2.7/site-packages/django/apps/config.py", line 199, in import_models
self.models_module = import_module(models_module_name)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/home/raghuram/Workspace/WebDevelopment/Django/test/stud/tau/models.py", line 10, in <module>
User = get_user_model()
File "/home/raghuram/Workspace/WebDevelopment/Django/test/local/lib/python2.7/site-packages/django/contrib/auth/__init__.py", line 163, in get_user_model
return django_apps.get_model(settings.AUTH_USER_MODEL)
File "/home/raghuram/Workspace/WebDevelopment/Django/test/local/lib/python2.7/site-packages/django/apps/registry.py", line 192, in get_model
self.check_models_ready()
File "/home/raghuram/Workspace/WebDevelopment/Django/test/local/lib/python2.7/site-packages/django/apps/registry.py", line 131, in check_models_ready
raise AppRegistryNotReady("Models aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Models aren't loaded yet.
Just for the sake of completing the test, I changed my model to this,
class File(models.Model):
file = models.FileField()
And that stopped the exception. So my guess is that the Exception was being raised by this.
from django.contrib.auth import get_user_model
User = get_user_model()
But I need to use that, since Im working with custom User Model. Any idea on how I can make it happen?
AbstractBaseUser provides a get_username() method which you can use instead. It practically does the same as what you're doing: return getattr(self, self.USERNAME_FIELD).
class File(models.Model):
...
def clean(self):
#Check if path has a trailing '/'
if self.path[-1]!='/':
self.path = self.path+"/"
if self.filename[0]=='/':
self.filename = self.filename[1:]
#Get the full path
username = self.user.get_username()
self.path = "tau/"+username+"/"+self.path
The reason your original method failed is because get_user_model() is executed when the module is first imported, at which time the app registry is not fully initialized. If you need to use get_user_model() in a models.py file, you should call it within a function or method, not at the module level:
class File(models.Model):
...
def clean(self):
User = get_user_model()

testing Django with standard Unittest: 'DatabaseWrapper' object has no attribute 'Database'

I am trying to test my Django test files separately, one by one, as ./manage.py test freezes for 5secs after each run due to heavy apps.
This is my test (not even a test yet though, just playing with requests):
if __name__ == "__main__":
import unittest
# manually get all the django stuff into memory if file is called directly
import os,sys
TEST_ROOT = os.path.realpath(os.path.dirname(__file__))
PROJ_AND_TEST_ROOT = os.path.dirname(os.path.dirname(TEST_ROOT))
PROJECT_ROOT = os.path.join(PROJ_AND_TEST_ROOT, 'the_game')
sys.path.append(PROJECT_ROOT)
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "the_game.settings")
from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()
from django.test import TestCase, RequestFactory
from django.contrib.auth.models import User
from interface.views import bets
class ViewTests(TestCase):
def test_bets_view(self):
request_factory=RequestFactory()
request=request_factory.get('/')
user = User.objects.create_user('testname','test#na.me','testname')
request.user=user
response=bets(request)
print response
if __name__ == "__main__":
unittest.main()
However, running this I get the following:
======================================================================
ERROR: test_bets_view (__main__.ViewTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/test/testcases.py", line 182, in __call__
self._pre_setup()
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/test/testcases.py", line 754, in _pre_setup
self._fixture_setup()
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/test/testcases.py", line 887, in _fixture_setup
if not connections_support_transactions():
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/test/testcases.py", line 874, in connections_support_transactions
for conn in connections.all())
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/test/testcases.py", line 874, in <genexpr>
for conn in connections.all())
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/utils/functional.py", line 55, in __get__
res = instance.__dict__[self.func.__name__] = self.func(instance)
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/db/backends/__init__.py", line 782, in supports_transactions
self.connection.leave_transaction_management()
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/db/backends/__init__.py", line 338, in leave_transaction_management
if managed == self.get_autocommit():
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/db/backends/__init__.py", line 345, in get_autocommit
self.ensure_connection()
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/db/backends/__init__.py", line 133, in ensure_connection
self.connect()
File "/Users/1111/.virtualenvs/the_game/lib/python2.7/site-packages/django/db/utils.py", line 86, in __exit__
db_exc_type = getattr(self.wrapper.Database, dj_exc_type.__name__)
AttributeError: 'DatabaseWrapper' object has no attribute 'Database'
The project itself works perfectly without errors (./manage.py runserver), as do django tests (./manage.py test ../tests).
How can I fix this?
p.s. This is not a duplicate of this question. Its author had problems with standard Django testing, while it works fine for my project. My trouble is with third-party testing.
Unfortunately, what you're doing is not quite enough to initialize django's settings system. You can try this:
from django.core.management import setup_environ
from mysite import settings
setup_environ(settings)
Or you could look into writing a custom managment command (which will have the settings system initialized already) which calls the test runner.

Error in scrapy screen scraper - Cant find what's wrong for the life of me

I can't figure out what is causing this error. The error is happening on line 3 of the craig.py file but I don't see any discrepancy.
folder Structure
Craig (folder)
Spiders (folder)
init.py
init.pyc
craig.py
craig.pyc
init.py
init.pyc
pipelines.py
settings.py
settings.pyc
scrapy.cfg
Project name: Craig
File name: Craig
Spyder name: Craig.py
Craig.py
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from craig.items import CraigslistSampleItem
class MySpider (BaseSpider):
name = "craig"
allowed_domain = ["craigslist.org"]
start_urls = ["http://sfbay.craigslist.org/sfc/npo/"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
title = hxs.select("//p")
items = []
for titles in titles:
item = CraigslistSampleItem()
item ["title"] = titles.select("a/text()").extract()
item ["link"] = titles.select("a/#href").extract()
items.append(item)
return items
items.py
# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
from scrapy.item import Item, Field
class CraigslistSampleItem(Item):
title = Field()
link = Field()
Here's the error:
Traceback (most recent call last):
File "C:\Python27\Scripts\scrapy-script.py", line 9, in <module>
load_entry_point('scrapy==0.24.4', 'console_scripts', 'scrapy')()
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\cmdline.py"
, line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\cmdline.py"
, line 89, in _run_print_help
func(*a, **kw)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\cmdline.py"
, line 150, in _run_command
cmd.run(args, opts)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\commands\cr
awl.py", line 57, in run
crawler = self.crawler_process.create_crawler()
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\crawler.py"
, line 87, in create_crawler
self.crawlers[name] = Crawler(self.settings)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\crawler.py"
, line 25, in __init__
self.spiders = spman_cls.from_crawler(self)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\spidermanag
er.py", line 35, in from_crawler
sm = cls.from_settings(crawler.settings)
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\spidermanag
er.py", line 31, in from_settings
return cls(settings.getlist('SPIDER_MODULES'))
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\spidermanag
er.py", line 22, in __init__
for module in walk_modules(name):
File "C:\Python27\lib\site-packages\scrapy-0.24.4-py2.7.egg\scrapy\utils\misc.
py", line 68, in walk_modules
submod = import_module(fullpath)
File "C:\Python27\lib\importlib\__init__.py", line 37, in import_module
__import__(name)
File "C:\Users\Turbo\craig\craig\spiders\craig.py", line 3, in <module>
from craig.items import CraigslistSampleItem
ImportError: No module named items
Please, show position of items.py in project structure.
You should have smth like this:
craig (folder)
craig.py
items (folder)
__ init __.py
items.py

Custom python database logger, having circular import

I am trying to create my own log handler to log to db models, which extends logging.Handler
import logging
from logging import Handler
from logger.models import SearchLog
class DBHandler(Handler,object):
model = None
def __init__(self, model):
super(DBHandler, self).__init__()
mod = __import__(model)
components = name.split('.')
for comp in components[1:]:
mod = getattr(mod, comp)
self.model = mod
def emit(self,record):
log_entry = self.model(level=record.levelname, message=record.msg)
log_entry.save()
and this is the log config:
'db_search_log':{
'level': 'INFO',
'class': 'db_logger.handlers.DBHandler',
'model': 'db_logger.models.SearchLog',
'formatter': 'verbose',
}
however I am getting the follow error, see stacktrace:
Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/home/james/virtualenv/hydrogen/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 443, in execute_from_command_line
utility.execute()
File "/home/james/virtualenv/hydrogen/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/james/virtualenv/hydrogen/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 252, in fetch_command
app_name = get_commands()[subcommand]
File "/home/james/virtualenv/hydrogen/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 101, in get_commands
apps = settings.INSTALLED_APPS
File "/home/james/virtualenv/hydrogen/local/lib/python2.7/site-packages/django/utils/functional.py", line 184, in inner
self._setup()
File "/home/james/virtualenv/hydrogen/local/lib/python2.7/site-packages/django/conf/__init__.py", line 42, in _setup
self._wrapped = Settings(settings_module)
File "/home/james/virtualenv/hydrogen/local/lib/python2.7/site-packages/django/conf/__init__.py", line 135, in __init__
logging_config_func(self.LOGGING)
File "/usr/lib/python2.7/logging/config.py", line 777, in dictConfig
dictConfigClass(config).configure()
File "/usr/lib/python2.7/logging/config.py", line 575, in configure
'%r: %s' % (name, e))
ValueError: Unable to configure handler 'db_search_log': Unable to configure handler 'db_search_log': 'module' object has no attribute 'handlers'
▾
db_logger/
__init__.py
__init__.pyc
handlers.py
handlers.pyc
log_handlers.pyc
models.py
models.pyc
router.py
router.pyc
tests.py
views.py
Thanks to #istruble pointed out that that is due to circular imports of settings, how can I avoid it and still log to the database models?
I just came up with another actually more canonical way of implementing it using delayed imports, my original problem was trying to import the model inside init function:
from logging import Handler
class DBHandler(Handler,object):
model_name = None
def __init__(self, model=""):
super(DBHandler,self).__init__()
self.model_name = model
def emit(self,record):
# instantiate the model
try:
model = self.get_model(self.model_name)
except:
from logger.models import GeneralLog as model
log_entry = model(level=record.levelname, message=self.format(record))
log_entry.save()
def get_model(self, name):
names = name.split('.')
mod = __import__('.'.join(names[:-1]), fromlist=names[-1:])
return getattr(mod, names[-1])
I got a work around and I admit it looks like a hack, which uses the model injection at actual logging point like this
from logging import Handler
class DBHandler(Handler,object):
def __init__(self):
super(DBHandler, self).__init__()
def emit(self,record):
model = record.model
log_entry = model(level=record.levelname, message=record.msg)
log_entry.save()
and you log it to the correct model by doing this:
import logging
import logger.models.TheModel
logger = logging.getLogger(__name__)
logger.info(123, extra={'model':TheModel})

Categories

Resources