Python- How to pass Feedparser object to a celery task? - python

I've using feedparser module to parse the RSS feeds. I need to pass the feedparser object to a celery task.
Upon trying to pass the object, I receive an error saying time.struct_time(tm_year=2015, tm_mon=2, tm_mday=12, tm_hour=8, tm_min=19, tm_sec=11, tm_wday=3, tm_yday=43, tm_isdst=0) is not JSON serializable
How do I pass the feedparser object to a celery task?
Here is my code:-
rss_content = feedparser.parse(rss_link)
content_entries = rss_content['entries']
for content in content_entries:
parse_link.apply_async(args=[content, link, json_id, news_category], queue= news_source) #celery task
How do I do it?

You need to create your custom encoder and decoder that will basically convert your time.time_struct object into something serializable (a dict), and then register them in the kombu serializer registry as described in the docs, in order to let celery use your new serializer in its task.
import json
import time
import types
import datetime
class FeedContentEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, time_struct):
epoch = int(time.mktime(time_struct))
return {'__type__': '__time__', 'time': epoch}
else:
return json.FeedContentEncoder.default(self, obj)
def decode_feed_content(obj):
if isinstance(obj, types.DictionaryType) and '__type__' in obj:
if obj['__type__'] == '__time__':
return datetime.datetime.fromtimestamp(obj['time']).timetuple()
return obj
You need to notify kombu about your new serialization by registering them into the serializer registry.
from kombu.serialization import register
def feed_content_json_dumps(obj):
return json.dumps(obj, cls=FeedContentEncoder)
def feed_content_json_loads(obj):
return json.loads(obj, object_hook=decode_feed_content)
register('feedcontentjson',
feed_content_json_dumps,
feed_content_json_loads,
content_type='application/x-feedcontent-json',
content_encoding='utf-8')
Finally, you should tell celery to use the new serializer for serializing the task just like the celery docs; you should call your task with the serializer parameter.
parse_link.apply_async(args=[content, link, json_id, news_category], queue= news_source, serializer='feedcontentjson')
Hope this helps.

Related

How to use Airflow ExternalTaskSensor as a SmartSensor?

I'm trying to implement the ExternalTaskSensor using SmartSensors but since it uses execution_date to poke for the other DAG status I can't seem to be able to pass it, if I omit it from my SmartExternalSensor it says that there is a KeyError with the execution_date, since it doesn't exist.
I tried overriding the get_poke_context method
def get_poke_context(self, context):
result = super().get_poke_context(context)
if self.execution_date is None:
result['execution_date'] = context['execution_date']
return result
but It now says that the datetime object is not json serializable (this is done while registering the sensor as a SmartSensor using json.dumps) and runs as a normal sensor. If I pass directly the string of that datetime object it says that str object has no isoformat() method so I know the execution date must be a datetime object.
Do you guys have any idea on how to work around this?
I get similar issues trying to use ExternalTaskSensor as a SmartSensor. This below hasn't been tested extensively, but seems to work.
import datetime
from airflow.sensors.external_task import ExternalTaskSensor
from airflow.utils.session import provide_session
class SmartExternalTaskSensor(ExternalTaskSensor):
# Something a bit odd happens with ExternalTaskSensor when run as a smart
# sensor. ExternalTaskSensor requires execution_date in the poke context,
# but the smart sensor system passes all poke context values to the
# constructor of ExternalTaskSensor, but it doesn't allow execution_date
# as an argument. So we add it...
def __init__(self, execution_date=None, **kwargs):
super().__init__(**kwargs)
def get_poke_context(self, context):
return {
'external_dag_id': self.external_dag_id,
'external_task_id': self.external_task_id,
'timeout': self.timeout,
'check_existence': self.check_existence,
# ... but execution_date has to be manually extracted from the
# context, and converted to a string, since it will be JSON
# encoded by the smart sensor system...
'execution_date': context['execution_date'].isoformat(),
}
#provide_session
def poke(self, context, session=None):
return super().poke(
{
**context,
# ... and then converted back to a datetime object since
# that's what ExternalTaskSensor poke needs
'execution_date': datetime.datetime.fromisoformat(
context['execution_date']
),
},
session,
)

Flask.session doesn't store data

I try to save serialized BaseQuery object to the flask.session['t'] object but every time the get request is sent to this endpoint the session.get('t') is None.
q_transactions is non-empty list.
Could you help me to understand why it behaves that way? Did I miss or missunderstand something?
# app.py
#bp.route('/testing')
def testing():
import sys
from flask import session
from sqlalchemy.ext.serializer import loads, dumps
form = FiltersForm(request.args)
if session.get('t') is not None:
print('|'*80)
print(loads(session.get('t')), file=sys.stdout)
print('|'*80)
filters = form.data
q_transactions = current_user.transactions()
q_transactions = Transaction.apply_filters(q_transactions, filters)
session['t'] = dumps(q_transactions)
return render_template('test_edit_transaction.html', form=form)
EDIT:
The issue is probably caused by too large data.

How to Mock __init__ and resolvers with various arguments

I've seen a lot of examples of mock tests but none that show how one could mock something like <graphql.execution.base.ResolveInfo object at 0x106f002a8>
For instance, if I wanted to test that the two methods in this class were working properly, how would I mock the values that are being passed in?
class mySearch(graphene.ObjectType):
my_search = graphene.Field(
MySearchWrapper,
query = graphene.String(description="Search query")
)
def __init__(self, args, context, info):
super(mySearch, self).__init__()
def resolve_my_search(self, args, context, info):
return promisify(MySearchWrapper, args, context, info)
The __init__ method returns:
args: {}, context: <Request 'http://localhost:8080/graphql' [POST]>, info: <graphql.execution.base.ResolveInfo object at 0x106f000c8>
And the resolve_my_search method returns:
args: {'page_type': [u'MyCorgi', u'YourCorgi'], 'query': u'Corgi family', 'domain': u'corgidata.com'}, context: <Request 'http://localhost:8080/graphql' [POST]>, info: <graphql.execution.base.ResolveInfo object at 0x106f002a8>
I know I can mock the dictionary value with mock_args.json.return_value but... not sure about the Request and object. Any ideas? Guidance? I've already spent a week on this and have found no way out.
Use unittest.mock python package. Here is an example:
from unittest import mock
info = mock.create_autospec(graphql.execution.base.ResolveInfo)
info.context.user = AnonymousUser()
Here is how I did it. You can tweak it to mock the GraphQLResolveInfo but for this use case I only needed to mock the Request object.
from requests import Requests
from unittest.mock import patch
from starlette.authentication import AuthenticationError
#patch("grapql.type.definition.GraphQLResolveInfo.context")
def test_func(context_mock):
with pytest.raises(AuthenticationError):
context_mock.side_effect = Request()
func_to_be_tested(resolver("fake_source", info_mock, data_inputs))

How do I turn MongoDB query into a JSON?

for p in db.collection.find({"test_set":"abc"}):
posts.append(p)
thejson = json.dumps({'results':posts})
return HttpResponse(thejson, mimetype="application/javascript")
In my Django/Python code, I can't return a JSON from a mongo query because of "ObjectID". The error says that "ObjectID" is not serializable.
What do I have to do?
A hacky way would be to loop through:
for p in posts:
p['_id'] = ""
The json module won't work due to things like the ObjectID.
Luckily PyMongo provides json_util which ...
... allow[s] for specialized encoding and
decoding of BSON documents into Mongo
Extended JSON's Strict mode. This lets
you encode / decode BSON documents to
JSON even when they use special BSON
types.
Here is a simple sample, using pymongo 2.2.1
import os
import sys
import json
import pymongo
from bson import BSON
from bson import json_util
if __name__ == '__main__':
try:
connection = pymongo.Connection('mongodb://localhost:27017')
database = connection['mongotest']
except:
print('Error: Unable to Connect')
connection = None
if connection is not None:
database["test"].insert({'name': 'foo'})
doc = database["test"].find_one({'name': 'foo'})
return json.dumps(doc, sort_keys=True, indent=4, default=json_util.default)
It's pretty easy to write a custom serializer which copes with the ObjectIds. Django already includes one which handles decimals and dates, so you can extend that:
from django.core.serializers.json import DjangoJSONEncoder
from bson import objectid
class MongoAwareEncoder(DjangoJSONEncoder):
"""JSON encoder class that adds support for Mongo objectids."""
def default(self, o):
if isinstance(o, objectid.ObjectId):
return str(o)
else:
return super(MongoAwareEncoder, self).default(o)
Now you can just tell json to use your custom serializer:
thejson = json.dumps({'results':posts}, cls=MongoAwareEncoder)
Something even simpler which works for me on Python 3.6 using
motor==1.1
pymongo==3.4.0
from bson.json_util import dumps, loads
for mongo_doc in await cursor.to_list(length=10):
# mongo_doc is a <class 'dict'> returned from the async mongo driver, in this acse motor / pymongo.
# result of executing a simple find() query.
json_string = dumps(mongo_doc)
# serialize the <class 'dict'> into a <class 'str'>
back_to_dict = loads(json_string)
# to unserialize, thus return the string back to a <class 'dict'> with the original 'ObjectID' type.

Importing Model / Lib Class and calling from controller

I'm new to python and pylons although experienced in PHP.
I'm trying to write a model class which will act as a my data access to my database (couchdb). My problem is simple
My model looks like this and is called models/BlogModel.py
from couchdb import *
class BlogModel:
def getTitles(self):
# code to get titles here
def saveTitle(self):
# code to save title here
My controller is called controllers/main.py
import logging
from pylons import request, response, session, tmpl_context as c
from pylons.controllers.util import abort, redirect_to
from billion.lib.base import BaseController, render
log = logging.getLogger(__name__)
from billion.model import BlogModel
class MainController(BaseController):
def index(self):
return render('/main.mako')
In my index action, how do I access the method getTitles() in BlogModel?
I've tried
x = BlogModel()
x.getTitles()
But i get
TypeError: 'module' object is not callable
Also
BlogModel.getTitles() results in
AttributeError: 'module' object has no attribute 'getTitles'
Is this down to the way I'm including the class ? Can someone tell me the best way to do this ?
thanks
x = BlogModel.BlogModel()
Or, more verbosely:
After you did the import, you have an object in your namespace called 'BlogModel'. That object is the BlogModel module. (The module name comes from the filename.) Inside that module, there is a class object called 'BlogModel', which is what you were after. (The class name comes from the source code you wrote.)
Instead of:
from billion.model import BlogModel
You could use:
from billion.model.BlogModel import BlogModel
then your
x = BlogModel()
would work.

Categories

Resources