Items remain in the lists even after using del statement - python

I'm building an app using Python and Django and I'm facing a problem with some variables.
Below are simplified portions from my code to explain the problem (you can access the complete code in the repo):
# resources.py
class HREmployeeResource(ModelResource):
def after_import_row(self, row, row_result, row_number=None, **kwargs):
row_result.employee_code = row.get('Employee Code')
if not kwargs.get('dry_run'):
import_type = row_result.import_type
employee_code = row_result.employee_code
instance = kwargs.get('instance')
# we need the ones belong to these 2 companies in the report
if instance.company in ['Company 1', 'Company 2']:
self.report.setdefault(import_type, []).append(employee_code)
# views.py
import gc
from resources import HREmployeeResource
def import_hr_file(request):
if request.method == 'POST':
hr_file = request.FILES['hr-file']
data = get_import_data(hr_file)
hr_resource = HREmployeeResource()
result = hr_resource.import_data(data, dry_run=True)
if not result.has_errors():
result = hr_resource.import_data(data, dry_run=False)
ImportStatus.objects.create(
date=timezone.now(),
status='S1',
data=hr_resource.report
)
del result, hr_resource, data
gc.collect()
return redirect('core:import-report')
return render(request, 'core/upload-hr-file.html', {})
The ModelResource class is from a third-party library called django-import-export and the import_data method belongs to this class, you could find the code of the method here.
When I run the view import_hr_file for the first time everything is working fine, but when I execute it the second time I find that the old items still exist in the lists of the report dict of the instance of HREmployeeResource class.
Originally I'm using these two lines (below) to free up the memory after the import process.
del result, hr_resource, data
gc.collect()
But the memory is not freed, and now I found this problem (the old items of the previous run remain every time I run the view import_hr_file)
So, how can I fix this?

Related

Create activities for an user with Stream-Framework

I'm trying to setup stream-framework the one here not the newer getstream. I've setup the Redis server and the environment properly, the issue I'm facing is in creating the activities for a user.
I've been trying to create activities, following the documentation to add an activity but it gives me an error message as follows:
...
File "/Users/.../stream_framework/activity.py", line 110, in serialization_id
if self.object_id >= 10 ** 10 or self.verb.id >= 10 ** 3:
AttributeError: 'int' object has no attribute 'id'
Here is the code
from stream_framework.activity import Activity
from stream_framework.feeds.redis import RedisFeed
class PinFeed(RedisFeed):
key_format = 'feed:normal:%(user_id)s'
class UserPinFeed(PinFeed):
key_format = 'feed:user:%(user_id)s'
feed = UserPinFeed(13)
print(feed)
activity = Activity(
actor=13, # Thierry's user id
verb=1, # The id associated with the Pin verb
object=1, # The id of the newly created Pin object
)
feed.add(activity) # Error at this line
I think there is something missing in the documentation or maybe I'm doing something wrong. I'll be very grateful if anyone helps me get the stream framework working properly.
The documentation is inconsistent. The verb you pass to the activity should be (an instance of?*) a subclass of stream_framework.verbs.base.Verb. Check out this documentation page on custom verbs and the tests for this class.
The following should fix the error you posted:
from stream_framework.activity import Activity
from stream_framework.feeds.redis import RedisFeed
from stream_framework.verbs import register
from stream_framework.verbs.base import Verb
class PinFeed(RedisFeed):
key_format = 'feed:normal:%(user_id)s'
class UserPinFeed(PinFeed):
key_format = 'feed:user:%(user_id)s'
class Pin(Verb):
id = 5
infinitive = 'pin'
past_tense = 'pinned'
register(Pin)
feed = UserPinFeed(13)
activity = Activity(
actor=13,
verb=Pin,
object=1,
)
feed.add(activity)
I quickly looked over the code for Activity and it looks like passing ints for actor and object should work. However, it is possible that these parameters are also outdated in the documentation.
* The tests pass in classes as verb. However, the Verb base class has the methods serialize and __str__ that can only be meaningfully invoked if you have an object of this class. So I'm still unsure which is required here. It seems like in the current state, the framework never calls these methods, so classes still work, but I feel like the author originally intended to pass instances.
With the help of great answer by #He3lixxx, I was able to solve it partially. As the package is no more maintained, the package installs the latest Redis client for python which was creating too many issues so by installation redis-2.10.5 if using stream-framework-1.3.7, should fix the issue.
I would also like to add a complete guide to properly add activity to a user feed.
Key points:
If you are not using feed manager, then make sure to first insert the activity before you add it to the user with feed.insert_activity(activity) method.
In case of getting feeds with feed[:] throws an error something like below:
File "/Users/.../stream_framework/activity.py", line 44, in get_hydrated
activity = activities[int(self.serialization_id)]
KeyError: 16223026351730000000001005L
then you need to clear data for that user using the key format for it in my case the key is feed:user:13 for user 13, delete it with DEL feed:user:13, In case if that doesn't fix the issue then you can FLUSHALL which will delete everything from Redis.
Sample code:
from stream_framework.activity import Activity
from stream_framework.feeds.redis import RedisFeed
from stream_framework.verbs import register
from stream_framework.verbs.base import Verb
class PinFeed(RedisFeed):
key_format = 'feed:normal:%(user_id)s'
class UserPinFeed(PinFeed):
key_format = 'feed:user:%(user_id)s'
class Pin(Verb):
id = 5
infinitive = 'pin'
past_tense = 'pinned'
register(Pin)
feed = UserPinFeed(13)
print(feed[:])
activity = Activity(
actor=13,
verb=Pin,
object=1)
feed.insert_activity(activity)
activity_id = feed.add(activity)
print(activity_id)
print(feed[:])

refactoring function to have a robust design

i am having a simple app example here:
say i have this piece of code which handles requests from user to get a list of books stored in a database.
from .handlers import all_books
#apps.route('/show/all', methods=['GET'])
#jwt_required
def show_books():
user_name = get_jwt_identity()['user_name']
all_books(user_name=user_name)
and in handlers.py i have :
def all_books(user_name):
db = get_db('books')
books = []
for book in db.books.find():
books.append(book)
return books
but while writing unit tests i realised if i use get_db() inside all_books() it would be harder to unit test the method.
so i thought this would be the good way.
from .handlers import all_books
#apps.route('/show/all', methods=['GET'])
#jwt_required
def show_books():
user_name = get_jwt_identity()['user_name']
db = get_db('books')
collection = db.books
all_books(collection=collection)
def all_books(collection):
books = []
for book in collection.find():
books.append(book)
return books
i want to know what is the good design to use?
have all code doing one thing at one place like the first example or the second example is good.
To me first one seems more clear as it has all related logic at one place. but its easier to pass a fake collection in second case to unit test it.
you should probably use the mock library see: https://docs.python.org/3/library/unittest.mock.html#quick-guide
(if you use python2 you will need pip install mock)
def test_it():
from unittest.mock import Mock,patch
with patch.object(get_db,'function',Mock(return_value=Mock(books=[1,2,3]))) as mocked_db:
x = get_db("ASDASD")
console.log(x.books)
# you can also do cool stuff like this
assert mocked_db.calledwith("ASDASD")
of coarse for yours you will have to construct a slightly more complex object
my_mocked_get_db = Mock(return_value=Mock(books=Mock(find=[1,2,3,4])))
with patch.object(get_db,'function',my_mocked_get_db) as mocked_db:
x = get_db("ASDASD")
print(x.books.find())

Python: import of Modules with mutual dependency

I have two files rest_api.py and Contact.py. Contact is similar to a domain object (contains Contact class), while rest_api has functions for setting up the application.
In rest_api I have the following lines:
from Contact import Contact
...
client = MongoClient('localhost',27017)
collection = client.crypto_database.test_collection
def dbcollection(){
return collection
}
...
api.add_resource(Contact,'/contact/<string:contact_id>')
In Contact I try to do the following:
from rest_api import dbcollection
class Contact(Resource):
def get(self,contact_id):
result = {}
result['data'] = dbcollection.find_one({'contact_id':contact_id})
result['code'] = 200 if result['data'] else 404
return make_response(dumps(result), result['code'],{"Content-type": "application/json"})
This fails with the following error:
ImportError: cannot import name Contact
What is the correct way of importing contact, so that it can also use variables/functions from rest_api?
p.s If I move the collection code to a different file, and import that file instead things work, but I assume there is some other way..
This is a circular import dependency, which cannot be solved as such. The problem is, that importing a python module really runs the code, which has to follow some order, one of the modules has to go first.
I would say that having the support code in a different file would be the proper way to go.
In this case however, the dbcollection is not actually needed at import time. Thus you can solve this by removing the import from the module level, into the get function. For example
class Contact(Resource):
def get(self,contact_id):
from rest_api import dbcollection
result = {}
result['data'] = dbcollection.find_one({'contact_id':contact_id})
result['code'] = 200 if result['data'] else 404
return make_response(dumps(result), result['code'],{"Content-type": "application/json"})
A similar approach would be the following:
import rest_api
class Contact(Resource):
def get(self,contact_id):
result = {}
result['data'] = rest_api.dbcollection.find_one({'contact_id':contact_id})
result['code'] = 200 if result['data'] else 404
return make_response(dumps(result), result['code'],{"Content-type": "application/json"})
This should work, as python does some effort to resolve circular import dependencies: When it start importing a module, it creates an empty module dict for that one. Then when it finds a nested import, it proceeds with that one. If that in turn imports a module that is already in the import process it just skips it. Thus at the time Contact.py is being loaded, the import rest_api just takes the module dict that is already there. Since it does not contain dbcollection yet, from rest_api import dbcollection fails. A simple import rest_api does work however, since it's member is only addressed after Contact.py finishes importing (unless you call Contact.get at module level from within).
Assuming you haven't a clue which modules are going to import which ever other ones, you can track that yourself and NOT do the import.
In your __init__.py, define these --
__module_imports__ = {}
def requires_module(name):
return name not in __module_imports__
def importing_module(name):
__module_imports__[name] = True
Then, at the top of each class file, where you define your classes add the following to this_module.py:
from my_modules import requires_module, importing_module
importing('ThisModule')
if requires_module('ThatModule')
from my_modules.that_module import ThatModule
class ThisModule:
""" Real Stuff Goes Here """
pass
and this to that_module.py:
from my_modules import requires_module, importing_module
importing('ThatModule')
if requires_module('ThisModule')
from my_modules.this_module import ThisModule
class ThatModule:
""" Real Stuff Goes Here """
pass
now you get your imports regardless of which gets imported first or whatever.

memcache.Client not setting cache values on GAE python

I am trying to add memcache to my webapp deployed on GAE, and to do this I am using memcache.Client() to prevent damage from any racing conditions:
from google.appengine.api import memcache
client = memcache.Client()
class BlogFront(BlogHandler):
def get(self):
global client
val = client.gets(FRONT_PAGE_KEY)
posts = list()
if val is not None:
posts = list(val)
else:
posts = db.GqlQuery("select * from Post order by created desc limit 10")
client.cas(FRONT_PAGE_KEY, list(posts))
self.render('front.html', posts = posts)
To test the problem I have a front page for a blog that displays the 10 most recent entries. If there is nothing in the cache, I hit the DB with a request, otherwise I just present the cached results to the user.
The problem is that no matter what I do, I always get val == None, thus meaning that I always hit the database with a useless request.
I have sifted through the documentation:
https://developers.google.com/appengine/docs/python/memcache/
https://developers.google.com/appengine/docs/python/memcache/clientclass
http://neopythonic.blogspot.pt/2011/08/compare-and-set-in-memcache.html
And it appears that I am doing everything correctly. What am I missing?
(PS: I am a python newb, if this is a retarded error, please bear with me xD )
from google.appengine.api import memcache
class BlogFront(BlogHandler):
def get(self):
client = memcache.Client()
client.gets(FRONT_PAGE_KEY)
client.cas(FRONT_PAGE_KEY, 'my content')
For a reason I cannot yet possible understand, the solution lies in having a gets right before having a cas call ...
I think I will stick with the memcache non-thread-safe version of the code for now ...
I suspect that the client.cas call is failing because there is no object. Perhaps client.cas only works to update and existing object (not to set a new object if there is none currently)? You might try client.add() (which will fail if an object already exists with the specified key, which I think is what you want?) instead of client.cas()

django list not destroyed between views

I am creating a list from some of the model data but I am not doing it correctly, it works but when I refresh the page in the broswer reportResults just gets added to. I hoped it would get garbage collected between requests but obviously I am doing something wrong, any ideas anyone??
Thanks,
Ewan
reportResults = [] #the list that doesn't get collected
def addReportResult(fix,description):
fix.description = description
reportResults.append(fix)
def unitHistory(request,unitid, syear, smonth, sday, shour, fyear, fmonth, fday, fhour, type=None):
waypoints = Fixes.objects.filter(name=(unitid))
waypoints = waypoints.filter(gpstime__range=(awareStartTime, awareEndTime)).order_by('gpstime')[:1000]
if waypoints:
for index in range(len(waypoints)):
...do stuff here selecting some waypoints and generating "description" text
addReportResult(waypointsindex,description) ##append the list with this, adding a text description
return render_to_response('unitHistory.html', {'fixes': reportResults})
You are reusing the same list each time, to fix it you need to restructure your code to create a new list on every request. This can be done in multiple ways and this is one such way:
def addReportResult(reportResults, fix,description):
fix.description = description
reportResults.append(fix)
def unitHistory(request,unitid, syear, smonth, sday, shour, fyear, fmonth, fday, fhour, type=None):
reportResults = [] # Here we create our local list that is recreated each request.
waypoints = Fixes.objects.filter(name=(unitid))
waypoints = waypoints.filter(gpstime__range=(awareStartTime, awareEndTime)).order_by('gpstime')[:1000]
if waypoints:
for index in range(len(waypoints)):
# Do processing
addReportResult(reportResults, waypointsindex, description)
# We pass the list to the function so it can use it.
return render_to_response('unitHistory.html', {'fixes': reportResults})
If the addReportResult stays small you could also inline the description attribute set by removing the call to addReportResult altogether and doing the waypointsindex.description = description at the same position.
Just so you're aware of the life cycle of requests, mod_wsgi will keep a process open to service multiple requests. That process gets recycled every so often but it is definitely not bound to a single request as you've assumed.
That means you need a local list. I would suggest moving the addReportResult function contents directly in-line, but that's not a great idea if it needs to be reusable or if the function is too long. Instead I'd make that function return the item, and you can collect the results locally.
def create_report(fix, description): # I've changed the name to snake_casing
fix.description = description
return fix
def unit_history(request,unitid, syear, smonth, sday, shour, fyear, fmonth, fday, fhour, type=None):
reports = []
waypoints = Fixes.objects.filter(name=(unitid))
waypoints = waypoints.filter(gpstime__range=(awareStartTime, awareEndTime)).order_by('gpstime')[:1000]
if waypoints:
for index in range(len(waypoints)):
report = create_report(waypointsindex, description)
reports.append(report)
return render_to_response('unitHistory.html', {'fixes': reportResults})

Categories

Resources