In my Django project, I have a situation where different views communicate via a request session data as follows:
def view_one(request):
...
request.session['my_session_key'] = data
...
def view_two(request):
...
data = request.session['my_session_key']
...
However, this has the following problems:
The key string my_session_key is not a constant so it will be hard to scale it up to other parts of the code if I start writing to and/or reading from it in other parts of my code.
As this system grows, it will become harder to identify which are the available keys being used and written in the session.
In Kotlin (which I'm more familiar with), one way to solve this would be using an extension function, like so:
var Request.my_session_key: Int
get() = this.session['my_session_key']
set(value) { this.session['my_session_key'] = value }
This way, I could now write my views as follows:
def view_one(request):
...
request.my_session_key = data
...
def view_two(request):
...
data = request.my_session_key
...
Is there any way to accomplish something similar in Python or with Django? Or, alternatively, taking a step back, what would be the best way to organize the data stored in a Django session across multiple views?
Related
I'm using Notion to store my data {client name, subscription...} in tables.
I want to extract some data using python but I can't figure out how.
For example count the total number of clients, get the total amount of subscriptions...
Could you please suggest a way to help me.
If you need to do this only once - you can export a notion page (database) to HTML, which will probably be easier to extract from.
If you want this as a weekly/daily/monthly thing - I can't help with doing that in python but zapier and automate.io would be perfect.
For fetching the data you can use notion-client. The great thing about it is that it supports both sync and async interfaces. What it lacks, though, is an easy way to navigate the data's structure (which is quite complicated in Notion)
For that you can use basic-notion. It allows you to use model classes to easily access all the properties and attributes of your Notion objects - kind of like you would with an ORM.
In your case the code might look something like this:
from notion_client import Client
from basic_notion.query import Query
from basic_notion.page import NotionPage, NotionPageList
from basic_notion.field import SelectField, TitleField, NumberField
# First define models
class MyRow(NotionPage):
name = TitleField(property_name='Name')
subscription = SelectField(property_name='Subscription')
some_number = NumberField(property_name='Some Number')
# ... your other fields go here
# See your database's schema and the available field classes
# in basic_notion.field to define this correctly.
class MyData(NotionPageList[MyRow]):
ITEM_CLS = MyRow
# You need to create an integration and get an API token from Notion:
NOTION_TOKEN = '<your-notion-api-token>'
DATABASE_ID = '<your-database-ID>'
# Now you can fetch the data
def get_data(database_id: str) -> MyData:
client = Client(auth=NOTION_TOKEN)
data = client.databases.query(
**Query(database_id=database_id).filter(
# Some filter here
MyRow.name.filter.starts_with('John')
).sorts(
# You can sort it here
MyRow.name.sort.ascending
).serialize()
)
return MyData(data=data)
my_data = get_data()
for row in my_data.items():
print(f'{row.name.get_text()} - {row.some_number.number}')
# Do whatever else you may need to do
For more info, examples and docs see:
notion-client: https://github.com/ramnes/notion-sdk-py
basic-notion: https://github.com/altvod/basic-notion
Notion API Reference: https://developers.notion.com/reference/intro
To provide a bit of context, I am building a risk model that pulls data from various different sources. Initially I wrote the model as a single function that when executed read in the different data sources as pandas.DataFrame objects and used those objects when necessary. As the model grew in complexity, it quickly became unreadable and I found myself copy an pasting blocks of code often.
To cleanup the code I decided to make a class that when initialized reads, cleans and parses the data. Initialization takes about a minute to run and builds my model in its entirety.
The class also has some additional functionality. There is a generate_email method that sends an email with details about high risk factors and another method append_history that point-in-times the risk model and saves it so I can run time comparisons.
The thing about these two additional methods is that I cannot imagine a scenario where I would call them without first re-calibrating my risk model. So I have considered calling them in init() like my other methods. I haven't only because I am trying to justify having a class in the first place.
I am consulting this community because my project structure feels clunky and awkward. I am inclined to believe that I should not be using a class at all. Is it frowned upon to create classes merely for the purpose of organization? Also, is it bad practice to call instance methods (that take upwards of a minute to run) within init()?
Ultimately, I am looking for reassurance or a better code structure. Any help would be greatly appreciated.
Here is some pseudo code showing my project structure:
class RiskModel:
def __init__(self, data_path_a, data_path_b):
self.data_path_a = data_path_a
self.data_path_b = data_path_b
self.historical_data = None
self.raw_data = None
self.lookup_table = None
self._read_in_data()
self.risk_breakdown = None
self._generate_risk_breakdown()
self.risk_summary = None
self.generate_risk_summary()
def _read_in_data(self):
# read in a .csv
self.historical_data = pd.read_csv(self.data_path_a)
# read an excel file containing many sheets into an ordered dictionary
self.raw_data = pd.read_excel(self.data_path_b, sheet_name=None)
# store a specific sheet from the excel file that is used by most of
# my class's methods
self.lookup_table = self.raw_data["Lookup"]
def _generate_risk_breakdown(self):
'''
A function that creates a DataFrame from self.historical_data,
self.raw_data, and self.lookup_table and stores it in
self.risk_breakdown
'''
self.risk_breakdown = some_dataframe
def _generate_risk_summary(self):
'''
A function that creates a DataFrame from self.lookup_table and
self.risk_breakdown and stores it in self.risk_summary
'''
self.risk_summary = some_dataframe
def generate_email(self, recipient):
'''
A function that sends an email with details about high risk factors
'''
if __name__ == "__main__":
risk_model = RiskModel(data_path_a, data_path_b)
risk_model.generate_email(recipient#generic.com)
In my opinion it is a good way to organize your project, especially since you mentioned the high rate of re-usability of parts of the code.
One thing though, I wouldn't put the _read_in_data, _generate_risk_breakdown and _generate_risk_summary methods inside __init__, but instead let the user call this methods after initializing the RiskModel class instance.
This way the user would be able to read in data from a different path or only to generate the risk breakdown or summary, without reading in the data once again.
Something like this:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
my_risk_model.generate_risk_breakdown(parameters)
my_risk_model.generate_risk_summary(other_parameters)
If there is an issue of user calling these methods in an order which would break the logical chain, you could throw an exception if generate_risk_breakdown or generate_risk_summary are called before read_in_data. Of course you could only move the generate... methods out, leaving the data import inside __init__.
To advocate more on exposing the generate... methods out of __init__, consider a case scenario, where you would like to generate multiple risk summaries, changing various parameters. It would make sense, not to create the RiskModel every time and read the same data, but instead change the input to generate_risk_summary method:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
for parameter in [50, 60, 80]:
my_risk_model.generate_risk_summary(parameter)
my_risk_model.generate_email('test#gmail.com')
I am using GAE with python, and I am using many forms. Usually, my code looks something like this:
class Handler(BaseHandler):
#...
def post(self):
name = self.request.get("name")
last_name = self.request.get("last_name")
# More variables...
n = self.request.get("n")
#Do something with the variables, validations, etc.
#Add them to a dictionary
data = dict(name=name, last_name=last_name, n=n)
info = testdb.Test(**data)
info.put()
I have noticed lately that it gets too long when there are many inputs in the form (variables), so I thought maybe I could send a stringified JSON object (which can be treated as a python dictionary using json.loads). Right now it looks like this:
class Handler(BaseHandler):
#...
def post(self):
data = validate_dict(json.loads(self.request.body))
#Use a variable like this: data['last_name']
test = testdb.Test(**data)
test.put()
Which is a lot shorter. I am inclined to do things this way (and stop using self.request.get("something")), but I am worried I may be missing some disadvantage of doing this apart from the client needing javascript for it to even work. Is it OK to do this or is there something I should consider before rearranging my code?
There is absolutely nothing wrong with your short JSON-focused code variant (few web apps today bother supporting clients w/o Javascript anyway:-).
You'll just need to adapt the client-side code preparing that POST, from being just a traditional HTML form, to a JS-richer approach, of course. But, I'm pretty sure you're aware of that -- just spelling it out!-)
BTW, there is nothing here that's App Engine - specific: the same considerations would apply no matter how you chose to deploy your server.
I've written a program in Python, which works with two distinct API to get the data from two different services (CKAN and MediaWiki).
In particular, there is a class Resource, which requests the data from the above mentioned services and process it.
At some point I've come to conclusion, that there is a need for tests for my app.
And the problem is that all examples I've found on web and in books do not deal with such cases.
For example, inside Resource class I've got a method:
def load_from_ckan(self):
"""
Get the resource
specified by self.id
from config.ckan_api_url
"""
data = json.dumps({'id': self.id})
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
url = config.ckan_api_url + '/action/resource_show'
r = requests.post(url, timeout=config.ckan_request_timeout, data=data, headers=headers)
assert r.ok, r
resource = json.loads(r.content)
resource = resource["result"]
for key in resource:
setattr(self, key, resource[key])
The load_from_ckan method get the data about resource from the CKAN API and assign it to the object. It is simple, but...
My question is: how to test the methods like this? OR What should I test here?
I thought about the possibility to pickle (save) results to HDD. Then I could load it in the test and compare with the object initialized with load_from_ckan(). But CKAN is community-driven platform and such behavior of such tests will be unpredictable.
If there exist any books on philosophy of automated tests (like what to test, what not to test, how to make tests meaningful etc.), please, give me a link to it.
With any testing, the key question really is - what could go wrong?
In your case, it looks like the three risks are:
The web API in question could stop working. But you check for this already, with assert r.ok.
You, or someone else, could make a mistaken change to the code in future (e.g. mistyping a variable) which breaks it.
The API could change, so that it no longer returns the fields or the format you need.
It feels like you could write a fairly simple test for the latter two, depending on what data from this API you actually rely on: for example, if you're expecting the JSON to have a field called "temperature" which is a floating-point Celsius number, you could write a test which calls your function, then checks that self.temperature is an instance of 'float' and is within a sensible range of values (-30 to 50?). That should leave you confident that both the API and your function are working as designed.
Typically if you want to test against some external service like this you will need to use a mock/dummy object to fake the api of the external service. This must be configurable at run-time either via the method's arguments or the class's constructor or another type of indirection. Another more complex option would be to monkey patch globals during testing, like "import requests; request.post = fake_post", but that can create more problems.
So for example your method could take an argument like so:
def load_from_ckan(self, post=requests.post):
# ...
r = post(url, timeout=config.ckan_request_timeout, data=data,
headers=headers)
# ...
Then during testing your would write your own post function that returned json results you'd see coming back from ckan. For example:
def mock_post(url, timeout=30, data='', headers=None):
# ... probably check input arguments
class DummyResponse:
pass
r = DummyResponse()
r.ok = True
r.content = json.dumps({'result': {'attr1': 1, 'attr2': 2}})
return r
Constructing the result in your test gives you a lot more flexibility than pickling results and returning them because you can fabricate error conditions or focus in on specific formats your code might not expect but you know could exist.
Overall you can see how complicated this could become so I would only start adding this sort of testing if you are experiencing repeated errors or other difficulties. This will just more code you have to maintain.
At this point, you can test that the response from CKAN is properly parsed. So you can pull the JSON from CKAN and ensure that it's returning data with the attributes you're interested in.
How do you normally load and store stuff from the DB in global constants for caching during initialisation? The global constants will not change again later.
Do you just make the DB query during load time and put it in a constant, or use a lazy loading mechanism of some sort?
What I have in mind is code in the global scope like this:
SPECIAL_USER_GROUP = Group.objects.get(name='very special users')
OTHER_THING_THAT_DOESNT_CHANGE = SomeDbEnum.objects.filter(is_enabled=True)
# several more items like this
I ran into issues doing that when running tests using an empty test database. An option would be to put all the needed data in fixtures, but I want to avoid coupling each individual test with irrelevant data they don't need.
Would the following be considered good style?
#memoize
def get_special_user_group():
return Group.objects.get(name='very special users')
Or would a generic reusable mechanism be preferred?
Django has a cache framework that you could use.
http://docs.djangoproject.com/en/dev/topics/cache/
It's got a low level caching api that does what you want.
from django.core.cache import cache
cache.set('my_key', 'hello, world!', 30)
cache.get('my_key')
To use it, you'd do something like
if cache.get("key"):
return cache.get("key")
else:
value = some_expensive_operation()
cache.set("key",value)
return value
Using something like this will give you more flexibility in the future.
An option would be to put all the needed data in fixtures,
Good thinking.
but I want to avoid coupling each individual test with irrelevant data they don't need.
Then define smaller fixtures.
If necessary, use the TestCase setUp method to create the necessary database row.