hope I can express myself clearly here, bear with me..
Say I have a retail app that sells t-shirts. Each t-shirt can be in a pre-sale, or in-sale phase.
Each time a user goes to a particular t-shirt-page, I can compare the datetime now versus the datetime of when it goes on sale and determine whether it is pre-sale or in-sale and output the appropriate data/content.
Instead, I can have a "phase" string property on my t-shirt, initially set at "presale". I can then set a task queue to execute when the sale starts, and switch the "phase" property of the t-shirt from "presale" to "in-sale". When a user visits a t-shirt page, I check the string, whether it is "presale" or "insale" and output the appropriate data/content.
My question, is one method preferred over the other? I'd assume that the first method, which is a datetime calculation/comparison, would be less efficient than the 2nd method which is based on string comparison? However, the 2nd method requires the use of task queues which adds overhead/cost?
The first thing that comes to my mind is configuration. It's local, it's just reading a file, and it does not compare anything.
Okay, I'll use a structure I've used a lot on my Django and Flask projects. My folder structure usually look like this:
main/
-app/
|_ __init__.py
|_ app.py
-templates/
-settings/
|_ __init__.py
|_ settings.py
|_ settings.json
_ main.py
My settings.py file is where the variables are. It's value it's huge, but it's implementation is simple.
Let's see a example:
#settings.py
import json
path_of_json= "path here"
class Settings:
#Object interface of the settings
class _Product:
# Non-public class that implements a object interface for the products
def __init__(self,name,sale_status):
# Inicialization
self.name = name
self.sale_status = sale_status
def __init__(self):
#Handles what happens at initialization
global path_of_json
self.dict= self._load_json(path_of_json) #this dictionary will hold the data we load from the JSON
self.product1 = self._Product(self.dict['product1_name'],self.dict['product1_sale_status'])
#Here we use the data that were loaded from the JSON
self.product2 = self._Product(self.dict['product2_name'],self.dict['product2_sale_status'])
def _load_json(self):
#Here you implement the JSON loading
pass
That's our configuration file.
Now to the main app, and how to make this do something
#main.py
from settings.settings import Settings
my_settings = Settings()
product1 = { "name":my_settings.product1.name,"sale_status":my_settings.product1.sale_status}
#Here the product was loaded from the settings
#A lot of awesome code here
def render_product_page(product): #This function receives a dictionary
render_template("sale_status-insert how your template engine show fields here", product["name"])
That's how it's (kinda) done.
Now all that you need to implement is a little daemon that wakes each 24 hours, check the data on the settings.json file, and if it's outdated, update it. (This I'll leave for you XD)
Hope this helps :)
Related
The Challenge
I am working on a Python project that will act as a translation layer for the SCPI command line interface on scientific instruments.
It is similar to the concept described in Multi layer package in python however my situation is slightly more complex and I can't seem to figure out how to make it work.
The following is a representation of my project structure (I am happy to change it if it is required):
Some things to keep in mind
The names of the files in translator_lib.instruments.supplierX.moduleX are only class1.py, class2.py and class3.py, the rest of the filename is for reference to describe where they are derived from.
Many of the modules and classes inside of supplier1 and supplier2 have the same names (as in the example).
Every directory contains a __init__.py file
The __init__.py files (based on the example layout) look as follows (Note I only have one module for testing)
translator_lib\__init__.py
from .instruments import supplier1
__all__ = ['supplier1']
translator_lib\instruments\__init__.py
from .supplier1 import module1
__all__ = ['module1']
What I'm trying to do
Compile three libraries called my_translator_lib, supplier1_translator_lib and supplier2_translator_lib.
Reason
The development team would import my_translator_lib to do what they need to, but if we want to send sample code to supplier1, we want to send them the supplier1_translator_lib and they should only be able to import supplier1_translator_lib
Example 1 : Developer
from translator_lib.instruments import supplier1
from translator_lib.instruments import supplier2
class DoStuff:
__init__(self):
self.sup1_class1 = supplier1.module1.class1.class1()
self.sup2_class1 = supplier2.module1.class1.class1()
Example 2 : Supplier 1
from supplier1_translator_lib import module1
from supplier1_translator_lib import module2
class DoStuff:
__init__(self):
self.class1 = module1.class1.class1()
self.class2 = module1.class2.class2()
I've tried multiple combinations and sections from How to create a Python library and Deep dive: Create and publish your first Python library. I manage to create a library and install it, but ultimately I can only import my_translator_lib and nothing else is visible or available.
Any help in this regard would be truly appreciated.
have you created the __init__.py files?
you need to have then in every subfolder to your module.
Files named __init__.py are used to mark directories on disk as Python package directories. Try to follow this:
mydir/spam/__init__.py
mydir/spam/module.py
If this not solve your problem, as a last resource you can try to sys.path.append('path_to_other_modules')
I have spent few recent days to learn how to structure data science project to keep it simple, reusable and pythonic. Sticking to this guideline I have created my_project. You can see it's structure below.
├── README.md
├── data
│ ├── processed <-- data files
│ └── raw
├── notebooks
| └── notebook_1
├── setup.py
|
├── settings.py <-- settings file
└── src
├── __init__.py
│
└── data
└── get_data.py <-- script
I defined a function that loads data from .data/processed. I want to use this function in other scripts and also in jupyter notebooks located in .notebooks.
def data_sample(code=None):
df = pd.read_parquet('../../data/processed/my_data')
if not code:
code = random.choice(df.code.unique())
df = df[df.code == code].sort_values('Date')
return df
Obviously this function won't work anywhere unless I run it directly in the script where it is defined.
My idea was to create settings.py where I'd declare:
from os.path import join, dirname
DATA_DIR = join(dirname(__file__), 'data', 'processed')
So now I can write:
from my_project import settings
import os
def data_sample(code=None):
file_path = os.path.join(settings.DATA_DIR, 'my_data')
df = pd.read_parquet(file_path)
if not code:
code = random.choice(df.code.unique())
df = df[df.code == code].sort_values('Date')
return df
Questions:
Is this common practice to refer to files in this way? settings.DATA_DIR looks kinda ugly.
Is this at all how settings.py should be used? And should it be placed in this directory? I have seen it in different spot in this repo under .samr/settings.py
I understand that there might not be 'one right answer', I just try to find logical, elegant way of handling these things.
I'm maintaining a economics data project based on DataDriven Cookiecutter, which I feel is a great template.
Separating you data folders and code seems as an advantage to me, allowing to treat your work as directed flow of tranformations (a 'DAG'), starting with immutable intiial data, and going to interim and final results.
Initially, I reviewed pkg_resources, but declined using it (long syntax and short of understanding cretaing a package) in favour of own helper functions/classes that navigate through directory.
Essentially, the helpers do two things
1. Persist project root folder and some other paths in constansts:
# shorter version
ROOT = Path(__file__).parents[3]
# longer version
def find_repo_root():
"""Returns root folder for repository.
Current file is assumed to be:
<repo_root>/src/kep/helper/<this file>.py
"""
levels_up = 3
return Path(__file__).parents[levels_up]
ROOT = find_repo_root()
DATA_FOLDER = ROOT / 'data'
UNPACK_RAR_EXE = str(ROOT / 'bin' / 'UnRAR.exe')
XL_PATH = str(ROOT / 'output' / 'kep.xlsx')
This is similar to what you do with DATA_DIR. A possible weak point is that here I
manually hardcode the relative location of helper file in relation to project root. If the helper file location is moved, this needs to be adjusted. But hey, this the same way it is done in Django.
2. Allow access to specific data in raw, interim and processed folders.
This can be a simple function returning a full path by a filename in a folder, for example:
def interim(filename):
"""Return path for *filename* in 'data/interim folder'."""
return str(ROOT / 'data' / 'interim' / filename)
In my project I have year-month subfolders for interim and processed directories and I address data by year, month and sometimes frequency. For this data structure I have
InterimCSV and ProcessedCSV classes that give reference specific paths, like:
from . helper import ProcessedCSV, InterimCSV
# somewhere in code
csv_text = InterimCSV(self.year, self.month).text()
# later in code
path = ProcessedCSV(2018,4).path(freq='q')
The code for helper is here. Additionally the classes create subfolders if they are not present (I want this for unittest in temp directory), and there are methods for checking files exist and for reading their contents.
In your example, you can easily have root directory fixed in setting.py,
but I think you can go a step forward with abstracting your data.
Currently data_sample() mixes file access and data transformations, not a great sign, and also uses a global name, another bad sign for a function. I suggest you may consider following:
# keep this in setting.py
def processed(filename):
return os.path.join(DATA_DIR, filename)
# this works on a dataframe - your argument is a dataframe,
# and you return a dataframe
def transform_sample(df: pd.DataFrame, code=None) -> pd.DataFrame:
# FIXME: what is `code`?
if not code:
code = random.choice(df.code.unique())
return df[df.code == code].sort_values('Date')
# make a small but elegant pipeline of data transfomation
file_path = processed('my_data')
df0 = pd.read_parquet(file_path)
df = transform_sample(df0)
As long as you are not committing lots of data and you make clear the difference between snapshots of the uncontrolled outside world and your own derived data (code + raw) == state. It is sometimes useful to use append-only-ish raw and think about symlinking steps like raw/interesting_source/2018.csv.gz -> raw_appendonly/interesting_source/2018.csv.gz.20180401T12:34:01 or some similar pattern to establish a "use latest" input structure. Try to clearly separate config settings (my_project/__init__.py, config.py, settings.py or whatever) that might need to be changed depending on env (imagine swapping out fs for blobstore or whatever). setup.py is usually at the top level my_project/setup.py and anything related to runnable stuff (not docs, examples not sure) in my_project/my_project. Define one _mydir = os.path.dirname(os.path.realpath(__file__)) in one place (config.py) and rely on that to avoid refactoring pain.
No, the use of a settings.py is only common practice if you are using Django. As far as referencing the data directory this way, it depends on if you want users to be able to ever change this value. The way you have it set up to change the value requires editing of the settings.py file. If you want users to have a default but also be able to easily change it as they use your function, just create the base path value inline and make it the default in the def data_sample(..., datadir=filepath):.
You can open a file by using open() and save it in a variable and keep using the variable wherever u wish to refer to the file.
with open('Test.txt','r') as f:
or
f=open('Test.txt','r')
and use f to refer to the file.
If you want the file to be both readable & writable, you can use r+ in place of r.
I was wondering if it were possible, and preferably not too difficult, to use Django DiscoverRunner to delete my media directory between every test, including once at the very beginning and once at the very end. I was particularly interested in the new attributes "test_suite" and "test_runner" that were introduced in Django 1.7 and was wondering if they would make this task easier.
I was also wondering how I can make the test specific MEDIA_ROOT a temporary file, currently I have a regular MEDIA_ROOT called "media" and a testing MEDIA_ROOT called "media_test" and I use rmtree in setup and tearDown of every test class that involves the media directory. The way I specify which MEDIA_ROOT to use is in my test.py settings file, currenly I just have:
MEDIA_ROOT = normpath(join(DJANGO_ROOT, 'media_test'))
Is there a way I can set MEDIA_ROOT to a temporary directory named "media" instead?
This question is a bit old, my answer is from Django 2.0 and Python 3.6.6 or later. Although I think the technique works on older versions too, YMMV.
I think this is a much more important question than it gets credit for! When you write good tests, its only a matter of time before you need to whip up test files, or generate test files. Either way, your in danger of polluting the File System of your server or developer machine. Neither is desirable!
I think the write up on this page is a best-practice. I'll copy/paste the code snippet below if you don't care about the reasoning (more notes afterwards):
----
First, let’s write a basic, really basic, model
from django.db import models
class Picture(models.Model):
picture = models.ImageField()
Then, let’s write a really, really basic, test.
from PIL import Image
import tempfile
from django.test import TestCase
from .models import Picture
from django.test import override_settings
def get_temporary_image(temp_file):
size = (200, 200)
color = (255, 0, 0, 0)
image = Image.new("RGBA", size, color)
image.save(temp_file, 'jpeg')
return temp_file
class PictureDummyTest(TestCase):
#override_settings(MEDIA_ROOT=tempfile.TemporaryDirectory(prefix='mediatest').name)
def test_dummy_test(self):
temp_file = tempfile.NamedTemporaryFile()
test_image = get_temporary_image(temp_file)
#test_image.seek(0)
picture = Picture.objects.create(picture=test_image.name)
print "It Worked!, ", picture.picture
self.assertEqual(len(Picture.objects.all()), 1)
----
I made one important change to the code snippet: TemporaryDirectory().name. The original snippet used gettempdir(). The TemporaryDirectory function creates a new folder with a system generated name every time its called. That folder will be removed by the OS - but we don't know when! This way, we get a new folder each run, so no chance of name conflicts. Note I had to add the .name element to get the name of the generated folder, since MEDIA_ROOT has to be a string. Finaly, I added prefix='mediatest' so all the generated folders are easy to identify in case I want to clean them up in a script.
Also potentially useful to you, is how the settings over-ride can be easy applied to a test class, not just one test function. See this page for details.
Also note in the comments after this article some people show an even easier way to get a temp file name without worrying about media settings using NamedTemporaryFile (only valid for tests that don't use Media settings!).
The answer by Richard Cooke works but leaves the temporary directories lingering in the file system, at least on Python 3.7 and Django 2.2. This can be avoided by using a combination of setUpClass, tearUpClass and overriding the settings in the test methods. For example:
import tempfile
class ExampleTestCase(TestCase):
temporary_dir = None
#classmethod
def setUpClass(cls):
cls.temporary_dir = tempfile.TemporaryDirectory()
super(ExampleTestCase, cls).setUpClass()
#classmethod
def tearDownClass(cls):
cls.temporary_dir = None
super(ExampleTestCase, cls).tearDownClass()
def test_example(self):
with self.settings(MEDIA_ROOT=self.temporary_dir.name):
# perform a test
pass
This way the temporary files are removed right away you don't need to worry about the name of the temporary directory either. (Of course, if you want you can still use the prefix argument in calling tempfile.TemporaryDirectory)
One solution I have found that works is to simply delete it in setUp / tearDown, I would prefer finding some way to make it automatically apply to all tests instead of having to put the logic in every test file that involves media, but I have not figured out how to do that yet.
The code I use is:
from shutil import rmtree
from django.conf import settings
from django.test import TestCase
class MyTests(TestCase):
def setUp(self):
rmtree(settings.MEDIA_ROOT, ignore_errors=True)
def tearDown(self):
rmtree(settings.MEDIA_ROOT, ignore_errors=True)
The reason I do it in both setUp and tearDown is because if I only have it in setUp I might end up with a lingering media_test directory, and even though it won't be checked in to GitHub by accident (it's in the .gitignore) it still takes up unnecessary space in my project explorer and I just prefer not having it sit there taking up space. If I only have it in tearDown then I risk causing problems if I quit out of the tests part way through and it tries to run a test that involves media while the media from the terminated test still lingers.
Something like that?
TESTING_MODE = True
...
MEDIA_ROOT = os.path.join(DJANGO_ROOT, 'media_test' if TESTING_MODE else 'media')
In a Django project, I have a directory structure that looks something like this:
project/
├╴module_name/
│ ├╴dbrouters.py
│ ...
...
In dbrouters.py, I define a class that starts out like this:
class CustomDBRouter(object):
current_connection = 'default'
...
The idea is to have a database router that sets the connection to use at the start of each request and then uses that connection for all subsequent database queries similarly to what is described in the Django docs for automatic database routing.
Everything works great, except that when I want to import CustomDBRouter in a script, I have to use the absolute path, or else something weird happens.
Let's say in one part of the application, CustomDBRouter.current_connection is changed:
import project.module_name.dbrouters.CustomDBRouter
...
CustomDBRouter.current_connection = 'alternate'
In another part of the application (assume that it is executed after the above code), I use a relative import instead:
import .dbrouters.CustomDBRouter
...
print CustomDBRouter.current_connection # Outputs 'default', not 'alternate'!
I'm confused as to why this is happening. Is Python creating a new class object for CustomDBRouter because I'm using a different import path?
Bonus points: Is there a better way to implement a 'global' class property?
It depends on how this script is being executed. When you're using relative imports, you have to make sure the name of the script the import is in has a __name__ attribute other than __main__. If it does, import .dbrouters.CustomDBRouter becomes import __main__.dbrouters.CustomDBRouter.
I found this here.
It turns out, the problem was being caused by a few lines in another file:
PROJECT_ROOT = '/path/to/project'
sys.path.insert(0, '%s' % PROJECT_ROOT)
sys.path.insert(1, '%s/module_name' % PROJECT_ROOT)
The files that were referencing .dbrouters were imported using the "shortcut" path (e.g., import views instead of import module_name.views).
I would like to test a pyramid view like the following one:
def index(request):
data = request.some_custom_property.do_something()
return {'some':data}
some_custom_property is added to the request via such an event handler:
#subscriber(NewRequest)
def prepare_event(event):
event.request.set_property(
create_some_custom_property,
'some_custom_property',reify=True
)
My problem is: If I create a test request manually, the event is not setup correctly, because no events are triggered. Because the real event handler is more complicated and depends on configuration settings, I don't want to reproduce that code in my test code. I would like to use the pyramid infracstructure as much as possible. I learned from an earlier question how to set up a real pyramid app from an ini file:
from webtest import TestApp
from pyramid.paster import get_app
app = get_app('testing.ini#main')
test_app = TestApp(app)
The test_app works fine, but I can only get back the html output (which is the idea of TestApp). What I want to do is, to execute index in the context of app or test_app, but to get back the result of index before it's send to a renderer.
Any hint how to do that?
First of all, I believe this is a really bad idea to write doctests like this. Since it requires a lot of initialization work, which is going to be included in documentation (remember doctests) and will not "document" anything. And, to me, these tests seems to be the job for unit/integration test. But if you really want, here's a way to do it:
import myapp
from pyramid.paster import get_appsettings
from webtest import TestApp
app, conf = myapp.init(get_appsettings('settings.ini#appsection'))
rend = conf.testing_add_renderer('template.pt')
test_app = TestApp(app)
resp = test_app.get('/my/view/url')
rend.assert_(key='val')
where myapp.init is a function that does the same work as your application initialization function, which is called by pserve (like main function here. Except myapp.init takes 1 argument, which is settings dictionary (instead of main(global_config, **settings)). And returns app (i.e. conf.make_wsgi_app()) and conf (i.e pyramid.config.Configurator instance). rend is a pyramid.testing.DummyTemplateRenderer instance.
P.S. Sorry for my English, I hope you'll be able to understand my answer.
UPD. Forgot to mention that rend has _received property, which is the value passed to renderer, though I would not recommend to use it, since it is not in public interface.