Best practices to filter in Django - python

In one of the pages of my Django app I have a page that simply displays all employees information in a table:
Like so:
First Name: Last Name: Age: Hire Date:
Bob Johnson 21 03/19/2011
Fred Jackson 50 12/01/1999
Now, I prompt the user for 2 dates and I want to know if an employee was hired between those 2 dates.
For HTTP GET I just render the page and for HTTP POST I'm sending a URL with the variables in the URL.
my urls.py file has these patterns:
('^employees/employees_by_date/$','project.reports.filter_by_date'),
('^employees/employees_by_date/sort/(?P<begin_date>\d+)/(? P<end_date>\d+)/$', EmployeesByDate.as_view()),
And my filter_by_date function looks like this:
def filter_by_date(request):
if request.method == 'GET':
return render(request,"../templates/reports/employees_by_date.html",{'form':BasicPrompt(),})
else:
form = BasicPrompt(request.POST)
if form.is_valid():
begin_date = form.cleaned_data['begin_date']
end_date = form.cleaned_data['end_date']
return HttpResponseRedirect('../reports/employees_by_date/sort/'+str(begin_date)+'/'+str(end_date)+'/')
The code works fine, the problem is I'm new to web dev and this doesn't feel like I'm accomplishing this in the right way. I want to use best practices so can anyone either confirm I am or guide me in the proper way to filter by dates?
Thanks!

You're right, it's a bit awkward to query your API in that way. If you need to add the employee name and something else to the filter, you will end up with a very long URL and it won't be flexible.
Your filter parameters (start and end date) should be added as a query in the url and not be part of path.
In this case, the url would be employees/employees_by_date/?start_date=xxx&end_date=yyy and the dates can be retrieved in the view using start_date = request.GET['start_date].
If a form is used with method='get', the input in the form are automatically converted to a query and appended at the end of the url.
If no form is used, parameters need to be encoded with a function to be able to pass values with special characters like \/ $%.

Use Unix timestamps instead of mm/dd/yyyy dates. A unix timestamp is the number of seconds that have elapsed from Jan 1 1970. ("The Epoch".) So it's just a simple integer number. As I'm writing this, the Unix time is 1432071354.
They aren't very human-readable, but Unix timestamps are unambiguous, concise, and can be filtered for with the simple regex [\d]+.
You'll see lots of APIs around the web use them, for example Facebook. Scroll down to "time based pagination", those numbers are Unix timestamps.
The problem with mm/dd/yyyy dates is ambiguity. Is it mm/dd/yyyy (US)? or dd/mm/yyyy (elsewhere)? What about mm-dd-yyyy?

Related

SOQL Socrata query datetime between

May I know what is wrong with my below code ? I would like to query all where date_occ is between '2015-01-10T12:00:00' and '2015-12-31T24:00:00'
response = requests.get('https://data.lacity.org/api/id/7fvc-faax.json?$select=*&$where = date_occ between 2015-01-10T12:00:00 and 2015-12-131T24:00:00')
I get the following error:
Unrecognized arguments [$where ]
I realise the following doesn't work as well
response = requests.get('https://data.lacity.org/api/id/7fvc-faax.json?$select=*&vict_age >20')
data = response.json()
data = json_normalize(data)
data = pd.DataFrame(data)
But this works:
response = requests.get('https://data.lacity.org/api/id/7fvc-faax.json?$select=*&vict_sex=M')
what am I missing here?
There are a few questions and answers posed in this one. Starting with your second query first; where you want to look at age above 20 years-old. Looking at the metadata (click the down arrow), the victim age is not numeric and is a text string. Thus, you won't be able to use operators like greater than, less than, etc. However, you can look for "equal to". The query below will work:
https://data.lacity.org/resource/7fvc-faax.json?$where=vict_age = '20'
Note: I've dropped the $select and am just using $where for simpler display.
Your third example works since you've set it to query a text field. If you want LA to change it to a numeric, click the "Contact Dataset Owner" under the ellipsis button.
Your first question on dates has a few changes. First, your single quotation marks were not aligned and some were missing. Second, the latter date is 2015-12-131T24:00:00, which has an invalid day. Finally, the data on the portal does not have a timestamp, so you only need the year-month-day. This will work:
https://data.lacity.org/resource/7fvc-faax.json?$where=date_occ between '2015-01-10' and '2015-12-13'
Finally, I would recommend that you use the URL structure, https://data.lacity.org/resource/7fvc-faax.json? instead of /api/id/. The former is the proper URL structure for Socrata-based APIs.

Django url multiple parameter

It have been from a week that I search without any answer to my problem. I don't know if I proceed in the good way, but I try to have an url that look like this : article/title/0/search=ad/2017-08-01/2017-08-09/; where the parameter are (in brace) : article/{filter}/{page}/search={search}/{date1}/{date2}/.
My url.py regex is:
url(r'article/(?P<filter>.+)/(?P<page>\d+)/(?:search=(?P<search>.*)/)?(?:(?P<date1>(\d{4}-\d{1,2}-\d{1,2})?)/)(?:(?P<date2>(\d{4}-\d{1,2}-\d{1,2})?)/)?$')
When the value search, date1 and date2 are fill, my url think that the {searched word}/2017-08-01/2017-08-09/ is the search value.
When search, date1 and date2 are empty, my link are like this: article/title/0/search=///. When the dates are filled article/title/0/search=/2017-08-01/2017-08-09/.
In my template, I need my url to be like this :{% url "view" filter page search date1 date2 %}
Can someone help me and correct me if i did it in the wrong way?
EDIT
I'll try to re-ask my problem in another ways:
In my template, in have a form which have 3 fields search, date1 and date2 not required, so they can be None (this is my problem). I have some links that need to change only one of the view parameters filter, page and keep the field param in the url.
When the form is POST to my view (which is actually like this def AllArticle(request, filter, page, search="", date1="", date2="")), I use them to filter my ArticleModel object (I don't need help for this, already done and work).
According to #Cory_Madden, I never use this method before and I don't know how to use it. I just tried it and django return me a MultiValueDictKeyError.
This is not how you use GET parameters. You need to access them from the request object. So let's say you have a function based view, this is how you would access any GET parameters appended to your URL:
def index(request, name='World'):
search_param = request.GET['search']
...
And your url in this case looks like this:
url(r'/(?P<name>.*)', views.index)
A request path in your address bar would then look like:
/Joe?search=search-term
And request.GET['search'] would return "search-term".

Scraping blog and saving date to database causes DateError: unknown date format

I am working on a project where I scrape a number of blogs, and save a selection of the data to a SQLite database. Such as the title of the post, the date it was posted, and the content of the post.
The goal in the end is to do some fancy textual analyses, but right now I have a problem with writing the data to the database.
I work with the library pattern for Python. (the module about databases can be found here)
I am busy with the third blog now. The data from the two other blogs is already saved in the database, and for the third blog, which is similarly structured, I adapted the code.
There are several functions well integrated with each other, they work fine. I also got access to all the data the right way, when I try it out in IPython Notebook it works fine. When I ran the code as a trial in the Console for only one blog page (it has 43 altogether), it also worked and saved everything nicely in the database. But when I ran it again for 43 pages, it threw a data error.
There are some comments and print statements inside the functions now which I used for debugging. The problem seems to happen in the function parse_post_info, which passes a dictionary on to the function that goes over all blog pages and opens every single post, and then saves the dictionary that the function parse_post_info returns IF it is not None, but I think it IS empty because something about the date format goes wrong.
Also - why does the code work once, and the same code throws a dateerror the second time:
DateError: unknown date format for '2015-06-09T07:01:55+00:00'
Here is the function:
from pattern.db import Database, field, pk, date, STRING, INTEGER, BOOLEAN, DATE, NOW, TEXT, TableError, PRIMARY, eq, all
from pattern.web import URL, Element, DOM, plaintext
def parse_post_info(p):
""" This function receives a post Element from the post list and
returns a dictionary with post url, post title, labels, date.
"""
try:
post_header = p("header.entry-header")[0]
title_tag = post_header("a < h1")[0]
post_title = plaintext(title_tag.content)
print post_title
post_url = title_tag("a")[0].href
date_tag = post_header("div.entry-meta")[0]
post_date = plaintext(date_tag("time")[0].datetime).split("T")[0]
#post_date = date(post_date_text)
print post_date
post_id = int(((p).id).split("-")[-1])
post_content = get_post_content(post_url)
labels = " "
print labels
return dict(blog_no=blog_no,
post_title=post_title,
post_url=post_url,
post_date=post_date,
post_id=post_id,
labels=labels,
post_content=post_content
)
except:
pass
The date() function returns a new Date, a convenient subclass of Python's datetime.datetime. It takes an integer (Unix timestamp), a string or NOW.
You can have diff with local time.
Also the format is "YYYY-MM-DD hh:mm:ss".
The convert time format can be found here

Script for a changing URL

I am having a bit of trouble in coding a process or a script that would do the following:
I need to get data from the URL of:
nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hd20140430/gfs_hd_00z
But the file URL's (the days and model runs change), so it has to assume this base structure for variables.
Y - Year
M - Month
D - Day
C - Model Forecast/Initialization Hour
F- Model Frame Hour
Like so:
nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hdYYYYMMDD/gfs_hd_CCz
This script would run, and then import that date (in the YYYYMMDD, as well as CC) with those variables coded -
So while the mission is to get
http://nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hd20140430/gfs_hd_00z
While these variables correspond to get the current dates in the format of:
http://nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hdYYYYMMDD/gfs_hd_CCz
Can you please advise how to go about and get the URL's to find the latest date in this format? Whether it'd be a script or something with wget, I'm all ears. Thank you in advance.
In Python, the requests library can be used to get at the URLs.
You can generate the URL using a combination of the base URL string plus generating the timestamps using the datetime class and its timedelta method in combination with its strftime method to generate the date in the format required.
i.e. start by getting the current time with datetime.datetime.now() and then in a loop subtract an hour (or whichever time gradient you think they're using) via timedelta and keep checking the URL with the requests library. The first one you see that's there is the latest one, and you can then do whatever further processing you need to do with it.
If you need to scrape the contents of the page, scrapy works well for that.
I'd try scraping the index one level up at http://nomads.ncep.noaa.gov/dods/gfs_hd ; the last link-of-particular-form there should take you to the daily downloads pages, where you could do something similar.
Here's an outline of scraping the daily downloads page:
import BeautifulSoup
import urllib
grdd = urllib.urlopen('http://nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hd20140522')
soup = BeautifulSoup.BeautifulSoup(grdd)
datalinks = 'http://nomads.ncep.noaa.gov:80/dods/gfs_hd/gfs_hd'
for link in soup.findAll('a'):
if link.get('href').startswith(datalinks):
print('Suitable link: ' + link.get('href')[len(datalinks):])
# Figure out if you already have it, choose if you want info, das, dds, etc etc.
and scraping the page with the last thirty would, of course, be very similar.
The easiest solution would be just to mirror the parent directory:
wget -np -m -r http://nomads.ncep.noaa.gov:9090/dods/gfs_hd
However, if you just want the latest date, you can use Mojo::UserAgent as demonstrated on Mojocast Episode 5
use strict;
use warnings;
use Mojo::UserAgent;
my $url = 'http://nomads.ncep.noaa.gov:9090/dods/gfs_hd';
my $ua = Mojo::UserAgent->new;
my $dom = $ua->get($url)->res->dom;
my #links = $dom->find('a')->attr('href')->each;
my #gfs_hd = reverse sort grep {m{gfs_hd/}} #links;
print $gfs_hd[0], "\n";
On May 23rd, 2014, Outputs:
http://nomads.ncep.noaa.gov:9090/dods/gfs_hd/gfs_hd20140523

Parametrizing a Date on urls.py in Django

I have the following URL definition:
url(r'^date-add/(?P<entity_id>\d+)$', views.date_add, name='date_add'),
That allows me to call date_add function with the following URL:
/app_name/date-add/<id>
I would like to fix this to allow a date. For example:
/app_name/date-add/1/2013-04-23
How should I edit my urls.py definition in order to achieve this?
You can define your URL regex like this:
url(r'^date-add/(?P<entity_id>\d+)/(?P<date>\d{4}-\d{2}-\d{2})/$', views.date_add, name='date_add'),
and the view, obviously would be
def date_add(request, entity_id, date):
#convert to datetime object from string here.
Typically you break it down into named parameters corresponding to the year, month and date:
url(r'^date-add/(?P<entity_id>\d+)/(?P<year>\d{4})-(?P<month>\d{1,2})-(?P<day>\d{1,2})/$', views.date_add, name='date_add_with_param'),
Then you can use datetime.date to construct the datetime in your view, which should receive year, month and day as parameters.
This is the usual pattern in particular for archive views, where the URLs might get more specific as you drill down - /archive/2013/ and /archive/2013/11/ might both be valid, although of course you probably wouldn't have a single regexp matching either. It might be unnecessarily complex compared to the single named pattern regexp karthikr's answer shows, which you could then parse with datetime.strptime.
In either case you can also use somewhat more restrictive regexps if you like, like ones that don't allow a first digit other than 0, 1, 2 or 3 for the month.

Categories

Resources