How can I convert this Python code into ColdFusion code? - python

I am looking to build a web application that utilizes the Reddit (open source) algorithm.
I plan to tweak it over time but for now I think it'll be a good start to use their ranking system.
I read a blog post about this algorithm and the example is written in Python. How can I convert this for use in ColdFusion? Additional bonus points for usage in a CFC if it's easier?
The code:
#Rewritten code from /r2/r2/lib/db/_sorts.pyx
from datetime import datetime, timedelta
from math import log
epoch = datetime(1970, 1, 1)
def epoch_seconds(date):
"""Returns the number of seconds from the epoch to date."""
td = date - epoch
return td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000)
def score(ups, downs):
return ups - downs
def hot(ups, downs, date):
"""The hot formula. Should match the equivalent function in postgres."""
s = score(ups, downs)
order = log(max(abs(s), 1), 10)
sign = 1 if s > 0 else -1 if s < 0 else 0
seconds = epoch_seconds(date) - 1134028003
return round(order + sign * seconds / 45000, 7)
The blog post that talks about this code:
http://amix.dk/blog/post/19588
Looking forward to hearing some ideas and examples.
Many thanks!
Michael.
Also, as an additional question; would this code be better performed via an SQL query or some kind of sorting in ColdFusion after the data-set has already been collected? My DB of choice would be MySQL.
UPDATE:
Just found another question on here that relates to what I was asking...I think it helps.
How are Reddit and Hacker News ranking algorithms used?

Most of that code is just standard code from any programming language
For example to get the seconds since a certain date is pretty easy in ColdFusion
<cfset seconds = dateDiff('s', now(), createDate(1970, 1, 1)) />
Not sure which bits your stuck with, but its all prety simple. Every function I see there has a ColdFusion version and without just wanting someone to rewrite it for you, I suggest you try to do it and ask if you get stuck with something.

Related

Coverting InvluxQL v1 query to FLUX query in python -- getting last reading for every key-value tag

So I am new to InfluxDB (v1) and even newer to InfluxDB v2 and Flux . Please don't be an arse as I am really trying hard to get my python code working again.
Recently, I upgraded my flux database from v1.8 to 2.6. This has been an absolute challenge but I think I have things working for the most part. (At least inserting data back into the database.) Reading items out of the database, however, has been especially challenging as I can't get my python code to work.
This is what I previously used in my python code when I was running flux 1.8 and using FluxQL. Essentially I need to convert these FluxQL queries to FLUX and get the expected results
meterids = influx_client.query('show tag values with key ="meter_id"')
metervalues = influx_client.query('select last(reading) from meter_consumption group by \*;')
With flux v2.6 I must use FLUX queries. For 'meterids' I do the following and it seems to work. (This took me days to figure out.)
metervalues_list = []
query_api = influx_client.query_api()
querystr = 'from(bucket: "rtlamr_bucket") \
|\> range(start: -1h)\
|\> keyValues(keyColumns: \["meter_id"\])' # this gives a bunch of meters ids but formatted like \[('reading', '46259044'),'reading', '35515159'),...\]
result = query_api.query(query=querystr)
for table in result:
for record in table.records:
meterid_list.append((record.get_value()))
print('This is meterids: %s' %(meterid_list))
But when I try to pull actual last readings / value for each meter_id (the meter_consumption) I can't seem to get any Flux query to work. This is what i currently have:
#metervalues = influx_client.query('select last(reading) from meter_consumption group by \*;')
querystrconsumption = 'from(bucket: "rtlamr_bucket")\
|\> range(start: -2h)\
|\> filter(fn:(r) =\> r.\_measurement == "meter_consumption")\
|\> group(columns: \["\_time"\], mode: "by")\
|\> last()'
resultconsumption = query_api.query(query=querystrconsumption)
for tableconsumption in resultconsumption:
for recordconsumption in tableconsumption.records:
metervalues_list.append((record.get_value()))
print('\\n\\nThis is metervalues: %s' %(metervalues_list))
Not sure if this will help, but in v1.8 of influxdb these were my measurements, tags and fields:
Time: timestamp
Measurement: consumption <-- consumption is the "measurement name"
Key-Value Tags (meta): meter_id, meter_type
Key-Value Fields (data): <meter_consumption in gal, ccf, etc.>
Any thoughts, suggestions or corrections would be most greatly appreciated. Apologies if I am not using the correct terminology. I have tried reading tons of google articles but I can't seem to figure this one out. :(

How to query with time filters in GoogleScraper?

Even if Google's official API does not offer time information in the query results - even no time filtering for keywords, there is time filtering option in the advanced search:
Google results for stackoverflow in the last one hour
GoogleScraper library offers many flexible options BUT time related ones. How to add time features using the library?
After a bit of inspection, I've found that time Google sends the filtering information by qdr value to the tbs key (possibly means time based search although not officially stated):
https://www.google.com/search?tbs=qdr:h1&q=stackoverflow
This gets the results for the past hour. m and y letters can be used for months and years respectively.
Also, to add sorting by date feature, add the sbd (should mean sort by date) value as well:
https://www.google.com/search?tbs=qdr:h1,sbd:1&q=stackoverflow
I was able to insert these keywords to the BASE Google URL of GoogleScraper. Insert below lines to the end of get_base_search_url_by_search_engine() method (just before return) in scraping.py:
if("google" in str(specific_base_url)):
specific_base_url = "https://www.google.com/search?tbs=qdr:{},sbd:1".format(config.get("time_filter", ""))
Now use the time_filter option in your config:
from GoogleScraper import scrape_with_config
config = {
'use_own_ip': True,
'keyword_file': "keywords.txt",
'search_engines': ['google'],
'num_pages_for_keyword': 2,
'scrape_method': 'http',
"time_filter": "d15" #up to 15 days ago
}
search = scrape_with_config(config)
Results will only include the time range. Additionally, text snippets in the results will have raw date information:
one_sample_result = search.serps[0].links[0]
print(one_sample_result.snippet)
4 mins ago It must be pretty easy - let propertytotalPriceOfOrder =
order.items.map(item => +item.unit * +item.quantity * +item.price);.
where order is your entire json object.

Get time since comment was posted [Praw]

Is there a way to get the time since a comment was posted using Praw?
I've looked over the docs but couldn't find any mention of it, if there isn't, are there any workarounds to get the time?
I don't know, if you're still looking for an answer. In any case, someone might find this via search engine, so here's an idea:
import praw
import datetime
reddit = praw.Reddit(...)
comment = reddit.comment(id="ctu29cb")
now = int(datetime.datetime.timestamp(datetime.datetime.today()))
then = int(comment.created)
delta = now - then
print("comment has been created with timestamp", then)
print("which means on", datetime.datetime.fromtimestamp(then).strftime('%Y-%m-%d %H:%M:%S'))
print("that was", delta, "seconds or", str(datetime.timedelta(seconds=delta)), "hours ago")
which returns
comment has been created with timestamp 1438924830
which means on 2015-08-07 05:20:30
that was 69533363 seconds or 804 days, 18:49:23 hours ago

elastic search performance using pyes

Sorry for cross posting.The following question is also posted on Elastic Search's google group.
In short I am trying to find out why I am not able to get optimal performance while doing searches on a ES index which contains about 1.5 millon records.
Currently I am able to get about 500-1000 searches in 2 seconds. I would think that this should be orders of magnitudes faster. Also currently I am not using thrift.
Here is how I am checking the performance.
Using 0.19.1 version of pyes (tried both stable and dev version from github)
Using 0.13.8 version of requests
conn = ES(['localhost:9201'],timeout=20,bulk_size=1000)
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
if x % 1000 == 0 and x > 0:
loop_check_point = time.clock()
print 'took %s secs to search %d records' % (loop_check_point-loop_start,x)
results = conn.search(query=q1)
if results:
for r in results:
pass
# print len(results)
else:
pass
Appreciate any help that you can give to help me scaleup the searches.
Thanks!
Isn't it just a matter of concurrency?
You're doing all your queries in sequence. So a query has to finish before the next one can come in to play. If you have a 1ms RTT to the server, this will limit you to 1000 requests per second.
Try to run a few instances of your script in parallel and see what kind of performance you got.
There are severeal ways to improve that with using pyes.
First of all try to get rid of the DottedDict class/object which is used to generat from every json/dict to an object for every result you get.
Second switch the json encoder to ujson.
These two things brought up a lot of performance.
This has the disadvantage that you
have to use the ways to access dicts instead of the dotted version ("result.facets.attribute.term" instead you have to use something like "result.facets['attribute']['term']" or "result.facets.get('attribute', {}).get('term', None)" )
I did this through extending the ES class and replace the "_send_request" function.

Getting a count of action from today's date from filtered results in Python

I have an action that a user can do many times a day. I'm trying to get a count of how many times the user has taken that action, but only for today's date. Here's the way I'm currently solving this, but is there an easier way? I feel like I should be able to fit this in one line. :)
today_slaps = 0
slaps = Slap.objects.filter(from_user=request.user.id)
for slap in slaps:
if slap.date.date() == datetime.now().date():
today_slaps += 1
The logic I'm looking for is:
slaps = Slap.objects.filter(from_user=2, date.date()=datetime.now().date()).count()
But that's obviously throwing an error that a keyword can't be an expression. Sorry if this is a basic one, but thoughts?
slap_count = Slap.objects.filter(from_user=request.user, \
date__gte=datetime.date.today()).count()
# specifically setting datetimefield=datetime.date.today() won't work
# gte = will work for datetimefield vs datetime object starting at that date
# it's also assumed there will never be a slap from the future.
Generates the following SQL:
SELECT ... FROM ... WHERE ... date >= 2011-02-26 00:00:00
So it's safe to say you will only get today's slaps, again, unless you have slaps from the future. If you do, I'd set every date__day, date__year, date__month explicitly.
Thanks to Yuji (below) we came up with this answer:
slap_count = Slap.objects.filter(from_user=request.user.id, date__gte=datetime.today().date()).count()

Categories

Resources