Multhreading with updates to MySQL - python

I need to evaluate around 80k rows of data every day at 11am, and I hope to accomplish it within a few minutes.
I used multithreading that uses select_for_update() of Django that gets one row at a time, updates it, and then gets a new row.
The problem is, there is an issue where the counter increases too fast having the assumption that there are times where the row gets evaluated twice.
Here is my current code block:
while True:
with transaction.atomic():
user_history = UserHistory.objects.select_for_update().filter(is_finished=False).first()
if user_history:
user = UserProfile.objects.filter(id=user_history.user_profile_id).first()
card_today = CardToday.objects.filter(id=user_history.card_today_id).first()
rewarded_value = 0
if user_history is item1:
if card_today.item1 > card_today.item2:
rewarded_value = card_today.item2/card_today.item1 + 1
elif user_history is item2:
if card_today.item2 > card_today.item1:
rewarded_value = card_today.item1/card_today.item2 + 1
user.coins += float(user.coins) + rewarded_value # the value increases too high here
user.save()
user_history.save()
else
break
This is the model for Card Today:
class CardToday(models.Model):
item1 = models.IntegerField(default=0)
item2 = models.IntegerField(default=0)
This is the model for User History:
class UserHistory(models.Model):
card_today = models.ForeignKey(CardToday, on_delete=models.CASCADE)
answer_item = models.ForeignKey(Item, on_delete=models.CASCADE)
is_finished = models.BooleanField(default=False) // checks whether the card has already been evaluated.
rewarded value's computation is as follows:
rewarded_value = majority/minority + 1
majority and minority switches depending on which item has a greater value.
Each user_history can only choose between item1 or item2.
After a certain amount of time has passed, the code will evaluate which item has been picked on a CardToday.
Is there a better way of accomplishing this?
The framework I'm using is Django, and I have a cron job running from the library django-cron.

Related

How can I display custom feedback after a trial in psynet based on information contained in 'analysis' from analyze_recording()?

I want to show participants a feedback page containing e.g. how many taps were detected during the trial they just completed. Here is what I'm trying to do:
class PracticeTrialMaker(StaticTrialMaker):
give_end_feedback_passed = True
performance_check_type = "performance"
performance_check_threshold = 0
end_performance_check_waits = True
def get_end_feedback_passed_page(self, score):
how_many_taps = "NA" if score is None else f"{(score):.0f}"
return InfoPage(
Markup(
f"You tapped <strong>{how_many_taps}&#37 times;</strong>."
),
time_estimate=5,
)
def performance_check(self, experiment, participant, participant_trials):
# for now, just count number of taps detected
n_taps_detected = participant_trials.analysis['num_resp_raw_all']
failed = participant_trials.failed
return {"score": n_taps_detected, "passed": not failed}
But I don't know how to find / pass into performance_check the analysis results... they are not contained in participant_trials or participant or experiment, unless I am completely missing something.
How can I show analysis-based feedback to a participant?
The method get_end_feedback_passed_page provides feedback at the end of aStaticTrialMaker based on the overall performance of all trials within that trial maker. More precisely, it only shows the feedback if the participant has successfully met the performance threshold.
For your case, I would use the method show_feedback within a StaticTrial. This allows you to provide feedback after each individual trial. You can then access the output of the analysis (the dictionary returned in the method analyze_recording) by calling the analysis output stored in the trial - i.e., self.details["analysis"]. Here is an example in context:
class MyExperimentalTrial(StaticTrial):
__mapper_args__ = {"polymorphic_identity": "custom_trial"}
wait_for_feedback = True
def gives_feedback(self, experiment, participant):
return True
def show_feedback(self, experiment, participant):
output_analysis = self.details["analysis"]
if output_analysis["num_taps"] > 20:
return InfoPage(
Markup(
f"""
<h3>Excellent!</h3>
<hr>
We detected {output_analysis["num_taps"]} taps.
"""
),
time_estimate=5
)
else:
return InfoPage(
Markup(
f"""
<h3>VERY BAD...</h3>
"""
),
time_estimate=5
)
Note that you should use it in combination with gives_feedback and also specify in the definition of the corresponding trialmaker check_performance_every_trial=False

Django: Creating a Reference Code for each new order

For my E-commerce project, I am trying to generate a reference code that can be understandable yet unique at the same time:
I am trying to generate a reference code that after each purchase is made that includes that day, month, year, hour, minute and a digit that increases with a new transaction
DDMMYYHHMMXXX
Day, Month, Year, Hour, Minute,3 digits starting with 001 and increasing with each new order.
How do I do it?
My current code generated is:
def create_ref_code():
return ''.join(random.choices(string.ascii_lowercase + string.digits, k=6))
model.py
class Order(models.Model):
ref_code = models.CharField(max_length=20, blank=True, null=True)
ordered_date = models.DateTimeField()
def __str__(self):
return self.user.username
This is how far I have reached but I am not sure how to increase the count with every new order
def create_ref_code():
now = datetime.now()
code = now.strftime("%y%m%d%H%M%S")
print(code)
count = + 1
digit = str(count).zfill(3)
my_code = (code, digit)
return ''.join(my_code)
for that you can extend the save method and retrieve all the order count and also you can use something like this to pad the leading zeroes on that count
str(1).zfill(3)
this will create 001 output in string and you need this in string format to concat the data so no need to convert that to integer again
def save(self, *args, **kwargs):
super().save(*args, **kwargs)
count = ***retrieve you count of that orders using query*** + 1
digit = str(count).zfill(3)
self.reference_code = your logic to create reference code
updated:
you don't have to increment count like that
def create_ref_code():
now = datetime.now()
"""
'make query to count all todays order here'
count = Order.objects.filter(filter argument by date).count() + 1
"""
code = now.strftime("%y%m%d%H%M%S")
digit = str(count).zfill(3)
my_code = (code, digit)
return ''.join(my_code)
instead of DDMMYYHHMMXXX try UUID-4
code :
import uuid
uuid.uuid4()

Python multiprocessing on same method that deals with data

I have a Django application where I am trying to manage the data for one of my models, since the number of its table rows in the database has gotten quite unruly.
I have a model staticmethod that gets all the 'Vehicle' objects in my database, goes through them, check if their image url is still active, and if not deletes it.
There are 1.2 million records (and some records have more than one image to check) so its going to take a pretty long time to go through all the records.
I know that you can use threading to run multiple processes, but I also know that for a method that deals with data, each thread has to be aware of the other thread. Is there a way that I can use multithreading to cut down the time it would take to go through a queryset and make those checks, e.g. If the first thread looks at queryset item 1, and 2, thread 2 will start looking at queryset item 3, and than thread 1 will skip to queryset item 4 if its done, and thread 2 hasn't finished with queryset item 3 ?
Method
#staticmethod
def image_existence_check():
import threading
import requests
from dealer.models import Dealer
vehicles = Vehicle.objects.all()
for index, veh in enumerate(vehicles):
images = veh.images.all()
if images.count() == 1:
image = images[0]
response = requests.get(image.image_url)
if response.status_code == 200:
veh.has_image = True
else:
veh.has_image = False
elif images.count() > 1:
has_image = True
for img in images:
response = requests.get(img.image_url)
if response != 200:
has_image = False
veh.has_image = has_image
else:
veh.has_image = False
veh.save()

Incrementing IntegerField counter in a database

As beginner at Django, i tried to make a simple application that would give Http response of how many times content was viewed.
I have created a new Counter model, and inside, added IntegerField model count.
class Counter(models.Model):
count = models.IntegerField(default=0)
def __int__(self):
return count
In views, i made a variable counter out of Counter() class, and tried adding +1 to counter.count integer, but when i tried to save, it would give me an error that integer couldn't be saved.
so i tried saving class instead:
def IndexView(response):
counter = Counter()
counter.count = counter.count + 1
counter.save()
return HttpResponse(counter.count)
This method, would keep showing 1 and could not change after reload.
How would i change IntegerField model properly, so it could be updated after every view, and would be saved even if server was reloaded?
The problem
Yes but you are creating a new Counter object on each request, which starts again at 0, that's your problem
def IndexView(response):
counter = Counter() # This creates a new counter each time
counter.count = counter.count + 1
counter.save()
return HttpResponse(counter.count)
What you were doing above would result in a bunch of Counter objects with count = 1 in the database.
The Solution
My example below shows you how to get an existing Counter object, and increment it, or create it if it doesn't already exist, with get_or_create()
First we need to associate a Counter to e.g. a page (or anything, but we need someway to identify it and grab it from the DB)
class Counter(models.Model):
count = models.IntegerField(default=0)
page = models.IntegerField() # or any other way to identify
# what this counter belongs to
then:
def IndexView(response):
# Get an existing page counter, or create one if not found (first page hit)
# Example below is for page 1
counter, created = Counter.objects.get_or_create(page=1)
counter.count = counter.count + 1
counter.save()
return HttpResponse(counter.count)
Avoid race conditions that can happen with count = count + 1
And to avoid race conditions use an F expression
# When you have many requests coming in,
# this may have outdated value of counter.count:
# counter.count = counter.count + 1
# Using an F expression makes the +1 happen on the database
from django.db.models import F
counter.count = F('count') + 1

Iteration and memory problems in Django

I need to create pairs of hashtags so people can judge whether the two tags in question refer to the same thing. The problem is that there are A LOT of hashtags, and I'm running the code on a Dreamhost VPS, so my memory is somewhat limited.
Here's my relevant models:
class Hashtag(models.Model):
text = models.CharField(max_length=140)
competitors = models.ManyToManyField('Hashtag', through='Competitors')
tweet = models.ManyToManyField('Tweet')
def __unicode__(self):
return unicode_escape(self.text)
class Competitors(models.Model):
tag1 = models.ForeignKey('Hashtag', related_name='+')
tag2 = models.ForeignKey('Hashtag', related_name='+')
yes = models.PositiveIntegerField(default=0, null=False)
no = models.PositiveIntegerField(default=0, null=False)
objects = models.Manager()
def __unicode__(self):
return u'{0} vs {1}'.format(unicode_escape(self.tag1.text), unicode_escape(self.tag2.text))
Here's the code I've developed to create the Competitors objects and save them to my DB:
class Twitterator(object):
def __init__(self, infile=None, outfile=None, verbosity=True):
...
self.competitors_i = 1
...
def __save_comps__(self,tag1, tag2):
try:
comps = Competitors(id=self.competitors_i,
tag1=tag1,
tag2=tag2,
yes=0,
no=0)
comps.save()
except IntegrityError:
self.competitors_i += 1
self.save_comps(tag1, tag2)
else:
self.competitors_i += 1
def competitors_to_db(self, start=1):
tags = Hashtag.objects.all()
i = start
while True:
try:
tag1 = tags.get(pk=i)
j = i + 1
while True:
try:
tag2 = tags.get(pk=j)
self.__save_comps__(tag1, tag2)
j += 1
except Hashtag.DoesNotExist:
break
i += 1
except Hashtag.DoesNotExist:
break
It all "works", but never manages to get that far before I run out of memory and the whole thing gets killed. I thought using .get would be less memory-intensive, but it doesn't seem to be less memory-intensive enough. I'm under the impression that Django Querysets are iterators already, so my usual 'make an iterator' trick is out. Any suggestions for further reducing my memory footprint?
I think the problem is in this function, i is not getting incremented properly and you will keep looping for same value of i.
def competitors_to_db(self, start=1):
tags = Hashtag.objects.all()
i = start
while True:
try:
tag1 = tags.get(pk=i)
j = i + 1
while True:
try:
tag2 = tags.get(pk=j)
self.__save_comps__(tag1, tag2)
j += 1
except Hashtag.DoesNotExist:
break #<------move this after i +=1 otherwise i will not increment
i += 1

Categories

Resources