Structured query over M to M in NDB - python

My model is like this
Club
User
Course
reference to club (key)
Session
reference to Course (key)
ClubMembership
reference to club (key)
reference to user (key)
CourseSubscription
reference to course (key)
reference to user (key)
Now, i want to have all the courses i'm subscribed to, having as input a club and a user
what i did is:
courses = Courses(Courses.club == club.key).fetch(keys_only=True)
real_list = []
for course in courses:
if CourseSubscription.get_by_id(user, course):
real_list.append(course)
sessions = Session.query(Session.course.IN(real_list),Session.start_date >= start_date).fetch()
for CourseSubscription i used this https://stackoverflow.com/a/26746542/1257185, this is why i can do the if (yet it's expensive)
is there a better way to do it? at least less expensive. i was thinking of gql, but then i've a list of IN to do.
probably smt like:
select key from courseSubscription, where course.club = {clubid} and user = {user} then a ndb_get_multi to load the query results?
is this possible somehow?

The for loop makes a number of requests and you can combine them into a single request.
If your CourseSubscription is the same as the CourseInscription in your SO link above, then you could get a list of subscription keys and make a single get_multi() call to get all of the subscriptions:
subscription_keys = [ndb.Key(CourseSubscription, CourseSubscription.build_id(user, course))
for course in courses]
real_list = ndb.get_multi(subscription_keys)
If the key does not exist, then that subscription will be None. You will have to filter those out.

Related

Confused on dictionary input example in Python Crash Course

I am working my way through Python Crash Course, and in Chapter 8 the author gives the following code as an example for filling a dictionary with user input. I am confused in the step where he stores the responses into the dictionary, as to my eye it looks as though he is only saving one piece of , "response" which is immutable data to the "responses" dictionary under the key "name". I am missing how both the name input and response input are put into the dictionary.
It seems to make no sense to me, but that is what I have loved about this journey so far, finding sense in seeming nonsense. Thank you for helping demystify this world for me.
responses = {}
# Set a flag to indicate that polling is active.
polling_active = True
while polling_active:
#Prompt for the person's name and response.
name = input("\nWhat is your name? ")
response = input("Which mountain would you like to climb someday? ")
#Store the response in the dictionary:
responses[name] = response
#Find out if anyone else is going to take the poll.
repeat = input("Would you like to let another person respond? (yes/no) ")
if repeat == 'no':
polling_active = False
#Polling is complete. Show the results.
print("\n--- Poll Results ---")
for name, response in responses.items():
print(name + " would like to climb " + response + ".")
The thing with dictionaries is that you can change the value of the key like this: dictionary[key] = value. If the key doesn't exist it will simply create a new key. You don't need any function like append which is used for lists. The line where you wrote responses[name] = response works because it stays in a while loop. After the loop runs again it asks another input and replaces the old name with a new name and old response with a new response. In conclusion, name and response is added every time the loop runs if the name is not already in the dictionary. If the name is there then it will simply change its value response if that is different from the old one.
Does this answer your question?
name and response are variables names that are filled with the inputted data, let's say 'John' and 'Kalimanjaro'.
Now, 'John' and 'Kalimanjaro' are indeed immutable, but that doesn't mean you can't replace the values stored in name and response in the next loop. You can assign a new value, maybe also immutable, to name if you want.
One possible source of confusion could be that you started learning dictionaries using statements like responses['John'] = 'Kalimanjaro', where both key and value were strings. Now you are doing responses[name] = response (no quotes around name and response). So you create a key called whatever was stored in name and a value with whatever was stored in response.
If in the next iteration the value of name is replaced by, let's say 'Maria' and response becomes 'Andes', the new responses[name] = response will be equivalent to responses['Maria'] = 'Andes'.
In the most basic explanation, dictionaries associates an arbitrary value at an arbitrary key. So, what the author is actually doing is associating the user's response with their name. The author does this by using the name as a key and the response as a value. Using dictionaries like this is fairly common.
If you want to retrieve a value in the array, you must know it key. However, you can retrieve all key and value pairs with dictionary.items(). This way, the author can get those two associated pieces of data (the name and the response).

Tweepy: about accessing the "id" of a user after a Pagination response

I'm really stuck on this one.
I'm using Tweepy to get the IDs of all users that liked a specific tweet. I seem to get a list of "User" structures that contain "id", "name" and "username", but I'm not able to get only the "id".
The code is simple:
client = tweepy.Client(
bearer_token=bearer_token,
consumer_key=api_key, consumer_secret=api_secret,
access_token=user_token, access_token_secret=user_token_secret,
wait_on_rate_limit=True
)
for response in tweepy.Paginator(client.get_liking_users, id=tweetid, max_results=100, limit=10):
for item in response:
print("ITEM:\n", item)
if item is not None:
for user in item:
if user is not None:
print(user)
The print of "item" gets me this (simplified, of course; the number of structures is high, that's why I have to use Paginator):
[<User id=0000001 name=user1 username=UserName1>, <User id=0002 name=user2 username=UserName2>, <User id=000003 name=user3 username=UserName3>]
and the print of "user" just gets me the individual usernames: "UserName1", etc.
But no way to get user.id, user.User.id, nor anything similar. And I'm frustrated, because the information is right there, just I can't access it easily.
Thank you!
Tweepy documentation provides an example of something very similar to what you want to do: https://docs.tweepy.org/en/stable/examples.html -> API v2 -> Get Tweet’s Liking Users
import tweepy
bearer_token = ""
client = tweepy.Client(bearer_token)
# Get Tweet's Liking Users
# This endpoint/method allows you to get information about a Tweet’s liking
# users
tweet_id = 1460323737035677698
# By default, only the ID, name, and username fields of each user will be
# returned
# Additional fields can be retrieved using the user_fields parameter
response = client.get_liking_users(tweet_id, user_fields=["profile_image_url"])
for user in response.data:
print(user.username, user.profile_image_url)
This example prints the user's username and profile image URL, but note the comment says the id is also returned, so something like user.id should work. Otherwise, you can also add id to user_fields to make sure it's returned, although that shouldn't be necessary.
Unfortunately, I am not able to test it myself because I don't have a Twitter developer account with the required elevated access.
Edit: I got access to an API account with elevated access and I was able to test your code, see the update below
Iterating paginated results
The reason why you need a double for loop to iterate the paginated results and it eventually crashes after showing some results with an error saying you are trying to access a non-existent id attribute on an str object is because you are not iterating the Paginator results correctly.
For the sake of simplicity, I'm going to label your three nested for loops:
loop 0: for response in tweepy.Paginator(...
loop 1: for item in response
loop 2: for user in item
Paginator returns a Response object with all the results in the data attribute. The object has other attributes like meta, count, etc.
When you do loop 1, you are iterating all these data, count, etc., attributes of Response.
If the attribute you are iterating happens to be the data attribute, it will start loop 2 and it will iterate the results getting the output you expect.
But loop 1 will also iterate other Reponse items outside of the data attribute.
Let's see, for example, what happens when loop 1 enters the meta attribute.
meta is a dictionary that looks like this:
meta={'result_count': 80, 'next_token': '676f9b7bumw8i3jbm4nnifamw2ejjaktp8kjym6akdak9'}
When you enter loop 2 with the meta attribute, it will start iterating the keys (not the values, because that's how dicts work in Python) so the value of user in loop 2 will be either result_count or next_token. And it's then when you are getting your error saying you are trying to access id on a str.
What you should be doing is iterating the response.data in loop 1 instead and that will also allow removing the need of a second loop:
for response in tweepy.Paginator(client.get_liking_users, id=tweetid, max_results=100, limit=10):
for user in response.data:
print(user.id)
Edit: grammar and style

Make directory system from user id in python list

Please help!
Well, first of all, I will explain what this should do. I'm trying to store users and servers from discord in a list (users that use the bot and servers in which the bot is in) with the id.
for example:
class User(object):
name = ""
uid = 0
Now, all discord id are very long and I want to store lots of users and servers in my list (one list for each one) but suppose that I get 10.000 users in my list, and I want to get the last one (without knowing it's the last one), this would take a lot of time. Instead, I thought that I could make a directory system for storing users in the list and finding it quickly. This is how it works:
I can get the id easily so imagine my id is 12345.
Now I convert it into a string using python str(id) function and I store it in a variable, strId.
For each digit of the list, I use it as an index for the users list, like this:
The User() is where the user is stored
users_list = [[[], [[], [], [[], [], [], [User()]]]]]
actual_dir = 0
for digit in strId:
actual_dir = digit
user = actual_dir[0]
And that's how I reach the user (or something like that)
Now, here is where my problem is. I know I can get the user easily by getting the user by id, but when I want to save the changes, I should do something like users_list[1][2][3][4][5] = changed_user_variable, but how far I know I cannot do something like list[1] += [2]
Is there any way to reach the user and save the changes?
Thanks in advance
You can use a python dictionary with the user id as the key and the user object as the value. I ran a test on my own computer and found that finding 100 000 random users in a dictionary with 10 million users only took 0.3s. This method is much simpler and I would guess it's just as fast, if not faster.
You can create a dictionary and add users with:
users = {}
users[userID] = some_user
(many other ways of doing this)
by using a dictionary you can easily change a user's field by:
users[userID].some_field = "Some value"
or overwrite the same way you add users in the first place.

Get by name in google app engine

Instead of using get_by_id() method for getting the id of a specific entry and print the content of this entry from the google datastore, i am trying to get the name of the url and print the content. For example:
print all the content that have this specific name(may have more than one rows of content with this name)
print the content of the specific id
i am using get_by_id(long(id)) to get the id in the second part of my example, and its working. I am trying to use get_by_key_name(name) but it does not working. any ideas on that? thank you.
sorry, but since i couldn't leave a comment, i am editing my question. Basically, since now i can get all the name of animals from my datastore and i have made them clickable using an html code in template file. In the datastore, there are entries with the same name of animal more than one times (e.g. name= duck, content= water and name=duck, content=lake). Now, when i am clicking into every name of animals(i have use the DINSTINCT in my gql query to print redundant elements(e.g. duck) only one time).Since the name=duck has two contents, when i am clicking on the name of the duck i want to see both of the contents. My problem is if i am using get_by_id(long(id)) i get the unique id of every element. But this will not print me both of the content of the name=duck because every entry has a unique id. But i want all the content of the entries with the same name. I am trying the following but it does not working.
msg = MODEL.Animals.get_by_key_name(name)
self.response.write("%s" % msg.content)
With get_by_id() you can get entity only if you know this ID. This operations named "Small operations" in quota and they are cheaper than datastore reads, but to get list of entities filtered by indexed property - you should use filters.
query = MODEL.Animals.query()
query = query.filter(MODEL.Animals.name == 'duck')
ducks = query.fetch(limit=100) # limit number of returned animals
for duck in ducks:
self.response.write('%s - %s' % (duck.name, duck.content))
By default, all string properties are indexed, so you will be able to do such requests.

Aggregating across columns in Django

I'm trying to figure out if there's a way to do a somewhat-complex aggregation in Django using its ORM, or if I'm going to have to use extra() to stick in some raw SQL.
Here are my object models (stripped to show just the essentials):
class Submission(Models.model)
favorite_of = models.ManyToManyField(User, related_name="favorite_submissions")
class Response(Models.model)
submission = models.ForeignKey(Submission)
voted_up_by = models.ManyToManyField(User, related_name="voted_up_responses")
What I want to do is sum all the votes for a given submission: that is, all of the votes for any of its responses, and then also including the number of people who marked the submission as a favorite.
I have the first part working using the following code; this returns the total votes for all responses of each submission:
submission_list = Response.objects\
.values('submission')\
.annotate(votes=Count('voted_up_by'))\
.filter(votes__gt=0)\
.order_by('-votes')[:TOP_NUM]
(So after getting the vote total, I sort in descending order and return the top TOP_NUM submissions, to get a "best of" listing.)
That part works. Is there any way you can suggest to include the number of people who have favorited each submission in its votes? (I'd prefer to avoid extra() for portability, but I'm thinking it may be necessary, and I'm willing to use it.)
EDIT: I realized after reading the suggestions below that I should have been clearer in my description of the problem. The ideal solution would be one that allowed me to sort by total votes (the sum of voted_up_by and favorited) and then pick just the top few, all within the database. If that's not possible then I'm willing to load a few of the fields of each response and do the processing in Python; but since I'll be dealing with 100,000+ records, it'd be nice to avoid that overhead. (Also, to Adam and Dmitry: I'm sorry for the delay in responding!)
One possibility would be to re-arrange your current query slightly. What if you tried something like the following:
submission_list = Response.objects\
.annotate(votes=Count('voted_up_by'))\
.filter(votes__gt=0)\
.order_by('-votes')[:TOP_NUM]
submission_list.query.group_by = ['submission_id']
This will return a queryset of Response objects (objects with the same Submission will be lumped together). In order to access the related submission and/or the favorite_of list/count, you have two options:
num_votes = submission_list[0].votes
submission = submission_list[0].submission
num_favorite = submission.favorite_of.count()
or...
submissions = []
for response in submission_list:
submission = response.submission
submission.votes = response.votes
submissions.append(submission)
num_votes = submissions[0].votes
submission = submissions[0]
num_favorite = submission.favorite_of.count()
Basically the first option has the benefit of still being a queryset, but you have to be sure to access the submission object in order to get any info about the submission (since each object in the queryset is technically a Response). The second option has the benefit of being a list of the submissions with both the favorite_of list as well as the votes, but it is no longer a queryset (so be sure you don't need to alter the query anymore afterwards).
You can count favorites in another query like
favorite_list = Submission.objects.annotate(favorites=Count(favorite_of))
After that you add the values from two lists:
total_votes = {}
for item in submission_list:
total_votes[item.submission.id] = item.voted_by
for item in favorite_list:
has_votes = total_votes.get(item.id, 0)
total_votes[item.id] = has_votes + item.favorites
I am using ids in the dictionary because Submission objects will not be identical. If you need the Submissions themselves, you may use one more dictionary or store tuple (submission, votes) instead of just votes.
Added: this solution is better than the previous because you have only two DB requests.

Categories

Resources