I want to fetch 10 articles from my Articles model on first load and then another 10 as users scroll towards the bottom via AJAX. I store the ids of the first 10 in an array and then append the subsequent ones to it. This is to enable fetch articles that are not in that list but I get empty array and the first 10 articles are still fetched when I scroll to the bottom again and again.
First load:
import json
my_interest = user_object.interet #this returns [3,4,55,24,57]
articles = Articles.objects.all()[:10]
fetched = [x.id for x in articles]
request.session['fetched'] = json.dumps(fetched)
Another 10 via AJAX:
import operator
fetched = json.loads(request.session['fetched'])
my_interest = user_object.interet #this returns [3,4,55,24,57]
query = reduce(operator.and_,[Q(cat_id__in = my_interest ), ~Q(id__in = fetched )])
articles = Articles.objects.filter(query)[:10]
request.session['fetched'] = json.dumps( fetched + [x.id for x in articles])
context = {'articles': articles, 'fetched': request.session['fetched']}
return render_to_response('mysite/loadmore.html', context)
But I still get the same 10 that was first fetched repeatedly as I scroll to the bottom of the page and if I <p> Fetched: {{fetched}} </p> on my template I only see Fetched:
I'm not sure if this is your problem, but there is no need for either reduce or Q in your query. It is much simpler and clearer to write:
articles = Articles.objects.filter(cat_id__in=my_interest).exclude(id__in=fetched )
Also there's no point in dumping/loading to JSON when saving the IDs to the session. The session already takes care of serialization. Just do:
fetched = request.session['fetched']
...
request.session['fetched'].extend([x.id for x in articles])
Q(id__in = fetched )
Looks like you request articles with the same ids, that you already fetched and saved to session?
Related
I am doing some data changes in a django app with a large amount of data and would like to know if there is a way to make this more efficient. It's currently taking a really long time.
I have a model that used to look like this (simplified and changed names):
class Thing(models.Model):
... some fields...
stuff = models.JSONField(encoder=DjangoJSONEncoder, default=list, blank=True)
I need to split the list up based on a new model.
class Tag(models.Model):
name = models.CharField(max_length=200)
class Thing(models.Model):
.... some fields ...
stuff = models.JSONField(encoder=DjangoJSONEncoder, default=list, blank=True)
other_stuff = models.JSONField(encoder=DjangoJSONEncoder, default=list, blank=True)
tags = models.Many2ManyField(Tag)
What I need to do is take the list that is currently in stuff, and split it up. For items that have a tag in the Tag model, add it to the Many2Many. For things that don't have a Tag, I add it to other_stuff. Then in the end, the stuff field should contain of the items that were saved in tags.
I start by looping through the Tags to make a dict that maps the string version that would be in the stuff list to the tag object so I don't have to keep querying the Tag model.
Then I loop through the Thing model, get the stuff field, loop through that and add each Tag item to the many2many while keeping lists for each item that is or isn't in Tags. Then put those in the stuff and other stuff fields at the end.
tags = Tag.objects.all()
tag_dict = {tag.name.lower():Tag for tag in tags}
things = Thing.objects.all()
for thing in things:
stuff_list = thing.stuff
stuff_in_tags = []
stuff_not_in_tags = []
for item in stuff_list:
if item.lower() in tag_dict.keys():
stuff_in_tags.append(item)
thing.tags.add(tag_dict[item.lower()])
else:
stuff_not_in_tags.append(item)
thing.stuff = stuff_in_tags
thing.other_stuff = stuff_not_in_tags
thing.save()
(Ignore any typos. This code works in my actual code)
That seems pretty efficient to me, but its taking hours to run as our database is pretty big (about 500k+ records). Are there any other ways to make this more efficient?
Unless you move some work to the database level with bulk operations, it won't run faster. You are making at least N (500k+) UPDATE queries.
If the parsing cannot be done on the DB level, chunked bulk_update is the next option.
Also, you can use iterator() to avoid loading all the objects to memory and only() to load only relevant columns.
There is a typo in tag_dict - it should be : tag (instance) instead of : Tag (model).
EDIT: I've originally missed the thing.tags.add - this will need additional handling. You have to bulk_create m2m table rows.
chunk_size = 10000
TagsToThing = Thing.tags.through
tag_dict = {tag.name.lower():tag for tag in Tag.objects.all()}
for_update = []
tags_for_create = []
for thing in Thing.objects.only('pk', 'stuff').iterator(chunk_size):
stuff_in_tags = []
stuff_not_in_tags = []
for item in thing.stuff:
if item.lower() in tag_dict.keys():
stuff_in_tags.append(item)
tags_for_create.append(
TagsToThing(thing=thing, tag=tag_dict[item.lower()])
)
else:
stuff_not_in_tags.append(item)
thing.stuff = stuff_in_tags
thing.other_stuff = stuff_not_in_tags
for_update.append(thing)
if len(for_update) == chunk_size:
Thing.objects.bulk_update(for_update, ['stuff', 'other_stuff'], chunk_size)
TagsToThing.objects.bulk_create(tags_for_create, ignore_conflicts=True) # in case the tag is already assigned
for_update = []
tags_for_create = []
# Save remaining objects
Thing.objects.bulk_update(for_update, ['stuff', 'other_stuff'], chunk_size)
TagsToThing.objects.bulk_create(tags_for_create, ignore_conflicts=True) # in case the tag is already assigned
I have a page that is meant to show company financials based on customers input(they input which company they're looking for). Once they submit, in the view function I want to create 5 API urls which then get the json data, create a list with the dates from 1 API result (they will all contain the same dates, so I will use the same list for all), then create new dictionaries with specific data from each api call, as well as the list of dates. I then want to create dataframes for each dictionary, then render each as html to the template.
My first attempt at this I attempted to do all requests.get and jsons.load calls one after another in a try block within " if request.method == 'POST' " block. This worked well when only grabbing data from one API call, but did not work with 5. I would get the local variable referenced before assigned error, which makes me think either the multiple requests.get or json.loads was creating the error.
My current attempt(which was created out of curiosity to see if it worked this way) does work as expected, but is obv not correct as it is calling the API multiple times in the for loop, as shown. (I have taken out some code for simplicity)
def get_financials(request, *args, **kwargs):
pd.options.display.float_format = '{:,.0f}'.format
IS_financials = {} #Income statement dictionary
BS_financials = {} #Balance sheet dictionary
dates = []
if request.method == 'POST':
ticker = request.POST['ticker']
IS_Url = APIURL1
BS_URL = APIURL2
try:
IS_r = requests.get(IS_Url)
IS = json.loads(IS_r.content)
for year in IS:
y = year['date']
dates.append(y)
for item in range(len(dates)):
IS_financials[dates[item]] = {}
IS_financials[dates[item]]['Revenue'] = IS[item]['revenue'] / thousands
IS_financials[dates[item]]["Cost of Revenue"] = IS[item]['costOfRevenue'] / thousands
IS_fundementals = pd.DataFrame.from_dict(IS_financials, orient="columns")
for item in range(len(dates)):
BS_r = requests.get(BS_URL)
BS = json.loads(BS_r.content)
BS_financials[dates[item]] = {}
BS_financials[dates[item]]['Cash and Equivalents'] = BS[item]['cashAndCashEquivalents'] / thousands
BS_financials[dates[item]]['Short Term Investments'] = BS[item]['shortTermInvestments'] / thousands
BS_fundementals = pd.DataFrame.from_dict(BS_financials, orient="columns")
except Exception as e:
apiList = "Error..."
return render(request, 'financials.html', {'IS': IS_fundementals.to_html(), 'BS': BS_fundementals.to_html()})
else:
return render(request, 'financials.html', {})
I'm trying to think of the proper way to do this. I'm new to django/python and not quite sure the best practice for a problem like this would be. I thought about making separate functions for each API, but then I would be unable to render them all on the same page. Can I use nested functions? Where only the main function renders to template, and all inner functions simply return the dataframe to outer function? Would class based views be better for something like this? I have never worked with class based views yet so would be a bit of a learning curve.
Another question I have is how to change the html in the table that is rendered from dataframe? The table/font that is currently rendered is quite large.
Thanks for any tips/advice!
It's not common to use pandas only for it's .to_html() method, but I have invoked pandas in a django method for less.
A more common approach is to loop over the IS and BS objects using django template's loop methods to generate the html tables.
To make this method more efficient move the BS api call out of the date loop, As long as the API call is not changed by the date.
Reasonable timeouts on the api calls would help also.
def get_financials(request, *args, **kwargs):
pd.options.display.float_format = '{:,.0f}'.format
IS_financials = {} #Income statement dictionary
BS_financials = {} #Balance sheet dictionary
dates = []
if request.method == 'POST':
ticker = request.POST['ticker']
IS_Url = APIURL1
BS_URL = APIURL2
try:
IS_r = requests.get(IS_Url, timeout=10)
IS = json.loads(IS_r.content)
BS_r = requests.get(BS_URL, timeout=10)
BS = json.loads(BS_r.content)
for year in IS:
y = year['date']
dates.append(y)
for item in range(len(dates)):
IS_financials[dates[item]] = {}
IS_financials[dates[item]]['Revenue'] = IS[item]['revenue'] / thousands
IS_financials[dates[item]]["Cost of Revenue"] = IS[item]['costOfRevenue'] / thousands
IS_fundementals = pd.DataFrame.from_dict(IS_financials, orient="columns")
for item in range(len(dates)):
BS_financials[dates[item]] = {}
BS_financials[dates[item]]['Cash and Equivalents'] = BS[item]['cashAndCashEquivalents'] / thousands
BS_financials[dates[item]]['Short Term Investments'] = BS[item]['shortTermInvestments'] / thousands
BS_fundementals = pd.DataFrame.from_dict(BS_financials, orient="columns")
except Exception as e:
apiList = "Error..."
return render(request, 'financials.html', {'IS': IS_fundementals.to_html(), 'BS': BS_fundementals.to_html()})
else:
return render(request, 'financials.html', {})
I am using YouTubes API to get comment data from a list of music videos. The way I have it working right now is by manually typing in my query and then writing the data to a csv file and repeating for each song like such.
query = "song title"
query_results = service.search().list(
part = 'snippet',
q = query,
order = 'relevance', # You can consider using viewCount
maxResults = 20,
type = 'video', # Channels might appear in search results
relevanceLanguage = 'en',
safeSearch = 'moderate',
).execute()
What I would like to do is use the title and artist columns from a csv file that I have containing the song titles I am trying to gather data for so I can run the program once without having to manually type in the song each time.
A friend suggested using something like this
import pandas as pd
data = pd.read_csv("metadata.csv")
def songtitle():
for i in data.index:
title = data.loc[i,'title']
title = '\"' + title + '\"'
artist = data.loc[i,'artist']
return(artist, title)
But I am not sure how I would make this work because when I run this, it is only returning the final row of data, and even if it did run correctly, how I would handle getting the entire program to repeat it self for every instance of a new song.
you can save song title and artist to a list, the loop over that list to get details.
def get_songTitles():
data = pd.read_csv("metadata.csv")
return data['artist'].tolist(),data['title'].tolist()
artist, song_titles = get_songTitles()
for song in song_titles:
query_results = service.search().list(
part = 'snippet',
q = song,
order = 'relevance', # You can consider using viewCount
maxResults = 20,
type = 'video', # Channels might appear in search results
relevanceLanguage = 'en',
safeSearch = 'moderate',
).execute()
I am using microsoft graph api to pull my emails in python and return them as a json object. There is a limitation that it only returns 12 emails. The code is:
def get_calendar_events(token):
graph_client = OAuth2Session(token=token)
# Configure query parameters to
# modify the results
query_params = {
#'$select': 'subject,organizer,start,end,location',
#'$orderby': 'createdDateTime DESC'
'$select': 'sender, subject',
'$skip': 0,
'$count': 'true'
}
# Send GET to /me/events
events = graph_client.get('{0}/me/messages'.format(graph_url), params=query_params)
events = events.json()
# Return the JSON result
return events
The response I get are twelve emails with subject and sender, and total count of my email.
Now I want iterate over emails changing the skip in query_params to get the next 12. Any method of how to iterate it using loops or recursion.
I'm thinking something along the lines of this:
def get_calendar_events(token):
graph_client = OAuth2Session(token=token)
# Configure query parameters to
# modify the results
json_list = []
ct = 0
while True:
query_params = {
#'$select': 'subject,organizer,start,end,location',
#'$orderby': 'createdDateTime DESC'
'$select': 'sender, subject',
'$skip': ct,
'$count': 'true'
}
# Send GET to /me/events
events = graph_client.get('{0}/me/messages'.format(graph_url), params=query_params)
events = events.json()
json_list.append(events)
ct += 12
# Return the JSON result
return json_list
May require some tweaking but essentially you're adding 12 to the offset each time as long as it doesn't return an error. Then it appends the json to a list and returns that.
If you know how many emails you have, you could also batch it that way.
I'm trying to get a RaspberryPi3 Python based birdhouse reading the popularity of its own uploaded videos (which enables it to deduct which ones should be deleted and avoid hundreds of uploaded files).
I thought the best way to read #views/#likes, was to use yt_analytics_report.py
When I input it always returns 0 values:
When I input:
$python yt_analytics_report.py --filters="video==bb_o--nP1mg"
or
$python yt_analytics_report.py --filters="channel==UCGXIq7h2UeFh9RhsSEsFsiA"
The output is:
$ python yt_analytics_report.py --filters="video==bb_o--nP1mg"
{'metrics': 'views,estimatedMinutesWatched,averageViewDuration', 'filters': 'video==bb_o--nP1mg', 'ids': 'channel==MINE', 'end_date': '2018-01-12', 'start_date': '2018-01-06'}
Please visit this URL to authorize this application: [note: here was url with authe sequence etc. which I ackced] followed by the result:
views estimatedMinutesWatched averageViewDuration
0.0 0.0 0.0
I'm new to this; The last 3 days I've been testing a variety of filters, but the result is always the same. I guess I do something severely wrong.
The (auto sensor triggered) video uploads work excellent so I presume the root cause is related to the way I'm using the yt-analytics example.
Any suggestions on rootcause or alternative methods to retrieve #views/#likes of self uploaded youtubes are appreciated.
After a few days trying I have found a solution how to generate with Python and the Youtube API v3 a list which contains views, likes etc of uploaded videos of my own Youtube channel.
I would like to share the complete code just in case anyone is having the same challenge. The code contains remarks and referrals to additional information.
Please be aware that using the API consumes API-credits... This implies that (when you run this script continuesly or often) you can run out of your daily maximum numbers of API-credits set by Google.
# This python 2.7.14 example shows how to retrieve with Youtube API v3 a list of uploaded Youtube videos in a channel and also
# shows additional statistics of each individual youtube video such as number of views, likes etc.
# Please notice that YOU HAVE TO change API_KEY and Youtube channelID
# Have a look at the referred examples to get to understand how the API works
#
# The code consists of two parts:
# - The first part queries the videos in a channel and stores it in a list
# - The second part queries in detail each individual video
#
# Credits to the Coding 101 team, the guy previously guiding me to a query and Google API explorer who/which got me on track.
#
# RESULTING EXAMPLE OUTPUT: The output of the will look a bit like this:
#
# https://www.youtube.com/watch?v=T3U2oz_Y8T0
# Upload date: 2018-01-13T09:43:27.000Z
# Number of views: 8
# Number of likes: 2
# Number of dislikes: 0
# Number of favorites:0
# Number of comments: 0
#
# https://www.youtube.com/watch?v=EFyC8nhusR8
# Upload date: 2018-01-06T14:24:34.000Z
# Number of views: 6
# Number of likes: 2
# Number of dislikes: 0
# Number of favorites:0
# Number of comments: 0
#
#
import urllib #importing to use its urlencode function
import urllib2 #for making http requests
import json #for decoding a JSON response
#
API_KEY = 'PLACE YOUR OWN YOUTUBE API HERE' # What? How? Learn here: https://www.youtube.com/watch?v=JbWnRhHfTDA
ChannelIdentifier = 'PLACE YOUR OWN YOUTUBE channelID HERE' # What? How? Learn here: https://www.youtube.com/watch?v=tf42K4pPWkM
#
# This first part will query the list of videos uploaded of a specific channel
# The identification is done through the ChannelIdentifier hwich you have defined as a variable
# The results from this first part will be stored in the list videoMetadata. This will be used in the second part of the code below.
#
# This code is based on the a very good example from Coding 101 which you can find here:https://www.youtube.com/watch?v=_M_wle0Iq9M
#
url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&channelId='+ChannelIdentifier+'&maxResults=50&type=video&key='+API_KEY
response = urllib2.urlopen(url) #makes the call to YouTube
videos = json.load(response) #decodes the response so we can work with it
videoMetadata = [] #declaring our list
for video in videos['items']:
if video['id']['kind'] == 'youtube#video':
videoMetadata.append(video['id']['videoId']) #Appends each videoID and link to our list
#
# In this second part, a loop will run through the listvideoMetadata
# During each step the details a specific video are retrieved and displayed
# The structure of the API-return can be tested with the API explorer (which you can excecute without OAuth):
# https://developers.google.com/apis-explorer/#p/youtube/v3/youtube.videos.list?part=snippet%252CcontentDetails%252Cstatistics&id=Ks-_Mh1QhMc&_h=1&
#
for metadata in videoMetadata:
print "https://www.youtube.com/watch?v="+metadata # Here the videoID is printed
SpecificVideoID = metadata
SpecificVideoUrl = 'https://www.googleapis.com/youtube/v3/videos?part=snippet%2CcontentDetails%2Cstatistics&id='+SpecificVideoID+'&key='+API_KEY
response = urllib2.urlopen(SpecificVideoUrl) #makes the call to a specific YouTube
videos = json.load(response) #decodes the response so we can work with it
videoMetadata = [] #declaring our list
for video in videos['items']:
if video['kind'] == 'youtube#video':
print "Upload date: "+video['snippet']['publishedAt'] # Here the upload date of the specific video is listed
print "Number of views: "+video['statistics']['viewCount'] # Here the number of views of the specific video is listed
print "Number of likes: "+video['statistics']['likeCount'] # etc
print "Number of dislikes: "+video['statistics']['dislikeCount']
print "Number of favorites:"+video['statistics']['favoriteCount']
print "Number of comments: "+video['statistics']['commentCount']
print "\n"
Building on Sefo's answer above, I was able to clean up the outputs a bit.
The first function creates a list of relevant videos (you could replace this however you want), and the second iterates through this list and grabs the statistics and basic text data associated with each individual video.
The output is a list of dictionaries, perfect for conversion into a pandas DataFrame.
def youtube_search_list(q, max_results=10):
# Call the search.list method to retrieve results matching the specified
# query term.
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey=DEVELOPER_KEY)
# Call the search.list method to retrieve results matching the specified
# query term.
search_response = youtube.search().list(
q=q,
part='id,snippet',
maxResults=max_results,
order='viewCount'
).execute()
return search_response
def youtube_search_video(q='spinners', max_results=10):
max_results = max_results
order = "viewCount"
token = None
location = None
location_radius = None
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey=DEVELOPER_KEY)
q=q
#Return list of matching records up to max_search
search_result = youtube_search_list(q, max_results)
videos_list = []
for search_result in search_result.get("items", []):
if search_result["id"]["kind"] == 'youtube#video':
temp_dict_ = {}
#Available from initial search
temp_dict_['title'] = search_result['snippet']['title']
temp_dict_['vidId'] = search_result['id']['videoId']
#Secondary call to find statistics results for individual video
response = youtube.videos().list(
part='statistics, snippet',
id=search_result['id']['videoId']
).execute()
response_statistics = response['items'][0]['statistics']
response_snippet = response['items'][0]['snippet']
snippet_list = ['publishedAt','channelId', 'description',
'channelTitle', 'tags', 'categoryId',
'liveBroadcastContent', 'defaultLanguage', ]
for val in snippet_list:
try:
temp_dict_[val] = response_snippet[val]
except:
#Not stored if not present
temp_dict_[val] = 'xxNoneFoundxx'
stats_list = ['favoriteCount', 'viewCount', 'likeCount',
'dislikeCount', 'commentCount']
for val in stats_list:
try:
temp_dict_[val] = response_statistics[val]
except:
#Not stored if not present
temp_dict_[val] = 'xxNoneFoundxx'
#add back to main list
videos_list.append(temp_dict_)
return videos_list
this code will help you a ton, i was struggling with this for a long time, just provide the API key and the youtube channel name and the channel id in the search list
from googleapiclient.discovery import build
DEVELOPER_KEY = "paste your API key here"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
title = []
channelId = []
channelTitle = []
categoryId = []
videoId = []
viewCount = []
likeCount = []
dislikeCount = []
commentCount = []
favoriteCount = []
category = []
tags = []
videos = []
tags = []
max_search = 50
order = "relevance"
token = None
location = None
location_radius = None
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey=DEVELOPER_KEY)
search_result = youtube.search().list(q="put the channel name here", type="video",
pageToken=token, order=order, part="id,snippet",
maxResults=max_search, location=location,
locationRadius=location_radius, channelId='put the
channel ID here').execute() # this line is to get the videos of the channel by the
name of it
for search_result in search_result.get("items", []):
if search_result["id"]["kind"] == 'youtube#video':
title.append(search_result['snippet']['title']) # the title of the video
videoId.append(search_result['id']['videoId']) # the ID of the video
response = youtube.videos().list(part='statistics, snippet',
id=search_result['id'][
'videoId']).execute() # this is the other request because the statistics and
snippet require this because of the API
channelId.append(response['items'][0]['snippet']['channelId']) # channel ID,
which is constant here
channelTitle.append(response['items'][0]['snippet']['channelTitle']) # channel
title, also constant
categoryId.append(response['items'][0]['snippet']['categoryId']) # stores
the categories of the videos
favoriteCount.append(response['items'][0]['statistics']['favoriteCount']) #
stores the favourite count of the videos
viewCount.append(response['items'][0]['statistics']['viewCount']) #
stores the view counts
"""
the likes and dislikes had a bug all along, which required the if else instead of
just behaving like the viewCount"""
if 'likeCount' in response['items'][0]['statistics'].keys(): # checks
for likes count then restores it, if no likes stores 0
likeCount.append(response['items'][0]['statistics']['likeCount'])
else:
likeCount.append('0')
if 'dislikeCount' in response['items'][0]['statistics'].keys(): # checks for dislikes count then stores it, if no dislikes found returns 0
dislikeCount.append(response['items'][0]['statistics']['dislikeCount'])
else:
likeCount.append('0')
if 'commentCount' in response['items'][0]['statistics'].keys(): # checks for comment count then stores it, if no comment found stores 0
commentCount.append(response['items'][0]['statistics']['commentCount'])
else:
commentCount.append('0')
if 'tags' in response['items'][0]['snippet'].keys(): # checks for tags count then stores it, if none found stores 0
tags.append(response['items'][0]['snippet']['tags'])
else:
tags.append('0')
youtube_dict = {
'tags': tags,
'channelId': channelId,
'channelTitle': channelTitle,
'categoryId': categoryId,
'title': title,
'videoId': videoId,
'viewCount': viewCount,
'likeCount': likeCount,
'dislikeCount': dislikeCount,
'commentCount': commentCount,
'favoriteCount': favoriteCount
}
for x in youtube_dict:
print(x)
for y in youtube_dict[x]:
print(y)
please re indent the code because the site ruined the indentation of python to make the code in the code section not in the words section. good luck