I am using soundcloud api through python SDK.
When I get the tracks data through 'Search',
the track attribute 'playback_count' seems to be
smaller than the actual count seen on the web.
How can I avoid this problem and get the actual playback_count??
(ex.
this track's playback_count gives me 2700,
but its actually 15k when displayed on the web
https://soundcloud.com/drumandbassarena/ltj-bukem-soundcrash-mix-march-2016
)
note: this problem does not occur for comments or likes.
following is my code
##Search##
tracks = client.get('/tracks', q=querytext, created_at={'from':startdate},duration={'from':startdur},limit=200)
outputlist = []
trackinfo = {}
resultnum = 0
for t in tracks:
trackinfo = {}
resultnum += 1
trackinfo["id"] = resultnum
trackinfo["title"] =t.title
trackinfo["username"]= t.user["username"]
trackinfo["created_at"]= t.created_at[:-5]
trackinfo["genre"] = t.genre
trackinfo["plays"] = t.playback_count
trackinfo["comments"] = t.comment_count
trackinfo["likes"] =t.likes_count
trackinfo["url"] = t.permalink_url
outputlist.append(trackinfo)
There is an issue with the playback count being incorrect when reported via the API.
I have encountered this when getting data via the /me endpoint for activity and likes to mention a couple.
The first image shows the information returned when accessing the sound returned for the currently playing track in the soundcloud widget
Information returned via the api for the me/activities endpoint
Looking at the SoundCloud website, they actually call a second version of the API to populate the track list on the user page. It's similar to the documented version, but not quite the same.
If you issue a request to https://api-v2.soundcloud.com/stream/users/[userid]?limit=20&client_id=[clientid] then you'll get back a JSON object showing the same numbers you see on the web.
Since this is an undocumented version, I'm sure it'll change the next time they update their website.
Related
I'm trying to work out how to use the google books api for python - finding it difficult to find any docs on - so am really just looking at the code at the moment and trying to go from there.. I finally worked out that you can provide your api key as "DeveloperKey" when creating a service, which I haven't seen written anywhere - and I figured out how to do a search, so this works:
from googleapiclient.discovery import build
service = build( "books", "v1", developerKey="...my api key...")
volumes = service.volumes()
request = volumes.list( q = "David eddings")
result = request.execute()
print( result )
However - the result comes back with totalItems of 1162 - but an items collection of 10 items - and I can't figure out at all how to ask for the next set of items, or even to change the number of items that comes back in the first place.
Is there a way?
And is this documented anywwhere? I can find documentation on the google rest api for books, but I can't find a word anywhere on the python api for books.
I think most of the parameters are listed here.
You can use startIndex parameter to paginate your results and maxResults parameter to get more than 10 results(which is the default) in a single request.
For example: https://www.googleapis.com/books/v1/volumes?q=search+terms&maxResults=20&startIndex=20
This will make a request for 20 elements, starting from 20, skipping all elements from 0 to 19.
On the Soundcloud API guide (https://developers.soundcloud.com/docs/api/guide#pagination)
the example given for reading more than 100 piece of data is as follows:
# get first 100 tracks
tracks = client.get('/tracks', order='created_at', limit=page_size)
for track in tracks:
print track.title
# start paging through results, 100 at a time
tracks = client.get('/tracks', order='created_at', limit=page_size,
linked_partitioning=1)
for track in tracks:
print track.title
I'm pretty certain this is wrong as I found that 'tracks.collection' needs referencing rather than just 'tracks'. Based on the GitHub python soundcloud API wiki it should look more like this:
tracks = client.get('/tracks', order='created_at',limit=10,linked_partitioning=1)
while tracks.collection != None:
for track in tracks.collection:
print(track.playback_count)
tracks = tracks.GetNextPartition()
Where I have removed the indent from the last line (I think there is an error on the wiki it is within the for loop which makes no sense to me). This works for the first loop. However, this doesn't work for successive pages because the "GetNextPartition()" function is not found. I've tried the last line as:
tracks = tracks.collection.GetNextPartition()
...but no success.
Maybe I'm getting versions mixed up? But I'm trying to run this with Python 3.4 after downloading the version from here: https://github.com/soundcloud/soundcloud-python
Any help much appreciated!
For anyone that cares, I found this solution on the SoundCloud developer forum. It is slightly modified from the original case (searching for tracks) to list my own followers. The trick is to call the client.get function repeatedly, passing the previously returned "users.next_href" as the request that points to the next page of results. Hooray!
pgsize=200
c=1
me = client.get('/me')
#first call to get a page of followers
users = client.get('/users/%d/followers' % me.id, limit=pgsize, order='id',
linked_partitioning=1)
for user in users.collection:
print(c,user.username)
c=c+1
#linked_partitioning means .next_href exists
while users.next_href != None:
#pass the contents of users.next_href that contains 'cursor=' to
#locate next page of results
users = client.get(users.next_href, limit=pgsize, order='id',
linked_partitioning=1)
for user in users.collection:
print(c,user.username)
c=c+1
Using an access token from the Facebook Graph API Explorer (https://developers.facebook.com/tools/explorer), with access scope which includes user likes, I am using the following code to try to get all the likes of a user profile:
myfbgraph = facebook.GraphAPI(token)
mylikes = myfbgraph.get_connections(id="me", connection_name="likes")['data']
for like in mylikes:
print like['name'], like['category']
...
However this is always giving me only 25 likes, whereas I know that the profile I'm using has 42 likes. Is there some innate limit operating here, or what's the problem in getting ALL the page likes of a user profile?
Per the Graph documention:
When you make an API request to a node or edge, you will usually not
receive all of the results of that request in a single response. This
is because some responses could contain thousands and thousands of
objects, and so most responses are paginated by default.
https://developers.facebook.com/docs/graph-api/using-graph-api/v2.2#paging
Well, this appears to work (a method, which accepts a user's facebook graph):
def get_myfacebook_likes(myfacebook_graph):
myfacebook_likes = []
myfacebook_likes_info = myfacebook_graph.get_connections("me", "likes")
while myfacebook_likes_info['data']:
for like in myfacebook_likes_info['data']:
myfacebook_likes.append(like)
if 'next' in myfacebook_likes_info['paging'].keys():
myfacebook_likes_info = requests.get(myfacebook_likes_info['paging']['next']).json()
else:
break
return myfacebook_likes
The above answers will work, but pretty slowly for anything with many likes. If you just want the count for number of likes, you can get it much more efficiently with total_likes:
myfacebook_likes_info = graph.get_connections(post['id'], 'likes?summary=1')
print myfacebook_likes_info["summary"]["total_count"]
I've crawled a tracklist of 36.000 songs, which have been played on the Danish national radio station P3. I want to do some statistics on how frequently each of the genres have been played within this period, so I figured the discogs API might help labeling each track with genre. However, the documentation for the API doesent seem to include an example for querying the genre of a particular song.
I have a CSV-file with with 3 columns: Artist, Title & Test(Test where i want the API to label each song with the genre).
Here's a sample of the script i've built so far:
import json
import pandas as pd
import requests
import discogs_client
d = discogs_client.Client('ExampleApplication/0.1')
d.set_consumer_key('key-here', 'secret-here')
input = pd.read_csv('Desktop/TEST.csv', encoding='utf-8',error_bad_lines=False)
df = input[['Artist', 'Title', 'Test']]
df.columns = ['Artist', 'Title','Test']
for i in range(0, len(list(df.Artist))):
x = df.Artist[i]
g = d.artist(x)
df.Test[i] = str(g)
df.to_csv('Desktop/TEST2.csv', encoding='utf-8', index=False)
This script has been working with a dummy file with 3 records in it so far, for mapping the artist of a given ID#. But as soon as the file gets larger(ex. 2000), it returns a HTTPerror when it cannot find the artist.
I have some questions regarding this approach:
1) Would you recommend using the search query function in the API for retrieving a variable as 'Genre'. Or do you think it is possible to retrieve Genre with a 'd.' function from the API?
2) Will I need to aquire an API-key? I have succesfully mapped the 3 records without an API-key so far. Looks like the key is free though.
Here's the guide I have been following:
https://github.com/discogs/discogs_client
And here's the documentation for the API:
https://www.discogs.com/developers/#page:home,header:home-quickstart
Maybe you need to re-read the discogs_client examples, i am not an expert myself, but a newbie trying to use this API.
AFAIK, g = d.artist(x) fails because x must be a integer not a string.
So you must first do a search, then get the artist id, then d.artist(artist_id)
Sorry for no providing an example, i am python newbie right now ;)
Also have you checked acoustid for
It's a probably a rate limit.
Read the status code of your response, you should find an 429 Too Many Requests
Unfortunately, if that's the case, the only solution is to add a sleep in your code to make one request per second.
Checkout the api doc:
http://www.discogs.com/developers/#page:home,header:home-rate-limiting
I found this guide:
https://github.com/neutralino1/discogs_client.
Access the api with your key and try something like:
d = discogs_client.Client('something.py', user_token=auth_token)
release = d.release(774004)
genre = release.genres
If you found a better solution please share.
I'm trying to use the Python-Twitter library (https://github.com/bear/python-twitter) to extract mentions of a twitter account using the GetMention() function. The script populates a database and runs periodically on a cron job so I don't want to extract every mention, only those since the last time the script was run.
The code below extracts the mentions fine but for some reason the 'since_id' argument doesn't seem to do anything - the function returns all the mentions every time I run it, rather than filtering for only the most recent mentions. For reference the documentation is here: https://python-twitter.googlecode.com/hg/doc/twitter.html#Api-GetMentions)
What is the correct way to implement the GetMention() function? (I've looked but I can't find any examples online). Alternatively, is there a different/more elegant way of extracting twitter mentions that I'm overlooking?
def scan_timeline():
''' Scans the timeline and populates the database with the results '''
FN_NAME = "scan_timeline"
# Establish the api connection
api = twitter.Api(
consumer_key = "consumerkey",
consumer_secret = "consumersecret",
access_token_key = "accesskey",
access_token_secret = "accesssecret"
)
# Tweet ID of most recent mention from the last time the function was run
# (In actual code this is dynamic and extracted from a database)
since_id = 498404931028938752
# Retrieve all mentions created since the last scan of the timeline
length_of_response = 20
page_number = 0
while length_of_response == 20:
# Retreive most recent mentions
results = api.GetMentions(since_id,None,page_number)
### Additional code inserts the tweets into a database ###
Your syntax seems to be consistent as per mentioned in the Python-Twitter library. What I think is happening is the following:
If the limit of Tweets has occured since the since_id, the since_id will be forced to the oldest ID available.
Which would lead to all the tweets starting from the oldest available ID. Try working with a more recent since ID value. Equivalently, also check whether the since ID you're giving is appropriate or not.