Following the SoundCloud API Documentation at https://developers.soundcloud.com/docs/api/reference#tracks, I started to write an implementation of the SoundCloud API in one of my projects. I tried to get 50 tracks of a specific genre with a minimum length of 120000ms using this code:
def get_starttracks(genres="Rock"):
return client.get("/tracks", genres=genres, duration={
'from': 120000
}, limit='50')
SoundCloud responds with a valid list of tracks, but their durations don't match the given Filter.
Example:
print(get_starttracks(genres="Pop")[0].fields()['duration'])
> 30000
Is the api ignoring the 'duration'-parameter or is there an error in the filter inside of my code?
Ps.: Could be related to soundcloud search api ignoring duration filter?, if error isn't inside of the python code.
After trying to fix this problem with several changes to my code, I finally found the issue:
It's NOT a bug. As Soundcloud released their "Go+"-Services, some official tracks got limited to a preview of 30 seconds. The API filter seems to compare the duration of the full track, while just sending the preview-version back to the client (if you don't have subscribed to "Go+" and/or your application is not logged-in).
So, the only way to filter by duration is to iterate through all received tracks:
for track in tracks:
if track.duration <= 30000:
tracks.remove(track)
Related
I want to translate a column called Full plain text from a dataframe. I access this DataFrame from Google Cloud Platform Client. I have read that the max characters you can translate is 10k but when I run I always had an error. Then I made a threshold of 5k (which is the limit of the online translator you find if you google "google translator".
´´´
column_translation = "Full plain text no sig"
tr = Data.loc[:, [column_translation]]
# Make a threshold to divide the text
threshold = 5000
tr.reset_index(inplace=True, drop=True)
l = []
for i in range(len(tr)):
if len(tr.loc[i,column_translation]) < threshold:
l.append(TR(text=tr.loc[i,column_translation], language="spanish"))
print(f"{i+1} out of {len(tr)}")
´´´
I just reset the index of my data frame, which only has the Full main text, so I can use loc[] to get the value and TR() is the function that translates. When I first run it, I have more or less 9749 samples to translate but when I get to the 2000ish samples I get this error:
Exception: Unexpected status code "429" from ('translate.google.com',) and <Response [200]>
Which I totally get is 429 error but what I do not understand is that I got a 200 response with request.get(). I wanted to make sure about it and I tried (iwr -uri http://translate.google.com).StatusCode at PowerShell (Win10) but I got a 200 !! I do not understand !! Also I wanted to get how much time I should wait for the next request but don't know where to look at; I have checked (iwr -uri http://translate.google.com).Headers but no rateLimit in there...
Based from the checking you did, you checked http://translate.google.com which is NOT the Google Cloud Translate API endpoint. If you are using http://translate.google.com in your code and you are using it to translate a bulk of data, there is a high possibility that your IP is blocked because this is not the official Google Translate API.
You should be using https://translation.googleapis.com/language/translate/v2 endpoint in your code (which is the official Google Translate API). See the Translate API guide on how to use Google Translate API. You can then check your usage in Google Cloud Platform Quotas.
To view quota usage and limits for all resources in your project:
In the Google Cloud Console, go to the Quotas page.
The list includes one line item for each type of quota available in
each service.
Sort and filter the results to focus on the information you need:
To view a specific property, click Filter table. Type in "Cloud Translation API" to view quotas for Cloud Translation API
By default, the list is sortedto show your most used quota first (in terms of peak usage over the last seven days), which helps you see limits that are at risk of being exceeded. To view the least used first, toggle the Quota status arrow.
To learn more about your Current Usage for a particular quota, in the Details column, click All Quotas.
Also the recommended maximum characters per request is 5K for basic translation and 30K for advanced translation. See Translation quotas and limits for a detailed take on this.
I am using twitter API (premium sandbox) in python and the maximum tweets I can get on one request is 100, so if I need 500 tweets on one day that means I need to do request 5 times but how can I make sure that the tweets I got in each request are different not duplicate?
and also, Is it possible to get the number of tweets in a specific hashtag based on days? , any help?
Here is the doc for Premium Search: https://developer.twitter.com/en/docs/tweets/search/api-reference/premium-search.html
See the next parameter. Basically, along with the 100 results you should see a "next" property. In your next search request include the next parameter with the value returned in "next".
With Premium Sandbox you cannot get counts for specific days.
I'm trying to iterate over submissions of a certain subreddit from the newest to the oldest using PRAW. I used to do it like this:
subreddit = reddit.subreddit('LandscapePhotography')
for submission in subreddit.submissions(None, time.time()):
print("Submission Title: {}".format(submission.title))
However, when I try to do it now I get the following error:
AttributeError: 'Subreddit' object has no attribute 'submissions'
From looking at the docs I can't seem to figure out how to do this. The best I can do is:
for submission in subreddit.new(limit=None):
print("Submission Title: {}".format(submission.title))
However, this is limited to the first 1000 submissions only.
Is there a way to do this with all submissions and not just the first 1000 ?
Unfortunately, Reddit removed this function from their API.
Check out the PRAW changelog. One of the changes in version 6.0.0 is:
Removed
Subreddit.submissions as the API endpoint backing the method is no more. See
https://www.reddit.com/r/changelog/comments/7tus5f/update_to_search_api/.
The linked post says that Reddit is disabling Cloudsearch for all users:
Starting March 15, 2018 we’ll begin to gradually move API users over to the new search system. By end of March we expect to have moved everyone off and finally turn down the old system.
PRAW's Subreddit.sumbissions() used Cloudsearch to search for posts between the given timestamps. Since Cloudsearch has been removed and the search that replaced it doesn't support timestamp search, it is no longer possible to perform a search based on timestamp with PRAW or any other Reddit API client. This includes trying to get all posts from a subreddit.
For more information, see this thread from /r/redditdev posted by the maintainer of PRAW.
Alternatives
Since Reddit limits all listings to ~1000 entries, it is currently impossible to get all posts in a subreddit using their API. However, third-party datasets with APIs exist, such as pushshift.io. As /u/kungming2 said on Reddit:
You can use Pushshift.io to still return data from defined time
periods by using their API:
https://api.pushshift.io/reddit/submission/search/?after=1334426439&before=1339696839&sort_type=score&sort=desc&subreddit=translator
This, for example, allows you to parse submissions to r/translator
between 2012-04-14 and 2012-06-2014.
You can retrieve all the data from pushshift.io using an iterative loop. Just set the start date as the current epoch date, and get 1000 items, then put the created_utc of the last items in the list as the before parameter to get the next 1000 items and keeps going until it stops returning.
Below is a useful link for further information:
https://www.reddit.com/r/pushshift/comments/b7onr6/max_number_of_results_returned_per_query/enter link description here
Pushshift doesn't work for private subreddits. In that case you can create a database 1000 submissions at a time from now on (not retroactive).
If you just need as many submissions as possible you could try using the different sort methods top, hot, new and combine them.
ok so im really new to python and I am trying to create to assist me in marketing my music via social media. I am trying to code it so that when I compare a users followers with my followers if I am not following one of their followers, it automatically follows them. here is what I have
import twitter
import time
now = time.time
username = raw_input("whos followers")
api = twitter.Api(...)
friendslist = api.GetFollowersPaged(screen_name=username, count=1,)
myfollowers = api.GetFollowersPaged(user_id=821151801785405441, count=1)
for u in friendslist:
if u not in myfollowers:
api.CreateFriendship(u.friendslist)
print 'you followed new people'
time.sleep(15)
I am using python 2.7 and the python-twitter api wrapper my error seems to start at the api.CreateFriendship line. also I set the count to 1 to try to avoid rate limiting but hae had them as high as 150, 200 being the max
The Twitter API has fairly subjective controls in place for Write operations. There are daily follow limits and they designed to limit exactly the sort of thing you are doing.
see https://support.twitter.com/articles/15364 and https://support.twitter.com/articles/15364
If you do reach a limit, we'll let you know with an error message
telling you which limit you've hit. For limits that are time-based
(like the direct messages, Tweets, changes to account email, and API
request limits), you'll be able to try again after the time limit has
elapsed.
I want collect data from twitter using python Tweepy library.
I surveyed the rate limits for Twitter API,which is 180 requests per 15-minute.
What I want to know how many data I can get for one specific keyword?put it in another way , when I use the Tweepy.Cursor,when it'll stops?
I not saying the maths calculation(100 count * 180 request * 4 times/hour etc.) but the real experience.I found a view as follows:
"With a specific keyword, you can typically only poll the last 5,000 tweets per keyword. You are further limited by the number of requests you can make in a certain time period. "
http://www.brightplanet.com/2013/06/twitter-firehose-vs-twitter-api-whats-the-difference-and-why-should-you-care/
Is this correct(if this's correct,I only need to run the program for 5 minutes or so)? or I am needed to keep getting as many tweets as they are there(which may make the program keep running very long time)?
You will definitely not be getting as many tweets as exist. The way Twitter limits how far back you can go (and therefore how many tweets are available) is with a minimum since_id parameter passed to the GET search/tweets call to the Twitter API. In Tweepy, the API.search function interfaces with the Twitter API. Twitter's GET search/tweets documentation has a lot of good info:
There are limits to the number of Tweets which can be accessed through the API. If the limit of Tweets has occured since the since_id, the since_id will be forced to the oldest ID available.
In practical terms, Tweepy's API.search should not take long to get all the available tweets. Note that not all tweets are available per the Twitter API, but I've never had a search take up more than 10 minutes.