I am just writing a simply 'grab followers' script in Python, using tweepy. When I run the script, everything works fine. Does what it needs to. Though I am being rate limited very heavily. Almost instantly, it seems.
I run other scripts through tweepy, hell I've scraped nearly 800 accounts' tweets before being rate limited previously. Multiple times even.
Can someone shed some light on this? My account was even suspended, temporarily, last night for simply trying to let it finish :-\
import tweepy
APP_KEY = ''
APP_SECRET = ''
result = []
auth = tweepy.AppAuthHandler(APP_KEY, APP_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
if not api:
print "Didn't Authenticate!"
sys.exit(-1)
def get_followers(screen_name, filename):
result = []
for page in tweepy.Cursor(api.followers_ids, screen_name=screen_name).pages():
result.extend(page)
write_to = open(filename, 'w')
for num in result:
name = api.get_user(num)
write_to.write(str(name.screen_name) + '\n')
write_to.write(str(len(result)))
write_to.close()
user_input = raw_input('Please enter Twitter name to get followers, or hit enter to use default file:')
if len(user_input) == 0:
user_input = 'names.txt'
for name in user_input:
file_name = str(name) + '.txt'
get_followers(name, file_name)
else:
file_name = str(user_input) + '.txt'
get_followers(user_input, file_name)
You are using the followers/id endpoint. The rate limit is 15 requests in a 15 minute window. See docs here. Try making the request once every minute and you should be fine.
Related
Essentially, I've written a program for a reddit bot designed to list down certain apsects of a reddit post, such as the title or poster, as long as they fit a certain criteria. I want it to be able to automatically run once every hour. I also want it to be able to make a post once every 7 days. Could someone share code for these please?
#!/usr/bin/python
import base64
import praw
#Enter your correct Reddit information into the variable below
userAgent = 'RRBot-Beta'
cID = 'Enter your so and so'
cSC = 'Enter your secret'
userN = 'Enter your Reddit username'
userP = 'Enter your Reddit password'
unfilled_post_URL = [""]
unfilled_post_url_B64 = [""]
submission_title_and_poster = {}
filled_requests = 0
unfilled_requests = 0
requests = 0
reddit = praw.Reddit(user_agent=userAgent,
client_id=cID,
client_secret=cSC,
username=userN,
password=userP)
subreddit = reddit.subreddit('riprequestsnew') #any subreddit you want to monitor
title_keywords = {'requests', 'request'} #makes a set of keywords to find in subreddits
comment_keyword = "share"
for submission in subreddit:
lowercase_title = submission.title.lower() #makes the post title lowercase so we can compare our keywords with it.
for title_keyword in title_keywords: #goes through our keywords
if title_keyword in lowercase_title: #if one of our keywords matches a title in the subreddit
requests = requests + 1 #tracks the number of requests
for post in requests:
comments = subreddit.submission.comment.lower() #makes the comment text lowercase
if comment_keyword in comments: #checks the comment text for our keyword
filled_requests += 1 #if someone has shared something, this post will be marked as filled
elif comment_keyword not in comments: #if no one has shared anything, the post url will be added to a list
submission_title_and_poster.update({subreddit.submission.title: subreddit.submission.name})
unfilled_post_URL.append(subreddit.submission.url)
for url in unfilled_post_URL: #B64 encodes each url and adds it to a new list
text = open(url, "rb")
text_read = text.read()
url_encoded = base64.encodestring(text_read)
unfilled_post_url_B64.append(url_encoded)
unfilled_requests += 1
Schedule (https://pypi.python.org/pypi/schedule) seemed to be what you need.
You will have to install their Python library:
pip install schedule
then modify the sample script :
import schedule
import time
def job():
schedule.every(10).seconds.do(job)
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)
while True:
schedule.run_pending()
time.sleep(1)
With your own function in job() and use the needed call for your timing.
Then you can run it with nohup.
Be advised that yu will need to start it again if you reboot.
here is the docs
i have a script that check the input link, if it's equivalent to one i specified in the code, then it will use my code, else it open the link in chrome.
i want to make that script kind of as a default browser, as to gain speed compared to opening the browser, getting the link with an help of an extension and then send it to my script using POST.
i used procmon to check where the process in question query the registry key and it seem like it tried to check HKCU\Software\Classes\ChromeHTML\shell\open\command so i added a some key there and in command, i edited the content of the key with my script path and arguments (-- %1)(-- only here for testing purposes)
unfortunately, once the program query this to send a link, windows prompt to choose a browser instead of my script, which isn't what i want.
Any idea?
in HKEY_CURRENT_USER\Software\Classes\ChromeHTML\Shell\open\command Replace the value in default with "C:\Users\samdra.r\AppData\Local\Programs\Python\Python39\pythonw.exe" "[Script_path_here]" %1
when launching a link, you'll be asked to set a default browser only once (it ask for a default browser for each change you make to the key):
i select chrome in my case
as for the python script, here it is:
import sys
import browser_cookie3
import requests
from bs4 import BeautifulSoup as BS
import re
import os
import asyncio
import shutil
def Prep_download(args):
settings = os.path.abspath(__file__.split("NewAltDownload.py")[0]+'/settings.txt')
if args[1] == "-d" or args[1] == "-disable":
with open(settings, 'r+') as f:
f.write(f.read()+"\n"+"False")
print("Background program disabled, exiting...")
exit()
if args[1] == "-e" or args[1] == "-enable":
with open(settings, 'r+') as f:
f.write(f.read()+"\n"+"True")
link = args[-1]
with open(settings, 'r+') as f:
try:
data = f.read()
osupath = data.split("\n")[0]
state = data.split("\n")[1]
except:
f.write(f.read()+"\n"+"True")
print("Possible first run, wrote True, exiting...")
exit()
if state == "True":
asyncio.run(Download_map(osupath, link))
async def Download_map(osupath, link):
if link.split("/")[2] == "osu.ppy.sh" and link.split("/")[3] == "b" or link.split("/")[3] == "beatmapsets":
with requests.get(link) as r:
link = r.url.split("#")[0]
BMID = []
id = re.sub("[^0-9]", "", link)
for ids in os.listdir(os.path.abspath(osupath+("/Songs/"))):
if re.match(r"(^\d*)",ids).group(0).isdigit():
BMID.append(re.match(r"(^\d*)",ids).group(0))
if id in BMID:
print(link+": Map already exist")
os.system('"'+os.path.abspath("C:/Program Files (x86)/Google/Chrome/Application/chrome.exe")+'" '+link)
return
if not id.isdigit():
print("Invalid id")
return
cj = browser_cookie3.load()
print("Downloading", link, "in", os.path.abspath(osupath+"/Songs/"))
headers = {"referer": link}
with requests.get(link) as r:
t = BS(r.text, 'html.parser').title.text.split("ยท")[0]
with requests.get(link+"/download", stream=True, cookies=cj, headers=headers) as r:
if r.status_code == 200:
try:
id = re.sub("[^0-9]", "", link)
with open(os.path.abspath(__file__.split("NewAltDownload.pyw")[0]+id+" "+t+".osz"), "wb") as otp:
otp.write(r.content)
shutil.copy(os.path.abspath(__file__.split("NewAltDownload.pyw")[0]+id+" "+t+".osz"),os.path.abspath(osupath+"/Songs/"+id+" "+t+".osz"))
except:
print("You either aren't connected on osu!'s website or you're limited by the API, in which case you now have to wait 1h and then try again.")
else:
os.system('"'+os.path.abspath("C:/Program Files (x86)/Google/Chrome/Application/chrome.exe")+'" '+link)
args = sys.argv
if len(args) == 1:
print("No arguments provided, exiting...")
exit()
Prep_download(args)
you obtain the argument %1 (the link) with sys.argv()[-1] (since sys.argv is a list) and from there, you just check if the link is similar to the link you're looking for (in my case it need to look like https://osu.ppy.sh/b/ or https://osu.ppy.sh/beatmapsets/)
if that's the case, do some code, else, just launch chrome with chrome executable and the link as argument. and if the id of the beatmap is found in the Songs folder, then i also open the link in chrome.
to make it work in the background i had to fight with subprocesses and even more tricks, and at the end, it started working suddenly with pythonw and .pyw extension.
I'm using Spotipy and LyricsGenius to open lyrics on a web browser from a terminal.
I can open a url for one song, but have to run the script each time to run consecutively. What are some ways to detect the end of a song using Spotipy?
import spotipy
import webbrowser
import lyricsgenius as lg
...
# Create our spotifyObject
spotifyObject = spotipy.Spotify(auth=token)
# Create out geniusObject
geniusObject = lg.Genius(access_token)
...
while True:
currently_playing = spotifyObject.currently_playing()
artist = currently_playing['item']['artists'][0]['name']
title = currently_playing['item']['name']
search_query = artist + " " + title
# if (currently_playing has changed):
song = geniusObject.search_songs(search_query)
song_url = song['hits'][0]['result']['url']
webbrowser.open(song_url)
webbrowser.open(song_url)
I was reading relevant threads such as this, this, and read through documentation but could not find an answer to my question if this could be handled by Spotipy. I would appreciate any suggestions, thank you.
I used time.sleep(length) with the argument 'length' standing for the remaining duration of a current track.
I am trying to pull the the number of followers from a list of Instagram accounts. I have tried using the "find" method within Requests, however, the string that I am looking for when I inspect the actual Instagram no longer appears when I print "r" from the code below.
Was able to get this code to run successfully find the past, however, will no longer run.
Webscraping Instagram follower count BeautifulSoup
import requests
user = "espn"
url = 'https://www.instagram.com/' + user
r = requests.get(url).text
start = '"edge_followed_by":{"count":'
end = '},"followed_by_viewer"'
print(r[r.find(start)+len(start):r.rfind(end)])
I receive a "-1" error, which means the substring from the find method was not found within the variable "r".
I think it's because of the last ' in start and first ' in end...this will work:
import requests
import re
user = "espn"
url = 'https://www.instagram.com/' + user
r = requests.get(url).text
followers = re.search('"edge_followed_by":{"count":([0-9]+)}',r).group(1)
print(followers)
'14061730'
I want to suggest an updated solution to this question, as the answer of Derek Eden above from 2019 does not work anymore, as stated in its comments.
The solution was to add the r' before the regular expression in the re.search like so:
follower_count = re.search(r'"edge_followed_by\\":{\\"count\\":([0-9]+)}', response).group(1)
This r'' is really important as without it, Python seems to treat the expression as regular string which leads to the query not giving any results.
Also the instagram page seems to have backslashes in the object we look for at least in my tests, so the code example i use is the following in Python 3.10 and working as of July 2022:
# get follower count of instagram profile
import os.path
import requests
import re
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# get instagram follower count
def get_instagram_follower_count(instagram_username):
url = "https://www.instagram.com/" + instagram_username
filename = "instagram.html"
try:
if not os.path.isfile(filename):
r = requests.get(url, verify=False)
print(r.status_code)
print(r.text)
response = r.text
if not r.status_code == 200:
raise Exception("Error: " + str(r.status_code))
with open(filename, "w") as f:
f.write(response)
else:
with open(filename, "r") as f:
response = f.read()
# print(response)
follower_count = re.search(r'"edge_followed_by\\":{\\"count\\":([0-9]+)}', response).group(1)
return follower_count
except Exception as e:
print(e)
return 0
print(get_instagram_follower_count('your.instagram.profile'))
The method returns the follower count as expected. Please note that i added a few lines to not hammer Instagrams webserver and get blocked while testing by just saving the response in a file.
This is a slice of the original html content that contains the part we are looking for:
... mRL&s=1\",\"edge_followed_by\":{\"count\":110070},\"fbid\":\"1784 ...
I debugged the regex in regexr, it seems to work just fine at this point in time.
There are many posts about the regex r prefix like this one
Also the documentation of the re package shows clearly that this is the issue with the code above.
Im trying to make a simple web browser in python. (i'm a novice programmer and this is the first time I'm using python.)
I'm aware that i have to save my links in a list and create a function to to go to my url overtime sees that list but i have no idea how to do that. I would very much appreciate it if someone could please help me with that.
Heading
Here's my code: #!/usr/bin/env python
import urllib
url = "http://google.com"
data = urllib.urlopen(url)
tokens = data.read().split()
List=[]
for token in tokens:
if token == '<body>':
print ''
elif token == '</body>':
print ''
#elif token[6:-2] == '<a href':
else:
enter code here
print token,
selectedLink = raw_input('Select a link:')