Python Bot with tweepy - python

I'm trying to code a bot on twitter using the tweepy lib but I'm not getting results. I need help for code to reply to tweets that mentioned me.
search = '#MoviesRandom'
numberOfTweets = 10
phrase = movies() # Here im using a function declared by me before. Doesn't having errors here
for tweet in tweepy.Cursor(api.search, search).items(numberOfTweets):
try:
tweetId = tweet.user.idusername
username = tweet.user.screen_name
api.update_status("#" + username + " " + phrase, in_reply_to_status_id=tweetId)
print("Replied with " + phrase)
except tweepy.TweepError as e:
print(e.reason)

It's likely caused by this line here
tweetId = tweet.user.idusername
There is no such object called idusername and as #Andy mentioned it, it should just be the id object.
tweetId = tweet.user.id

Related

Tweepy search_full_archive() missing 2 required positional arguments: 'label' and 'query'

I'm using Tweepy 3.10.0 to collect tweets containing specific keywords and hashtags for a single calendar day at a time. I recently upgraded from the standard Developer Account to the Premium Account to access the full archive. I know this changes the "search" function to "search_full_archive" and changes a couple other small syntax things. I thought I made the correct changes but I'm still getting this error. I've checked the Developer API reference.
consumer_key = '****'
consumer_secret = '****'
access_token = '****'
access_token_secret = '****'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
def get_tweets_withHashTags(query, startdate, enddate, count = 300):
tweets_hlist= []
tweets_list= []
qt=str(query)
for page in tweepy.Cursor(api.search_full_archive, environment_name='FullArchive', q=qt, fromDate=startdate,toDate=enddate,count=300, tweet_mode='extended').pages(100):
count = len(page)
print( "Count of tweets in each page for " + str(qt) + " : " + str(count))
for value in page:
hashList = value._json["entities"]["hashtags"]
flag = 0
for tag in hashList:
if qt.lower() in tag["text"].lower():
flag = 1
if flag==1:
tweets_hlist.append(value._json)
tweets_list.append(value._json)
print("tweets_hash_"+ query +": " + str(len(tweets_hlist)))
print("tweets_"+ query +": " + str(len(tweets_list)))
with open("/Users/Victor/Documents/tweetCollection/data/"+startdate +"/" + "query1_hash_" + str(startdate)+ "_" + str(enddate) + "_" +query+'.json', 'w') as outfile:
json.dump(tweets_hlist, outfile, indent = 2)
with open("/Users/Victor/Documents/tweetCollection/data/"+startdate +"/"+"query1_Contains_" + str(startdate)+ "_" + str(enddate) + "_" +query+'.json', 'w') as outfile:
json.dump(tweets_list, outfile, indent = 2)
return len(tweets_list)
query = ["KeyWord1","KeyWord2","KeyWord3",etc.]
for value in query:
get_tweets_withHashTags(value,"2020-04-21","2020-04-22")
According to the api's code https://github.com/tweepy/tweepy/blob/5b2dd086c2c5a08c3bf7be54400adfd823d19ea1/tweepy/api.py#L1144 api.search_full_archive has as arguments label (the environment name) and query. So changing
api.search_full_archive, environment_name='FullArchive', q=qt, fromDate=startdate,toDate=enddate,count=300, tweet_mode='extended'
to
api.search_full_archive, label='FullArchive', query=qt, fromDate=startdate,toDate=enddate
As for the tweet_mode='extended', it is not available for search_full_archive nor search_30_day. You can see how to access full text in https://github.com/tweepy/tweepy/issues/1461

Tweepy Error: Request exceeds account’s current package request limits

I'm using the Tweepy API to collect tweets containing specific keywords or hashtags through the standard Academic Research Developer Account. This allows me to collect 10,000,000 tweets per month. I'm trying to collect the tweets from one full calendar date at a time using the full archive search. I've gotten a rate limit error (despite the wait_on_rate_limit flag being set to true) and now this request limit error. I'm not totally sure why or what to change at this point?
consumer_key = '***'
consumer_secret = '***'
access_token = '***'
access_token_secret = '***'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
def get_tweets_withHashTags(query, startdate, enddate, count = 300):
tweets_hlist= []
tweets_list= []
qt=str(query)
for page in tweepy.Cursor(api.search_full_archive, label='myLabel', query=qt, fromDate=startdate+'0000',toDate=enddate+'0000',maxResults=100).pages(100):
count = len(page)
print( "Count of tweets in each page for " + str(qt) + " : " + str(count))
for value in page:
hashList = value._json["entities"]["hashtags"]
flag = 0
for tag in hashList:
if qt.lower() in tag["text"].lower():
flag = 1
if flag==1:
tweets_hlist.append(value._json)
tweets_list.append(value._json)
print("tweets_hash_"+ query +": " + str(len(tweets_hlist)))
print("tweets_"+ query +": " + str(len(tweets_list)))
with open("/Users/Victor/Documents/tweetCollection/data/"+startdate +"/" + "query1_hash_" + str(startdate)+ "_" + str(enddate) + "_" +query+'.json', 'w') as outfile:
json.dump(tweets_hlist, outfile, indent = 2)
with open("/Users/Victor/Documents/tweetCollection/data/"+startdate +"/"+"query1_Contains_" + str(startdate)+ "_" + str(enddate) + "_" +query+'.json', 'w') as outfile:
json.dump(tweets_list, outfile, indent = 2)
return len(tweets_list)
query = ["keyword1","keyword2","keyword3", etc. ]
for value in query:
get_tweets_withHashTags(value,"20200422","20200423")
At the time of writing this answer, Tweepy does not support version 2 of the Twitter API (although there is a Pull Request to add initial support). The api.search_full_archive is actually hitting the v1.1 Premium Full Archive search, which has a lower request and Tweet cap volume limit, so that is why you are seeing an error about exceeding your package.
For the Full Archive / All Tweets endpoint in v2 of the API, you need a Project in the Academic Track, and an App in that Project. You'll need to point your code at the /2/tweets/search/all endpoint. If you are using Python, there is sample code in the TwitterDev repo. This uses the requests library, rather than using a full API library like Tweepy.

how to make text clickable in python

How to make text clickable ?
class ComplainceServer():
def __init__(self, jira_server, username, password, encoding='utf-8'):
if jira_server is None:
error('No server provided.')
#print(jira_server)
self.jira_server = jira_server
self.username = username
self.password = password
self.encoding = encoding
def checkComplaince(self, appid, toAddress):
query = "/rest/api/2/search?jql=issuetype = \"Application Security\" AND \"Prod Due Date\" < now()
request = self._createRequest()
response = request.get(query, contentType='application/json')
# Parse result
if response.status == 200 and action == "warn":
data = Json.loads(response.response)
print "#### Issues found"
issues = {}
msg = "WARNING: The below tickets are non-complaint in fortify, please fix them or raise exception.\n"
issue1 = data['issues'][0]['key']
for item in data['issues']:
issue = item['key']
issues[issue] = item['fields']['summary']
print u"* {0} - {1}".format(self._link(issue), item['fields']['summary'])
print "\n"
data = u" {0} - {1}".format(self._link(issue), item['fields']['summary'])
msg += '\n'+ data
SOCKET_TIMEOUT = 30000 # 30s
email = SimpleEmail()
email.setHostName('smtp.com')
email.setSmtpPort(25)
email.setSocketConnectionTimeout(SOCKET_TIMEOUT);
email.setSocketTimeout(SOCKET_TIMEOUT);
email.setFrom('R#group.com')
for toAddress in toAddress.split(','):
email.addTo(toAddress)
email.setSubject('complaince report')
email.addHeader('X-Priority', '1')
email.setMsg(str(msg))
email.send()
def _createRequest(self):
return HttpRequest(self.jira_server, self.username, self.password)
def _link(self, issue):
return '[{0}]({1}/browse/{0})'.format(issue, self.jira_server['url'])
This is the calling function. APPid and toAddress will be passed in from different UI.
from Complaince import ComplainceServer
jira = ComplainceServer(jiraServer, username, password)
issues = jira.checkComplaince(appid, toAddress)
I want issueid to be an embedded link.
currently the email sends as below:
MT-4353(https://check.com/login/browse/MT-4353) - Site Sc: DM isg_cq5
but i want [MT-4353] as hyperlink to the URL https://check.com/login/browse/MT-4353
Firstly, you need to encode your email as html. I'm not familiar with the library you are using so I cannot give an example of this.
I have replaced a snippet of your code with html syntax just to illustrate the point that you are meant to use html syntax to have clickable links in an email.
msg = "<p>WARNING: The below tickets are non-compliant in fortify, please fix them or raise exception.</p>"
issue1 = data['issues'][0]['key']
for item in data['issues']:
issue = item['key']
issues[issue] = item['fields']['summary']
data = u"<a href='{0}'>{1}</a>".format(self._link(issue), item['fields']['summary'])
msg += '<br />'+ data
In future, please ask your questions carefully as your title does not question does not indicate what you are actually meaning. You also have spelling mistakes: Compliant
Oh, I missed the point of self._link(issue) not returning the correct link. It returns MT-4353(https://check.com/login/browse/MT-4353) so you are going to need to extract the link part between the brackets. I suggest a regular expression.

Reddit to Twitter python bot error (python: praw & tweepy used)

I have a problem with this part of my code:
timeline = tweepy.Cursor(api.user_timeline).items(1)
for submission in reddit.subreddit('StonerPhilosophy').top('hour' , limit=1):
if len(submission.title) <= 280:
try:
api.update_status(status = submission.title)
sleep(120)
for tweet in timeline:
api.update_status(status = 'Credits: Posted by /u/' + str(submission.author) + 'url: redd.it/' + str(submission.id) , in_reply_to_status_id = tweet.id)
except:
print('Fail')
elif len(submission.title) <= 560:
try:
s = submission.title
first_half = s[0:len(s)//2]
second_half = s[len(s)//2 if len(s)%2 == 0 else ((len(s)//2)+1):]
api.update_status(status = first_half)
for tweet in timeline:
api.update_status(status = second_half) , in_reply_to_status_id = tweet.id)
sleep(120)
for tweet in timeline:
api.update_status(status = 'Credits: Posted by /u/' + str(submission.author) + 'url: redd.it/' + str(submission.id)) , in_reply_to_status_id = tweet.id)
except:
print('Fail')
when I try to run it I get 'Invalid Syntax' as an error. The problem is with 'tweet.id' (but for some reason the second and the third, no problem with the first one... :/)
what i am doing with 'tweet.id' is basically getting the id of my last tweet so that i can reply to it with either the credits or the second part of the tweet (I have to break some tweet in two parts if they are longer than 280 char) and then another reply with the credits. I have been trying to fix this for hours :/
Not sure what your API calls should look like, but assuming
api.update_status() takes a status, and a tweetID; you are only passing it a status. If you look at the brackets on it,
api.update_status(status = second_half) is all that you would call, if it didn't fail during run because of the rest of the line.
you might have better luck with:
api.update_status(status = second_half, in_reply_to_status_id = tweet.id)
More information on your API/code would be helpful.

Why is one of my functions running twice?

The function search_for_song(pbody) is running twice, i can't figure out why.
would like some help, just started learning python a few days ago.
Here's the full code:
#a bot that replies with youtube songs that were mentioned in the comments
import traceback
import praw
import time
import sqlite3
import requests
from lxml import html
import socket
import errno
import re
import urllib
from bs4 import BeautifulSoup
import sys
import urllib2
'''USER CONFIGURATION'''
APP_ID = ""
APP_SECRET = ""
APP_URI = ""
APP_REFRESH = ""
# https://www.reddit.com/comments/3cm1p8/how_to_make_your_bot_use_oauth2/
USERAGENT = "Python automatic youtube linkerbot"
# This is a short description of what the bot does.
# For example "Python automatic replybot v2.0 (by /u/GoldenSights)"
SUBREDDIT = "kqly"
# This is the sub or list of subs to scan for new posts. For a single sub, use "sub1". For multiple subreddits, use "sub1+sub2+sub3+..."
DO_SUBMISSIONS = False
DO_COMMENTS = True
# Look for submissions, comments, or both.
KEYWORDS = ["linksong"]
# These are the words you are looking for
KEYAUTHORS = []
# These are the names of the authors you are looking for
# The bot will only reply to authors on this list
# Keep it empty to allow anybody.
#REPLYSTRING = "**Hi, I'm a bot.**"
# This is the word you want to put in reply
MAXPOSTS = 100
# This is how many posts you want to retrieve all at once. PRAW can download 100 at a time.
WAIT = 30
# This is how many seconds you will wait between cycles. The bot is completely inactive during this time.
CLEANCYCLES = 10
# After this many cycles, the bot will clean its database
# Keeping only the latest (2*MAXPOSTS) items
'''All done!'''
try:
import bot
USERAGENT = bot.aG
except ImportError:
pass
print('Opening SQL Database')
sql = sqlite3.connect('sql.db')
cur = sql.cursor()
cur.execute('CREATE TABLE IF NOT EXISTS oldposts(id TEXT)')
print('Logging in...')
r = praw.Reddit(USERAGENT)
r.set_oauth_app_info(APP_ID, APP_SECRET, APP_URI)
r.refresh_access_information(APP_REFRESH)
def replybot():
print('Searching %s.' % SUBREDDIT)
subreddit = r.get_subreddit(SUBREDDIT)
posts = []
if DO_SUBMISSIONS:
posts += list(subreddit.get_new(limit=MAXPOSTS))
if DO_COMMENTS:
posts += list(subreddit.get_comments(limit=MAXPOSTS))
posts.reverse()
for post in posts:
#print ("Searching for another the next comment")
# Anything that needs to happen every loop goes here.
pid = post.id
try:
pauthor = post.author.name
except AttributeError:
# Author is deleted. We don't care about this post.
continue
if pauthor.lower() == r.user.name.lower():
# Don't reply to yourself, robot!
print('Will not reply to myself.')
continue
if KEYAUTHORS != [] and all(auth.lower() != pauthor for auth in KEYAUTHORS):
# This post was not made by a keyauthor
continue
cur.execute('SELECT * FROM oldposts WHERE ID=?', [pid])
if cur.fetchone():
# Post is already in the database
continue
if isinstance(post, praw.objects.Comment):
pbody = post.body
else:
pbody = '%s %s' % (post.title, post.selftext)
pbody = pbody.lower()
if not any(key.lower() in pbody for key in KEYWORDS):
# Does not contain our keyword
continue
cur.execute('INSERT INTO oldposts VALUES(?)', [pid])
sql.commit()
print('Replying to %s by %s' % (pid, pauthor))
try:
if search_for_song(pbody):
# pbody=pbody[8:]
# pbody=pbody.replace("\n", "")
temp=pbody[8:].lstrip()
post.reply("[**"+temp+"**]("+search_for_song(pbody)+") \n ---- \n ^^This ^^is ^^an ^^automated ^^message ^^by ^^a ^^bot, ^^if ^^you ^^found ^^any ^^bug ^^and/or ^^willing ^^to ^^contact ^^me. [**^^Press ^^here**](https://www.reddit.com/message/compose?to=itailitai)")
except praw.errors.Forbidden:
print('403 FORBIDDEN - is the bot banned from %s?' % post.subreddit.display_name)
def search_for_song(pbody):
#print("in search_for_song")
song=pbody
if len(song)>8:
song=song[8:]
if song.isspace()==True or song=='':
return False
else:
print("Search if %s exists in the database" % song )
#HEADERS = {'User-Agent': 'Song checker - check if songs exists by searching this website, part of a bot for reddit'}
author, song_name = song_string_generator(song)
url = 'http://www.songlyrics.com/'+author+'/'+song_name+'-lyrics/'
print url
#page = requests.get(url, HEADERS)
check=1
while check==1:
try:
headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; rv:40.0) Gecko/20100101 Firefox/40.0' }
req = urllib2.Request(url, None, headers)
page= urllib2.urlopen(req)
check=2
except socket.error as error:
pass
except Exception:
print('An error occured while tryinc to verify song existence')
return False
soup = BeautifulSoup(page.read(), "lxml")
if "Please check the spelling and try again" not in soup.get_text():
print ("Song was found in the database!")
result=first_youtube(song)
return result
else:
print ("Song was not found in the database!")
return False
def song_string_generator(song):
#print("in song_string_generator")
song=song
author,song_name= '',''
try:
if "-" in song:
l=song.split('-', 1 )
print ("2 ",l)
author=l[0]
song_name=l[1]
elif "by" in song:
l=song.split('by', 1 )
print ("2 ",l)
author=l[1]
song_name=l[0]
song_name=" ".join(song_name.split())
author=" ".join(author.split())
print (author,song_name)
if author == 'guns and roses':
author="guns n' roses"
song_name=song_name.replace("\n", "")
author=author.replace("\n", "")
author=author.replace(" ", "-")
song_name=song_name.replace(" ", "-")
author=author.replace("'", "-")
song_name=song_name.replace("'", "-")
song_name=song_name.rstrip()
song_name=" ".join(song_name.split())
return author, song_name
except:
print ("No song was mentioned in the comment!")
return False
def first_youtube(textToSearch):
reload(sys)
sys.setdefaultencoding('UTF-8')
query_string = textToSearch
try:
html_content = urllib.urlopen("http://www.youtube.com/results?search_query=" + query_string)
search_results = re.findall(r'href=\"\/watch\?v=(.{11})', html_content.read().decode())
result="http://www.youtube.com/watch?v=" + search_results[0]
return result
except IOError:
print ("IOError Occured while contacting Youtube!")
except Exception:
print ("A non IOError Occured while contacting Youtube!")
return False
cycles = 0
while True:
try:
replybot()
cycles += 1
except Exception as e:
traceback.print_exc()
if cycles >= CLEANCYCLES:
print('Cleaning database')
cur.execute('DELETE FROM oldposts WHERE id NOT IN (SELECT id FROM oldposts ORDER BY id DESC LIMIT ?)', [MAXPOSTS * 2])
sql.commit()
cycles = 0
print('Running again in %d seconds \n' % WAIT)
time.sleep(WAIT)
This is the output I'm getting:
Opening SQL Database
Logging in...
Searching kqly.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Replying to d0kwcrs by itailitai
Search if guns and roses - paradise city exists in the database
('2 ', [u' guns and roses ', u' paradise city'])
(u'guns and roses', u'paradise city')
http://www.songlyrics.com/guns-n--roses/paradise-city-lyrics/
Song was found in the database!
Search if guns and roses - paradise city exists in the database
('2 ', [u' guns and roses ', u' paradise city'])
(u'guns and roses', u'paradise city')
http://www.songlyrics.com/guns-n--roses/paradise-city-lyrics/
Song was found in the database!
Running again in 30 seconds
it's a bot for reddit that replies with the youtube video of a song that was mentioned in the comments, if anyone wants to know.
With a cursory reading of your code you have
if search_for_song(pbody):
# do stuff..
post.reply("[**"+temp+"**]("+search_for_song(pbody)+") \n ---- \n ^^This ^^is ^^an ^^automated ^^message ^^by ^^a ^^bot, ^^if ^^you ^^found ^^any ^^bug ^^and/or ^^willing ^^to ^^contact ^^me. [**^^Press ^^here**](https://www.reddit.com/message/compose?to=itailitai)")
You call the function in the start of the if and in your post.reply line
RESPONDING TO COMMENTS
If you need to check the results but don't want to call twice simply save the output
res = search_for_song(pbody):
if res:
#...
post.reply(... + res + ...)
I've just quickly searched for the function call search_for_song, I suppose the following piece of code is resulting in 2 function calls.
if search_for_song(pbody):
# pbody=pbody[8:]
# pbody=pbody.replace("\n", "")
temp=pbody[8:].lstrip()
post.reply("[**"+temp+"**]("+search_for_song(pbody)+")
Once at the if statement, and once inside the post.reply statement.

Categories

Resources