Reddit to Twitter python bot error (python: praw & tweepy used) - python

I have a problem with this part of my code:
timeline = tweepy.Cursor(api.user_timeline).items(1)
for submission in reddit.subreddit('StonerPhilosophy').top('hour' , limit=1):
if len(submission.title) <= 280:
try:
api.update_status(status = submission.title)
sleep(120)
for tweet in timeline:
api.update_status(status = 'Credits: Posted by /u/' + str(submission.author) + 'url: redd.it/' + str(submission.id) , in_reply_to_status_id = tweet.id)
except:
print('Fail')
elif len(submission.title) <= 560:
try:
s = submission.title
first_half = s[0:len(s)//2]
second_half = s[len(s)//2 if len(s)%2 == 0 else ((len(s)//2)+1):]
api.update_status(status = first_half)
for tweet in timeline:
api.update_status(status = second_half) , in_reply_to_status_id = tweet.id)
sleep(120)
for tweet in timeline:
api.update_status(status = 'Credits: Posted by /u/' + str(submission.author) + 'url: redd.it/' + str(submission.id)) , in_reply_to_status_id = tweet.id)
except:
print('Fail')
when I try to run it I get 'Invalid Syntax' as an error. The problem is with 'tweet.id' (but for some reason the second and the third, no problem with the first one... :/)
what i am doing with 'tweet.id' is basically getting the id of my last tweet so that i can reply to it with either the credits or the second part of the tweet (I have to break some tweet in two parts if they are longer than 280 char) and then another reply with the credits. I have been trying to fix this for hours :/

Not sure what your API calls should look like, but assuming
api.update_status() takes a status, and a tweetID; you are only passing it a status. If you look at the brackets on it,
api.update_status(status = second_half) is all that you would call, if it didn't fail during run because of the rest of the line.
you might have better luck with:
api.update_status(status = second_half, in_reply_to_status_id = tweet.id)
More information on your API/code would be helpful.

Related

JSON request returns a blank response randomly causing script to fail

I have a python script that is failing when my request returns a blank JSON response. The code loops through the function repeatedly and is successful 99% of the time, but fails every once in a while when a blank JSON response is received.
i = 0
while i < 1000:
r = 0
while r < 4:
time.sleep(5)
response = c.get_quote(symbol)
jsonData = response.json()
for key in jsonData:
jsonkey = key
if jsonkey == symbol:
print (i)
print("you are good", response)
print(jsonData)
print ()
break
else:
print("you have a problem, jsonkey=")
print()
print (jsonData)
print()
r =+ 1
current_price = response.json()[symbol]["lastPrice"]
i += 1
The 'While r < 4:' loop was added in an attempt to add error handling. If I can figure out what to trap on, I would retry the response = c.get_quote(symbol) but the blank JSON response is slipping past the if jsonkey == symbol logic.
The error message received is "current_price = response.json()[symbol]["lastPrice"]
KeyError: 'NVCR'"
and the output from print (jsonData) is: {}
as opposed to a healthy response which contains the symbol as a key with additional data to follow. The request is returning a response [200] so unfortunately it isn't that simple...
Instead of validating the key with jsonkey == symbol, use a try-except block to catch the blank response errors and handle them.
For instance:
i = 0
while i < 1000:
time.sleep(5)
response = c.get_quote(symbol)
jsonData = response.json()
try:
for key in jsonData:
jsonkey = key
if jsonkey == symbol:
print (i)
print("you are good", response)
print(jsonData + "\n")
break
except:
print("you have a problem \n")
print (jsonData + "\n")
current_price = response.json()[symbol]["lastPrice"]
i += 1
#DeepSpace is also likely correct in the comments. My guess is that the server that you're pulling json data from (nsetools?) is throttling your requests, so it might be worth looking deeper in their docs to see if you can find a limit, and then use time.sleep() to stay under it.
Edit: If you are using nsetools, their api seems to be built by reverse-engineering the api that the nse website is built on and performing json api calls to urls such as this one (these can be found in this source code file). Because of this, it's not documented what the rate limit is, as this data is scraped directly from NSE and subject to their rate limit. Using this data is against NSE's terms of use (unless they have express written consent from the government of India which for all I know nsetools has, but I assume you do not.)
OK so thanks to #DeepSpace and #CarlHR I think I have a solution but it still seems like there is too much code for what I am trying to accomplish. This works:
i = 0
while i < 1000:
r = 1
while r < 5:
time.sleep(1)
response = c.get_quote(symbol)
jsonData = response.json()
try:
current_price = response.json()[symbol]["lastPrice"]
print ("Looks Good, moving on")
break
except KeyError:
print ("There was an problem with the JSON response,
trying again. retry number:", r)
print (jsonData)
print ()
r += 1
i += 1
print ("Moving on to the next iteration")

data scraping on discord using python

I'm currently trying to learn web scraping and decided to scrape some discord data. Code follows:
import requests
import json
def retrieve_messages(channelid):
num=0
headers = {
'authorization': 'here we enter the authorization code'
}
r = requests.get(
f'https://discord.com/api/v9/channels/{channelid}/messages?limit=100',headers=headers
)
jsonn = json.loads(r.text)
for value in jsonn:
print(value['content'], '\n')
num=num+1
print('number of messages we collected is',num)
retrieve_messages('server id goes here')
The problem: when I tried changing the limit here messages?limit=100 apparently it only accepts numbers between 0 and 100, meaning that the maximum number of messages I can get is 100. I tried changing this number to 900, for example, to scrape more messages. But then I get the error TypeError: string indices must be integers.
Any ideas on how I could get, possibly, all the messages in a channel?
Thank you very much for reading!
APIs that return a bunch of records are almost always limited to some number of items.
Otherwise, if a large quantity of items is requested, the API may fail due to being out of memory.
For that purpose, most APIs implement pagination using limit, before and after parameters where:
limit: tells you how many messages to fetch
before: get messages before this message ID
after: get messages after this message ID
Discord API is no exception as the documentation tells us.
Here's how you do it:
First, you will need to query the data multiple times.
For that, you can use a while loop.
Make sure to add an if the condition that will prevent the loop from running indefinitely - I added a check whether there are any messages left.
while True:
# ... requests code
jsonn = json.loads(r.text)
if len(jsonn) == 0:
break
for value in jsonn:
print(value['content'], '\n')
num=num+1
Define a variable that has the last message that you fetched and save the last message id that you already printed
def retrieve_messages(channelid):
last_message_id = None
while True:
# ...
for value in jsonn:
print(value['content'], '\n')
last_message_id = value['id']
num=num+1
Now on the first run the last_message_id is None, and on subsequent requests it has the last message you printed.
Use that to build your query
while True:
query_parameters = f'limit={limit}'
if last_message_id is not None:
query_parameters += f'&before={last_message_id}'
r = requests.get(
f'https://discord.com/api/v9/channels/{channelid}/messages?{query_parameters}',headers=headers
)
# ...
Note: discord servers give you the latest message first, so you have to use the before parameter
Here's a fully working example of your code
import requests
import json
def retrieve_messages(channelid):
num = 0
limit = 10
headers = {
'authorization': 'auth header here'
}
last_message_id = None
while True:
query_parameters = f'limit={limit}'
if last_message_id is not None:
query_parameters += f'&before={last_message_id}'
r = requests.get(
f'https://discord.com/api/v9/channels/{channelid}/messages?{query_parameters}',headers=headers
)
jsonn = json.loads(r.text)
if len(jsonn) == 0:
break
for value in jsonn:
print(value['content'], '\n')
last_message_id = value['id']
num=num+1
print('number of messages we collected is',num)
retrieve_messages('server id here')
To answer this question, we must look at the discord API. Googling "discord api get messages" gets us the developer reference for the discord API. The particular endpoint you are using is documented here:
https://discord.com/developers/docs/resources/channel#get-channel-messages
The limit is documented here, along with the around, before, and after parameters. Using one of these parameters (most likely after) we can paginate the results.
In pseudocode, it would look something like this:
offset = 0
limit = 100
all_messages=[]
while True:
r = requests.get(
f'https://discord.com/api/v9/channels/{channelid}/messages?limit={limit}&after={offset}',headers=headers
)
all_messages.append(extract messages from response)
if (number of responses < limit):
break # We have reached the end of all the messages, exit the loop
else:
offset += limit
By the way, you will probably want to print(r.text) right after the response comes in so you can see what the response looks like. It will save a lot of confusion.
Here is my solution. Feedback is welcome as I'm newish to Python. Kindly provide me w/ credit/good-luck if using this. Thank you =)
import requests
CHANNELID = 'REPLACE_ME'
HEADERS = {'authorization': 'REPLACE_ME'}
LIMIT=100
all_messages = []
r = requests.get(f'https://discord.com/api/v9/channels/{CHANNELID}/messages?limit={LIMIT}',headers=HEADERS)
all_messages.extend(r.json())
print(f'len(r.json()) is {len(r.json())}','\n')
while len(r.json()) == LIMIT:
last_message_id = r.json()[-1].get('id')
r = requests.get(f'https://discord.com/api/v9/channels/{CHANNELID}/messages?limit={LIMIT}&before={last_message_id}',headers=HEADERS)
all_messages.extend(r.json())
print(f'len(r.json()) is {len(r.json())} and last_message_id is {last_message_id} and len(all_messages) is {len(all_messages)}')

Python Bot with tweepy

I'm trying to code a bot on twitter using the tweepy lib but I'm not getting results. I need help for code to reply to tweets that mentioned me.
search = '#MoviesRandom'
numberOfTweets = 10
phrase = movies() # Here im using a function declared by me before. Doesn't having errors here
for tweet in tweepy.Cursor(api.search, search).items(numberOfTweets):
try:
tweetId = tweet.user.idusername
username = tweet.user.screen_name
api.update_status("#" + username + " " + phrase, in_reply_to_status_id=tweetId)
print("Replied with " + phrase)
except tweepy.TweepError as e:
print(e.reason)
It's likely caused by this line here
tweetId = tweet.user.idusername
There is no such object called idusername and as #Andy mentioned it, it should just be the id object.
tweetId = tweet.user.id

why is web scraping with Robobrowser in Python causing "Task was destroyed but it is pending!"

I'm writing code for a Discord bot which searches different game hosting sites. It searches for an image and a description in the html of the page using Robobrowser.
Before, I had no issue. I just added a case for the Google Play Store, however, and now it's telling me "Task was destroyed but it is pending!" when it tries to get those items through a GPS link.
I don't know why this is happening, nor do I know how to fix it. I looked up all other "Task was destroyed..." cases, but none were similar to mine.
Here is my code:
I've tried threading it and awaiting it. Robobrowser cannot be awaited, so that didn't work. Threading also didn't work because I need the functions to return a string. I know it's possible to return something while using a different thread, but it was overly complex for what I'm trying to fix.
def get_embed_caption(url):
print("Getting caption")
desc = None
if url != "No Link":
try:
browser.open(url)
desc = "something"
except:
print("Caption ERROR with url")
desc = None
if desc != None:
if "itch.io" in url and " " not in url:
parse = browser.parsed
parse = str(parse)
pos2 = parse.find("og:description")
pos1 = parse.rfind('content=', 0, pos2)
desc_type = parse[pos1+8:pos1+9]
pos2 = parse.rfind(desc_type, 0, pos2-2)
pos1 = parse.find(desc_type, pos1)
desc = parse[pos1+1:pos2]
if len(desc) > 1000:
desc = desc[:1000]
if "/><" in desc:
pos = parse.find("formatted_description user_formatted")
pos = parse.find("<p>", pos)
desc = parse[pos+3:parse.find('</p>', pos)]
elif "steam" in url and " " not in url:
parse = browser.parsed
parse = str(parse)
pos = parse.find("game_description_snippet")
pos = parse.find('"', pos)
pos = parse.find('>', pos)
desc = parse[pos+1:parse.find('<', pos+1)]
elif "play.google" in url and " " not in url:
parse = browser.parsed
parse = str(parse)
pos = parse.find('aria-label="Description"')
print(parse[pos:pos+20])
pos = parse.rfind("content", 0, pos)
print(parse[pos:pos+20])
pos = parse.find('"', pos)
print(parse[pos:pos+20])
desc = parse[pos+1:parse.find('"', pos+1)]
else:
print("No caption")
desc = None
if desc != None:
desc = desc.replace("<p>", "")
desc = desc.replace("</p>", "")
desc = desc.replace("<em>", "`")
desc = desc.replace("</em>", "`")
desc = desc.replace("<br>", "")
desc = desc.replace("<br/>", "")
return desc
Task was destroyed but it is pending!
task: <Task pending coro=<Client._run_event() running at C:\Users\Gman\AppData\Local\Programs\Python\Python36\lib\site-packages\discord\client.py:307> wait_for=<Future pending cb=[BaseSelectorEventLoop._sock_connect_done(696)(), <TaskWakeupMethWrapper object at 0x0000000005DEAA98>()]>>
It seems to run through the process just fine, but right when it finishes, it crashes.
Google Play Store has a lot of gibberish HTML, probably to make web scraping there difficult on purpose. This led to it taking more than 10 seconds to parse the page. I do not know what caused the task to destroy itself, however.
The fix was to use Python's play-scraper library, which was 20x faster, taking less than half a second to gather the info.

Why is one of my functions running twice?

The function search_for_song(pbody) is running twice, i can't figure out why.
would like some help, just started learning python a few days ago.
Here's the full code:
#a bot that replies with youtube songs that were mentioned in the comments
import traceback
import praw
import time
import sqlite3
import requests
from lxml import html
import socket
import errno
import re
import urllib
from bs4 import BeautifulSoup
import sys
import urllib2
'''USER CONFIGURATION'''
APP_ID = ""
APP_SECRET = ""
APP_URI = ""
APP_REFRESH = ""
# https://www.reddit.com/comments/3cm1p8/how_to_make_your_bot_use_oauth2/
USERAGENT = "Python automatic youtube linkerbot"
# This is a short description of what the bot does.
# For example "Python automatic replybot v2.0 (by /u/GoldenSights)"
SUBREDDIT = "kqly"
# This is the sub or list of subs to scan for new posts. For a single sub, use "sub1". For multiple subreddits, use "sub1+sub2+sub3+..."
DO_SUBMISSIONS = False
DO_COMMENTS = True
# Look for submissions, comments, or both.
KEYWORDS = ["linksong"]
# These are the words you are looking for
KEYAUTHORS = []
# These are the names of the authors you are looking for
# The bot will only reply to authors on this list
# Keep it empty to allow anybody.
#REPLYSTRING = "**Hi, I'm a bot.**"
# This is the word you want to put in reply
MAXPOSTS = 100
# This is how many posts you want to retrieve all at once. PRAW can download 100 at a time.
WAIT = 30
# This is how many seconds you will wait between cycles. The bot is completely inactive during this time.
CLEANCYCLES = 10
# After this many cycles, the bot will clean its database
# Keeping only the latest (2*MAXPOSTS) items
'''All done!'''
try:
import bot
USERAGENT = bot.aG
except ImportError:
pass
print('Opening SQL Database')
sql = sqlite3.connect('sql.db')
cur = sql.cursor()
cur.execute('CREATE TABLE IF NOT EXISTS oldposts(id TEXT)')
print('Logging in...')
r = praw.Reddit(USERAGENT)
r.set_oauth_app_info(APP_ID, APP_SECRET, APP_URI)
r.refresh_access_information(APP_REFRESH)
def replybot():
print('Searching %s.' % SUBREDDIT)
subreddit = r.get_subreddit(SUBREDDIT)
posts = []
if DO_SUBMISSIONS:
posts += list(subreddit.get_new(limit=MAXPOSTS))
if DO_COMMENTS:
posts += list(subreddit.get_comments(limit=MAXPOSTS))
posts.reverse()
for post in posts:
#print ("Searching for another the next comment")
# Anything that needs to happen every loop goes here.
pid = post.id
try:
pauthor = post.author.name
except AttributeError:
# Author is deleted. We don't care about this post.
continue
if pauthor.lower() == r.user.name.lower():
# Don't reply to yourself, robot!
print('Will not reply to myself.')
continue
if KEYAUTHORS != [] and all(auth.lower() != pauthor for auth in KEYAUTHORS):
# This post was not made by a keyauthor
continue
cur.execute('SELECT * FROM oldposts WHERE ID=?', [pid])
if cur.fetchone():
# Post is already in the database
continue
if isinstance(post, praw.objects.Comment):
pbody = post.body
else:
pbody = '%s %s' % (post.title, post.selftext)
pbody = pbody.lower()
if not any(key.lower() in pbody for key in KEYWORDS):
# Does not contain our keyword
continue
cur.execute('INSERT INTO oldposts VALUES(?)', [pid])
sql.commit()
print('Replying to %s by %s' % (pid, pauthor))
try:
if search_for_song(pbody):
# pbody=pbody[8:]
# pbody=pbody.replace("\n", "")
temp=pbody[8:].lstrip()
post.reply("[**"+temp+"**]("+search_for_song(pbody)+") \n ---- \n ^^This ^^is ^^an ^^automated ^^message ^^by ^^a ^^bot, ^^if ^^you ^^found ^^any ^^bug ^^and/or ^^willing ^^to ^^contact ^^me. [**^^Press ^^here**](https://www.reddit.com/message/compose?to=itailitai)")
except praw.errors.Forbidden:
print('403 FORBIDDEN - is the bot banned from %s?' % post.subreddit.display_name)
def search_for_song(pbody):
#print("in search_for_song")
song=pbody
if len(song)>8:
song=song[8:]
if song.isspace()==True or song=='':
return False
else:
print("Search if %s exists in the database" % song )
#HEADERS = {'User-Agent': 'Song checker - check if songs exists by searching this website, part of a bot for reddit'}
author, song_name = song_string_generator(song)
url = 'http://www.songlyrics.com/'+author+'/'+song_name+'-lyrics/'
print url
#page = requests.get(url, HEADERS)
check=1
while check==1:
try:
headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; rv:40.0) Gecko/20100101 Firefox/40.0' }
req = urllib2.Request(url, None, headers)
page= urllib2.urlopen(req)
check=2
except socket.error as error:
pass
except Exception:
print('An error occured while tryinc to verify song existence')
return False
soup = BeautifulSoup(page.read(), "lxml")
if "Please check the spelling and try again" not in soup.get_text():
print ("Song was found in the database!")
result=first_youtube(song)
return result
else:
print ("Song was not found in the database!")
return False
def song_string_generator(song):
#print("in song_string_generator")
song=song
author,song_name= '',''
try:
if "-" in song:
l=song.split('-', 1 )
print ("2 ",l)
author=l[0]
song_name=l[1]
elif "by" in song:
l=song.split('by', 1 )
print ("2 ",l)
author=l[1]
song_name=l[0]
song_name=" ".join(song_name.split())
author=" ".join(author.split())
print (author,song_name)
if author == 'guns and roses':
author="guns n' roses"
song_name=song_name.replace("\n", "")
author=author.replace("\n", "")
author=author.replace(" ", "-")
song_name=song_name.replace(" ", "-")
author=author.replace("'", "-")
song_name=song_name.replace("'", "-")
song_name=song_name.rstrip()
song_name=" ".join(song_name.split())
return author, song_name
except:
print ("No song was mentioned in the comment!")
return False
def first_youtube(textToSearch):
reload(sys)
sys.setdefaultencoding('UTF-8')
query_string = textToSearch
try:
html_content = urllib.urlopen("http://www.youtube.com/results?search_query=" + query_string)
search_results = re.findall(r'href=\"\/watch\?v=(.{11})', html_content.read().decode())
result="http://www.youtube.com/watch?v=" + search_results[0]
return result
except IOError:
print ("IOError Occured while contacting Youtube!")
except Exception:
print ("A non IOError Occured while contacting Youtube!")
return False
cycles = 0
while True:
try:
replybot()
cycles += 1
except Exception as e:
traceback.print_exc()
if cycles >= CLEANCYCLES:
print('Cleaning database')
cur.execute('DELETE FROM oldposts WHERE id NOT IN (SELECT id FROM oldposts ORDER BY id DESC LIMIT ?)', [MAXPOSTS * 2])
sql.commit()
cycles = 0
print('Running again in %d seconds \n' % WAIT)
time.sleep(WAIT)
This is the output I'm getting:
Opening SQL Database
Logging in...
Searching kqly.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Will not reply to myself.
Replying to d0kwcrs by itailitai
Search if guns and roses - paradise city exists in the database
('2 ', [u' guns and roses ', u' paradise city'])
(u'guns and roses', u'paradise city')
http://www.songlyrics.com/guns-n--roses/paradise-city-lyrics/
Song was found in the database!
Search if guns and roses - paradise city exists in the database
('2 ', [u' guns and roses ', u' paradise city'])
(u'guns and roses', u'paradise city')
http://www.songlyrics.com/guns-n--roses/paradise-city-lyrics/
Song was found in the database!
Running again in 30 seconds
it's a bot for reddit that replies with the youtube video of a song that was mentioned in the comments, if anyone wants to know.
With a cursory reading of your code you have
if search_for_song(pbody):
# do stuff..
post.reply("[**"+temp+"**]("+search_for_song(pbody)+") \n ---- \n ^^This ^^is ^^an ^^automated ^^message ^^by ^^a ^^bot, ^^if ^^you ^^found ^^any ^^bug ^^and/or ^^willing ^^to ^^contact ^^me. [**^^Press ^^here**](https://www.reddit.com/message/compose?to=itailitai)")
You call the function in the start of the if and in your post.reply line
RESPONDING TO COMMENTS
If you need to check the results but don't want to call twice simply save the output
res = search_for_song(pbody):
if res:
#...
post.reply(... + res + ...)
I've just quickly searched for the function call search_for_song, I suppose the following piece of code is resulting in 2 function calls.
if search_for_song(pbody):
# pbody=pbody[8:]
# pbody=pbody.replace("\n", "")
temp=pbody[8:].lstrip()
post.reply("[**"+temp+"**]("+search_for_song(pbody)+")
Once at the if statement, and once inside the post.reply statement.

Categories

Resources