Getting file input into Python script for praw script - python

So I have a simple reddit bot set up which I wrote using the praw framework. The code is as follows:
import praw
import time
import numpy
import pickle
r = praw.Reddit(user_agent = "Gets the Daily General Thread from subreddit.")
print("Logging in...")
r.login()
words_to_match = ['sdfghm']
cache = []
def run_bot():
print("Grabbing subreddit...")
subreddit = r.get_subreddit("test")
print("Grabbing thread titles...")
threads = subreddit.get_hot(limit=10)
for submission in threads:
thread_title = submission.title.lower()
isMatch = any(string in thread_title for string in words_to_match)
if submission.id not in cache and isMatch:
print("Match found! Thread ID is " + submission.id)
r.send_message('FlameDraBot', 'DGT has been posted!', 'You are awesome!')
print("Message sent!")
cache.append(submission.id)
print("Comment loop finished. Restarting...")
# Run the script
while True:
run_bot()
time.sleep(20)
I want to create a file (text file or xml, or something else) using which the user can change the fields for the various information being queried. For example I want a file with lines such as :
Words to Search for = sdfghm
Subreddit to Search in = text
Send message to = FlameDraBot
I want the info to be input from fields, so that it takes the value after Words to Search for = instead of the whole line. After the information has been input into the file and it has been saved. I want my script to pull the information from the file, store it in a variable, and use that variable in the appropriate functions, such as:
words_to_match = ['sdfghm']
subreddit = r.get_subreddit("test")
r.send_message('FlameDraBot'....
So basically like a config file for the script. How do I go about making it so that my script can take input from a .txt or another appropriate file and implement it into my code?

Yes, that's just a plain old Python config, which you can implement in an ASCII file, or else YAML or JSON.
Create a subdirectory ./config, put your settings in ./config/__init__.py
Then import config.
Using PEP-18 compliant names, the file ./config/__init__.py would look like:
search_string = ['sdfghm']
subreddit_to_search = 'text'
notify = ['FlameDraBot']
If you want more complicated config, just read the many other posts on that.

Related

How to get twitter handle from tweet using Tweepy API 2.0

I am using the Twitter API StreamingClient using the python module Tweepy. I am currently doing a short stream where I am collecting tweets and saving the entire ID and text from the tweet inside of a json object and writing it to a file.
My goal is to be able to collect the Twitter handle from each specific tweet and save it to a json file (preferably print it in the output terminal as well).
This is what the current code looks like:
KEY_FILE = './keys/bearer_token'
DURATION = 10
def on_data(json_data):
json_obj = json.loads(json_data.decode())
#print('Received tweet:', json_obj)
print(f'Tweet Screen Name: {json_obj.user.screen_name}')
with open('./collected_tweets/tweets.json', 'a') as out:
json.dump(json_obj, out)
bearer_token = open(KEY_FILE).read().strip()
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_data = on_data
streaming_client.sample(threaded=True)
time.sleep(DURATION)
streaming_client.disconnect()
And I have no idea how to do this, the only thing I found is that someone did this:
json_obj.user.screen_name
However, this did not work at all, and I am completely stuck.
So a couple of things
Firstly, I'd recommend using on_response rather than on_data because StreamClient already defines a on_data function to parse the json. (Then it will fire on_tweet, on_response, on_error, etc)
Secondly, json_obj.user.screen_name is part of API v1 I believe, which is why it doesn't work.
To get extra data using Twitter Apiv2, you'll want to use Expansions and Fields (Tweepy Documentation, Twitter Documentation)
For your case, you'll probably want to use "username" which is under the user_fields.
def on_response(response:tweepy.StreamResponse):
tweet:tweepy.Tweet = response.data
users:list = response.includes.get("users")
# response.includes is a dictionary representing all the fields (user_fields, media_fields, etc)
# response.includes["users"] is a list of `tweepy.User`
# the first user in the list is the author (at least from what I've tested)
# the rest of the users in that list are anyone who is mentioned in the tweet
author_username = users and users[0].username
print(tweet.text, author_username)
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_response = on_response
streaming_client.sample(threaded=True, user_fields = ["id", "name", "username"]) # using user fields
time.sleep(DURATION)
streaming_client.disconnect()
Hope this helped.
also tweepy documentation definitely needs more examples for api v2
KEY_FILE = './keys/bearer_token'
DURATION = 10
def on_data(json_data):
json_obj = json.loads(json_data.decode())
print('Received tweet:', json_obj)
with open('./collected_tweets/tweets.json', 'a') as out:
json.dump(json_obj, out)
bearer_token = open(KEY_FILE).read().strip()
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_data = on_data
streaming_client.on_closed = on_finish
streaming_client.sample(threaded=True, expansions="author_id", user_fields="username", tweet_fields="created_at")
time.sleep(DURATION)
streaming_client.disconnect()

pyzmail: use a loop variable to get a single raw message

First, I should say, I don't really know much about computer programming, but I find Python fairly easy to use for automating simple tasks, thanks to Al Sweigart's book, "Automate the boring stuff."
I want to collect email text bodies. I'm trying to move homework to email to save paper. I thought I could do that by getting the numbers of the unseen mails and just looping through that. If I try that, the IDLE3 shell just becomes unresponsive, ctrl c does nothing, I have to restart the shell.
Question: Why can't I just use a loop variable in server.fetch()??
for msgNum in unseenMessages:
rawMessage = server.fetch([msgNum], ['BODY[]', 'FLAGS'])
It seems you need an actual number like 57, not msgNum, in there, or it won't work.
After looking at various questions and answers here on SO, the following works for me. I suppose it collects all the email bodies in one swoop.
import pyzmail
import pprint
from imapclient import IMAPClient
server = IMAPClient('imap.qq.com', use_uid=True, ssl=True)
server.login('myEmail#foxmail.com', 'myIMAPpassword')
select_info = server.select_folder('Inbox')
unseenMessages = server.search(['UNSEEN'])
rawMessage = server.fetch(unseenMessages, ['BODY[]', 'FLAGS'])
for msgNum in unseenMessages:
message = pyzmail.PyzMessage.factory(rawMessage[msgNum][b'BODY[]'])
text = message.text_part.get_payload().decode(message.text_part.charset)
print('Text' + str(msgNum) + ' = ')
print(text)
I've found this gist with nice and clean code and a page with many helping examples
The main difference between API of imaplib and pyzmail is pyzmail is all-in-one package with parsing and all client-server communication. But these packages are splitted into different packages in standard library. Basically, they both provided almost the same functionality and with the same methods.
As an additional important note here, pyzmail looks quite abandoned.
To save a useful code from that gist, I copy it here as it is with very small modifications like extracting main function (note, it's for Python 3):
#!/usr/bin/env python
#
# Very basic example of using Python 3 and IMAP to iterate over emails in a
# gmail folder/label. This code is released into the public domain.
#
# This script is example code from this blog post:
# http://www.voidynullness.net/blog/2013/07/25/gmail-email-with-python-via-imap/
#
# This is an updated version of the original -- modified to work with Python 3.4.
#
import sys
import imaplib
import getpass
import email
import email.header
import datetime
EMAIL_ACCOUNT = "notatallawhistleblowerIswear#gmail.com"
# Use 'INBOX' to read inbox. Note that whatever folder is specified,
# after successfully running this script all emails in that folder
# will be marked as read.
EMAIL_FOLDER = "Top Secret/PRISM Documents"
def process_mailbox(M):
"""
Do something with emails messages in the folder.
For the sake of this example, print some headers.
"""
rv, data = M.search(None, "ALL")
if rv != 'OK':
print("No messages found!")
return
for num in data[0].split():
rv, data = M.fetch(num, '(RFC822)')
if rv != 'OK':
print("ERROR getting message", num)
return
msg = email.message_from_bytes(data[0][1])
hdr = email.header.make_header(email.header.decode_header(msg['Subject']))
subject = str(hdr)
print('Message %s: %s' % (num, subject))
print('Raw Date:', msg['Date'])
# Now convert to local date-time
date_tuple = email.utils.parsedate_tz(msg['Date'])
if date_tuple:
local_date = datetime.datetime.fromtimestamp(
email.utils.mktime_tz(date_tuple))
print ("Local Date:", \
local_date.strftime("%a, %d %b %Y %H:%M:%S"))
def main(host, login, folder):
with imaplib.IMAP4_SSL(host) as M:
rv, data = M.login(login, getpass.getpass())
print(rv, data)
rv, mailboxes = M.list()
if rv == 'OK':
print("Mailboxes:")
print(mailboxes)
rv, data = M.select(folder)
if rv == 'OK':
print("Processing mailbox...\n")
process_mailbox(M)
else:
print("ERROR: Unable to open mailbox ", rv)
if __name__ == '__main__':
try:
main('imap.gmail.com', EMAIL_ACCOUNT, EMAIL_FOLDER)
except imaplib.IMAP4.error as e:
print('Error while processing mailbox:', e)
sys.exit(1)

Move Lotus Notes Email to a Different Folder Python

After extracting some of the email's data, I would like to move the email to a specified folder with python. I've searched and haven't seemed to find what I need.
Has anyone done this before?
Per a comment, I've added my current logic in hopes that it will clarify my problem. I loop through my folder, extract the details. After doing that, I want to move the email to a different folder.
import win32com.client
import getpass
import re
'''
Loops through Lotus Notes folder to view messages
'''
def docGenerator(folderName):
# Get credentials
mailServer = 'server'
mailPath = 'PubDir\inbox.nsf'
# Password
pw = getpass.getpass('Enter password: ')
# Connect
session = win32com.client.Dispatch('Lotus.NotesSession')
# Initializing the session and database
session.Initialize(pw)
db = session.GetDatabase(mailServer, mailPath)
# Get folder
folder = db.GetView(folderName)
if not folder:
raise Exception('Folder "%s" not found' % folderName)
# Get the first document
doc = folder.GetFirstDocument()
# If the document exists,
while doc:
# Yield it
yield doc
# Get the next document
doc = folder.GetNextDocument(doc)
# Loop through emails
for doc in docGenerator('Folder\Here'):
# setting variables
subject = doc.GetItemValue('Subject')[0].strip()
invoice = re.findall(r'\d+',subject)[0]
body = doc.GetItemValue('Body')[0].strip()
# Move email after extracting above data
# ???
As you will move the document before getting the next one, I'd recommend to replace your loop with
doc = folder.GetFirstDocument()
while doc:
docN = folder.GetNextDocument(doc)
yield doc
doc = docN
And then to move the message to the proper folder you need
doc.PutInFolder(r"Destination\Folder")
doc.RemoveFromFolder(r"Origin\Folder")
Of course, take care of escaping your backslashes or using raw strings literals to pass correctly the view name.
doc.PutInFolder creates the folder if it doesn't exist. In that case the user needs to have permissions to create public folders, otherwise the created folder will be private. (If the folder already exists, of course, this is not a problem.)

Python load variables into an array from file

I'm making a reddit bot and I've made it store the comment ids in an array so it doesn't reply to the same comment twice, however if I close the program the array is cleared.
I'm looking for a way to keep the array, such as storing it in an external file and reading it, thanks!
Here's my code:
import praw
import time
import random
import pickle
#logging into the Reddit API
r = praw.Reddit(user_agent="Random Number machine by /u/---")
print("Logging in...")
r.login(---,---, disable_warning=True)
print("Logged in.")
wordsToMatch = ["+randomnumber","+random number","+ randomnumber","+ random number"] #Words which the bot looks for.
cache = [] #If a comment ID is stored here, the bot will not reply back to the same post.
def run_bot():
print("Start of new loop.")
subreddit = r.get_subreddit(---) #Decides which sub-reddit to search for comments.
comments = subreddit.get_comments(limit=100) #Grabbing comments...
print(cache)
for comment in comments:
comment_text = comment.body.lower() #Stores the comment in a variable and lowers it.
isMatch = any(string in comment_text for string in wordsToMatch) #If the bot matches a comment with the wordsToMatch array.
if comment.id not in cache and isMatch: #If a comment is found and the ID isn't in the cache.
print("Comment found: {}".format(comment.id)) #Prints the following line to console, develepors see this only.
#comment.reply("Hey, I'm working!")
#cache.append(comment.id)
while True:
run_bot()
time.sleep(5)
What you're looking for is called serialization. You can use json or yaml or even pickle. They all have very similar APIs:
import json
a = [1,2,3,4,5]
with open("/tmp/foo.json", "w") as fd:
json.dump(a, fd)
with open("/tmp/foo.json") as fd:
b = json.load(fd)
assert b == a
foo.json:
$ cat /tmp/foo.json
[1, 2, 3, 4, 5]
json and yaml only work with basic types like strings, numbers, lists, and dictionaries. Pickle has more flexibility allowing you to serialize more complex types like classes. json is generally used for communication between programs, while yaml tends to be used when the input is needed to be edited/read by humans as well.
For your case, you probably want json. If you want to make it pretty, there are options to the json library to indent the output.

cURL method in Python for JSON feed [duplicate]

This question already has answers here:
How to download a file over HTTP?
(30 answers)
Closed 7 years ago.
While building a flask website, I'm using an external JSON feed to feed the local mongoDB with content. This feed is parsed and fed while repurposing keys from the JSON to keys in Mongo.
One of the available keys from the feed is called "img_url" and contains, guess what, an url to an image.
Is there a way, in Python, to mimic a php style cURL? I'd like to grab that key, download the image, and store it somewhere locally while keeping other associated keys, and have that as an entry to my db.
Here is my script up to now:
import json
import sys
import urllib2
from datetime import datetime
import pymongo
import pytz
from utils import slugify
# from utils import logger
client = pymongo.MongoClient()
db = client.artlogic
def fetch_artworks():
# logger.debug("downloading artwork data from Artlogic")
AL_artworks = []
AL_artists = []
url = "http://feeds.artlogic.net/artworks/artlogiconline/json/"
while True:
f = urllib2.urlopen(url)
data = json.load(f)
AL_artworks += data['rows']
# logger.debug("retrieved page %s of %s of artwork data" % (data['feed_data']['page'], data['feed_data']['no_of_pages']))
# Stop we are at the last page
if data['feed_data']['page'] == data['feed_data']['no_of_pages']:
break
url = data['feed_data']['next_page_link']
# Now we have a list called ‘artworks’ in which all the descriptions are stored
# We are going to put them into the mongoDB database,
# Making sure that if the artwork is already encoded (an object with the same id
# already is in the database) we update the existing description instead of
# inserting a new one (‘upsert’).
# logger.debug("updating local mongodb database with %s entries" % len(artworks))
for artwork in AL_artworks:
# Mongo does not like keys that have a dot in their name,
# this property does not seem to be used anyway so let us
# delete it:
if 'artworks.description2' in artwork:
del artwork['artworks.description2']
# upsert int the database:
db.AL_artworks.update({"id": artwork['id']}, artwork, upsert=True)
# artwork['artist_id'] is not functioning properly
db.AL_artists.update({"artist": artwork['artist']},
{"artist_sort": artwork['artist_sort'],
"artist": artwork['artist'],
"slug": slugify(artwork['artist'])},
upsert=True)
# db.meta.update({"subject": "artworks"}, {"updated": datetime.now(pytz.utc), "subject": "artworks"}, upsert=True)
return AL_artworks
if __name__ == "__main__":
fetch_artworks()
First, you might like the requests library.
Otherwise, if you want to stick to the stdlib, it will be something in the lines of:
def fetchfile(url, dst):
fi = urllib2.urlopen(url)
fo = open(dst, 'wb')
while True:
chunk = fi.read(4096)
if not chunk: break
fo.write(chunk)
fetchfile(
data['feed_data']['next_page_link'],
os.path.join('/var/www/static', uuid.uuid1().get_hex()
)
With the correct exceptions catching (i can develop if you want, but i'm sure the documentation will be clear enough).
You could put the fetchfile() into a pool of async jobs to fetch many files at once.
https://docs.python.org/2/library/json.html
https://docs.python.org/2/library/urllib2.html
https://docs.python.org/2/library/tempfile.html
https://docs.python.org/2/library/multiprocessing.html

Categories

Resources