Pulling historical channel messages python - python

I am attempting to create a small dataset by pulling messages/responses from a slack channel I am a part of. I would like to use python to pull the data from the channel however I am having trouble figuring out my api key. I have created an app on slack but I am not sure how to find my api key. I see my client secret, signing secret, and verification token but can't find my api key
Here is a basic example of what I believe I am trying to accomplish:
import slack
sc = slack.SlackClient("api key")
sc.api_call(
"channels.history",
channel="C0XXXXXX"
)
I am willing to just download the data manually if that is possible as well. Any help is greatly appreciated.

messages
See below for is an example code on how to pull messages from a channel in Python.
It uses the official Python Slack library and calls
conversations_history with paging. It will therefore work with
any type of channel and can fetch large amounts of messages if
needed.
The result will be written to a file as JSON array.
You can specify channel and max message to be retrieved
threads
Note that the conversations.history endpoint will not return thread messages. Those have to be retrieved additionaly with one call to conversations.replies for every thread you want to retrieve messages for.
Threads can be identified in the messages for each channel by checking for the threads_ts property in the message. If it exists there is a thread attached to it. See this page for more details on how threads work.
IDs
This script will not replace IDs with names though. If you need that here are some pointers how to implement it:
You need to replace IDs for users, channels, bots, usergroups (if on a paid plan)
You can fetch the lists for users, channels and usergroups from the API with users_list, conversations_list and usergroups_list respectively, bots need to be fetched one by one with bots_info (if needed)
IDs occur in many places in messages:
user top level property
bot_id top level property
as link in any property that allows text, e.g. <#U12345678> for users or <#C1234567> for channels. Those can occur in the top level text property, but also in attachments and blocks.
Example code
import os
import slack
import json
from time import sleep
CHANNEL = "C12345678"
MESSAGES_PER_PAGE = 200
MAX_MESSAGES = 1000
# init web client
client = slack.WebClient(token=os.environ['SLACK_TOKEN'])
# get first page
page = 1
print("Retrieving page {}".format(page))
response = client.conversations_history(
channel=CHANNEL,
limit=MESSAGES_PER_PAGE,
)
assert response["ok"]
messages_all = response['messages']
# get additional pages if below max message and if they are any
while len(messages_all) + MESSAGES_PER_PAGE <= MAX_MESSAGES and response['has_more']:
page += 1
print("Retrieving page {}".format(page))
sleep(1) # need to wait 1 sec before next call due to rate limits
response = client.conversations_history(
channel=CHANNEL,
limit=MESSAGES_PER_PAGE,
cursor=response['response_metadata']['next_cursor']
)
assert response["ok"]
messages = response['messages']
messages_all = messages_all + messages
print(
"Fetched a total of {} messages from channel {}".format(
len(messages_all),
CHANNEL
))
# write the result to a file
with open('messages.json', 'w', encoding='utf-8') as f:
json.dump(
messages_all,
f,
sort_keys=True,
indent=4,
ensure_ascii=False
)

This is using the slack webapi. You would need to install requests package. This should grab all the messages in channel. You need a token which can be grabbed from apps management page. And you can use the getChannels() function. Once you grab all the messages you will need to see who wrote what message you need to do id matching(map ids to usernames) you can use getUsers() functions. Follow this https://api.slack.com/custom-integrations/legacy-tokens to generate a legacy-token if you do not want to use a token from your app.
def getMessages(token, channelId):
print("Getting Messages")
# this function get all the messages from the slack team-search channel
# it will only get all the messages from the team-search channel
slack_url = "https://slack.com/api/conversations.history?token=" + token + "&channel=" + channelId
messages = requests.get(slack_url).json()
return messages
def getChannels(token):
'''
function returns an object containing a object containing all the
channels in a given workspace
'''
channelsURL = "https://slack.com/api/conversations.list?token=%s" % token
channelList = requests.get(channelsURL).json()["channels"] # an array of channels
channels = {}
# putting the channels and their ids into a dictonary
for channel in channelList:
channels[channel["name"]] = channel["id"]
return {"channels": channels}
def getUsers(token):
# this function get a list of users in workplace including bots
users = []
channelsURL = "https://slack.com/api/users.list?token=%s&pretty=1" % token
members = requests.get(channelsURL).json()["members"]
return members

Related

How to get the next telegram messages from specific users

I'm implementing a telegram bot that will serve users. Initially, it used to get any new message sequentially, even in the middle of an ongoing session with another user. Because of that, anytime 2 or more users tried to use the bot, it used to get all jumbled up. To solve this I implemented a queue system that put users on hold until the ongoing conversation was finished. But this queue system is turning out to be a big hassle. I think my problems would be solved with just a method to get the new messages from a specific chat_id or user. This is the code that I'm using to get any new messages:
def get_next_message_result(self, update_id: int, chat_id: str):
"""
get the next message the of a given chat.
In case of the next message being from another user, put it on the queue, and wait again for
expected one.
"""
update_id += 1
link_requisicao = f'{self.url_base}getUpdates?timeout={message_timeout}&offset={update_id}'
result = json.loads(requests.get(link_requisicao).content)["result"]
if len(result) == 0:
return result, update_id # timeout
if "text" not in result[0]["message"]:
self.responder(speeches.no_text_speech, message_chat_id)
return [], update_id # message without text
message_chat_id = result[0]["message"]["chat"]["id"]
while message_chat_id != chat_id:
self.responder(speeches.wait_speech, message_chat_id)
if message_chat_id not in self.current_user_queue:
self.current_user_queue.append(message_chat_id)
print("Queuing user with the following chat_id:", message_chat_id)
update_id += 1
link_requisicao = f'{self.url_base}getUpdates?timeout={message_timeout}&offset={update_id}'
result = json.loads(requests.get(link_requisicao).content)["result"]
if len(result) == 0:
return result, update_id # timeout
if "text" not in result[0]["message"]:
self.responder(speeches.no_text_speech, message_chat_id)
return [], update_id # message without text
message_chat_id = result[0]["message"]["chat"]["id"]
return result, update_id
On another note: I use the queue so that the moment the current conversation ends, it calls the next user in line. Should I just drop the queue feature and tell the concurrent users to wait a few minutes? While ignoring any messages not from the current chat_id?

lyricsgenius lyrics sometimes end with "EmbedShare URLCopyEmbedCopy"

I am making a Discord lyrics bot and to receive the lyrics. I am using genius API (lyricsgenius API wrapper). But when I receive the lyrics, it ends with this:
"away" is the last word in the song but it is accompanied with EmbedShare URLCopyEmbedCopy. Sometimes it is just the plain lyrics without the EmbedShare text.
With the same song:
Is there anyway to prevent that?
Source code for the lyrics command:
#commands.command(help="Gives the lyrics of the song XD! format //lyrics (author) (song name)")
async def lyrics(self, ctx, arg1, arg2):
song = genius.search_song(arg1, arg2)
print(song.lyrics)
name = ("Lyrics for " + arg2.capitalize() + " by " + arg1.capitalize())
gembed = discord.Embed(title=name.capitalize(), description=song.lyrics)
await ctx.send(embed=gembed)
This is a known bug with lyricsgenius and there's an open PR to address this issue: https://github.com/johnwmillr/LyricsGenius/pull/215.
This is because lyricsgenius web scrapes the lyrics from Genius' website, which means if their website updates, lyricsgenius would fail to fetch the lyrics. This library hasn't been updated in 6 months; itself being a web scraping library means that kind of inactivity would render the library severely unstable. Since the library is licensed under MIT, you can fork the library and maintain an up-to-date version for your project/bot. However, it would be much better to use a dedicated API to fetch songs lyrics to guarantee stability.
Also, lyricsgenius uses the synchronous requests library, which means it'll "block" your asynchronous bot while it fetches the lyrics. This is definitely undesirable for a Discord Bot since your bot would be completely unresponsive while it fetches the lyrics. Consider rewriting it using aiohttp or use run_in_executor when calling blocking functions.
Some Random API is something easy to deal with when you are creating a command that will send you the song lyrics.
This is how to do it with some random api,
# these imports are used for this particular lyrics command. the essential import here is aiohttp, which will be used to fetch the lyrics from the API
import textwrap
import urllib
import aiohttp
import datetime
#bot.command(aliases = ['l', 'lyrc', 'lyric']) # adding aliases to the command so they they can be triggered with other names
async def lyrics(ctx, *, search = None):
"""A command to find lyrics easily!"""
if not search: # if user hasnt given an argument, throw a error and come out of the command
embed = discord.Embed(
title = "No search argument!",
description = "You havent entered anything, so i couldnt find lyrics!"
)
return await ctx.reply(embed = embed)
# ctx.reply is available only on discord.py version 1.6.0, if you have a version lower than that use ctx.send
song = urllib.parse.quote(search) # url-encode the song provided so it can be passed on to the API
async with aiohttp.ClientSession() as lyricsSession:
async with lyricsSession.get(f'https://some-random-api.ml/lyrics?title={song}') as jsondata: # define jsondata and fetch from API
if not 300 > jsondata.status >= 200: # if an unexpected HTTP status code is recieved from the website, throw an error and come out of the command
return await ctx.send(f'Recieved poor status code of {jsondata.status}')
lyricsData = await jsondata.json() # load the json data into its json form
error = lyricsData.get('error')
if error: # checking if there is an error recieved by the API, and if there is then throwing an error message and returning out of the command
return await ctx.send(f'Recieved unexpected error: {error}')
songLyrics = lyricsData['lyrics'] # the lyrics
songArtist = lyricsData['author'] # the author's name
songTitle = lyricsData['title'] # the song's title
songThumbnail = lyricsData['thumbnail']['genius'] # the song's picture/thumbnail
# sometimes the song's lyrics can be above 4096 characters, and if it is then we will not be able to send it in one single message on Discord due to the character limit
# this is why we split the song into chunks of 4096 characters and send each part individually
for chunk in textwrap.wrap(songLyrics, 4096, replace_whitespace = False):
embed = discord.Embed(
title = songTitle,
description = chunk,
color = discord.Color.blurple(),
timestamp = datetime.datetime.utcnow()
)
embed.set_thumbnail(url = songThumbnail)
await ctx.send(embed = embed)

How to access data in documents from realtime listener python

Apologies if some of this doesn't make sense, I'm struggling to understand realtime listeners completely.
I'm trying to add a realtime listener to the chat part of my app, so I can add new messages to the screen as they come into the database. In the below code I load all current messages to the screen when the user opens the page and then I (try to) add the realtime listener in so any new messages can be added to the screen.
However, the doc_snapshot is just a list of the document ids, rather than the message_dict I have been using above, how do I access the data for each document in doc_snapshot, rather than just the id?
Or am I doing it completely wrong, should I not load do a one-time load of the messages when the screen is opened and just use a realtime listener to load the messages and listen for new messages?
self.local_id is the id of the user who has logged in, doc_id is the id of the person they're messaging.
def move_to_chat(self, doc_id):
group_id = self.local_id + ":"+ doc_id
doc_ref = self.my_firestore.db.collection(u'messages').document(group_id)
doc = doc_ref.get()
if doc.exists: # Check if the document exists. If it does, load the messages to the screen
get_messages = self.my_firestore.db.collection(u'messages').document(group_id).collection(group_id).order_by(u'Timestamp').limit(20)
messages = get_messages.stream()
for message in messages:
message_dict = message.to_dict()
try:
if message_dict['IdFrom'] == self.local_id:
#Add label to left of screen
else:
#Add label to right of screen
except:
pass
else: # If it doesn't, create it
self.my_firestore.db.collection(u'messages').document(group_id).set({
u'GroupId': group_id
})
add_to_doc = self.my_firestore.db.collection(u'messages').document(group_id).collection(group_id).document()
add_to_doc.set({
u'Timestamp': datetime.datetime.now()
})
# Watch for new messages
self.query_watch = self.my_firestore.db.collection(u'messages').document(group_id).collection(group_id)
# Watch the document
self.query_watch.on_snapshot(self.on_snapshot)
def on_snapshot(self, doc_snapshot, changes, read_time):
for doc in doc_snapshot:
#Here's where I'd like to access data from the documents, to find the message that has been added.
The Google Firestore documentation for Snapshot method explained the Classes for representing documents for the Google Cloud Firestore API.You can refer to this document to conform the returning values.

Send push notification to many users via python

For my project using Firebase messaging to send push notification. I have users's firebase tokens stored on the database. Using them I sent push to each user. Total time of sending is about 100 seconds for 100 users. Is there way to send push asynchronously(I mean at one time to send many push notifications)
# Code works synchronously
for user in users:
message = messaging.Message(
notification=messaging.Notification(
title="Push title",
body="Push body"
),
token = user['fcmToken']
)
response = messaging.send(message)
Sure, you could use one of the python concurrency libraries. Here's one option:
from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
def send_message(user):
message = messaging.Message(
notification=messaging.Notification(
title="Push title",
body="Push body"),
token = user['fcmToken'])
return messaging.send(message)
with ThreadPoolExecutor(max_workers=10) as executor: # may want to try more workers
future_list = []
for u in users:
future_list.append(executor.submit(send_message, u))
wait(future_list, return_when=ALL_COMPLETED)
# note: we must use the returned self to get the test count
print([future.result() for future in future_list])
If you want to send the same message to all tokens, you can use a single API call with a multicast message. The Github repo has this sample of sending a multicast message in Python:
def send_multicast():
# [START send_multicast]
# Create a list containing up to 500 registration tokens.
# These registration tokens come from the client FCM SDKs.
registration_tokens = [
'YOUR_REGISTRATION_TOKEN_1',
# ...
'YOUR_REGISTRATION_TOKEN_N',
]
message = messaging.MulticastMessage(
data={'score': '850', 'time': '2:45'},
tokens=registration_tokens,
)
response = messaging.send_multicast(message)
# See the BatchResponse reference documentation
# for the contents of response.
print('{0} messages were sent successfully'.format(response.success_count))
# [END send_multicast]

Reading page's messages with Python Facebook SDK

Basically i need to get all messages of a page using facebook SDK in python.
Following some tutorial i arrived to this point:
import facebook
def main():
cfg = {
"page_id" : "MY PAGE ID",
"access_token" : "LONG LIVE ACCESS TOKEN"
}
api = get_api(cfg)
msg = "Hre"
status = api.put_wall_post(msg) #used to post to wall message Hre
x = api.get_object('/'+str(MY PAGE ID)+"/conversations/") #Give actual conversations
def get_api(cfg):
graph = facebook.GraphAPI(cfg['access_token'])
resp = graph.get_object('me/accounts')
page_access_token = None
for page in resp['data']:
if page['id'] == cfg['page_id']:
page_access_token = page['access_token']
graph = facebook.GraphAPI(page_access_token)
return graph
if __name__ == "__main__":
main()
The first problem is that api.get_object('/'+str(MY PAGE ID)+"/conversations/")returns a dictionary containing many informations, but what i would like to see is the messages they sent to me, while for now it print the user id that sent to me a message.
The output look like the following:
{u'paging': {u'next': u'https://graph.facebook.com/v2.4/571499452991432/conversations?access_token=Token&limit=25&until=1441825848&__paging_token=enc_AdCqaKAP3e1NU9MGSsvSdzDPIIDtB2ZCe2hCYfk7ft5ZAjRhsuVEL7eFYOOCdQ8okvuhZA5iQWaYZBBbrZCRNW8uzWmgnKGl69KKt4catxZAvQYCus7gZDZD', u'previous': u'https://graph.facebook.com/v2.4/571499452991432/conversations?access_token=token&limit=25&since=1441825848&__paging_token=enc_AdCqaKAP3e1NU9MGSsvSdzDPIIDtB2ZCe2hCYfk7ft5ZAjRhsuVEL7eFYOOCdQ8okvuhZA5iQWaYZBBbrZCRNW8uzWmgnKGl69KKt4catxZAvQYCus7gZDZD&__previous=1'}, u'data': [{u'link': u'/communityticino/manager/messages/?mercurythreadid=user%3A1055476438&threadid=mid.1441825847634%3Af2e0247f54f5c4d222&folder=inbox', u'id': u't_mid.1441825847634:f2e0247f54f5c4d222', u'updated_time': u'2015-09-09T19:10:48+0000'}]}
which is basically paging and data.
Given this is there a way to read the conversation?
In order to get the messages content you need first to request the single messages in the conversation, accessible with the 'id' field in the dictionary you copied, result of
x = api.get_object('/'+str(MY PAGE ID)+"/conversations/") #Give actual conversations
you can request the messages in the conversation by calling
msg = api.get_object('/'+<message id>)
Here it gets tricky, because following the graph api documentation you should receive back a dictionary with ALL the possible fields, including the 'message' (content) field. The function however returns only the fields 'created_time' and 'id'.
Thanks to this other question Request fields in Python Facebook SDK I found that you can request for those fields by adding a dict with such fields specified in the arguments of the graph.get_object() function. As far as I know this is undocumented in the facebook sdk reference for python.
The correct code is
args = {'fields' : 'message'}
msg = api.get_object('/'+<message id>, **args)
Similar question: Read facebook messages using python sdk

Categories

Resources