So I'm kind of new to python and I want to make a twitter bot.
I did this:
print(api.get_user(screen_name="My account's handle"))
(while having "tweepy" imported and given my script the correct authentication keys / tokens etc)
That line printed a lot of text, what i want to do is get the number afte "in_reply_to_status_id="
which is 1048042979359936513
The text that was printed is pasted inside here:
https://pastebin.com/ZVWzYEJw
(had to use pastebit because it was too long and has links)
I hope this makes sense...
I'm not entirely familiar with the tweepy's response object but if it's as you described above i.e.the User object, then you can probably try this:
import json
response = "{'response':" + User._json + "}"
data = json.loads(response)
data['in_reply_to_status_id']
>>>1048042979359936513
Edit: If in_reply_to_status_id is an attribute of User then you should be able to call it by just User.in_reply_to_status_id
Tweepy's API.get_user() method returns User object. The long text you see in the response is the string representation of User object.As #kerwei says, you can check which properties exist in this object by checking keys in user._json (this is a dictionary object).
But in_reply_to_status_id is in the Status object (representing a tweet) not in User object. So at first, you should get a Status object by using API.get_status() etc.. After that, you should be able to get in_reply_to_status_id in this object.
You can get in_reply_to_status_id from Status object like this:
>>> status = api.get_status(1234567890)
>>> reply_id = status.in_reply_to_status_id
>>> print(reply_id)
Related
I'm really stuck on this one.
I'm using Tweepy to get the IDs of all users that liked a specific tweet. I seem to get a list of "User" structures that contain "id", "name" and "username", but I'm not able to get only the "id".
The code is simple:
client = tweepy.Client(
bearer_token=bearer_token,
consumer_key=api_key, consumer_secret=api_secret,
access_token=user_token, access_token_secret=user_token_secret,
wait_on_rate_limit=True
)
for response in tweepy.Paginator(client.get_liking_users, id=tweetid, max_results=100, limit=10):
for item in response:
print("ITEM:\n", item)
if item is not None:
for user in item:
if user is not None:
print(user)
The print of "item" gets me this (simplified, of course; the number of structures is high, that's why I have to use Paginator):
[<User id=0000001 name=user1 username=UserName1>, <User id=0002 name=user2 username=UserName2>, <User id=000003 name=user3 username=UserName3>]
and the print of "user" just gets me the individual usernames: "UserName1", etc.
But no way to get user.id, user.User.id, nor anything similar. And I'm frustrated, because the information is right there, just I can't access it easily.
Thank you!
Tweepy documentation provides an example of something very similar to what you want to do: https://docs.tweepy.org/en/stable/examples.html -> API v2 -> Get Tweet’s Liking Users
import tweepy
bearer_token = ""
client = tweepy.Client(bearer_token)
# Get Tweet's Liking Users
# This endpoint/method allows you to get information about a Tweet’s liking
# users
tweet_id = 1460323737035677698
# By default, only the ID, name, and username fields of each user will be
# returned
# Additional fields can be retrieved using the user_fields parameter
response = client.get_liking_users(tweet_id, user_fields=["profile_image_url"])
for user in response.data:
print(user.username, user.profile_image_url)
This example prints the user's username and profile image URL, but note the comment says the id is also returned, so something like user.id should work. Otherwise, you can also add id to user_fields to make sure it's returned, although that shouldn't be necessary.
Unfortunately, I am not able to test it myself because I don't have a Twitter developer account with the required elevated access.
Edit: I got access to an API account with elevated access and I was able to test your code, see the update below
Iterating paginated results
The reason why you need a double for loop to iterate the paginated results and it eventually crashes after showing some results with an error saying you are trying to access a non-existent id attribute on an str object is because you are not iterating the Paginator results correctly.
For the sake of simplicity, I'm going to label your three nested for loops:
loop 0: for response in tweepy.Paginator(...
loop 1: for item in response
loop 2: for user in item
Paginator returns a Response object with all the results in the data attribute. The object has other attributes like meta, count, etc.
When you do loop 1, you are iterating all these data, count, etc., attributes of Response.
If the attribute you are iterating happens to be the data attribute, it will start loop 2 and it will iterate the results getting the output you expect.
But loop 1 will also iterate other Reponse items outside of the data attribute.
Let's see, for example, what happens when loop 1 enters the meta attribute.
meta is a dictionary that looks like this:
meta={'result_count': 80, 'next_token': '676f9b7bumw8i3jbm4nnifamw2ejjaktp8kjym6akdak9'}
When you enter loop 2 with the meta attribute, it will start iterating the keys (not the values, because that's how dicts work in Python) so the value of user in loop 2 will be either result_count or next_token. And it's then when you are getting your error saying you are trying to access id on a str.
What you should be doing is iterating the response.data in loop 1 instead and that will also allow removing the need of a second loop:
for response in tweepy.Paginator(client.get_liking_users, id=tweetid, max_results=100, limit=10):
for user in response.data:
print(user.id)
Edit: grammar and style
Basically I want to get the conversation_id if the Tweet is a reply to another Tweet. So I can get the list of replies to each other to analyze.
My code:
class Listener(StreamingClient):
def on_response(self, response):
print(response)
listener = Listener(auth['bearer_token'])
listener.sample(expansions=['in_reply_to_user_id'], tweet_fields=['conversation_id'])
When using this, I only get the user_id to which it is replying, but I cannot get any type of conversation_id.
I have a slight feeling I am missing something essential.
From the relevant FAQ section about this in Tweepy's documentation:
If you are simply printing the objects and looking at that output, the string representations of API v2 models/objects only include the default fields that are guaranteed to exist.
The objects themselves still include the relevant data, which you can access as attributes or by subscription.
Hello I am trying to scrape the tweets of a certain user using tweepy.
Here is my code :
tweets = []
username = 'example'
count = 140 #nb of tweets
try:
# Pulling individual tweets from query
for tweet in api.user_timeline(id=username, count=count, include_rts = False):
# Adding to list that contains all tweets
tweets.append((tweet.text))
except BaseException as e:
print('failed on_status,',str(e))
time.sleep(3)
The problem I am having is the tweets are coming back unfinished with "..." at the end.
I think I've looked at all the other similar problems on stack overflow and elsewhere but nothing works. Most do not concern me because I am NOT dealing with retweets .
I have tried putting tweet_mode = 'extended' and/or tweet.full_text or tweet._json['extended_tweet']['full_text'] in different combinations .
I don't get an error message but nothing works, just an empty list in return.
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
Has anyone managed to get the full text of each tweet?? I'm really stuck on this seemingly simple problem and am losing my hair so I would appreciate any advice :D
Thanks in advance!!!
TL;DR: You're most likely running into a Rate Limiting issue. And use the full_text attribute.
Long version:
First,
The problem I am having is the tweets are coming back unfinished with "..." at the end.
From the Tweepy documentation on Extended Tweets, this is expected:
Compatibility mode
... It will also be discernible that the text attribute of the Status object is truncated as it will be suffixed with an ellipsis character, a space, and a shortened self-permalink URL to the Tweet.
Wrt
And It looks like the documentation is out of date because it says nothing about the 'tweet_mode' nor the 'include_rts' parameter :
They haven't explicitly added it to the documentation of each method, however, they specify that tweet_mode is added as a param:
Standard API methods
Any tweepy.API method that returns a Status object accepts a new tweet_mode parameter. Valid values for this parameter are compat and extended , which give compatibility mode and extended mode, respectively. The default mode (if no parameter is provided) is compatibility mode.
So without tweet_mode added to the call, you do get the tweets with partial text? And with it, all you get is an empty list? If you remove it and immediately retry, verify that you still get an empty list. ie, once you get an empty list result, check if you keep getting an empty list even when you change the params back to the one which worked.
Based on bug #1329 - API.user_timeline sometimes returns an empty list - it appears to be a Rate Limiting issue:
Harmon758 commented on Feb 13
This API limitation would manifest itself as exactly the issue you're describing.
Even if it was working, it's in the full_text attribute, not the usual text. So the line
tweets.append((tweet.text))
should be
tweets.append(tweet.full_text)
(and you can skip the extra enclosing ())
Btw, if you're not interested in retweets, see this example for the correct way to handle them:
Given an existing tweepy.API object and id for a Tweet, the following can be used to print the full text of the Tweet, or if it’s a Retweet, the full text of the Retweeted Tweet:
status = api.get_status(id, tweet_mode="extended")
try:
print(status.retweeted_status.full_text)
except AttributeError: # Not a Retweet
print(status.full_text)
If status is a Retweet, status.full_text could be truncated.
As per the twitter API v2:
tweet_mode does not work at all. You need to add expansions=referenced_tweets.id. Then in the response, search for includes. You can find all the truncated tweets as full tweets in the includes. You will still see the truncated tweets in response but do not worry about it.
I have a URL which outputs JSON data when called. I have to check for a specific word in that JSON output. Example below -
r1 = session.get(authurl, headers=headers, timeout=6)
resphead = r1.headers.get('content-type')
if 'application/json' in resphead:
json_data = r1.json()
overallstatus1 = str((json_data['status']))
overallstatus2 = str((json_data['status']['code']))
Sometimes, the output will have to be called using ['status']['code'] and sometimes the code section will not be coming up in the output, i.e. just ['status']. Similarly, i'll have many other status to check which differ in the same way.
What can be done here to read the output even if the keys change.
Kindly clarify.
When you call .json() on your response object it returns a python dict. Therefore you can take advantage of the build-in function get() of the dictionaries that returns a key if it exists in the dictionary, otherwise some pre-specified value (defalt is None)
From the docs:
https://docs.python.org/2/library/stdtypes.html#dict.get
The calls at your code would then be
overallstatus1 = str(json_data.get('status'))
overallstatus2 = str(json_data.get('status', {}).get('code'))
I'm starting to learn Python and I've written the following Python code (some of it omitted) and it works fine, but I'd like to understand it better. So I do the following:
html_doc = requests.get('[url here]')
Followed by:
if html_doc.status_code == 200:
soup = BeautifulSoup(html_doc.text, 'html.parser')
line = soup.find('a', class_="some_class")
value = re.search('[regex]', str(line))
print (value.group(0))
My questions are:
What does html_doc.text really do? I understand that it makes "text" (a string?) out of html_doc, but why isn't it text already? What is it? Bytes? Maybe a stupid question but why doesn't requests.get create a really long string containing the HTML code?
The only way that I could get the result of re.search was by value.group(0) but I have literally no idea what this does. Why can't I just look at value directly? I'm passing it a string, there's only one match, why is the resulting value not a string?
requests.get() return value, as stated in docs, is Response object.
re.search() return value, as stated in docs, is MatchObject object.
Both objects are introduced, because they contain much more information than simply response bytes (e.g. HTTP status code, response headers etc.) or simple found string value (e.g. it includes positions of first and last matched characters).
For more information you'll have to study docs.
FYI, to check type of returned value you may use built-in type function:
response = requests.get('[url here]')
print type(response) # <class 'requests.models.Response'>
Seems to me you are lacking some basic knowledge about Classes, Object and methods...etc, you need to read more about it here (for Python 2.7) and about requests module here.
Concerning what you asked, when you type html_doc = requests.get('url'), you are creating an instance of class requests.models.Response, you can check it by:
>>> type(html_doc)
<class 'requests.models.Response'>
Now, html_doc has methods, thus html_doc.text will return to you the server's response
Same goes for re module, each of its methods generates response object that are not simply int or string