Excluding link at the end while pulling tweets in tweepy Streaming - python

I am pulling text or extended_text using tweepy streaming, but when I pull these tweets, there is always a t.co/randomletters link at the end that leads to nowhere. What is it and how do I get rid of it?
Here is an example:
"text": "To make room for more expression, we will now count all emojis as equal—including those with gender‍‍‍ ‍‍and skin tone modifiers https://t.co(forward slash)MkGjXf9aXm"
Please help

As far as my experience with twitter and tweepy goes, these URL's are included in a tweet's text whenever there is a URL of some sort in the actual tweet, so we can't really avoid getting them.
You could remove them after you get them, this is a simple regex that replaces the pattern of these URL's with a blank string.
import re
re.sub(r' https://t.co/\w{10}', '', tweet_text)

Related

Is it possible to set multiple strings in query for search method of tweepy? python

What I want is to search tweets that have multiple words I choose on twitter with python.
The official doc dose not say anything but it seems that the search method only takes 1 query.
source code
import tweepy
CK=
CS=
AT=
AS=
auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS)
api = tweepy.API(auth)
for status in api.search(q='word',count=100,): # I want to set multiple words in q but when I do.
print(status.user.id)
print(status.user.screen_name)
print(status.user.name)
print(status.text)
print(status.created_at)
What I have tried is below it didn't get any error but it searched only with the last word in the query in this case, the results were only tweets with the word "Python" it did not get tweets with both words.
for status in api.search(q='Java' and 'Python',count=100,)
Official doc
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets
So my questions is that is it possible to set multiple words in query.
Is the way I wrote is simply wrong?
If so, please let me know.
If it can't set multiple words, I would appreciate if you could share simple python code that works for what I want to do.
Thank you in advance.
Use:
for status in api.search(q='Java Python', count=100)
From the Search Tweets: Standard v1.1 section Standard search operators:
watching now - containing both “watching” and “now”. This is the default operator.
As explained by Vlad Siv, just put each word you wish to look for in the speech marks for the query param. This should in turn look for tweets containing these words.

Extract tweets that match a string of words exactly

I am trying to get tweets that have an exact match to a string.
Here is the code:
query = "last dance"
language="en"
results = api.search(q=query, lang=language, count=200)
But I get results involving tweets that have the words last and dance separately. But I want tweets that have the words last dance together.
Please, make sure to URL encode these queries before making the request. There are several online tools to help you to do that, or you can search at twitter.com/search and copy the encoded URL from the browser’s address bar. The table below shows some example mappings from search queries to URL encoded queries:
Search query URL encoded query
#haiku #poetry %23haiku+%23poetry
“happy hour” :) %22happy%20hour%22%20%3A%29
Note that the space character can be represented by “%20” or “+” sign.
For more info - https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators

how to get the full status text from twitter API with JSON?

tldr: how to access the FULL tweet body with JSON?
Hello
I have a problem finding the full text of a tweet in JSON.
I am making a python app with tweepy. I would like to take a status, and then access the text
EDIT
I used user_timeline() to get a tweet_list. Then got one tweet from them like this:
tweet=tweet_list[index]._json
now when I do this:
tweet['text']
it returns a shortened tweet with a link to the original
eg:
Unemployment for Black Americans is the lowest ever recorded. Trump
approval ratings with Black Americans has doubl…
(the shortened link, couldn't directly link due to stackoverflow rules)
I want to return this:
Unemployment for Black Americans is the lowest ever recorded. Trump
approval ratings with Black Americans has doubled. Thank you, and it
will get even (much) better! #FoxNews
I don't mind if the link is added as long as the full tweet is shown
Okay after looking a bit more. I believe it is impossible to do it directly with JSON
There is a solution here about getting the full tweet. you can see it here
The problem with the answer above is that full_text turn the object into a string. if you need the object in its initial state to use it later with json to get other info. do the following:
use tweet_mode="extended" in user_timeline() and save it in tweet_list. eg:
tweet_list = api.user_timeline("user", count=10, tweet_mode="extended")
take one tweet only like this: tweet=tweet_list[0]
if you want the full text tweet, do this: tweet.full_text
if you need a json version of the object do this jtweet = tweet._json or just access the key like this tweet._json['id']
Hope that helps
You didn't provide any information about, how you want to achieve your goal. Looking at tweepy API, there is optional flag argument full_text which you can pass to function. get direct message function
It defaults to false causing that returned messages are shortened to 140 chars. Just set it at True and see what happen.

Using tweepy to get unique tweets

I am trying to get a corpus of Tweets using a number of search terms. One issue I am having is that it is not being able to get unique tweets. That is, retweets.
Is there a way to remove these beforehand without doing any text processing?
What I've got now:
api=tweepy.API(auth)
for search in hashtags:
for tweet in tweepy.Cursor(api.search,q=search,count=1000,lang="en").items():
text=repr(tweet.text.encode("utf-8"))
out.write(text+"\n")
You can add " -filter:retweets" to your query to only get original tweets. Maybe not the prettiest solution, but it works.
api=tweepy.API(auth)
for search in hashtags:
for tweet in tweepy.Cursor(api.search,q=search+" -filter:retweets",count=1000,lang="en").items():
text=repr(tweet.text.encode("utf-8"))
out.write(text+"\n")

Filtering in tweepy

I am new to tweepy and have encountered a problem. I want to download tweets with special hashtags. But it seems
stream.filter(track = ['word1', 'word2', 'word3'])
looks for these words in tweet and not in hashtags of the tweet. How can I filter on hashtags?
You can actually filter tweets based on your special hashtag.
stream.filter(track=['#MySpecialHashtag', '#AlsoThisHashtag'])
This will pick up only tweets that contain the hashtags you provide as part of the tweet text and save you from arbitrarily collecting tweets and checking if the hashtag field has your hashtag in it.
You find the tags in the status object. It is there you have to make the comparison with the ones you are looking for.
example:
for hashtag in status.entities['hashtags']:
print(hashtag['text'])
example here: http://www.pythoncentral.io/introduction-to-tweepy-twitter-for-python/

Categories

Resources