I have a text file with userID and tweet text separated by a "-->". I want to load these into a dictionary and then iterate over the values, computing the sentiment for each tweet using AlchemyAPI.
My input data is similar to this (real file has millions of records):
v2cigs --> New #ecig #regulations in #Texas mean additional shipping charges for residents. https:\/\/t.co\/aN3O5UfGUM #vape #ecigs #vapeon #vaporizer
JessyQuil --> FK SHIPPING I DON'T WANT TO WAIT TO BUY MY VAPE STUFF
thebeeofficial --> #Lancashire welcomes latest #ECIG law READ MORE: https:\/\/t.co\/qv6foghaOL https:\/\/t.co\/vYiTAQ6VED
2br --> #Lancashire welcomes latest #ECIG law READ MORE: https:\/\/t.co\/ghRWTxQy8r https:\/\/t.co\/dKh9TLkNRe
My code is:
import re
from alchemyapi import AlchemyAPI
alchemyapi = AlchemyAPI()
outputFile = ("intermediate.txt", "w")
tid = 1; #counter for keys in dictionary
tdict = {} #dictionary to store tweet data
with open("testData.txt", "r") as inputfile :
for lines in inputfile:
tweets = lines.split("-->")[1].lstrip()
tweets = re.sub("[^A-Za-z0-9#\s'.#]+", '', tweets)
tdict[tid] = tweets.strip("\n")
tid+=1
for k in tdict:
response = alchemyapi.sentiment("text", str(tdict[k]))
sentiment = response["docSentiment"]["type"]
print sentiment
I am getting the error:
sentiment = response["docSentiment"]["type"]
KeyError: 'docSentiment'
I don't understand what I am doing wrong. Can anybody please help?
You need to check if the response was successful before trying to access the key.
for k in tdict:
response = alchemyapi.sentiment("text", str(tdict[k]))
status = response.get('status')
if status == 'ERROR':
print(response['statusInfo'])
else:
print(response['docSentiment']['type'])
Related
y'all. I'm trying to figure out how to sort for a specific country's tweets using search_recent_tweets. I take a country name as input, use pycountry to get the 2-character country code, and then I can either put some sort of location filter in my query or in search_recent_tweets params. Nothing I have tried so far in either has worked.
######
import tweepy
from tweepy import OAuthHandler
from tweepy import API
import pycountry as pyc
# upload token
BEARER_TOKEN='XXXXXXXXX'
# get tweets
client = tweepy.Client(bearer_token=BEARER_TOKEN)
# TAKE USER INPUT
countryQuery = input("Find recent tweets about travel in a certain country (input country name): ")
keyword = 'women safe' # gets tweets containing women and safe for that country (safe will catch safety)
# get country code to plug in as param in search_recent_tweets
country_code = str(pyc.countries.search_fuzzy(countryQuery)[0].alpha_2)
# get 100 recent tweets containing keywords and from location = countryQuery
query = str(keyword+' place_country='+str(countryQuery)+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
# expansions=geo.place_id, place.fields=[country_code],
# filter posts to remove retweets
# export tweets to json
import json
with open('twitter.json', 'w') as fp:
for tweet in posts.data:
json.dump(tweet.data, fp)
fp.write('\n')
print("* " + str(tweet.text))
I have tried variations of:
query = str(keyword+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, place_fields=[str(countryQuery), country_code], max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
and:
query = str(keyword+' place.fields='+str(countryQuery)+','+country_code+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
These either ended up pulling me NoneType tweets aka nothing or causing a
"The place.fields query parameter value [Germany] is not one of [contained_within,country,country_code,full_name,geo,id,name,place_type]"
The documentation for search_recent_tweets seems like place.fields / place_fields / place_country should be supported.
Any advice would help!!!
I'm trying extract a dataset using tweepy. I have a set of tweet Ids that I use to extract full text tweets. I have looped the IDs and tweepy functions to get the tweet texts, but my program keeps crashing because a few of the tweet IDs on my list are from suspended accounts.
This is the related code snippet I'm using:
# Creating DataFrame using pandas
db = pd.DataFrame(columns=['username', 'description', 'location', 'following',
'followers', 'totaltweets', 'retweetcount', 'text', 'hashtags'])
#reading tweet IDs from file
df = pd.read_excel('dataid.xlsx')
mylist = df['tweet_id'].tolist()
#tweet counter
n=1
#looping for extract tweets
for i in mylist:
tweets=api.get_status(i, tweet_mode="extended")
username = tweets.user.screen_name
description = tweets.user.description
location = tweets.user.location
following = tweets.user.friends_count
followers = tweets.user.followers_count
totaltweets = tweets.user.statuses_count
retweetcount = tweets.retweet_count
text=tweets.full_text
hashtext = list()
ith_tweet = [username, description, location, following,followers, totaltweets, retweetcount, text, hashtext]
db.loc[len(db)] = ith_tweet
n=n+1
filename = 'scraped_tweets.csv'
I'm gathering all cities, towns and villages of some countries from OSM using an Overpass query in a Python program.
Everything seems to be correct but I found a town in Luxembug that is missing im my result set. It concerns the town Kiischpelt.
'''
import requests
import json
Country = 'LU'
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-1"=""" + Country + """][admin_level=2]->.search;
(node["place"="city"](area.search);
node["place"="town"](area.search);
node["place"="village"](area.search);
way["place"="city"](area.search);
way["place"="town"](area.search);
way["place"="village"](area.search);
rel["place"="city"](area.search);
rel["place"="town"](area.search);
rel["place"="village"](area.search);
);
out center;
"""
response = requests.get(overpass_url,
params={'data': overpass_query})
data = response.json()
filename = """C:/Data/GetGeoData/data/""" + Country + 'cities' +'.json'
f = open(filename,'w', encoding="utf-8")
json.dump(data, f)
f.close()
'''
When searching on the OSM site for Kiischpelt, I get a result of type relation but it doesnet appear in my result set.
Also when I change the query as follows
'''rel"place";''' which should return all places of all kinds (city, town, village, isolated dwelling,...)
Any idea what I'm doing wrong?
Many thanks!
I am currently trying to access the place names and coordinates of tweets from a json file created by twitter's API. While not all of my tweets include these attributes, some do and id like to collect them. my current approach is:
for line in tweets_json:
try:
tweet = json.loads(line.strip()) # only messages contains 'text' field is a tweet
tweet_id = (tweet['id']) # This is the tweet's id
created_at = (tweet['created_at']) # when the tweet posted
text = (tweet['text']) # content of the tweet
user_id = (tweet['user']['id']) # id of the user who posted the tweet
hashtags = []
for hashtag in tweet['entities']['hashtags']:
hashtags.append(hashtag['text'])
lat = []
long = []
for coordinates in tweet['coordinates']['coordinates']:
lat.append(coordinates[0])
long.append(coordinates[1])
country_code = []
place_name = []
for place in tweet['place']:
country_code.append(place['country_code'])
place_name.append(place['full_name'])
except:
# read in a line is not in JSON format (sometimes error occured)
continue
As of right now, no attribute past Hashtags are being collected, Am I trying to access the attributes wrong? more information regarding the JSON object can be found here https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object
By wrapping all your code in a Try/Except block, you're passing over every error that occurs, including KeyErrors when trying to access a 'coordinates' that doesn't exist
If some of the parsed tweet dictionaries contain a key, and you want to collect them, you can do something like this:
from json import JSONDecodeError
for line in tweets_json:
# try to parse json
try:
tweet = json.loads(line.strip()) # only messages contains 'text' field is a tweet
except JSONDecodeError:
print('bad json')
continue
tweet_id = (tweet['id']) # This is the tweet's id
created_at = (tweet['created_at']) # when the tweet posted
text = (tweet['text']) # content of the tweet
user_id = (tweet['user']['id']) # id of the user who posted the tweet
hashtags = []
for hashtag in tweet['entities']['hashtags']:
hashtags.append(hashtag['text'])
lat = []
long = []
# this is how you check for the presence of coordinates
if 'coordinates' in tweet and 'coordinates' in tweet['coordinates']:
for coordinates in tweet['coordinates']['coordinates']:
lat.append(coordinates[0])
long.append(coordinates[1])
country_code = []
place_name = []
for place in tweet['place']:
country_code.append(place['country_code'])
place_name.append(place['full_name'])
I created a scraper with python that gets all the followers of a particular twitter user. The issue is that when I use this list of user Ids to get their tweets with logstash, I have an Error.
I used http://gettwitterid.com/ to manually check if these Ids are working, and they are but the list is really long to check it one by one.
Is there a solution with python to split the Ids into two lists, one containing Valid Ids and the other contains the Not valid ones, thet I use the Valid list as input for logstash?
The first 10 rows of the csv file is like this :
"id"
"602169027"
"95104995"
"874339739557670912"
"2981270769"
"93054327"
"870723159011545088"
"3008493180"
"874804469082533888"
"756339889092829184"
"1077712806"
I tried this code to get tweets using Ids imported from csv, but unfortunetly it's raising 144 (Not found)
import tweepy
import pandas as pd
consumer_key = ""
consumer_secret = ""
access_token_key = "-"
access_token_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
dfuids = pandas.read_csv('Uids.csv')
for index, row in dfuids.iterrows():
print row['id']
tweet = api.get_status(dfuids['id'])
importing ids from csv
Try to change your code to this:
for index, row in dfuids.iterrows():
print row['id']
tweet = api.get_status(row['id'])
To escape potential errors, you can add a try / except loop later.
I got the solution after some experiments:
dfuids = pd.read_csv('Uids.csv')
valid = []
notvalid = []
for index, row in dfuids.iterrows():
print index
x = str(row.id)
#print x , type(x)
try:
tweet = api.user_timeline(row.id)
#print "Fine :",row.id
valid.append(x)
#print x, "added to valid"
except:
#print "NotOk :",row.id
notvalid.append(x)
#print x, "added to valid"
This Part of the code was what I needed, so it loops for all the Ids, and test if that user id give us some tweets from the timeline, if correct then it's appended as string to a list called (valid) else if we have an exception for any reason then it's appended to (notvalid).
We can save this list into a dataframe and export csv :
df = pd.DataFrame(valid)
dfnotv = pd.DataFrame(notvalid)
df.to_csv('valid.csv', index=False, encoding='utf-8')
dfnotv.to_csv('notvalid.csv', index=False, encoding='utf-8')