KeyError: 'docSentiment' for sentiment analysis using AlchemyAPI in Python - python

I have a text file with userID and tweet text separated by a "-->". I want to load these into a dictionary and then iterate over the values, computing the sentiment for each tweet using AlchemyAPI.
My input data is similar to this (real file has millions of records):
v2cigs --> New #ecig #regulations in #Texas mean additional shipping charges for residents. https:\/\/t.co\/aN3O5UfGUM #vape #ecigs #vapeon #vaporizer
JessyQuil --> FK SHIPPING I DON'T WANT TO WAIT TO BUY MY VAPE STUFF
thebeeofficial --> #Lancashire welcomes latest #ECIG law READ MORE: https:\/\/t.co\/qv6foghaOL https:\/\/t.co\/vYiTAQ6VED
2br --> #Lancashire welcomes latest #ECIG law READ MORE: https:\/\/t.co\/ghRWTxQy8r https:\/\/t.co\/dKh9TLkNRe
My code is:
import re
from alchemyapi import AlchemyAPI
alchemyapi = AlchemyAPI()
outputFile = ("intermediate.txt", "w")
tid = 1; #counter for keys in dictionary
tdict = {} #dictionary to store tweet data
with open("testData.txt", "r") as inputfile :
for lines in inputfile:
tweets = lines.split("-->")[1].lstrip()
tweets = re.sub("[^A-Za-z0-9#\s'.#]+", '', tweets)
tdict[tid] = tweets.strip("\n")
tid+=1
for k in tdict:
response = alchemyapi.sentiment("text", str(tdict[k]))
sentiment = response["docSentiment"]["type"]
print sentiment
I am getting the error:
sentiment = response["docSentiment"]["type"]
KeyError: 'docSentiment'
I don't understand what I am doing wrong. Can anybody please help?

You need to check if the response was successful before trying to access the key.
for k in tdict:
response = alchemyapi.sentiment("text", str(tdict[k]))
status = response.get('status')
if status == 'ERROR':
print(response['statusInfo'])
else:
print(response['docSentiment']['type'])

Related

How to search a specific country's tweets with Tweepy client.search_recent_tweets()

y'all. I'm trying to figure out how to sort for a specific country's tweets using search_recent_tweets. I take a country name as input, use pycountry to get the 2-character country code, and then I can either put some sort of location filter in my query or in search_recent_tweets params. Nothing I have tried so far in either has worked.
######
import tweepy
from tweepy import OAuthHandler
from tweepy import API
import pycountry as pyc
# upload token
BEARER_TOKEN='XXXXXXXXX'
# get tweets
client = tweepy.Client(bearer_token=BEARER_TOKEN)
# TAKE USER INPUT
countryQuery = input("Find recent tweets about travel in a certain country (input country name): ")
keyword = 'women safe' # gets tweets containing women and safe for that country (safe will catch safety)
# get country code to plug in as param in search_recent_tweets
country_code = str(pyc.countries.search_fuzzy(countryQuery)[0].alpha_2)
# get 100 recent tweets containing keywords and from location = countryQuery
query = str(keyword+' place_country='+str(countryQuery)+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
# expansions=geo.place_id, place.fields=[country_code],
# filter posts to remove retweets
# export tweets to json
import json
with open('twitter.json', 'w') as fp:
for tweet in posts.data:
json.dump(tweet.data, fp)
fp.write('\n')
print("* " + str(tweet.text))
I have tried variations of:
query = str(keyword+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, place_fields=[str(countryQuery), country_code], max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
and:
query = str(keyword+' place.fields='+str(countryQuery)+','+country_code+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
These either ended up pulling me NoneType tweets aka nothing or causing a
"The place.fields query parameter value [Germany] is not one of [contained_within,country,country_code,full_name,geo,id,name,place_type]"
The documentation for search_recent_tweets seems like place.fields / place_fields / place_country should be supported.
Any advice would help!!!

How to handle errors during tweet extraction using python?

I'm trying extract a dataset using tweepy. I have a set of tweet Ids that I use to extract full text tweets. I have looped the IDs and tweepy functions to get the tweet texts, but my program keeps crashing because a few of the tweet IDs on my list are from suspended accounts.
This is the related code snippet I'm using:
# Creating DataFrame using pandas
db = pd.DataFrame(columns=['username', 'description', 'location', 'following',
'followers', 'totaltweets', 'retweetcount', 'text', 'hashtags'])
#reading tweet IDs from file
df = pd.read_excel('dataid.xlsx')
mylist = df['tweet_id'].tolist()
#tweet counter
n=1
#looping for extract tweets
for i in mylist:
tweets=api.get_status(i, tweet_mode="extended")
username = tweets.user.screen_name
description = tweets.user.description
location = tweets.user.location
following = tweets.user.friends_count
followers = tweets.user.followers_count
totaltweets = tweets.user.statuses_count
retweetcount = tweets.retweet_count
text=tweets.full_text
hashtext = list()
ith_tweet = [username, description, location, following,followers, totaltweets, retweetcount, text, hashtext]
db.loc[len(db)] = ith_tweet
n=n+1
filename = 'scraped_tweets.csv'

OSM Overpass missing data in query result

I'm gathering all cities, towns and villages of some countries from OSM using an Overpass query in a Python program.
Everything seems to be correct but I found a town in Luxembug that is missing im my result set. It concerns the town Kiischpelt.
'''
import requests
import json
Country = 'LU'
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-1"=""" + Country + """][admin_level=2]->.search;
(node["place"="city"](area.search);
node["place"="town"](area.search);
node["place"="village"](area.search);
way["place"="city"](area.search);
way["place"="town"](area.search);
way["place"="village"](area.search);
rel["place"="city"](area.search);
rel["place"="town"](area.search);
rel["place"="village"](area.search);
);
out center;
"""
response = requests.get(overpass_url,
params={'data': overpass_query})
data = response.json()
filename = """C:/Data/GetGeoData/data/""" + Country + 'cities' +'.json'
f = open(filename,'w', encoding="utf-8")
json.dump(data, f)
f.close()
'''
When searching on the OSM site for Kiischpelt, I get a result of type relation but it doesnet appear in my result set.
Also when I change the query as follows
'''rel"place";''' which should return all places of all kinds (city, town, village, isolated dwelling,...)
Any idea what I'm doing wrong?
Many thanks!

How to access place and geo objects in tweet JSON object

I am currently trying to access the place names and coordinates of tweets from a json file created by twitter's API. While not all of my tweets include these attributes, some do and id like to collect them. my current approach is:
for line in tweets_json:
try:
tweet = json.loads(line.strip()) # only messages contains 'text' field is a tweet
tweet_id = (tweet['id']) # This is the tweet's id
created_at = (tweet['created_at']) # when the tweet posted
text = (tweet['text']) # content of the tweet
user_id = (tweet['user']['id']) # id of the user who posted the tweet
hashtags = []
for hashtag in tweet['entities']['hashtags']:
hashtags.append(hashtag['text'])
lat = []
long = []
for coordinates in tweet['coordinates']['coordinates']:
lat.append(coordinates[0])
long.append(coordinates[1])
country_code = []
place_name = []
for place in tweet['place']:
country_code.append(place['country_code'])
place_name.append(place['full_name'])
except:
# read in a line is not in JSON format (sometimes error occured)
continue
As of right now, no attribute past Hashtags are being collected, Am I trying to access the attributes wrong? more information regarding the JSON object can be found here https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object
By wrapping all your code in a Try/Except block, you're passing over every error that occurs, including KeyErrors when trying to access a 'coordinates' that doesn't exist
If some of the parsed tweet dictionaries contain a key, and you want to collect them, you can do something like this:
from json import JSONDecodeError
for line in tweets_json:
# try to parse json
try:
tweet = json.loads(line.strip()) # only messages contains 'text' field is a tweet
except JSONDecodeError:
print('bad json')
continue
tweet_id = (tweet['id']) # This is the tweet's id
created_at = (tweet['created_at']) # when the tweet posted
text = (tweet['text']) # content of the tweet
user_id = (tweet['user']['id']) # id of the user who posted the tweet
hashtags = []
for hashtag in tweet['entities']['hashtags']:
hashtags.append(hashtag['text'])
lat = []
long = []
# this is how you check for the presence of coordinates
if 'coordinates' in tweet and 'coordinates' in tweet['coordinates']:
for coordinates in tweet['coordinates']['coordinates']:
lat.append(coordinates[0])
long.append(coordinates[1])
country_code = []
place_name = []
for place in tweet['place']:
country_code.append(place['country_code'])
place_name.append(place['full_name'])

Validate if User Id on Twitter to be able to scrape tweets

I created a scraper with python that gets all the followers of a particular twitter user. The issue is that when I use this list of user Ids to get their tweets with logstash, I have an Error.
I used http://gettwitterid.com/ to manually check if these Ids are working, and they are but the list is really long to check it one by one.
Is there a solution with python to split the Ids into two lists, one containing Valid Ids and the other contains the Not valid ones, thet I use the Valid list as input for logstash?
The first 10 rows of the csv file is like this :
"id"
"602169027"
"95104995"
"874339739557670912"
"2981270769"
"93054327"
"870723159011545088"
"3008493180"
"874804469082533888"
"756339889092829184"
"1077712806"
I tried this code to get tweets using Ids imported from csv, but unfortunetly it's raising 144 (Not found)
import tweepy
import pandas as pd
consumer_key = ""
consumer_secret = ""
access_token_key = "-"
access_token_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
dfuids = pandas.read_csv('Uids.csv')
for index, row in dfuids.iterrows():
print row['id']
tweet = api.get_status(dfuids['id'])
importing ids from csv
Try to change your code to this:
for index, row in dfuids.iterrows():
print row['id']
tweet = api.get_status(row['id'])
To escape potential errors, you can add a try / except loop later.
I got the solution after some experiments:
dfuids = pd.read_csv('Uids.csv')
valid = []
notvalid = []
for index, row in dfuids.iterrows():
print index
x = str(row.id)
#print x , type(x)
try:
tweet = api.user_timeline(row.id)
#print "Fine :",row.id
valid.append(x)
#print x, "added to valid"
except:
#print "NotOk :",row.id
notvalid.append(x)
#print x, "added to valid"
This Part of the code was what I needed, so it loops for all the Ids, and test if that user id give us some tweets from the timeline, if correct then it's appended as string to a list called (valid) else if we have an exception for any reason then it's appended to (notvalid).
We can save this list into a dataframe and export csv :
df = pd.DataFrame(valid)
dfnotv = pd.DataFrame(notvalid)
df.to_csv('valid.csv', index=False, encoding='utf-8')
dfnotv.to_csv('notvalid.csv', index=False, encoding='utf-8')

Categories

Resources