VaderSentiment: emoji analyzer does not work in Jupyter Notebook - python

I am trying to do some sentiment analysis on r/wallstreetbets content and would also like to use the meaning of emojis.
Here is my code:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
wsb_lingo = {
"bullish": 4.0,
"bearish": -4.0,
"bagholder": -4.0,
"BTFD": 4.0,
"FD": 4.0,
"diamond hands": 0.0,
"paper hands": 0.0,
"DD": 4.0,
"GUH": -4.0,
"pump": 4.0,
"dump": -4.0,
"gem stone": 4.0, # emoji
"rocket": 4.0, # emoji
"andromeda": 0.0,
"to the moon": 4.0,
"stonks": -4.0,
"tendies": 4.0,
"buy": 4.0,
"sell": -4.0,
"hold": 4.0,
"short": 4.0,
"long": 4.0,
"overvalued": -4.0,
"undervalued": 4.0,
"calls": 4.0,
"call": 4.0,
"puts": -4.0,
"put": -4.0,
}
sid = SentimentIntensityAnalyzer()
sid.lexicon.update(wsb_lingo)
# Test
print(sid.polarity_scores('🚀'))
print(sid.polarity_scores('😄'))
The output is given below:
{'neg': 0.0, 'neu': 0.0, 'pos': 0.0, 'compound': 0.0}
{'neg': 0.0, 'neu': 0.0, 'pos': 0.0, 'compound': 0.0}
How is it possible that it's unable to give any sentiment for emojis (e.g., due to Jupyter Notebook)? Am I forgetting something here? All libraries are up-to-date.

If I use vaderSentiment instead of nltk.sentiment.vader it works for me
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
new = { "rocket": 4.0 }
sia = SentimentIntensityAnalyzer()
sia.polarity_scores('🚀')
# Outputs: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
sia.lexicon.update(new)
sia.polarity_scores('🚀')
# Outputs: {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.7184}
See also this issue

Related

python list comprehension and looping question

I have variables as follows:
J = range (1,16)
T = range (1,9)
x= {} # 0,1 decision variable to be determined
These variables turn into combinations of x[j,t].
I am trying to implement a constraint for unacceptable t types in T for x[j,t] combinations that make the x var = 0.
I have a dictionary 'U' with j's as the key and t types and values stored in a list. Zero value means t is unacceptable, 1 is acceptable. The index is range 1-9, not 0-8. So in the example below, j 2, type 3 (bc its at index 3 on range(1,9)) is the only acceptable value.
{1: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
2: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
3: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
4: [1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
5: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
6: [1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
7: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
8: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
9: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
10: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
11: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
12: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
13: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
14: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
15: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]
}
I am struggling in trying to get the x[j,t] combinations bc of the misaligned index. I set it up like so:
for j,t in x:
if t in U[j]==0:
# do the thing... #addConstr(x[j,t], GRB.EQUAL,0)
So for j 2 the results I need are {(2,1):0, (2,2):0, (2,4):0, (2,5):0, (2,6):0, (2,7):0, (2,8):0} where the index value on range (1,9) becomes the t value in the tupledict.
Any pointers? Thank you!
Assuming your example data is stored in U, what you want to do is:
j_results = []
for j,types in U.items():
results = {}
for t in types:
if t == 0.0:
result[(j,int(t))] = 0
j_results.append(result)
j_results list will contain all results like you described:
for j 2 the results I need are {(2,1):0, (2,2):0, (2,4):0, (2,5):0, (2,6):0, (2,7):0, (2,8):0}
will be in j_result[1] (counter intuitive because your U data start from 1)
Note the int cast, because data you provided has floats, but results you provided are a tuple of ints.

Remove non zero values from dictionary of dictionary in python

I have a dictionary of dictionary as below how to remove non zero key,values from this
{'abcdef': {'1987': 0.0,
'0544': 0.0,
'0568': 0.0,
'3000': 0.0,
'7095': 0.0,
'75609': 1.0,
'56565': 2.0,
'98656': 3.0,
'756095': 0.0,
'23432': 0.0},
'fgrd': {'1987': 0.0,
'0544': 0.0,
'0568': 0.0,
'3000': 0.0,
'7095': 0.0,
'75609': 1.0,
'56565': 2.0,
'98656': 3.0,
'756095': 0.0,
'23432': 0.0}
}
Tried below,
{key:val for key, val in my_dict.items() if val.values() != 0.0}
and getting AttributeError: 'float' object has no attribute 'values',
Thanks
you have to iterate over the primary dictionary and than look for values in secondary dict
here is an example:
val = {'abcdef': {'1987': 0.0,
'0544': 0.0,
'0568': 0.0,
'3000': 0.0,
'7095': 0.0,
'75609': 1.0,
'56565': 2.0,
'98656': 3.0,
'756095': 0.0,
'23432': 0.0}}
for dict_k in val:
for dicts in val.values():
for k in dicts.copy():
if dicts[k] != 0.0:
dicts.pop(k)
print(val)

Select and display a certain value in dataframe column

I have two data frame columns and I was wondering how do I select and return only the 'compound' value?
Data frame:
attributes categories
0 {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0} {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
1 {'neg': 0.0, 'neu': 0.865, 'pos': 0.135, 'compound': 0.0} {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
2 {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.1 {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
3 {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.1} {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.1}
4 {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.1} {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.1}
Desired results:
attributes categories
0.0 0.0
0.0 0.0
0.1 0.0
0.1 0.1
0.1 0.1
Let us try
df['attributes']=df['attributes'].str.get('compound')

Python: ValueError: setting an array element with a sequence

I'm trying to use scikit-learn to do some ML.
I am using the preprocessing module to prep my data. The data are of type float.
From reading other questions regarding this issue: ValueError: setting an array element with a sequence, it's either due to wrong structure of my data or because my data is of type string. Neither seem to be the case here.
Please let me know if you have any idea how to solve this issue or what it even means. Thank you.
The code:
print(X)
pred_X = np.array(pred_X)
pred_Y = np.array(pred_Y)
X = np.array(X)
Y = np.array(Y)
X = preprocessing.scale(X)
pred_X = preprocessing.scale(pred_X)
print(x):
[[547180.0, 120.0, 113.0, 456701.0, 1.0, 6.43, -1.0, 0.313, 0.42, 0.267 3.0, 11800.0, 607208.0, 120.0, 113.0, 456701.0, 1.0, 0.273, 0.331, 0.154, 6.0, 10300.0, 458015.0, 113.0, 120.0, 45328 6.0, 1.0, 2.54, -1.0, 0.32, 0.443, 0.257, 3.0, 92000.0, 543685.0, 120.0, 113.0, 456701.0, 1.0, 6.43, 1.0, 0.296, 0.4, 0.234, 2.0, 8800.0, 594809.0, 475582.0, 120.0, 113.0, 456701.0, 1.0, 1.0, 0.295, 0.384, 0.264, 4.0, 7700.0],
[547180.0, 120.0, 113.0, 456701.0, 1.0, 6.43, -1.0, 0.313, 0.42, 0.267, 3.0, 11800.0, 607208.0, 120.0, 113.0, 456701.0, 1.0, 0.273, 0.331, 0.154, 6.0, 10300.0, 458015.0, 113.0, 120.0, 453286.0, 1.0, 2.54, -1.0, 0.32, 0.443, 0.257, 3.0, 92000.0, 543685.0, 120.0, 113.0, 456701.0, 1.0, 6.43, 1.0, 0.296, 0.4, 0.234, 2.0, 8800.0, 594809.0, 435062.0, 120.0, 113.0, 456701.0, 1.0, 1.0, 0.312, 0.364, 0.154, 5.0, 6900.0],
[547180.0, 120.0, 113.0, 456701.0, 1.0, 6.43, -1.0, 0.313, 0.42, 0.267, 3.0, 11800.0, 607208.0, 120.0, 113.0, 456701.0, 1.0, 0.273, 0.331, 0.154, 6.0, 10300.0, 458015.0, 113.0, 120.0, 453286.0, 1.0, 2.54, -1.0, 0.32, 0.443, 0.257, 3.0, 92000.0, 543685.0, 120.0, 113.0, 456701.0, 1.0, 6.43, 1.0, 0.296, 0.4, 0.234, 2.0, 8800.0, 594809.0, 446308.0, 120.0, 113.0, 456701.0, 1.0, 0.0, 0.221, 0.28e, 0.115, 8.0, 6400.0]]
The Error:
Traceback (most recent call last):
File "sampleSVM.py", line 46, in <module>
X = preprocessing.scale(X)
File "/home/user/.local/lib/python3.5/site-packages/sklearn/preprocessing/data.py", line 133, in scale
dtype=FLOAT_DTYPES)
File "/home/user/.local/lib/python3.5/site-packages/sklearn/utils/validation.py", line 433, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.
Your input array X is malformed. There are 59 elements in row 1, and 58 in rows 2 & 3. When you convert to a numpy array it becomes an array of shape (3,) with dtype=Object.
The solution is to check and fix your input data. Each row in X must be the same length.

Why doesn't the function work with my input?

I have a default example dictionary which looks like this:
critics = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
I use a function that returns the most similar person in the dictionary using the Pearson correlation coefficient which looks like this:
from math import sqrt
def sim_pearson(prefs,p1,p2):
# lista na zaednichki tochki
si={}
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
# najdi go brojot na elementi
n=len(si)
# ako nemaat zaednichki tochki vrati 0
if n==0: return 0
# dodadi gi site
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si])
# sumiraj gi kvadratite
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
# sumiraj gi proizvodite
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
# presmetka na Pirsonoviot koeficient
num=pSum-(sum1*sum2/n)
den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
if den==0: return 0
r=num/den
return r
and it works. For example, for the call print sim_pearson(critics, 'Toby', 'Lisa Rose') I get the coefficient 0.991240707162.
However, when I try the same function with my dictionary which is:
tests = {'dzam': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0},
'kex': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0},
'rokoko': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0},
'test#example.com': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0},
'seljak': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, }}
I always get 1.0, no matter that I have matches in the dictionaries, why is that so?
By the way, I'm using hashes so my dictionary MUST have this long strings. :)
You are probably fooled by the long keys that hide to the eyes which strings are different.
Try setting all the values to 0 in test 'seljak' and run a correlation with it. You'll see a 0 correlation:
print sim_pearson(tests, 'test#example.com', 'seljak')
Change the last value of test 'seljak' to 1 and you will see a negative correlation re-running the script.

Categories

Resources