Why doesn't the function work with my input? - python

I have a default example dictionary which looks like this:
critics = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
I use a function that returns the most similar person in the dictionary using the Pearson correlation coefficient which looks like this:
from math import sqrt
def sim_pearson(prefs,p1,p2):
# lista na zaednichki tochki
si={}
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
# najdi go brojot na elementi
n=len(si)
# ako nemaat zaednichki tochki vrati 0
if n==0: return 0
# dodadi gi site
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si])
# sumiraj gi kvadratite
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
# sumiraj gi proizvodite
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
# presmetka na Pirsonoviot koeficient
num=pSum-(sum1*sum2/n)
den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
if den==0: return 0
r=num/den
return r
and it works. For example, for the call print sim_pearson(critics, 'Toby', 'Lisa Rose') I get the coefficient 0.991240707162.
However, when I try the same function with my dictionary which is:
tests = {'dzam': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0},
'kex': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0},
'rokoko': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0},
'test#example.com': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0},
'seljak': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0,
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, }}
I always get 1.0, no matter that I have matches in the dictionaries, why is that so?
By the way, I'm using hashes so my dictionary MUST have this long strings. :)

You are probably fooled by the long keys that hide to the eyes which strings are different.
Try setting all the values to 0 in test 'seljak' and run a correlation with it. You'll see a 0 correlation:
print sim_pearson(tests, 'test#example.com', 'seljak')
Change the last value of test 'seljak' to 1 and you will see a negative correlation re-running the script.

Related

How to train a LSTM with a sequence of numbers with different lenghts?

I am trying to train a LSTM with a dataset in which both the input and the output are a sequence of numbers of different lenght. Each number in the input represents a timestep. Example of input and output:
Input:
ent
229 [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, ...
511 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, ...
110 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, ...
243 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, ...
334 [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, ...
Output:
sal
229 [6.0, 7.0, 3.0, 0.0, 1.0, 4.0, 5.0, 2.0]
511 [0.0, 1.0, 6.0, 7.0, 2.0, 4.0, 5.0, 6.0, 7.0]
110 [3.0, 5.0, 0.0, 1.0, 5.0, 6.0, 7.0, 3.0]
243 [3.0, 6.0, 7.0, 4.0, 6.0, 7.0, 0.0, 1.0, 4.0]
334 [6.0, 7.0, 3.0, 4.0, 3.0, 5.0, 4.0]
When executing the train of the model always appears this error:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
model = keras.Sequential()
model.add(layers.Input(shape=(None, 200)))
model.add(layers.LSTM(20))
Should I select a different NN or include padding?
I have also tried changing the dimension to:
ent
229 [[3.0], [3.0], [3.0], [3.0], [3.0], [3.0], [3....
Do you know what could I do?
Traceback (most recent call last):
at block 8, line 8
at /opt/python/envs/default/lib/python3.8/site-packages/keras/utils/traceback_utils.pyline 67, in error_handler(*args, **kwargs)
at /opt/python/envs/default/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.pyline 106, in convert_to_eager_tensor(value, ctx, dtype)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

I cant properly shape a 2d array for a classifier

I have an issue with the numpy.array method. I'm trying to set up an array of dimensions (73, 125) with my data, but when applying the .array method I get something like this
set arousal (73,) [list([3.0, 4.0, 4.0, 3.0, 5.0, 3.0, 2.0, 4.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 3.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 5.0, 3.0, 3.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 4.0, 4.0, 5.0, 3.0, 3.0, 3.0, 5.0, 2.0, 3.0, 2.0, 4.0, 3.0, 2.0, 3.0, 2.0, 3.0, 2.0, 2.0, 4.0, 3.0, 4.0, 5.0, 4.0, 3.0, 4.0, 4.0, 4.0, 3.0, 5.0, 3.0, 5.0, 2.0, 3.0, 3.0, 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 5.0, 5.0, 4.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 1.0, 2.0]) # etc...
While instead I was expecting something like set arousal (73, 125).
This is my code
# Before this I imported the packages, the relevant datasets and did some preprocessing to drop "bad" data
info_en = info_clean[info_clean['QESTN_LANGUAGE'] == 'ENG']
rating_en = rating_clean[rating_clean['LANGUAGE'] == 'ENG']
info_en_set = info_en.copy()
ratings_set = rating_en.copy()
lArousal = []
lValence = []
for case in case_list:
set = ratings_set[ratings_set['CASE'] == case]
lArousal.append(list(set.loc[:,['AROUSAL_RATING']]['AROUSAL_RATING']))
lValence.append(list(set.loc[:,['VALENCE_RATING_RECODED']]['VALENCE_RATING_RECODED']))
arrArousal = np.asarray(lArousal)
arrValence = np.asarray(lValence)
print('set arousal',arrArousal.shape,arrArousal)
print('set valence',arrValence.shape,arrValence)
When I try to train my sklearn classifier I get the error message "setting an array element with a sequence." that I can understand but I can't solve the list issue.
Apparently, the for loop works for one dataset that I am testing but not for the other. In one case I correctly get the 2d array, in the other, I am stuck with this array of lists.

wrong when import python code in "programming collective intelligence"

I'm reading programming collective intelligence book. In chapter 2 I've problem when doing this step
>>>from recommendations import critics
the terminal show me this message
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name critics
the whole code is
# A dictionary of movie critics and their ratings of a small
# set of movies
critics = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
I didn't read this book but it looks like you need to download "recommendations" module to complete the example. Here it is https://github.com/arthur-e/Programming-Collective-Intelligence/blob/master/chapter2/recommendations.py
P.S.
You need to put this file in your work directory

Tensorflow large tensor split to small tensor

I have a tensor like below
x = tf.Variable(tf.truncated_normal([batch, input]), stddev=0.1))
Assume that batch = 99, input= 5, and I would like to split up into a small tensor.
If x is below:
[[1.0, 2.0, 3.0, 4.0, 5.0]
[2.0, 3.0, 4.0, 5.0, 6.0]
[3.0, 4.0, 5.0, 6.0, 7.0]
[4.0, 5.0, 6.0, 7.0, 8.0]
.........................
.........................
.........................
[44.0, 55.0, 66.0, 77.0, 88.0]
[55.0, 66.0, 77.0, 88.0, 99.0]]
I want to split up into two tensors
[[1.0, 2.0, 3.0, 4.0, 5.0]
[2.0, 3.0, 4.0, 5.0, 6.0]
[3.0, 4.0, 5.0, 6.0, 7.0]]
and
[4.0, 5.0, 6.0, 7.0, 8.0]
.........................
.........................
[44.0, 55.0, 66.0, 77.0, 88.0]
[55.0, 66.0, 77.0, 88.0, 99.0]]
I don't know how to use tf.split to split row.
An expedient way would be to call tf.slice twice.

Remove duplicates in each list of a list of lists

I have a list of lists:
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
What I need to do is remove all the duplicates in the list of lists and keep the previous sequence. Such as
a = [[1.0],
[2.0, 3.0, 4.0],
[3.0, 5.0],
[1.0, 4.0, 5.0],
[5.0],
[1.0]
]
If order is important, you can just compare to the set of items seen so far:
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]]
for index, lst in enumerate(a):
seen = set()
a[index] = [i for i in lst if i not in seen and seen.add(i) is None]
Here i is added to seen as a side-effect, using Python's lazy and evaluation; seen.add(i) is only called where the first check (i not in seen) evaluates True.
Attribution: I saw this technique yesterday from #timgeb.
If you have access to the OrderedDict (in Python 2.7 on), abusing it a good way to do this:
import collections
import pprint
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
b = [list(collections.OrderedDict.fromkeys(i)) for i in a]
pprint.pprint(b, width = 40)
Outputs:
[[1.0],
[2.0, 3.0, 4.0],
[3.0, 5.0],
[1.0, 4.0, 5.0],
[5.0],
[1.0]]
This will help you.
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
for _ in range(len(a)):
a[_] = sorted(list(set(a[_])))
print a
OUTPUT:
[[1.0], [2.0, 3.0, 4.0], [3.0, 5.0], [1.0, 4.0, 5.0], [5.0], [1.0]]
Inspired by DOSHI, here's another way, probably best way for a small number of possible elements (i.e. a small number of index lookups for sorted) otherwise a way that remembers insertion order may be better:
b = [sorted(set(i), key=i.index) for i in a]
So just to compare the methods, a seen set versus sorting a set by an original index lookup:
>>> setup = 'l = [1,2,3,4,1,2,3,4,1,2,3,4]*100'
>>> timeit.repeat('sorted(set(l), key=l.index)', setup)
[23.231241687943111, 23.302754517266294, 23.29650511717773]
>>> timeit.repeat('seen = set(); [i for i in l if i not in seen and seen.add(i) is None]', setup)
[49.855933579601697, 50.171151882997947, 51.024657420945005]
Here we see that for a larger case where, the contain test that Jon uses for every element becomes relatively very costly, and since insertion order is quickly determined by index in this case, this method is much more efficient.
However, by appending more elements to the end of the list, we see that Jon's method does not bear much increased cost, whereas mine does:
>>> setup = 'l = [1,2,3,4,1,2,3,4,1,2,3,4]*100 + [8,7,6,5]'
>>> timeit.repeat('sorted(set(l), key=l.index)', setup)
[93.221347206941573, 93.013769266020972, 92.64512197257136]
>>> timeit.repeat('seen = set(); [i for i in l if i not in seen and seen.add(i) is None]', setup)
[51.042504915545578, 51.059295348750311, 50.979311841569142]
I think I'd prefer Jon's method with a seen set, given the bad lookup times for the index.

Categories

Resources