Spam Classification Lab: Python Error in Array Syntax - python

I receive an invalid syntax error for this code... Any help would be greatly appreciated.
#show the vectors for each sentence
print(X.toarray())
[[0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0]
[0 0 0 0 0 2 0 1 1 0 1 0 0 1 0 0 0]
[0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0]
[1 1 1 1 1 0 0 1 0 0 0 0 0 1 1 0 1]]

Mate simply try this with Numpy
import numpy as np
arr = np.array(X)
print(arr)
In case you're solving a text based problem like converting words to vectors, then I'll recommend you use the Tokenizer from Tensorflow library. Use the following code then.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.text.preprocessing.text import Tokenizer
sentences = [
'This is sample one',
'This is sample two'
]
tokenizer = Tokenizer(num_words = 100)
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
print(word_index)
Try this.
Hope this helps.

Related

How to find vertices of a polytope in python?

I want to find the vertices given the following:
`A = np.array([
[-1,2/3,0,0,0,0,0,0,0],
[1,-2/3,0,0,0,0,0,0,0],
[-1,0,0,2/3,0,0,0,0,0],
[-1,0,2/3,0,0,0,0,0,0],
[1,0,-2/3,0,0,0,0,0,0],
[-1,0,0,0,2/3,0,0,0,0],
[-1,0,0,0,2/3,0,0,0,0],
[1,0,0,-2/3,0,0,0,0,0],
[-1,2/3,0,0,0,0,0,0,0]])`
`b = np.array([-1/3, 1/3, -2/3, 1/3, -1/3, 0, -2/3, 2/3, -1/3])` .
I tried to compute the vertices by using the
vertices = pypoman.compute_polytope_vertices(A,b).
However, I get the following error:
raise Exception("Polyhedron is not a polytope")
Exception: Polyhedron is not a polytope".
Did anyone have a problem like this?
Use pycddlib:
# -*- coding: utf-8 -*-
import numpy as np
import cdd as pcdd
import fractions as frac
A = np.array([
[-1,frac.Fraction(2,3),0,0,0,0,0,0,0],
[1,-frac.Fraction(2,3),0,0,0,0,0,0,0],
[-1,0,0,frac.Fraction(2,3),0,0,0,0,0],
[-1,0,frac.Fraction(2,3),0,0,0,0,0,0],
[1,0,-frac.Fraction(2,3),0,0,0,0,0,0],
[-1,0,0,0,frac.Fraction(2,3),0,0,0,0],
[-1,0,0,0,frac.Fraction(2,3),0,0,0,0],
[1,0,0,-frac.Fraction(2,3),0,0,0,0,0],
[-1,frac.Fraction(2,3),0,0,0,0,0,0,0]])
b = np.array([
[-frac.Fraction(1,3)],
[frac.Fraction(1,3)],
[-frac.Fraction(2,3)],
[frac.Fraction(1,3)],
[-frac.Fraction(1,3)],
[0],
[-frac.Fraction(2,3)],
[frac.Fraction(2,3)],
[-frac.Fraction(1,3)]
])
M = np.hstack( (b, -A) )
mat = pcdd.Matrix(M, linear=False, number_type="fraction")
mat.rep_type = pcdd.RepType.INEQUALITY
poly = pcdd.Polyhedron(mat)
ext = poly.get_generators()
print(ext)
Giving:
1 2/3 1/2 3/2 0 0 0 0 0 0
0 1 3/2 3/2 3/2 0 0 0 0 0
0 2/3 1 1 1 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1
Indeed, this is not a polytope. A "1" in the first column means it is an extreme point, and here there is only one.

Create a matrix that contains 1 if there is a 1 in the bottom-right corner

Given a matrix M n*n (containing only 0 and 1), I want to build the matrix that contains a 1 in position (i, j) if and only if there is at least a 1 in the bottom-right submatrix M[i:n, j:n]
Please note that I know there are optimal algorithm to compute this, but for performance reasons, I'm looking for a solution using numpy (so the algorithm is fully compiled)
Example:
Given this matrix:
0 0 0 0 1
0 0 1 0 0
0 0 0 0 1
1 0 1 0 0
I'm looking for a way to compute this matrix:
0 0 0 0 1
0 0 1 1 1
0 0 1 1 1
1 1 1 1 1
Thanks
Using numpy, you can accumulate the maximum value over each axis:
import numpy as np
M = np.array([[0,0,0,0,1],
[0,0,1,0,0],
[0,0,0,0,1],
[1,0,1,0,0]])
M = np.maximum.accumulate(M)
M = np.maximum.accumulate(M,axis=1)
print(M)
[[0 0 0 0 1]
[0 0 1 1 1]
[0 0 1 1 1]
[1 1 1 1 1]]
Note: This matches your example result (presence of 1 in top-left quadrant). Your explanations of the logic would produce a different result however
If we go with M[i:n,j:n] (bottom-right):
M = np.array([[0,0,0,0,1],
[0,0,1,0,0],
[0,0,0,0,1],
[1,0,1,0,0]])
M = np.maximum.accumulate(M[::-1,:])[::-1,:]
M = np.maximum.accumulate(M[:,::-1],axis=1)[:,::-1]
print(M)
[[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 0 0]]
It is essentially the same approach except with reversed accumulation on the axes

Counting repeated sequences in transition table

I'm using the following function to generate a transition table:
import numpy as np
import pandas as pd
def make_table(allSeq):
n = max([ max(s) for s in allSeq ]) + 1
arr = np.zeros((n,n), dtype=int)
for seq in allSeq:
ind = (seq[1:], seq[:-1])
arr[ind] += 1
return pd.DataFrame(arr).rename_axis(index='Next', columns='Current')
However, my result is incorrect:
list1 = [1,2,3,4,5,4,5,4,5]
list2 = [4,5,4,5]
make_table([list1, list2])
Current 0 1 2 3 4 5
Next
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 1 0 0 0 0
3 0 0 1 0 0 0
4 0 0 0 1 0 2
5 0 0 0 0 2 0
For example, the transition 4->5 should be counted 5 times, but it's only counted once per sequence (2). I know the issue is the arr[ind] += 1 line, but I just can't figure it out! Do I nest another loop, or is there a slick way to add the total number of instances at once? Thanks!
Figured it out! Switched to the following:
def make_table(allSeq):
n = max([ max(s) for s in allSeq ]) + 1
arr = np.zeros((n,n), dtype=int)
for seq in allSeq:
for i,j in zip(seq[1:],seq[:-1]):
ind = (i,j)
arr[ind] += 1
return pd.DataFrame(arr).rename_axis(index='Next', columns='Current')
Another loop seems like the easiest solution, with a bit of a twist of using zip:
import numpy as np
import pandas as pd
def make_table(allSeq):
n = max([ max(s) for s in allSeq ]) + 1
arr = np.zeros((n,n), dtype=int)
for seq in allSeq:
ind = zip(seq[1:], seq[:-1])
for i in ind:
arr[i] += 1
return pd.DataFrame(arr).rename_axis(index='Next', columns='Current')
list1 = [1,2,3,4,5,4,5,4,5]
list2 = [4,5,4,5]
make_table([list1, list2])
returns
Next 0 1 2 3 4 5
------ --- --- --- --- --- ---
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 1 0 0 0 0
3 0 0 1 0 0 0
4 0 0 0 1 0 3
5 0 0 0 0 5 0

Efficient implementation of words count in the several lists using Python

I have the list of comments in the following format:
Comments=[['hello world'], ['would', 'hard', 'press'],['find', 'place', 'less'']]
wordset={'hello','world','hard','would','press','find','place','less'}
I wish to have the table or dataframe which has wordset as index and the individual counts for each comment in Comments
I worked with the following code which achieves the required dataframe. And It is high time taking and I look for an efficient implementation. Since the corpus is large, this has a huge impact on the efficiency of our ranking algorithm.
result=pd.DataFrame()
for comment in Comments:
worddict_terms=dict.fromkeys(wordset,0)
for items in comment:
worddict_terms[items]+=1
df_comment=pd.DataFrame.from_dict([worddict_terms])
frames=[result,df_comment]
result = pd.concat(frames)
Comments_raw_terms=result.transpose()
The result we expect is:
0 1 2
hello 1 0 0
world 1 0 0
would 0 1 0
press 0 1 0
find 0 0 1
place 0 0 1
less 0 0 1
hard 0 1 0
I think your nested for loop is increasing complexity. I am writing code which replaces 2 for loops with single map function. I am writing code only up to part where for each comment in comments, you get the count_dictionary for "Hello" and "World". You, Please copy the remaining code of making table using pandas.
from collections import Counter
import funcy
from funcy import project
def fun(comment):
wordset={'hello','world'}
temp_dict_comment = Counter(comment)
temp_dict_comment = dict(temp_dict_comment)
final_dict = project(temp_dict_comment,wordset)
print final_dict
Comments=[['hello', 'world'], ['would', 'hard', 'press'],['find', 'place', 'less', 'excitingit', 'wors', 'watch', 'paint', 'dri']]
map(fun,Comments)
This should help as it is only containing single map instead of 2 for loops.
Try this approach:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()
text = pd.Series(Comments).str.join(' ')
X = vect.fit_transform(text)
r = pd.DataFrame(X.toarray(), columns=vect.get_feature_names())
Result:
In [49]: r
Out[49]:
find hard hello less place press world would
0 0 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0 1
2 1 0 0 1 1 0 0 0
In [50]: r.T
Out[50]:
0 1 2
find 0 0 1
hard 0 1 0
hello 1 0 0
less 0 0 1
place 0 0 1
press 0 1 0
world 1 0 0
would 0 1 0
Pure Pandas solution:
In [61]: pd.get_dummies(text.str.split(expand=True), prefix_sep='', prefix='')
Out[61]:
find hello would hard place world less press
0 0 1 0 0 0 1 0 0
1 0 0 1 1 0 0 0 1
2 1 0 0 0 1 0 1 0

rotate an nxnxn matrix in python

I have a binary array of size 64x64x64, where a volume of 40x40x40 is set to "1" and rest is "0". I have been trying to rotate this cube about its center around z-axis using skimage.transform.rotate and also Opencv as:
def rotateImage(image, angle):
row, col = image.shape
center = tuple(np.array([row, col]) / 2)
rot_mat = cv2.getRotationMatrix2D(center, angle, 1.0)
new_image = cv2.warpAffine(image, rot_mat, (col, row))
return new_image
In the case of openCV, I tried, 2D rotation of each idividual slices in a cube (Cube[:,:,n=1,2,3...p]).
After rotating, total sum of the values in the array changes. This may be caused by interpolation during rotation. How can I rotate 3D array of this kind without adding anything to the array?
Ok so I understand now what you are asking. The closest I can come up with is scipy.ndimage. But there is a way interface with imagej from python if which might be easier. But here is what I did with scipy.ndimage:
from scipy.ndimage import interpolation
angle = 25 #angle should be in degrees
Rotatedim = interpolation.rotate(yourimage, angle, reshape = False,output = np.int32, order = 5,prefilter = False)
This worked for some angles to preserve the some and not others, perhaps by playing around more with the parameters you might be able to get your desired outcome.
One option is to convert into sparse, and transform the coordinates using a matrix rotation. Then transform back into dense. In 2 dimensions, this looks like:
import numpy as np
import scipy.sparse
import math
N = 10
space = np.zeros((N, N), dtype=np.int8)
space[3:7, 3:7].fill(1)
print(space)
print(np.sum(space))
space_coo = scipy.sparse.coo_matrix(space)
Coords = np.array(space_coo.nonzero()) - 3
theta = 30 * 3.1416 / 180
R = np.array([[math.cos(theta), math.sin(theta)], [-math.sin(theta), math.cos(theta)]])
space2_coords = R.dot(Coords)
space2_coords = np.round(space2_coords)
space2_coords += 3
space2_sparse = scipy.sparse.coo_matrix(([1] * space2_coords.shape[1], (space2_coords[0], space2_coords[1])), shape=(N, N))
space2 = space2_sparse.todense()
print(space2)
print(np.sum(space2))
Output:
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
16
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 1 1 1 1 0 0 0 0]
[0 0 1 1 1 1 1 0 0 0]
[0 1 1 0 1 1 0 0 0 0]
[0 0 0 1 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
16
The advantage is that you'll get exactly as many 1 values before and after the transform. The downsides is that you might get 'holes', as above, and/or duplicate coordinates, giving values of '2' in the final dense matrix.

Categories

Resources