Related
I need to calculate the length of this poly line with the following coordinates formatted this exact way:
coords = [[1.0, 1.0], [1.5, 2.0], [2.2, 2.4], [3.0, 3.2], [4.0, 3.6], [4.5, 3.5], [4.8, 3.2], [5.2, 2.8], [5.6, 2.0],
[6.5, 1.2]]
using this distance formula 𝑑 = √(𝑥2 − 𝑥1)
2 + (𝑦2 − 𝑦1)
our lab wants us to use a Loop, and be able to use the same code on other sets of coordinates. I am lost on where to start.
Something like this could work?
Define a function for the Euclidean distance:
import math
def distance(x1,x2,y1,y2):
return math.sqrt((x2-x1)**2 + (y2-y1)**2)
Then, another function which receives coords as input and gives back the sum of point-to-point euclidean distances
def compute_poly_line_lenth(coords):
distances = []
for i in range(len(coords)-1):
current_line = coords[i]
next_line = coords[i+1]
distances.append(distance(current_line[0],next_line[0],current_line[1],next_line[1]))
return sum(distances)
I am trying to extract the round up and round down of given values to the nearest 0.25 multiples. I have tried using round, ceil and floor functions but not getting the desired output. Is there any other function I can use to achieve the expected output shared in the below code snippet.
import math
x=[0.14,2.31,1.56,1.98,0.12,0.11,0.13,0.25]
lup=[]
ldown=[]
base=0.25
for i in x:
x1= base*(round(float(i/base)))
x2= base*(math.floor(float(i/base)))
lup.append(x1)
ldown.append(x2)
# print(F"value:{i},fraction:{float(i/base)},fround:{round(float(i/base))},lup:{x1},ldown:{x2}")
print(F"lup={lup}")
print(F"ldown={ldown}")
expected output:
lup = [0.25,2.5,1.75,2.0,0.25,0.25,0.25]
ldown=[0,2.25,1.5,1.75,0.0,0.0,0.0,0.25]
obtained output:
lup=[0.25, 2.25, 1.5, 2.0, 0.0, 0.0, 0.25, 0.25]
ldown=[0.0, 2.25, 1.5, 1.75, 0.0, 0.0, 0.0, 0.25]
Use math.ceil(The opposite of math.floor) instead of math.round:
x1= base*(math.ceil(float(i/base)))
I would like to organize my collected data (from computer simulations) into a hdf5 file using Python.
I measured positions and velocities [x,y,z,vx,vy,vz] of all atoms within a certain space region over many time steps. The number of atoms, of course, varies from time step to time step.
A minimal example could look as follows:
[
[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ],
[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ]
]
(2 time steps,
first time step: 2 atoms,
second time step: 3 atoms)
My idea was to create a hdf5 dataset within Python which stores all the information. At each time step it should store a 2d array of alls positions/velocities of all atoms, i.e.
dataset[0] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ]
dataset[1] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ].
The idea is clear, I think. However, I struggle with the definition of the correct data type of the data set with varying array length.
My code looks like this:
import numpy as np
import h5py
file = h5py.File ('file.h5','w')
columnNo = 6
rowtype = np.dtype("%sfloat32" % columnNo)
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )
dataset = file.create_dataset("dset", (2,), dtype=dt)
print dataset.value
testarray = np.array([[1.,2.,3.,2.,3.,4.],[1.,2.,3.,2.,3.,4.]])
print testarray
dataset[0] = testarray
print dataset[0]
This, however, does not work. When I run the script I get the error message "AttributeError: 'float' object has no attribute 'dtype'."
It seems that my defined dtype is wrong.
Does anybody see how it should be defined correctly?
Thanks very much,
Sven
The error in your case is buried, though it is clear it occurs when trying to assign the testarray to the dataset:
Traceback (most recent call last):
File "stack41465480.py", line 26, in <module>
dataset[0] = testarray
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/build/h5py-GhwtGD/h5py-2.6.0/h5py/_objects.c:2577)
...
File "h5py/_conv.pyx", line 712, in h5py._conv.ndarray2vlen (/build/h5py-GhwtGD/h5py-2.6.0/h5py/_conv.c:6171)
AttributeError: 'float' object has no attribute 'dtype'
I'm not skilled with the special_dtype and vlen, but I was able to write a numpy structured arrays to h5py.
import numpy as np
import h5py
file = h5py.File ('file.h5','w')
columnNo = 6
# rowtype = np.dtype("%sfloat32" % columnNo)
rowtype = np.dtype([('f0', '<f4',(6,))])
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )
print('rowtype',rowtype)
print('dt',dt)
dataset = file.create_dataset("dset", (2,), dtype=rowtype)
print('value')
print(dataset.value[0])
arr = np.ones((2,),dtype=rowtype)
print(repr(arr))
dataset[0] = arr[0]
print(dataset.value)
testarray = np.array([([1.,2.,3.,2.,3.,4.],),([2.,3.,4.,1.,2.,3.],)], dtype=rowtype)
print(repr(testarray))
dataset[1] = testarray[1]
print(dataset.value)
print(dataset.value['f0'])
producing
1316:~/mypy$ python3 stack41465480.py
rowtype [('f0', '<f4', (6,))]
dt object
value
([0.0, 0.0, 0.0, 0.0, 0.0, 0.0],)
array([([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],), ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)],
dtype=[('f0', '<f4', (6,))])
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([0.0, 0.0, 0.0, 0.0, 0.0, 0.0],)]
array([([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],), ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)],
dtype=[('f0', '<f4', (6,))])
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)]
[[ 1. 1. 1. 1. 1. 1.]
[ 2. 3. 4. 1. 2. 3.]]
Thanks for the quick answer. It helped a lot.
If I now simply change the data type of the data set to
dtype = dt,
I get what I would like to have.
Here, the Python code (for completeness):
import numpy as np
import h5py
file = h5py.File ('file.h5','w')
columnNo = 6
rowtype = np.dtype([('f0', '<f4',(6,))])
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )
print('rowtype',rowtype)
print('dt',dt)
dataset = file.create_dataset("dset", (2,), dtype=dt)
# print('value')
# print(dataset.value[0])
arr = np.ones((3,),dtype=rowtype)
# print(repr(arr))
dataset[0] = arr
# print(dataset.value)
testarray = np.array([([1.,2.,3.,2.,3.,4.],),([2.,3.,4.,1.,2.,3.],)], dtype=rowtype)
# print(repr(testarray))
dataset[1] = testarray
print(dataset.value)
for i in range(2): print dataset[i]
And to corresponding output reads
('rowtype', dtype([('f0', '<f4', (6,))]))
('dt', dtype('O'))
[ array([([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],),
([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],), ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)],
dtype=[('f0', '<f4', (6,))])
array([([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],), ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)],
dtype=[('f0', '<f4', (6,))])]
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)
([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)]
[([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],) ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)]
Just to get it right: The problem in my original code was a bad definition of my rowtype data structure, right?
Best,
Sven
I am new to machine learning and am trying to try out the following problem.
Input is 2 arrays of descriptions with same length, and output is an array of similarity scores of first string from first array compared to first string in second array etc.
Each item in the array(numpy array) is a string of description. Can you write a function find out how similar between two strings by calculating how many identical and co-occurring word IDs there are, and assign it a score (one possible weight can be based on the frequency of co-occurrence vs sum of frequency of individual word ID). Then apply the function to two arrays to get an array of scores.
Please also let me know if there are other approaches you would want to to consider as well.
Thanks!
Data:
array(['0/1/2/3/4/5/6/5/7/8/9/3/10', '11/12/13/14/15/15/16/17/12',
'18/19/20/21/22/23/24/25',
'26/27/28/29/30/31/32/33/34/35/36/37/38/39/33/34/40/41',
'5/42/43/15/44/45/46/47/48/26/49/50/51/52/49/53/54/51/55/56/22',
'57/58/59/60/61/49/62/23/57/58/63/57/58', '64/65/66/63/67/68/69',
'70/71/72/73/74/75/76/77',
'78/79/80/81/82/83/84/85/86/87/88/89/90/91',
'33/34/92/93/94/95/85/96/97/98/99/60/85/86/100/101/102/103',
'104/105/106/107/108/109/110/86/107/111/112/113/5/114/110/113/115/116',
'117/118/119/120/121/12/122/123/124/125',
'14/126/127/128/122/129/130/131/132/29/54/29/129/49/3/133/134/135/136',
'137/138/139/140/141/142',
'143/144/145/146/147/148/149/150/151/152/4/153/154/155/156/157/158/128/159',
'160/161/162/163/131/2/164/165/166/167/168/169/49/170/109/171',
'172/173/174/175/176/177/73/178/104/179/180/179/181/173',
'182/144/183/179/73',
'184/163/68/185/163/8/186/187/188/54/189/190/191',
'181/192/0/1/193/194/22/195',
'113/196/197/198/68/199/68/200/201/202/203/201',
'204/205/206/207/208/209/68/200',
'163/210/211/122/212/213/214/215/216/217/100/101/160/139/218/76/179/219',
'220/221/222/223/5/68/224/225/54/225/226/227/5/221/222/223',
'214/228/5/6/5/215/228/228/229',
'230/231/232/233/122/215/128/214/128/234/234',
'235/236/191/237/92/93/238/239',
'13/14/44/44/240/241/242/49/54/243/244/245/55/56',
'220/21/246/38/247/201/248/73/160/249/250/203/201',
'214/49/251/252/253/254/255/256/257/258'],
dtype='|S127')
array(['151/308/309/310/311/215/312/160/313/214/49/12',
'314/315/316/275/317/42/318/319/320/212/49/170/179/29/54/29/321/322/323',
'324/325/62/220/326/194/327/328/218/76/241/329',
'330/29/22/103/331/314/68/80/49',
'78/332/85/96/97/227/333/4/334/188',
'57/335/336/34/187/337/21/338/212/213/339/340',
'341/342/167/343/8/254/154/61/344',
'2/292/345/346/42/347/348/348/100/349/202/161/263',
'283/39/312/350/26/351', '352/353/33/34/144/218/73/354/355',
'137/356/357/358/357/359/22/73/170/87/88/78/123/360/361/53/362',
'23/363/10/364/289/68/123/354/355',
'188/28/365/149/366/98/367/368/369/370/371/372/368',
'373/155/33/34/374/25/113/73', '104/375/81/82/168/169/81/82/18/19',
'179/376/377/378/179/87/88/379/20',
'380/85/381/333/382/215/128/383/384', '385/129/386/387/388',
'389/280/26/27/390/391/302/392/393/165/394/254/302/214/217/395/396',
'397/398/291/140/399/211/158/27/400', '401/402/92/93/68/80',
'77/129/183/265/403/404/405/406/60/407/162/408/409/410/411/412/413/156',
'129/295/90/259/38/39/119/414/415/416/14/318/417/418',
'419/420/421/422/423/23/424/241/421/425/58',
'426/244/427/5/428/49/76/429/430/431',
'257/432/433/167/100/101/434/435/436', '437/167/438/344/356/170',
'439/440/441/442/192/443/68/80/444/445/111', '446/312/23/447/448',
'385/129/218/449/450/451/22/452/125/129/453/212/128/454/455/456/457/377'],
dtype='|S127')
The following code should facilitate you with what you need in Python 3.x
import numpy as np
from collections import Counter
def jaccardSim(c1, c2):
cU = c1 | c2
cI = c1 & c2
sim = sum(cI.values()) / sum(cU.values())
return sim
def byteArraySim(b1, b2):
cA = [Counter(b1[i].decode(encoding="utf-8", errors="strict").split("/"))
for i in range(len(b1))]
cB = [Counter(b2[i].decode(encoding="utf-8", errors="strict").split("/"))
for i in range(len(b2))]
# Assuming both 'a' and 'b' are in the same length
cSim = [jaccardSim(cA[i], cB[i]) for i in range(len(a))]
return cSim # Array of similarities
Jaccard Similarity score is used in this implementation. You may other scores, such as cosine or hamming, to your liking.
Assuming that the arrays are stored in variables a and b, the resulting function byteArraySim(a,b) outputs the following similarity scores:
[0.0,
0.0,
0.0,
0.038461538461538464,
0.0,
0.041666666666666664,
0.0,
0.0,
0.0,
0.08,
0.0,
0.05555555555555555,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.058823529411764705,
0.0,
0.0,
0.0,
0.05555555555555555,
0.0,
0.0,
0.0,
0.0,
0.0]
Trying to create a left right discrete HMM in sklearn to recognize words from recognized characters. symbol set is all " " + 26 letters for 27 total symbols.
import numpy as np
from sklearn import hmm
# alphabet is symbols
symbols = [' ','a','b','c','d','e','f','g','h','i','j', #0-10
'k','l','m','n','o','p','q','r','s','t', #11-20
'u','v','w','x','y','z'] #21-26
num_symbols = len(symbols)
# words up to 6 letters
n_states = 6
obsONE = np.array([ [0,0,15,14,5,0], # __one_
[15,14,5,0,0,0], # one___
[0,0,0,15,14,5], # ___one
[0,15,14,5,0,0], # _one__
[0,0,16,14,5,0], # __pne_
[15,14,3,0,0,0], # onc___
[0,0,0,15,13,5], # ___ome
[0,15,14,5,0,0], # _one__
[0,0,15,14,5,0], # __one_
[15,14,5,0,10,15], # one_jo
[1,14,0,15,14,5], # an_one
[0,15,14,5,0,16], # _one_p
[20,0,15,14,5,0], # t_one_
[15,14,5,0,10,15], # one_jo
[21,20,0,15,14,5], # ut_one
[0,15,14,5,0,20], # _one_t
[21,0,15,14,5,0], # u_one_
[15,14,5,0,10,15], # one_jo
[0,0,0,15,14,5], # an_one
[0,15,14,5,0,26], # _one_z
[5,20,0,15,14,5] ])
pi = np.array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0]) # initial state is the left one always
A = np.array([[0.0, 1.0, 0.0, 0.0, 0.0, 0.0], # node 1 goes to node 2
[0.0, 0.5, 0.5, 0.0, 0.0, 0.0], # node 2 can self loop or goto 3
[0.0, 0.0, 0.5, 0.5, 0.0, 0.0], # node 3 can self loop or goto 4
[0.0, 0.0, 0.0, 0.5, 0.5, 0.0], # node 4 can self loop or goto 5
[0.0, 0.0, 0.0, 0.0, 0.5, 0.5], # node 5 can self loop or goto 6
[1.0, 0.0, 0.0, 0.0, 0.0, 0.0]]) # node 6 goes to node 1
model = hmm.MultinomialHMM(n_components=n_states,
startprob=pi, # this is the start matrix, pi
transmat=A, # this is the transition matrix, A
params='e', # update e in during training (aka B)
init_params='ste') # initialize with s,t,e
model.n_symbols = num_symbols
model.fit(obsONE)
But I get ValueError: Input must be both positive integer array and every element must be continuous.
The code seems to directly want observations to be implemented as [0,1,2,3,4,5]
How should I set this up to get to the HMM model that I want???
I faced the same problem. It seems to me that the observation sequence that is input doesn't contain some characters from the vocabulary. So instead of assigning numbers to them statically, you can assign numbers to characters after finding out what characters are present in the observation sequence.
(eg.)
Suppose
Word = 'apzaqb'
use
Symbols = ['a','b','p','q','z'] for numbering
(ie.) ObsOne = np.array([0,2,4,0,3,1])
instead of
Symbols = ['','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'] for numbering
(ie.) ObsOne = np.array([1,16,26,1,17,2])