how to cluster trajectories of x,y coordinates in r? - python

i'm trying to clustering trajectories. But this is not easy.
The following stream data (spatio-temporal data) exists.
Here, we can see that each Object_ID has several x, y, and this is a
trajectory.
So I want to follow these points and get the following clusters:
I have already thought of many ways. For example, DBSCAN, TRACLUS, ...
But if I use DBSCAN, I do not know how to put the input value.
In other words, how do I put each object_ID line as an input value? (What
form?)
Or is there a way to put multiple coordinates of each Object_ID first?
object_1: [{x1, y1}, {x2, y2}, {y3, y3}, ... {xn, yn}],
object_2: [{x1, y1}, {x2, y2}, {y3, y3}, ... {xn, yn}],
object_3: [{x1, y1}, {x2, y2}, {y3, y3}, ... {xn, yn}],
.
.
.
And after I get cluster results, each cluster must have Object information.
Do you know anyone in r or python?

DBSCAN has no particular requirements on the data type.
You just need to be able to compute distances.
So organize the data as necessary for your time series distance function.
Is then try HAC first, then DBSCAN.

Related

Add custom property to vtkXMLUnstructuredGrid using python

I have a .vtu file representing a mesh which I read through vtkXMLUnstructuredGridReader. Then I create a numpy array (nbOfPoints x 3) in which I store the mesh vertex coordinates, which I'll call meshArray.
I also have a column array (nOfPoints x 1), which I'll call brightnessArray, which represents a certain property I want to assign to the vertexes of the meshArray; so to each vertex corresponds a scalar value. For example: to the element meshArray[0] will correspond brightnessArray[0] and so on.
How can I do this?
It is then possible to interpolate the value at the vertexes of the mesh to obtain a smooth variation of the property I had set in order to visualize it in paraview?
Thank you.
Simon
Here is what you need to do :
Write a Python Programmable Source to read your numpy data as a vtkUnstructuredGrid.
Here are a few examples of programmable sources :
https://www.paraview.org/Wiki/ParaView/Simple_ParaView_3_Python_Filters
https://www.paraview.org/Wiki/Python_Programmable_Filter
Read your .vtu dataset
Use a "Ressample with Dataset" filter on your python programmable source output and select your dataset as "source"
And you're done.
The hardest part is writing the programmble source script.

STRESS array is empty in python scripting Abaqus

I wanted to extract stress on top surface of my model on each node but it can't be done normally. when I use this script:
odb = visualization.openOdb('My.odb')
frame=odb.steps['AStep'].frames[-1]
dispNode = odb.rootAssembly.nodeSets['UPPER']
STRESS= frame.fieldOutputs['S'].getSubset(region=dispNode).values
COORD= frame.fieldOutputs['COORD'].getSubset(region=dispNode).values
print(STRESS)
print(COORD[1].data)
STRESS returns an empty array.
How can I edit my script to have stress and its corresponding coordinates??
Your Code can't work, if you only calculated your stress values on the integration points. There are simply no values at the nodes, so if you request values at nodes you will get an empty array.
This is how it should work:
Extrapolate your integration point results to the nodes
Average your ElementNodal values. This is how that works: https://stackoverflow.com/a/43175485/4045774
Extract your node coordinates (deformed or undeformed)
get the node labels from your point set
With the node labels from your point set find the corresponding unique nodal values https://docs.scipy.org/doc/numpy/reference/generated/numpy.in1d.html
If you need a small example code, feel free to ask.

How to use distancematrix function from Biopython?

I would like to calculate the distance matrix (using genetic distance function) on a data set using http://biopython.org/DIST/docs/api/Bio.Cluster.Record-class.html#distancematrix, but I seem to keep getting errors, typically telling me the rank is not of 2. I'm not actually sure what it wants as an input since the documentation never says and there are no examples online.
Say I read in some aligned gene sequences:
SingleLetterAlphabet() alignment with 7 rows and 52 columns
AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIRL...SKA COATB_BPIKE/30-81
AEPNAATNYATEAMDSLKTQAIDLISQTWPVVTTVVVAGLVIKL...SRA Q9T0Q8_BPIKE/1-52
DGTSTATSYATEAMNSLKTQATDLIDQTWPVVTSVAVAGLAIRL...SKA COATB_BPI22/32-83
AEGDDP---AKAAFNSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPM13/24-72
AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA COATB_BPZJ2/1-49
AEGDDP---AKAAFDSLQASATEYIGYAWAMVVVIVGATIGIKL...SKA Q9T0Q9_BPFD/1-49
FAADDATSQAKAAFDSLTAQATEMSGYAWALVVLVVGATVGIKL...SRA COATB_BPIF1/22-73
which would be done by
data = Align.read("dataset.fasta","fasta")
But the distance matrix in Cluster.Record class does not accept this. How can I get it to work! ie
dist_mtx = distancematrix(data)
The short answer: You don't.
From the documentation:
A Record stores the gene expression data and related information
The Cluster object is used for gene expression data and not for MSA.
I would recommend using an external tool like MSARC which runs in Python as well.

Most suitable clustering method for a dataset containing 10 dimension numerical arrays

I have a data set (~4k samples) of the following structure:
sample type: string - very general
sample sub type: string
sample model number: number - may be None
signature: number array[10]
sampleID: string - unique id
I want to cluster the samples based on the "signature" (I have a function that measures "distance" between one signature to another). So that when I'll encounter a new signature I'll be able to tell to which type/sub type the sample belongs to.
Which algorithm should I use?
P.S. (I am using python and scikit-learn), I also need to somehow visualize the results.
Since you already have a distance function, and yoour data set is tiny, just use HAC, the grandfather of all clustering algorithms.

Implementing Disjoint Set Data Structure in Python

I'm working on a small project involving cluster, and I think the code given here https://www.ics.uci.edu/~eppstein/PADS/UnionFind.py might be a good starting point for my work. However, I have come across a few difficulties implementing it to my work:
If I make a set containing all my clusters cluster=set([0,1,2,3,4,...,99]) (there are 100 points with the numbers labelling them), then I would like to to group the numbers into cluster, do I simply write cluster=UnionFind()? Now what is the data type of cluster?
How can I perform the usual operations for set on cluster? For instance, I would like to read all the points (which may have been grouped together) in cluster, but type print cluster results in <main.UnionFind instance at 0x00000000082F6408>. I would also like to keep adding new elements to cluster, how do I do it? Do I need to write the specific methods for UnionFind()?
How do I know all the members of a group with one of its member is called? For instance, 0,1,3,4 are grouped together, then if I call 3, I want it to print 0,1,3,4, how do I do this?
thanks
Here's a small sample code on how to use the provided UnionFind class.
Initialization
The only way to create a set using the provided class is to FIND it, because it creates a set for a point only when it doesn't find it. You might want to create an initialization method instead.
union_find = UnionFind()
clusters = set([0,1,2,3,4])
for i in clusters:
union_find[i]
Union
# Merge clusters 0 and 1
union_find.union(0, 1)
# Add point 2 to the same set
union_find.union(0, 2)
Find
# Get the set for clusters 0 and 1
print union_find[0]
print union_find[1]
Getting all Clusters
# print all clusters and their sets
for cluster in union_find:
print cluster, union_find[cluster]
Note:
There is no direct way that gets you all the points given a cluster number. You can loop over all the points and pick the ones that have the required cluster number. You might want to modify the given class to support that operation more efficiently.

Categories

Resources