I don't understand the k-means scipy algorithm

I don't understand the k-means scipy algorithm - python

I'm trying to use the scipy kmeans algorithm.
So I have this really simple example:
from numpy import array
from scipy.cluster.vq import vq, kmeans, whiten
features = array([[3,4],[3,5],[4,2],[4,2]])
book = array((features[0],features[2]))
final = kmeans(features,book)
and the result is
final
(array([[3, 4],
[4, 2]]), 0.25)
What I don't understand is, for me the centroids coordinate should be the barycentre of all the points belongings to the cluster, so in this exemple
[3,9/2] and [4,2]
can anyone explain me the result the scipy algorithm is giving?

It looks like it is preserving the data type that you are giving it (int). Try:
features = array([[3., 4.], [3., 5.], [4., 2.], [4., 2.]])

Related

Creating a NumPy array out of another array with shifted indices

I would like to produce a 4D array from a 2D one by periodic shifts, in a way that can be summarized by the following:
uuvv[kx,ky,qx,qy] = uu[kx+qx,ky+qy]
This is easiest to illustrate with a "2D from 1D" MWE:
def pc(idx):
return idx - Npts*int(idx/Npts)
uu = np.square(np.arange(Npts))
uv = np.zeros((Npts,Npts))
for kx in np.arange(Npts):
for qx in np.arange(Npts):
uv[kx,qx] = uu[pc(kx+qx)]
Here, the periodicity condition pc just brings the index back into the allowed range. The output for Npts=4 is:
array([[0., 1., 4., 9.],
[1., 4., 9., 0.],
[4., 9., 0., 1.],
[9., 0., 1., 4.]])
So that each value is shifted slightly. For the "4D from 2D" case, I could obviously use:
def pbc(idx):
return idx - Npts*int(idx/Npts)
uv = np.zeros((Npts,Npts,Npts,Npts))
for kx in np.arange(Npts):
for ky in np.arange(Npts):
for qx in np.arange(Npts):
for qy in np.arange(Npts):
uv[kx,ky,qx,qy] = uu[pbc(kx+qx),pbc(ky+qy)]
However, using four loops is going to be slow, as I will be doing this multiple times for much larger arrays. How can I do this more efficiently?
Please note that, although the MWE example could be reproduced by applying the square function to a 2D array, that would not be a helpful solution. Using the MWE to illustrate, the goal is to apply the function as few times as possible (i.e. only on the 1D array) and then to create the 2D array without for loops. Ultimately, I will need to do this to generate a 4D array from a 2D array. How can I do this?

You can replicate the 2D array and then extract the shifted 2D sub-arrays (avoiding modulus and conditionals). Here is how to do that:
uuRep = np.tile(uu, (2,2))
uv = np.zeros((Npts,Npts,Npts,Npts))
for kx in np.arange(Npts):
for ky in np.arange(Npts):
uv[kx,ky,:,:] = uuRep[kx:kx+Npts,ky:ky+Npts]
With Npts=64, this solution is about 1000 times faster.

Spline interpolation

I'm having difficulties to perform a spline interpolation on the below set:
import numpy
SOURCE = numpy.array([[1,2,3],[3,4,5], [9,10,11]])
from scipy.interpolate import griddata
from scipy.interpolate import interp1d
input = [0.5,2,3,6,9,15]
The linear interpolation works fine, yet when I replace linear with cubic, I have an error :
f = interp1d(SOURCE[:,0], SOURCE[:,1:], kind="linear", axis=0, bounds_error=False)
f(input)
f = interp1d(SOURCE[:,0], SOURCE[:,1:], kind="cubic", axis=0, bounds_error=False)
ValueError: The number of derivatives at boundaries does not match: expected 1, got 0+0
How can I perform this cubic interpolation ?

Your SOURCE data is too short. A cubic spline needs at least four points to interpolate from, but you're only provide three. If you add one more value to SOURCE, it should work more or less as expected:
>>> SOURCE = numpy.array([[1,2,3],[3,4,5], [9,10,11], [12,13,14]]) # added an extra value
>>> f = interp1d(SOURCE[:,0], SOURCE[:,1:], kind="cubic", axis=0, bounds_error=False)
>>> f(input)
array([[nan, nan],
[ 3., 4.],
[ 4., 5.],
[ 7., 8.],
[10., 11.],
[nan, nan]])

map int over 2D array python

To parse
s="1,2,3,4_5,6,7,8"
as [[1,2,3,4],[5,6,7,8]]
I am currently using
import numpy as np
a=np.array([list(map(int,r.split(","))) for r in s.split("_")])
Is there a more pythonic or one-shot inbuilt way of doing this or am I on the right track here?
Python newbie.

Using list-comprehensions:
s="1,2,3,4_5,6,7,8"
a = np.array([[int(x) for x in r.split(',')] for r in s.split('_')])

You can use np.genfromtxt:
from io import StringIO
import numpy as np
s="1,2,3,4_5,6,7,8"
np.genfromtxt(StringIO(s.replace("_", "\n")), delimiter=",")
array([[1., 2., 3., 4.],
[5., 6., 7., 8.]])

How this command "preprocessing.scale" do in term of math?

I have read the manual in scikit learn website and i still don't know what is the mathematical formula behind this command.
>>> from sklearn import preprocessing
>>> import numpy as np
>>> X = np.array([[ 1., -1., 2.],
... [ 2., 0., 0.],
... [ 0., 1., -1.]])
>>> X_scaled = preprocessing.scale(X)
>>> X_scaled
array([[ 0. ..., -1.22..., 1.33...],
[ 1.22..., 0. ..., -0.26...],
[-1.22..., 1.22..., -1.06...]])

Center to the mean and component wise scale to unit variance.
This means that mean value along the axis is subtracted from X and the resulting value is divided by std along the axis.

Andrey's formula in the comments is correct - I'd just add that numpy and scikit-learn use the population formula for calculating the standard deviation, not the sample standard deviation, which is the default in other languages like R. So numpy and scikit-learn divide the sum of squares by n, instead of n-1.

How do I convert a numpy array to (and display) an image?

I have created an array thusly:
import numpy as np
data = np.zeros( (512,512,3), dtype=np.uint8)
data[256,256] = [255,0,0]
What I want this to do is display a single red dot in the center of a 512x512 image. (At least to begin with... I think I can figure out the rest from there)

The following should work:
from matplotlib import pyplot as plt
plt.imshow(data, interpolation='nearest')
plt.show()
If you are using Jupyter notebook/lab, use this inline command before importing matplotlib:
%matplotlib inline
A more featureful way is to install ipyml pip install ipympl and use
%matplotlib widget
see an example.

You could use PIL to create (and display) an image:
from PIL import Image
import numpy as np
w, h = 512, 512
data = np.zeros((h, w, 3), dtype=np.uint8)
data[0:256, 0:256] = [255, 0, 0] # red patch in upper left
img = Image.fromarray(data, 'RGB')
img.save('my.png')
img.show()

Note: both these APIs have been first deprecated, then removed.
Shortest path is to use scipy, like this:
# Note: deprecated in v0.19.0 and removed in v1.3.0
from scipy.misc import toimage
toimage(data).show()
This requires PIL or Pillow to be installed as well.
A similar approach also requiring PIL or Pillow but which may invoke a different viewer is:
# Note: deprecated in v1.0.0 and removed in v1.8.0
from scipy.misc import imshow
imshow(data)

How to show images stored in numpy array with example (works in Jupyter notebook)
I know there are simpler answers but this one will give you understanding of how images are actually drawn from a numpy array.
Load example
from sklearn.datasets import load_digits
digits = load_digits()
digits.images.shape #this will give you (1797, 8, 8). 1797 images, each 8 x 8 in size
Display array of one image
digits.images[0]
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])
Create empty 10 x 10 subplots for visualizing 100 images
import matplotlib.pyplot as plt
fig, axes = plt.subplots(10,10, figsize=(8,8))
Plotting 100 images
for i,ax in enumerate(axes.flat):
ax.imshow(digits.images[i])
Result:
What does axes.flat do?
It creates a numpy enumerator so you can iterate over axis in order to draw objects on them.
Example:
import numpy as np
x = np.arange(6).reshape(2,3)
x.flat
for item in (x.flat):
print (item, end=' ')

import numpy as np
from keras.preprocessing.image import array_to_img
img = np.zeros([525,525,3], np.uint8)
b=array_to_img(img)
b

Using pillow's fromarray, for example:
from PIL import Image
from numpy import *
im = array(Image.open('image.jpg'))
Image.fromarray(im).show()

Using pygame, you can open a window, get the surface as an array of pixels, and manipulate as you want from there. You'll need to copy your numpy array into the surface array, however, which will be much slower than doing actual graphics operations on the pygame surfaces themselves.

The Python Imaging Library can display images using Numpy arrays. Take a look at this page for sample code:
Convert Between Numerical Arrays and PIL Image Objects
EDIT: As the note on the bottom of that page says, you should check the latest release notes which make this much simpler:
http://effbot.org/zone/pil-changes-116.htm

Supplement for doing so with matplotlib. I found it handy doing computer vision tasks. Let's say you got data with dtype = int32
from matplotlib import pyplot as plot
import numpy as np
fig = plot.figure()
ax = fig.add_subplot(1, 1, 1)
# make sure your data is in H W C, otherwise you can change it by
# data = data.transpose((_, _, _))
data = np.zeros((512,512,3), dtype=np.int32)
data[256,256] = [255,0,0]
ax.imshow(data.astype(np.uint8))

For example your image is in an array names 'image'
All you do is
plt.imshow(image)
plt.show
This will display an array in the form of an image
Also, dont forget to import PLT

this could be a possible code solution:
from skimage import io
import numpy as np
data=np.random.randn(5,2)
io.imshow(data)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

I don't understand the k-means scipy algorithm - python

It looks like it is preserving the data type that you are giving it (int). Try: features = array([[3., 4.], [3., 5.], [4., 2.], [4., 2.]])

Related

Creating a NumPy array out of another array with shifted indices

Spline interpolation

map int over 2D array python

How this command "preprocessing.scale" do in term of math?

How do I convert a numpy array to (and display) an image?

Categories

Resources