Python, Numpy - Normalize a matrix/array [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This is most likely a dumb question but being a beginner in Python/Numpy I will ask it anyways. I have come across a lot of posts on how to Normalize an array/matrix in numpy. But I am not sure about the WHY. Why/When does an array/matrix need to be normalized in numpy? When is it used?
Normalize can have multiple meanings in difference context. My question belongs to the field of Data Analytics/Data Science. What does Normalization mean in this context? Or more specifically in what situation should I normalize an array?
The second part to this question is - What are the different methods of Normalization and can they be used interchangeably in all situations?
The third and final part - can Normalization be used for Arrays of any dimensions?
Links to any reference material (for beginners) will be appreciated.

Consider trying to cluster objects with two numerical attributes A and B. Both are equally important. Attribute A can range from 0 to 1000 and attribute B can range from 0 to 5.
If you did not normalize A and B you would end up with attribute A completely overpowering attribute B when applying any standard distance metric.

Related

How to perform clustering of points when distance between any two points are given? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a Set of lets say 100 points. And the distance of a point from any other point is given. Which means I have 100x100 dataset giving me distance of each of the 100 points from all the other 100 points. I want to form clusters from this dataset based on the condition that distance between any two points in a cluster should not be greater than x(where x can be for example 25kms.).
I am new to clustering and data science. Please guide me how to solve this problem. What libraries can most efficiently solve this problem. Any help will be appreciated. :)
This can be solved using sklearn's agglomerative clustering by setting the affinity as "precomputed"
Refer this link for the solution.

Is it possible to use time series without seasonal and cyclical patterns to forecast the future? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to analyse a time series(blue one) that looks like this
As you see it's not seasonal I tried to draw the log of this series and it's look like it's not seasonal to ?!
I wonder what's possible to do to forecast the future
log of ts
Your question means "Do I predict variable Y using variable X (X being the time) assuming Y and X are independant". So short answer is no, you can't.
Now, your affirmation that your data is not cyclical seems like jumping to conclusions imo. You might have complex cycles and hidden dependant variables that might explain part of the variance leaving you with more cyclical residuals.
You could maybe try using a periodogram (there are many Python packages for it) to find important parameters, for example the frequency of your signal.
For regular sampled signals, try:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.periodogram.html
For irregular sampled signals on the other hand, I'd suggest:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.lombscargle.html
Hope this helps!

Text clustering/NLP [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Imagine there is a column in dataset representing university. We need to classify the values, i.e. number of groups after classification should be as equal as possible to real number of universities. The problem is that there might be different naming for the same university. An example: University of Stanford = Stanford University = Uni of Stanford. Is there any certain NLP method/function/solution in Python 3?
Let's consider both cases: data might be tagged as well as untagged.
Thanks in advance.
A very simple unsupervised approach would be to use a k-means based approach. The advantage here is that you know exactly how many clusters (k) you expect, since you know the number of universities in advance.
Then you could use a package such as scikit-learn to create your feature vectors (most likely n-grams of characters using a Countvectorizer with the option analyzer=char) and you can use the clustering to group together similarly written universities.
There is no guarantee that the groups will match perfectly, but I think that it should work quite well, as long as the different spellings are somewhat similar.

normalize a matrix along one specific dimension [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a matrix of shape [1000,500], and I would like to normalize the matrix along the second dimension. Is the following implementation right?
def norm(x):
return (x - np.mean(x)) / (np.std(x) + 1e-7)
for row_id in range(datamatrix.shape[0]):
datamatrix[row_id,:] = norm(datamatrix[row_id,:])
Your implementation would indeed normalize along the row-axis (I'm not sure what you mean by second dimension as rows are usually the first dimension of matrices, and numpy starts with dimension 0). You don't need to include the colon as it's implicit that you want all the rows.
Do remember to use the float32 dtype in your datamatrix as opposed to a integer dtype as it doesn't do automatic typecasting.
A more efficient, or clean implementation might be to use sklearn.preprocessing.normalize.
But be aware that you're using standard score normalization which assumes your dataset is normally distributed.

Fitting arbitrary data from simulation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hello dear Python experts:)
From a simulation I got data (course of energy over the time) which I have to fit. When I plot the energy it has a non-periodic oscillating course. There are a bunch of helping function like curve_fit from scipy etc. But you always have to specify a function with which the fit should take place. But I don't know a proper function a priori.
I need something like a Fourier fit to get a function representing the data (like it is possible in MatLab) to later use this function to determine its maxima. Has anyone an idea how to deal with such a problem?
Here is an example course: 2
If you like, you can have a look at the data in a .csv-file: https://1drv.ms/u/s!AuQAmr8-QRJSdzNTzyvWPhUaEnw
I would be very delighted to get some help:-)
Many thanks:-)
Using the Fourier fit in MATLAB you also specify a model (how many sin/cos you want).
For instance "Fourier 2" is:
f(x) = a0 + a1*cos(x*w) + b1*sin(x*w) +
a2*cos(2*x*w) + b2*sin(2*x*w)
Check http://exnumerus.blogspot.nl/2010/04/how-to-fit-sine-wave-example-in-python.html to see how to fit for "Fourier1".
If you really want no model you need to use something like "eureqa", which is free for academic use (http://www.nutonian.com/download/eureqa-desktop-download/).

Categories

Resources