I have a question on the difference between just using max(list array) and np.max(list array).
Is the only difference here the time it takes for Python to return the code?
They may differ in edge cases, such as a list containing NaNs.
import numpy as np
a = max([2, 4, np.nan]) # 4
b = np.max([2, 4, np.nan]) # nan
NumPy propagates NaN in such cases, while the behavior of Python's max is less certain.
There are also subtle issues regarding data types:
a = max([10**n for n in range(20)]) # a is an integer
b = np.max([10**n for n in range(20)]) # b is a float
And of course running time differences documented in numpy.max or max ? Which one is faster?
Generally, one should use max for Python lists and np.max for NumPy arrays to minimize the number of surprises. For instance, my second example is not really about np.max but about the data type conversion: to use np.max the list is first converted to a NumPy array, but elements like 10**19 are too large to be represented by NumPy integer types so they become floats.
Related
Im trying to write a code with numpy where it outputs the maximum value between indexes. I think using argmax could be usable. However I do not know how I can use slices without using a for loop in python. If there is a pandas function for this it could be useable too. I want to make the computation as fast as possible.
list_ = np.array([9887.89, 9902.99, 9902.99, 9910.23, 9920.79, 9911.34, 9920.01, 9927.51, 9932.3, 9932.33, 9928.87, 9929.22, 9929.22, 9935.24, 9935.24, 9935.26, 9935.26, 9935.68, 9935.68, 9940.5])
indexes = np.array([0, 5, 10, 19])
Expected result:
Max number between index(0 - 5): 9920.79 at index 5
Max number between index(5 - 10): 9932.33 at index 10
Max number between index(10 - 19): 9940.5 at index 19
You can use reduceat directly yo your array without the need to splice/split it:
np.maximum.reduceat(list_,indexes[:-1])
output:
array([9932.33, 9929.22, 9940.5 ])
Assuming that the first (zero) index and the last index is specified in the indexes array,
import numpy as np
list_ = np.array([9887.89, 9902.99, 9902.99, 9910.23, 9920.79, 9911.34, 9920.01, 9927.51, 9932.3, 9932.33, 9928.87, 9929.22, 9929.22, 9935.24, 9935.24, 9935.26, 9935.26, 9935.68, 9935.68, 9940.5])
indexes = np.array([0, 5, 10, 19])
chunks = np.split(list_, indexes[1:-1])
print([c.max() for c in chunks])
max_ind = [c.argmax() for c in chunks]
print(max_ind + indexes[:-1])
It's not necessary that each chunk will have the same size with an arbitrary specification of indices. So The vectorization benefits of numpy is going to be lost in there one way or another (Since you can't have a numpy array where each element is of a different size in memory which also has all the benefits of vectorization).
At least one for loop is going to be necessary, I think. However, you can use split, to make the splitting a numpy-optimized operation.
I have two numpy int matrices a and b, if I create diff = a-b than something weird happens... I have huge values that weren't present in any of the 2 matrices.
In the picture you can see that the max value of a and b is 52, there are no nan values, but the value diff[0][8] (and many other but not all) is 4294967295.
Screen shots of the results
Any guess?
Can anyone explain why the second method of computing log change yields a numpy array, discarding the index instead of DataFrame? If I specify DataFrame I get one with integer based index. The first method works as desired. Thanks for any insight.
import numpy as np
import pandas as pd
import pandas_datareader as pdr
aapl = pdr.get_data_yahoo('AAPL')
close = pd.DataFrame(aapl['Close'])
change = np.log(close['Close'] / close['Close'].shift(1))
another_change = np.diff(np.log(close['Close']))
I can't find documentation to back this up, but it seems that the type returned is being converted to ndarray when there's a reduction in dimension from the Series input. This happens with diff but not with log.
Taking the simple example:
x = pd.Series(range(5))
change = np.log(x / x.shift(1)) # Series of float64 of length 5
another_change = np.diff(np.log(x)) # array of float64 of length 4
We can observe that x / x.shift(1) is still a 5-element Series (even though elements 0 and 1 are NaN and inf) So np.log, which doesn't reduce dimension, will still return a 5-element something, which matches the dimensionality of x.
However, np.diff does reduce dimension -- it is supposed to return (according to doc)
diff : ndarray
The n-th differences. The shape of the output is the same as a except along axis where the dimension is smaller by n. [...]
The next sentence appears in the above doc for numpy 1.13 but not 1.12 and earlier:
[...] The type of the output is the same as that of the input.
So the type of the output is still an array-like structure, but because of the dimension being reduced, perhaps it doesn't get re-converted to a Series (the array-like input). At least in versions 1.12 and earlier.
That's my best guess.
I would like to combine an array full of floats with an array full of strings. Is there a way to do this?
(I am also having trouble rounding my floats, insert is changing them to scientific notation; I am unable to reproduce this with a small example)
A=np.array([[1/3,257/35],[3,4],[5,6]],dtype=float)
B=np.array([7,8,9],dtype=float)
C=np.insert(A,A.shape[1],B,axis=1)
print(np.arround(B,decimals=2))
D=np.array(['name1','name2','name3'])
How do I append D onto the end of C in the same way that I appended B onto A (insert D as the last column of C)?
I suspect that there is a type issue between having strings and floats in the same array. It would also answer my questions if there were a way to change a float (or maybe a scientific number, my numbers are displayed as '5.02512563e-02') to a string with about 4 digits (.0502).
I believe concatenate will not work, because the array dimensions are (3,3) and (,3). D is a 1-D array, D.T is no different than D. Also, when I plug this in I get "ValueError: all the input arrays must have same number of dimensions."
I don't care about accuracy loss due to appending, as this is the last step before I print.
Use dtype=object in your numpy array; like bellow:
np.array([1, 'a'], dtype=object)
Try making D a numpy array first, then transposing and concatenating with C:
D=np.array([['name1','name2','name3']])
np.concatenate((C, D.T), axis=1)
See the documentation for concatenate for explanation and examples:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html
numpy arrays support only one type of data in the array. Changing the float to str is not a good idea as it will only result in values very close to the original value.
Try using pandas, it support multiple data types in single column.
import numpy as np
import pandas as pd
np_ar1 = np.array([1.3, 1.4, 1.5])
np_ar2 = np.array(['name1', 'name2', 'name3'])
df1 = pd.DataFrame({'ar1':np_ar1})
df2 = pd.DataFrame({'ar2':np_ar2})
pd.concat([df1.ar1, df2.ar2], axis=0)
I believed this a simple question and looked for relative topics but I didn't find the right thing. Here is the problem:
I have two NumPy arrays for which I need to make statistic analysis by calculating some criterions, for exemple the correlation coefficient and the Nash criterion (for who are familiar with Nash). Since in the first array are observation data (the second is simulation results), I have some NaNs. I would like my programme to calculate the criterions in ignoring the value couples where the value in the first array is NaN.
I tried the mask method. It worked well if I need only to deal with the first array (for calculation its average for exemple), but didn't work for comparisons of the two arrays value by value.
Could anyone give some help? Thanks!
Just answered a similar question Numpy only on finite entries. You can replace the NaN values in you array with Numpy's isnan function, which is a common way to deal with NaN values.
import numpy as np
replace_NaN = np.isnan(array_name)
array_name[replace_NaN] = 0