Please consider the following python code
import matplotlib.pyplot as plt
import numpy as np
#create some data to plot.
dt = 0.001
t = np.arange(0.0,100,dt)
r = np.exp(-t[:1000]/0.05)
x = np.random.randn(len(t))
s = np.convolve(x,r)[:len(x)]*dt
The code compiles and runs and I largely understand what it is doing. However, I am confused about the code '[:len(x)]' is actually doing. If I truncate 's' to 'np.convolve(x,r)*dt', the code fails to compile and there is an error message from 'base.py' as follows:
"raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (100000,) and (100999,)"
What is '[:len(x)]' actually doing and is there something in the language documentation that gives some examples of this sort of context ?
Thanks.
All the objects are of type 'ndarray'.
t is length 100000
t is of shape (100000,)
r is length 1000
r is of shape (1000,)
x is length 100000
x is of shape (100000,)
s is length 100999
s is of shape (100999,)
If we read the docs for np.convolve, we see that with the default parameters, it returns an array that is one shorter than the sum of the lengths of the input array. That is if you call np.convolve(a, b), and len(a) = A and len(b) = B, the output is length A + B - 1.
This is because a convolution can be interpreted as integrating the product of two functions, with one of the functions shifted relative to the other. By default, np.convolve calculates this convolution for all points at which these functions overlap, so the length of the output is approximately the sum of the lengths of the input functions. In your case, x has length 100,000, and r has length 1,000, so the output length is 100,000 + 1,000 - 1 = 100,999.
You can change this behaviour with the mode parameter, so that np.convolve truncates the output automatically, but neither of the alternate options seem to match your use case. You could try supplying mode = same, which ensures the output is the same length as the longest input, and see what happens for your own interest though.
Since t - length 100,000 - and s need to be the same length so you can plot (I assume) s(t), you need to truncate the output s to a length of 100,000 to match.
This is what the notation [:len(x)] does. This is called "slice" notation, and the gist is that A[start:stop] allows you to select the subset of values in A from start (inclusive) to stop (exclusive). If you don't supply a start or end, it defaults to the start or end of the array respectively. So [:len(x)] picks from 0 to len(x) (exclusive) which gives you an array of length len(x). This ensures len(s) = len(x).
Related
What is [0] and (1) in: r = np.random.randint((1), 24000, 1)[0]
Entire github code:
https://github.com/Pawandeep-prog/facial-emotion-detection-webapp/blob/main/facial-detection.py#L230
Here np is numpy,
(1) denotes the minimum value of output
np.random.randint((1), 24000, 1)
returns an array, so [0] denotes the first element of array
for more info check https://numpy.org/doc/1.16/reference/generated/numpy.random.randint.html#numpy.random.randint
Deciphering
The numpy.random.randint function returns an array of random integers if called with the optional size parameter (the third parameter). The low bound (first parameter), 1 in the given case, is enclosed in parentheses without any need but giving part of the confusion.
The following code would generate a one-element array of random integers, in other words, just one random integer, in the range [1,24000) (a value which is greater or equal to 1 and less than 24000):
import numpy as np
np.random.randint(1, 24000, 1)
As we know from basic Python, we access array elements by index in square brackets [] whereas the first element is indicated by [0]. In the given case, this is also the only element.
Clarification
To get an integer instead of an array, the numpy.random.randint function provides the option to leave out the size parameter. That's why the code in question should better be rewritten to *
np.random.randint(1, 24000)
*
as kindly commented by user2357112 supports Monica.
I am trying to declare a constraint in Pyomo where one parameter would be a list instead of a scalar and then build the constraint by providing a set of the right dimension but it seems that Pyomo does the Cartesian product of every set and deduces the number of inputs (which are then considered as scalars).
Below a dummy example of what I want to achieve:
model.inputs = RangeSet(0,1)
model.x = model.inputs*model.inputs
model.p = Var()
def constraint_rule(model,x,i):
return x[i] > model.p
model.constraint = Constraint(model.x,model.inputs,rule=constraint_rule)
To be more precise on what I want to achieve. My constraint is of the form:
f(x_0,x_1,i) > p
And I'd like to input x as a vector instead of inputting separately x_0 and x_1 (or more if I have more x_i). So I want to input a list of lists for the first parameter and a iterator as the second parameter which could specify which element of the list I want.
I can of course decompose the list x of lenght n with n scalars x[i] but because I want to keep changing the size of the inputs, I wanted to only change model.x and hope it would automatically scale.
Below the full mathematical problem (I don't have enough reputation to put an image, sorry about that):
Tr(MixiNx) > p
Tr(Nx) = 1
Mi0 + Mi1 = Id
Mi0, Mi1 and Nx SDP
Here the Ms and Ns are 2x2 matrices.
We have 4 N for bitstrings of lenght 2 and 2 matrices M per value of i (then xi is either 0 or 1).
Also x specifies the bitstring (here we have only x_0 and x_1) and i specifies which bit (for instance for i=0, we want the value x_0 ie the first bit of x). But i could be larger than the number of the number of bits in x (for instance we could set that for i=2, we want the value x_0 xor x_1 ie the parity of the bits). So I wanted to encode the 1sr constraint such it receives a bitstring x and a value i which could specify which information I want about that bitstring.
I hope that's clearer.
I'm trying to complete the following function, but I have been running into problems with the indexing, resulting in "ValueError: operands could not be broadcast together with shapes (0,9) (5)".
I think my error might be coming from how I'm trying to call the values from ssd_difference[], but I'm not entirely sure.
Also how would I go about using convolve2d based on the hint given below? I understand numpy has a function for it, but I have no idea what I would need to put in to make it work.
Additional information: binomialFilter5() returns a 5x1 numpy array of dtype float representing a binomial filter. I'm also assuming that the "weights[]" are the ssd_difference[] values.
def transitionDifference(ssd_difference):
""" Compute the transition costs between frames, taking dynamics into
account.
Instructions:
1. Iterate through the rows and columns of ssd difference, ignoring the
first two values and the last two values.
1a. For each value at i, j, multiply the binomial filter of length
five (implemented later in the code) by the weights starting two
frames before until two frames after, and take the sum of those
products.
i.e. Your weights for frame i are:
[weight[i - 2, j - 2],
weight[i - 1, j - 1],
weight[i, j],
weight[i + 1, j + 1],
weight[i + 2, j + 2]]
Multiply that by the binomial filter weights at each i, j to get
your output.
It may take a little bit of understanding to get why we are
computing this, the simple explanation is that to change from
frame 4 to 5, lets call this ch(4, 5), and we make this weight:
ch(4, 5) = ch(2, 3) + ch(3, 4) + ch(4, 5) + ch(5, 6) + ch(6, 7)
This accounts for the weights in previous changes and future
changes when considering the current frame.
Of course, we weigh all these sums by the binomial filter, so
that the weight ch(4, 5) is still the most important one, but
hopefully that gives you a better understanding.
Args:
ssd_difference (numpy.ndarray): A difference matrix as produced by your
ssd function.
Returns:
output (numpy.ndarray): A difference matrix that takes preceding and
following frames into account. The output
difference matrix should have the same dtype as
the input, but be 4 rows and columns smaller,
corresponding to only the frames that have valid
dynamics.
Hint: There is an efficient way to do this with 2d convolution. Think about
the coordinates you are using as you consider the preceding and
following frame pairings.
"""
output = np.zeros((ssd_difference.shape[0] - 4,
ssd_difference.shape[1] - 4), dtype=ssd_difference.dtype)
# WRITE YOUR CODE HERE.
for i in range(len(ssd_difference)):
for j in range(len(ssd_difference)):
if i == 0:
if j > 1:
output[i,j] = np.sum( ssd_difference[i-2:i+2]*binomialFilter5())
elif i == ssd_difference.shape[0] - 1:
if j < ssd_difference.shape[1] - 2:
output[i,j] = np.sum( ssd_difference[i-2:i+2]*binomialFilter5())
else:
output[i,j] = np.sum( ssd_difference[i-2:i+2]*binomialFilter5())
# END OF FUNCTION.
return output
As I commented, you really should tell us the line that produced the error message.
But I can guess, since there are just a couple of lines that do an operation that involves broadcasting. Most likely it is:
output[i,j] = np.sum( ssd_difference[i-2:i+2]*binomialFilter5())
You write that binomialFilter5() produces a (5,1) array, but the error talks about a (5,). It probably doesn't matter here, but you really should keep the number of dimensions straight. Sometimes (5,1) is signficantly different from (5,).
output has shape (ssd_difference.shape[0] - 4, ssd_difference.shape[1] - 4). But you are iterating i,j both over range(len(ssd_difference)). output[i,j] will eventually result in an index error. Especially when iterating over a 2d array, it is better to use the correct shape element, rather than len().
But I suspect the immediate error results from ssd_difference[i-2:i+2]. When i==0, this is ssd_difference[-2:2]. This is producing the (0,9) array, since the -2 index means second from the last, which is larger than 2.
I think you are intending to pull 5 rows from this array, to match the 5 values in the other array. A correct iteration, would I think be:
for i in range(output.shape[0]):
for j in range(output.shape[1]):
....
output[i,j] = np.sum(ssd_difference[i:i+5, :] * binomialFilter5())
...
You should test expressions like that individually in an interactive shell, with selected values of i. ssd_difference[i:i+5, :] should have shape (5,9), and binomialFilter5() should be (5,1).
Why does the following code return a ValueError?
from scipy.optimize import fsolve
import numpy as np
def f(p,a=0):
x,y = p
return (np.dot(x,y)-a,np.outer(x,y)-np.ones((3,3)),x+y-np.array([1,2,3]))
x,y = fsolve(f,(np.ones(3),np.ones(3)),9)
ValueError: setting an array element with a sequence.
The basic problem here is that your function f does not satisfy the criteria required for fsolve to work. These criteria are described in the documentation - although arguably not very clearly.
The particular things that you need to be aware of are:
the input to the function that will be solved for must be an n-dimensional vector (referred to in the docs as ndarray), such that the value of x you want is the solution to f(x, *args) = 0.
the output of f must be the same shape as the x input to f.
Currently, your function takes a 2 member tuple of 1x3-arrays (in p) and a fixed scalar offset (in a). It returns a 3 member tuple of types (scalar,3x3 array, 1x3 array)
As you can see, neither condition 1 nor 2 is met.
It is hard to advise you on exactly how to fix this without being exactly sure of the equation you are trying to solve. It seems you are trying to solve some particular equation f(x,y,a) = 0 for x and y with x0 = (1,1,1) and y0 = (1,1,1) and a = 9 as a fixed value. You might be able to do this by passing in x and y concatenated (e.g. pass in p0 = (1,1,1,1,1,1) and in the function use x=p[:3] and y = p[3:] but then you must modify your function to output x and y concatenated into a 6-dimensional vector similarly. This depends on the exact function your are solving for and I can't work this out from the output of your existing f (i.e based on a dot product, outer product and sum based tuple).
Note that arguments that you don't pass in the vector (e.g. a in your case) will be treated as fixed values and won't be varied as part of the optimisation or returned as part of any solution.
Note for those who like the full story...
As the docs say:
fsolve is a wrapper around MINPACK’s hybrd and hybrj algorithms.
If we look at the MINPACK hybrd documentation, the conditions for the input and output vectors are more clearly stated. See the relevant bits below (I've cut some stuff out for clarity - indicated with ... - and added the comment to show that the input and output must be the same shape - indicated with <--)
1 Purpose.
The purpose of HYBRD is to find a zero of a system of N non-
linear functions in N variables by a modification of the Powell
hybrid method. The user must provide a subroutine which calcu-
lates the functions. The Jacobian is then calculated by a for-
ward-difference approximation.
2 Subroutine and type statements.
SUBROUTINE HYBRD(FCN,N,X, ...
...
FCN is the name of the user-supplied subroutine which calculates
the functions. FCN must be declared in an EXTERNAL statement
in the user calling program, and should be written as follows.
SUBROUTINE FCN(N,X,FVEC,IFLAG)
INTEGER N,IFLAG
DOUBLE PRECISION X(N),FVEC(N) <-- input X is an array length N, so is output FVEC
----------
CALCULATE THE FUNCTIONS AT X AND
RETURN THIS VECTOR IN FVEC.
----------
RETURN
END
N is a positive integer input variable set to the number of
functions and variables.
X is an array of length N. On input X must contain an initial
estimate of the solution vector. On output X contains the
final estimate of the solution vector.
I have a gaussian_kde.resample array. I don't know if it is a numpy array so that I can use numpy functions.
I had the data 0<x<=0.5 of 3000 variables and I used
kde = scipy.stats.gaussian_kde(x) # can also mention bandwidth here (x,bandwidth)
sample = kde.resample(100000) # returns 100,000 values that follow the prob distribution of "x"
This gave me a sample of data that follows the probability distribution of "x". But the problem is, no matter what bandwidth I try to select, I get very few negative values in my "sample". I only want values within the range 0 < sample <= 0.5
I tried to do:
sample = np.array(sample) # to convert this to a numpy array
keep = 0<sample<=0.5
sample = sample[keep] # using the binary conditions
But this does not work! How can I remove the negative values in my array?
Firstly, you can check what type it is by using the 'type' call within python:
x = kde.resample(10000)
type(x)
numpy.ndarray
Secondly, it should be working in the way you wrote, but I would be more explicit in your binary condition:
print x
array([[ 1.42935658, 4.79293343, 4.2725778 , ..., 2.35775067, 1.69647609]])
x.size
10000
y = x[(x>1.5) & (x<4)]
which you can see, does the correct binary conditions and removes the values >1.5 and <4:
print y
array([ 2.95451084, 2.62400183, 2.79426449, ..., 2.35775067, 1.69647609])
y.size
5676
I know I'm answering about 3 years late, but this may be useful for future reference.
The catch is that while kde.resample(100000) technically returns a NumPy array, this array actually contains another array(!), and that gets in the way of all the attempts to use indexing to get subsets of the sample. To get the array that the resample() method probably should have returned all along, do this instead:
sample = kde.resample(100000)[0]
The array variable sample should then have all 100000 samples, and indexing this array should work as expected.
Why SciPy does it this way, I don't know. This misfeature doesn't even appear to be documented.
First of all, the return value of kde.resample is a numpy array, so you do not need to reconvert it.
The problem lies in the line (Edit: No, it doesn't. This should work!)
keep = 0 < sample <= 0.5
It does not do what you would think. Try:
keep = (0 < sample) * (sample <= 0.5)