vtk/python VTKFloatArray setters for tuple and component don't commute - python

Problem
The following code snippet configures a python/VTKFloatArray for future filling with vertices coordinates of the unit cube. 8 tuples of 3 components (x,y,z) are set. It seems that the setters do not commute. Is this the expected behaviour ? The number of components seems to have to be set first. Could anyone reproduce this ? Thanks for your answer
import vtk
import numpy as np
from itertools import product as itprod
vertices = np.array(list(itprod([0, 1], repeat=3)))
print vertices.shape[0] //8 vertices
print vertices.shape[1] //3 coordinates x-y-z
array = vtk.vtkFloatArray()
array.SetNumberOfComponents(vertices.shape[1])
array.SetNumberOfTuples(vertices.shape[0])
print array // number of tuples is 8, number of components is 3 OK
array = vtk.vtkFloatArray()
array.SetNumberOfTuples(vertices.shape[0])
array.SetNumberOfComponents(vertices.shape[1])
print array // number of tuples is 2 number of components is 3 WRONG

VTK is always a fickle thing, especially when it comes to documentation. I found some information on SetNumberOfTuples and SetNumberOfComponents at the corresponding links.
The former (SetNumberOfTuples):
Set the number of tuples (a component group) in the array.
Note that this may allocate space depending on the number of components. [...]
The latter (SetNumberOfComponents):
Set/Get the dimension (n) of the components.
Must be >= 1. Make sure that this is set before allocation.
As I see it, the former may allocate space, but the latter has to be called before any allocation. So, they indeed do not commute, you need to call the latter first, and that should be the working order (which is in line with your results).
The links obviously don't correspond to the python implementation, but the fact that the C++ version is not supposed to commute, you should not expect commutativity in python either.

Related

Python: Computation in for loop doesn't match result of manual computation

I'm currently working on a project researching properties of some gas mixtures. Testing my code with different inputs, I came upon a bug(?) which I fail to be able to explain. Basically, it's concerning a computation on a numpy array in a for loop. When it computed the for-loop, it yields a different (and wrong) result as opposed to the manual construction of the result, using the same exact code snippets as in the for-loop, but indexing manually. I have no clue, why it is happening and whether it is my own mistake, or a bug within numpy.
It's super weird, that certain instances of the desired input objects run through the whole for loop without any problem, while others run perfectly up to a certain index and others fail to even compute the very first loop.
For instance, one input always stopped at index 16, throwing a:
ValueError: could not broadcast input array from shape (25,) into shape (32,)
Upon further investigation I could confirm, that the previous 15 loops threw the correct results, the results in loop of index 16 were wrong and not even of the correct size. When running loop 16 manually through the console, no errors occured...
The lower array shows the results for index 16, when it's running in the loop.
These are the results for index 16, when running the code in the for loop manually in the console. These are, what one would expect to get.
The important part of the code is really only the np.multiply() in the for loop - I left the rest of it for context but am pretty sure it shouldn't interfere with my intentions.
def thermic_dissociation(input_gas, pressure):
# Copy of the input_gas object, which may not be altered out of scope
gas = copy.copy(input_gas)
# Temperature range
T = np.logspace(2.473, 4.4, 1000)
# Matrix containing the data over the whole range of interest
moles = np.zeros((gas.gas_cantera.n_species, len(T)))
# Array containing other property of interest
sum_particles = np.zeros(len(T))
# The troublesome for-loop:
for index in range(len(T)):
print(str(index) + ' start')
# Set temperature and pressure of the gas
gas.gas_cantera.TP = T[index], pressure
# Set gas mixture to a state of chemical equilibrium
gas.gas_cantera.equilibrate('TP')
# Sum of particles = Molar Density * Avogadro constant for every temperature
sum_particles[index] = gas.gas_cantera.density_mole * ct.avogadro
#This multiplication is doing the weird stuff, printed it to see what's computed before it puts it into the result matrix and throwing the error
print(np.multiply(list(gas.gas_cantera.mole_fraction_dict().values()), sum_particles[index]))
# This is where the error is thrown, as the resulting array is of smaller size, than it should be and thus resulting in the error
moles[:, index] = np.multiply(list(gas.gas_cantera.mole_fraction_dict().values()), sum_particles[index])
print(str(index) + ' end')
# An array helping to handle the results
molecule_order = list(gas.gas_cantera.mole_fraction_dict().keys())
return [moles, sum_particles, T, molecule_order]
Help will be very appreciated!
If you want the array of all species mole fractions, you should use the X property of the cantera.Solution object, which always returns that full array directly. You can see the documentation for that method: cantera.Solution.X`.
The mole_fraction_dict method is specifically meant for cases where you want to refer to the species by name, rather than their order in the Solution object, such as when relating two different Solution objects that define different sets of species.
This particular issue is not related to numpy. The call to mole_fraction_dict returns a standard python dictionary. The number of elements in the dictionary depends on the optional threshold argument, which has a default value of 0.0.
The source code of Cantera can be inspected to see what happens exactly.
mole_fraction_dict
getMoleFractionsByName
In other words, a value ends up in the dictionary if x > threshold. Maybe it would make more sense if >= was used here instead of >. And maybe this would have prevented the unexpected outcome in your case.
As confirmed in the comments, you can use mole_fraction_dict(threshold=-np.inf) to get all of the desired values in the dictionary. Or -float('inf') can also be used.
In your code you proceed to call .values() on the dictionary but this would be problematic if the order of the values is not guaranteed. I'm not sure if this is the case. It might be better to make the order explicit by retrieving values out of the dict using their key.

(scipy.stats.qmc) How to do multiple randomized Quasi Monte Carlo

I want to generate many randomized realizations of a low discrepancy sequence thanks to scipy.stat.qmc. I only know this way, which directly provide a randomized sequence:
from scipy.stats import qmc
ld = qmc.Sobol(d=2, scramble=True)
r = ld.random_base2(m=10)
But if I run
r = ld_deterministic.random_base2(m=10)
twice I get
The balance properties of Sobol' points require n to be a power of 2. 2048 points have been previously generated, then: n=2048+2**10=3072. If you still want to do this, the function 'Sobol.random()' can be used.
It seems like using Sobol.random() is discouraged from the doc.
What I would like (and it should be faster) is to first get
ld = qmc.Sobol(d=2, scramble=False)
then to generate like a 1000 scrambling (or other randomization method) from this initial series.
It avoids having to regenerate the Sobol sequence for each sample and just do scrambling.
How to that?
It seems to me like it is the proper way to do many Randomized QMC, but I might be wrong and there might be other ways.
As the warning suggests, Sobol' is a sequence meaning that there is a link between with the previous samples. You have to respect the properties of 2^m. It's perfectly fine to use Sobol.random() if you understand how to use it, this is why we created Sobol.random_base2() which prints a warning if you try to do something that would break the properties of the sequence. Remember that with Sobol' you cannot skip 10 points and then sample 5 or do arbitrary things like that. If you do that, you will not get the convergence rate guaranteed by Sobol'.
In your case, what you want to do is to reset the sequence between the draws (Sobol.reset). A new draw will be different from the previous one if scramble=True. Another way (using a non scrambled sequence for instance) is to sample 2^k and skip the first 2^(k-1) points then you can sample 2^n with n<k-1.

Multiplying a numpy array within Psychopy

TL;DR: Can I multiply a numpy.average by 2? If yes, how?
For an orientation discrimination experiment, during which people respond on how well they're able to discriminate the angle between an visible grating and non-visible reference grating, I want to calculate the Just Noticeable Difference (JND).
At the end of the code I have this:
#write JND to logfile (average of last 10 reversals)
if len(staircase[stairnum].reversalIntensities) < 10:
dataFile.write('JND = %.3f\n' % numpy.average(staircase[stairnum].reversalIntensities))
else:
dataFile.write('JND = %.3f\n' % numpy.average(staircase[stairnum].reversalIntensities[-10:]))
This is where the JND is written to the file, and I thought it'd be easy to multiply that "numpy.average" line by 2, which doesn't work. I thought of making two different variables that contained the same array, and using numpy.sum to add them together.
#Possible solution
x=numpy.average(staircase[stairnum].reversalIntensities[-10:]))
y=numpy.average(staircase[stairnum].reversalIntensities[-10:]))
numpy.sum(x,y, [et cetera])
I am sure the procedure is very simple, but my current capabilities of programming are limited and the psychopy and python reference materials did not provide what I was looking for (if there is, please share!).

Is there a good way to keep track of large numbers of symbols in scipy?

I need to do symbolic manipulations to very large systems of equations, and end up with well over 200 variables that I need to do computations with. The problem is, one would usually name their variables x, y, possibly z when solving small system of equations. Even starting at a, b, ... you only get 26 unique variables this way.
Is there a nice way of fixing this problem? Say for instance I wanted to fill up a 14x14 matrix with a different variable in each spot. How would I go about doing this?
You could use symbolic matrices via MatrixSymbol
>>> A = MatrixSymbol('A', 14, 14)
This can be accessed as you would expect
>>> A[2, 3]
A[2, 3]
I think the most straightfoward way to do this is to use sympy.symarray, like so:
x = sympy.symarray("x",(5,5,5))
This creates an accordingly sized (numpy) array - here the size is 5x5x5 - that contains sympy variables, more specifically these variables are prefixed with whatever you chose - here "x"- and have as many indices as you provided dimensions, here 3. Of course you can make as many of these arrays as you need - perhaps it makes sense to use different prefixes for different groups of variables for readability etc.
You can then use these in your code by using e.g. x[i,j,k]:
In [6]: x[0,1,4]
Out[6]: x_0_1_4
(note that you can not access the elements via x_i_j_k - I found this a bit counterintuitive when I started using sympy, but once you get the hang on python vs. sympy variables, it makes perfect sense.)
You can of course also use slicing on the array, e.g. x[:,0,0].
If you need a python list of your variables, you can use e.g. x.flatten().tolist().
This is in my opinion preferable to using sympy.MatrixSymbol because (a) you get to decide the number of indices you want, and (b) the elements are "normal" sympy.Symbols, meaning you can be sure you can do anything with them you could also do with them if you declared them as regular symbols.
(I'm not sure this is still the case in sympy 1.1, but in sympy 1.0 it used to be that not all functionality was implemented for MatrixElement.)
I'd recommend the package numpy so you can use NumPy arrays.
# import statement
import numpy as np
# instantiate a NumPy array (matrix) with 14 rows and 14 columns
variableMatrix = np.zeros((14,14))
Note ```np.zeros((14,14))`` will fill the matrix with 0s, and you can replace each element with your desired variable value later. Notice that the extra pair of parentheses in the function call is necessary!
You can access the i,jth element of the matrix using the syntax variableMatrix[i-1,j-1]. I subtracted one from the index since Python indexing starts at 0 of course.

Can someone explain how arrays and scalars are handled in a Python code snippet

I have this code snippet I am trying to understand that is in python. I don't understand how scalars operate on arrays in all cases. In most code I read it makes sense that operations work on each value of an array.
sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples)
var_norm = sqrt(sig_sq_samples/kN)
mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples)
I want to know how each line is functioning. The reason being is that I don't have a linux machine setup with the library and thought someone may be able to help me understand this python code I have found in an article. I can not setup the environment in a reasonable amount of time.
invgamma.rvs() - returns an array of numeric values
beta - is a scalar value
sig_sq_samples (I'm assuming)- is an array of beta * each array value of
what invgamma.rvs() function returns.
var_norm - I have no idea what this value is supposed to be because
the norm.rvs function underneath takes a scalar (scale=var_norm).
In short how is sqrt(siq_sq_samples/kn) with kN also a scalar returning back a scalar? What is happening here? This one line is what is getting me. Like I said earlier sig_sq_samples is an array. I hope I'm not wrong about that line that is producing sig_sq_samples. At one point or another the values being worked on are scalars. I am from c# where hard types are used and I have worked with scripting languages such as PERL where I had a lot of experience with what "shortcut" operations do. Ex. C# does not allow you to multiply a scalar to an array. I tried to look up how scalars work with arrays but it doesn't clarify this code to me. Anyone answering is more than welcome to look up the functions above in case I am wrong about anything. I put a lot of effort and I have many years of development experience. Either this code snippet is wrong or I'm just not seeing something real obvious.
In the line
mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples)
n_samples has the same size as var_norm, so what is happening is that for the ith sample of n_samples, it generates it using the ith scale parameter of var_norm, var_norm[i]
Internal to the code is
vals = vals * scale + loc, when scale is an array it uses broadcasting which is a common feature of numpy. norm.rvs already generated an array of n_samples random values. When multiplied by scale, it does an element-wise multiplication between each array. The result is that the left hand side will also be an array value. For more information see here
sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples)
if
invgamma.rvs() - returns an array of numeric values
beta - is a scalar value
then
sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples)
produces another array of the same size. Scalar beta just multiplies each element.
In
var_norm = sqrt(sig_sq_samples/kN)
kN is scalar doing the same thing - dividing each element. I assume sqrt is numpy.sqrt, which takes the sqrt of each element. So var_norm is again an array of the original size (of invgammas.rvs()).
mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples)
I don't know what norm.rvs does, or where it is from. It's not numpy, but could be a package in scipy. I'd have to google it. It takes one postional argument, here mean_norm, and two (at least) keyword values. n_samples is probably a number, eg. 100. But scale could certainly take an array, such as var_norm.
======================
http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.rv_continuous.rvs.html#scipy.stats.rv_continuous.rvs
appears to be the documentation for the rvs method (norm is a subclass of rv_continuous).
Arguments are:
arg1, arg2, arg3,... : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
scale : array_like, optional
Scale parameter (default=1).
size : int or tuple of ints, optional
Defining number of random variates (default is 1).
and the result is
rvs : ndarray or scalar
Random variates of given size.
I'm guessing invgamma.rvs is the similar method for a different subclass. alpha must be the shape argument for the first, and norm_mean the shape for the 2nd.

Categories

Resources