How to match elements of two series by np.array? - python

I have two series namely train_1 and train_2,
import numpy as np
mean = 0
std = 1
num_samples = 4
train_1 = numpy.random.normal(mean, std, size=num_samples)
train_2 = numpy.random.normal(mean, std, size=num_samples)
I am entering this command:
X = np.array(train_1,train_2, dtype=float)
and taking this output:
array([[ 0.82561222, 0.95885746, 0.40454621, 1.37793967],
[ 0.93473674, -1.51716492, -0.56732792, 1.03333013]])
But, I would like these different series to match in ordered manner such this:
Y = np.array(([3,5], [5,1], [10,2], [6,1.5]), dtype=float)
Y
array([[ 3. , 5. ],
[ 5. , 1. ],
[ 10. , 2. ],
[ 6. , 1.5]])

I might be misunderstanding your question, but is this not simply the transpose?
X = np.array(train_1,train_2, dtype=float).T
Note the .T at the end. In this case X will have two columns, the first will be train_1, the second will be train_2.

Related

Parsing an irregular .dat file in Python

I have a .dat file of coordinates (x,y and z), separated by a marker (an integer). Here's a snippet of it:
500
0.14166 0.09077 0
0.11918 0.08461 0
0.09838 0.07771 0
0.07937 0.07022 0
0.06223 0.06222 0
0.04705 0.05386 0
0.03388 0.04528 0
0.02281 0.03663 0
0.01391 0.02808 0
42
0.00733 0.01969 0
0.00297 0.01152 0
0.01809 -0.01422 0
0.03068 -0.01687 0
0.14166 0.09077 0
0.11918 0.08461 0
0.09838 0.07771 0
0.07937 0.07022 0
42
0.14166 0.09077 0
0.11918 0.08461 0
0.09838 0.07771 0
0.07937 0.07022 0
What's the best way to separate it in chunks (preferably, one array per interval between markers)?
It's just a fraction of the data, in reality there are a few thousand points.
I would suggest to apply the power of pandas and numpy libraries.
We start with loading the input file into dataframe with skipping the 1st row (skiprows=1) and explicitly specifying the number of columns via column names (names=['x','y','z']) meaning that marker lines will be treated as 1-column row with NaN values (like 42.00000 NaN NaN):
import pandas as pd
import numpy as np
coords = pd.read_table('test.dat', delim_whitespace=True, header=None,
engine='python', skiprows=1, names=['x','y','z'])
Then finding the positions of marker lines on which the coords dataframe will be splitted into chunks:
na_markers = coords.loc[coords['y'].isna()].index
Finally splitting and getting the needed numpy arrays:
coords = [chunk.dropna().to_numpy() for chunk in np.split(coords, na_markers)]
That's it, now coords contains a list of the needed coordinates "chunks":
[array([[0.14166, 0.09077, 0. ],
[0.11918, 0.08461, 0. ],
[0.09838, 0.07771, 0. ],
[0.07937, 0.07022, 0. ],
[0.06223, 0.06222, 0. ],
[0.04705, 0.05386, 0. ],
[0.03388, 0.04528, 0. ],
[0.02281, 0.03663, 0. ],
[0.01391, 0.02808, 0. ]]), array([[ 0.00733, 0.01969, 0. ],
[ 0.00297, 0.01152, 0. ],
[ 0.01809, -0.01422, 0. ],
[ 0.03068, -0.01687, 0. ],
[ 0.14166, 0.09077, 0. ],
[ 0.11918, 0.08461, 0. ],
[ 0.09838, 0.07771, 0. ],
[ 0.07937, 0.07022, 0. ]]), array([[0.14166, 0.09077, 0. ],
[0.11918, 0.08461, 0. ],
[0.09838, 0.07771, 0. ],
[0.07937, 0.07022, 0. ]])]

Convert cumsum() output to binary array in xarray

I have a 3D x-array that computes the cumulative sum for specific time periods and I'd like to detect which time periods meet a certain condition (and set to 1) and those which do not meet this condition (set to zero). I'll explain using the code below:
import pandas as pd
import xarray as xr
import numpy as np
# Create demo x-array
data = np.random.rand(20, 5, 5)
times = pd.date_range('2000-01-01', periods=20)
lats = np.arange(10, 0, -2)
lons = np.arange(0, 10, 2)
data = xr.DataArray(data, coords=[times, lats, lons], dims=['time', 'lat', 'lon'])
data.values[6:12] = 0 # Ensure some values are set to zero so that the cumsum can reset between valid time steps
data.values[18:] = 0
# This creates an xarray whereby the cumsum is calculated but resets each time a zero value is found
cumulative = data.cumsum(dim='time')-data.cumsum(dim='time').where(data.values == 0).ffill(dim='time').fillna(0)
print(cumulative[:,0,0])
>>> <xarray.DataArray (time: 20)>
array([0.13395 , 0.961934, 1.025337, 1.252985, 1.358501, 1.425393, 0. ,
0. , 0. , 0. , 0. , 0. , 0.366988, 0.896463,
1.728956, 2.000537, 2.316263, 2.922798, 0. , 0. ])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-20
lat int64 10
lon int64 0
The print statement shows that the cumulative sum resets each time a zero is encountered on the time dimension. I need a solution to identify, which of the two periods exceeds a value of 2 and convert to a binary array to confirm where the conditions are met.
So my expected output would be (for this specific example):
<xarray.DataArray (time: 20)>
array([0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 1. , 1. ,
1. , 1. , 1. , 1. , 0. , 0. ])
Solved this using some masking and the backfill functionality:
# make something to put results in
out = xr.full_like(cumulative, fill_value=0.0)
# find the points which have met the criteria
out.values[cumulative.values > 3] = 1
# fill the other valid sections over 0, with nans so we can fill them
out.values[(cumulative.values>0) & (cumulative.values<3)] = np.nan
# backfill it, so the ones that have not reached 2 are filled with 0
# and the ones that have are filled with 1
out_ds = out.bfill(dim='time').fillna(1)
print ('Cumulative array:')
print (cumulative.values[:,0,0])
print (' ')
print ('Binary array')
print (out_ds.values[:,0,0])

I want to take data from CSV file and to put into Two different arrays variables

I have this code in python. Importing CSV file.
Location = r'C:\assign\lrdataset.csv'
df = pd.read_csv(Location, names=['Xi','Yi'])
print(df.columns)
Print shows this:
Xi Yi
0 -2.552990 -218.408328
1 1.764052 155.118872
2 -1.791328 -128.884326
3 -1.214077 -91.571734
4 -1.444940 -122.267726
5 0.195070 12.248124
6 1.480515 135.444007
........
But i want to take this two input and output values into two different variables, in array forms. Like this.
X=np.array([[-2.552990],[1.764052],[-1.791328],[-1.214077]])
Y=np.array([[-218.408328],[155.118872],[-128.884326],[-91.571734]])
is that what you want:
In [43]: X = df.Xi.reshape((len(df), 1))
In [44]: X
Out[44]:
array([[-2.55299 ],
[ 1.764052],
[-1.791328],
[-1.214077],
[-1.44494 ],
[ 0.19507 ],
[ 1.480515]])
if you want to round your values:
In [62]: df.Xi.round(5).reshape(len(df), 1)
Out[62]:
array([[-2.55299],
[ 1.76405],
[-1.79133],
[-1.21408],
[-1.44494],
[ 0.19507],
[ 1.48052]])
You can try reshape with shape:
print df.shape
(7, 2)
X = df.Xi.reshape((df.shape[0],1))
print X
[[-2.55299 ]
[ 1.764052]
[-1.791328]
[-1.214077]
[-1.44494 ]
[ 0.19507 ]
[ 1.480515]]
Y = df.Yi.reshape((df.shape[0],1))
print Y
[[-218.408328]
[ 155.118872]
[-128.884326]
[ -91.571734]
[-122.267726]
[ 12.248124]
[ 135.444007]]

Fastest way to compute upper-triangular matrix of geometric series (Python)

and thanks in advance for the help.
Using Python (mostly numpy), I am trying to compute an upper-triangular matrix where each row "j" is the first j-terms of a geometric series, all rows using the same parameter.
For example, if my parameter is B (where abs(B)=<1, i.e. B in [-1,1]), then row 1 would be [1 B B^2 B^3 ... B^(N-1)], row 2 would be [0 1 B B^2...B^(N-2)] ... row N would be [0 0 0 ... 1].
This computation is key to a Bayesian Metropolis-Gibbs sampler, and so needs to be done thousands of times for new values of "B".
I have currently tried this two ways:
Method 1 - Mostly Vectorized:
B_Matrix = np.triu(np.dot(np.reshape(B**(-1*np.array(range(N))),(N,1)),np.reshape(B**(np.array(range(N))),(1,N))))
Essentially, this is the upper triangle part of a product of an Nx1 and 1xN set of matrices:
upper triangle ([1 B^(-1) B^(-2) ... B^(-(N-1))]' * [1 B B^2 B^3 ... B^(N-1)])
This works great for small N (algebraically it is correct), but for large N it errs out. And it produces errors out for B=0 (which should be allowed). I believe this is stemming from taking B^(-N) ~ inf for small B and large N.
Method 2:
B_Matrix = np.zeros((N,N))
B_Row_1 = B**(np.array(range(N)))
for n in range(N):
B_Matrix[n,n:] = B_Row_1[0:N-n]
So that just fills in the matrix row by row, but uses a loop which slows things down.
I was wondering if anyone had run into this before, or had any better ideas on how to compute this matrix in a faster way.
I've never posted on stackoverflow before, but didn't see this question anywhere, and thought I'd ask.
Let me know if there's a better place to ask this, and if I should provide anymore detail.
You could use scipy.linalg.toeplitz:
In [12]: n = 5
In [13]: b = 0.5
In [14]: toeplitz(b**np.arange(n), np.zeros(n)).T
Out[14]:
array([[ 1. , 0.5 , 0.25 , 0.125 , 0.0625],
[ 0. , 1. , 0.5 , 0.25 , 0.125 ],
[ 0. , 0. , 1. , 0.5 , 0.25 ],
[ 0. , 0. , 0. , 1. , 0.5 ],
[ 0. , 0. , 0. , 0. , 1. ]])
If your use of the array is strictly "read only", you can play tricks with numpy strides to quickly create an array that uses only 2*n-1 elements (instead of n^2):
In [55]: from numpy.lib.stride_tricks import as_strided
In [56]: def make_array(b, n):
....: vals = np.zeros(2*n - 1)
....: vals[n-1:] = b**np.arange(n)
....: a = as_strided(vals[n-1:], shape=(n, n), strides=(-vals.strides[0], vals.strides[0]))
....: return a
....:
In [57]: make_array(0.5, 4)
Out[57]:
array([[ 1. , 0.5 , 0.25 , 0.125],
[ 0. , 1. , 0.5 , 0.25 ],
[ 0. , 0. , 1. , 0.5 ],
[ 0. , 0. , 0. , 1. ]])
If you will modify the array in-place, make a copy of the result returned by make_array(b, n). That is, arr = make_array(b, n).copy().
The function make_array2 incorporates the suggestion #Jaime made in the comments:
In [30]: def make_array2(b, n):
....: vals = np.zeros(2*n-1)
....: vals[n-1] = 1
....: vals[n:] = b
....: np.cumproduct(vals[n:], out=vals[n:])
....: a = as_strided(vals[n-1:], shape=(n, n), strides=(-vals.strides[0], vals.strides[0]))
....: return a
....:
In [31]: make_array2(0.5, 4)
Out[31]:
array([[ 1. , 0.5 , 0.25 , 0.125],
[ 0. , 1. , 0.5 , 0.25 ],
[ 0. , 0. , 1. , 0.5 ],
[ 0. , 0. , 0. , 1. ]])
make_array2 is more than twice as fast as make_array:
In [35]: %timeit make_array(0.99, 600)
10000 loops, best of 3: 23.4 µs per loop
In [36]: %timeit make_array2(0.99, 600)
100000 loops, best of 3: 10.7 µs per loop

How can I vertically concatenate 1x2 arrays in numpy?

How can I concatenate 1x2 arrays produced in another function?
I have a for loop that produces xa as output, which is a float64 (1L,2L).
xa = [[ 1.17281823 1.210732 ]]
The code I tried for concatenate is
A = []
for i in range(5):
# actually xa = UglyCalculation(**Inputs[i])
xa = np.array([[ i, i+1 ]]) # for our example here
# do something
All I want to do is concatenate/join/append these xa values vertically.
Basically the desired output should be
0 1
1 2
2 3
3 4
4 5
I have tried the following lines of code, but it's not working. Can you help?
A = np.concatenate((A, xa)) # Concatenate the matrices
A = np.concatenate((A,xa),axis=0)
A=np.vstack((A,xa))
Setting a shape on A allows concatenation of similar column sized arrays:
import numpy as np
A = np.array([])
A.shape=(0,2) # set A to be 0 rows x 2 cols matrix
for i in range(5):
xa = np.array([[i, i+1]])
A = np.concatenate( (A,xa) )
print A
output
[[ 0. 1.]
[ 1. 2.]
[ 2. 3.]
[ 3. 4.]
[ 4. 5.]]
without A.shape = (0,2) an Exception will be thrown:
Traceback (most recent call last):
File "./arr.py", line 5, in <module>
A = np.concatenate( (A,xa) )
ValueError: all the input arrays must have same number of dimensions

Categories

Resources