How to create a tensor from sparse data? - python

I have a dataset where each time point is represented by a set of sparse x and y values. For data storage purposes, if y = 0, that data point is not recorded.
Imagine data point t0:
#Real data
#t0
x0 = [200, 201, 202, 203, 204, 205, 206, 207, ...]
y0 = [5, 10, 0, 7, 0, 0, 15, 20, ...]
#Data stored
#t0
x0 = [200, 201, 203, 206, 207, ...]
y0 = [5, 10, 7, 15, 20, ...]
Now, imagine I have data point t1:
#Data stored
#t1
x1 = [201, 204, 206, 207, ...]
y1 = [10, 15, 3, 20, ...]
Is there a simple and efficient way to rebuild the full dataset for a custom number of data points? Let's say I want a data structure that represents all data contained in t0 + t1:
#t0+t1
M = [[200, 201, 203, 204, 206, 207, ...], # this contains all xs recorded for both t0 and t1
[5, 10, 7, 0, 15, 20, ... ], # y values from t0. Missing values are filled with 0
[0, 10, 0, 15, 3, 20, ...] # y values from t1. Missing values are filled with 0
]
Any help would be really appreciated!

It looks like np.searchsorted is what you are looking for:
m0 = np.unique(x0 + x1) #assuming x0 and x1 are lists
M = np.zeros((3, len(m0)), dtype=int)
M[0] = m0
M[1, np.searchsorted(m0, x0)] = y0
M[2, np.searchsorted(m0, x1)] = y1
>>> M
array([[200, 201, 203, 204, 206, 207],
[ 5, 10, 7, 0, 15, 20],
[ 0, 10, 0, 15, 3, 20]])

Related

How to generate sequential subsets of integers?

I have the following start and end values:
start = 0
end = 54
I need to generate subsets of 4 sequential integers starting from start until end with a space of 20 between each subset. The result should be this one:
0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51
In this example, we obtained 3 subsets:
0, 1, 2, 3
24, 25, 26, 27
48, 49, 50, 51
How can I do it using numpy or pandas?
If I do r = [i for i in range(0,54,4)], I get [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52].
This should get you what you want:
j = 20
k = 4
result = [split for i in range(0,55, j+k) for split in range(i, k+i)]
print (result)
Output:
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]
Maybe something like this:
r = [j for i in range(0, 54, 24) for j in range(i, i + 4)]
print(r)
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]
you can use numpy.arange which returns an ndarray object containing evenly spaced values within a given range
import numpy as np
r = np.arange(0, 54, 4)
print(r)
Result
[0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52]
Numpy approach
You can use np.arange to generate number with a step value of 20 + 4, where 20 is for space between each interval and 4 for each sequential sub array.
start = 0
end = 54
out = np.arange(0, 54, 24) # array([ 0, 24, 48]) These are the starting points
# for each subarray
step = np.tile(np.arange(4), (len(out), 1))
# [[0 1 2 3]
# [0 1 2 3]
# [0 1 2 3]]
res = out[:, None] + step
# array([[ 0, 1, 2, 3],
# [24, 25, 26, 27],
# [48, 49, 50, 51]])
This can be done with plane python:
rangeStart = 0
rangeStop = 54
setLen = 4
step = 20
stepTot = step + setLen
a = list( list(i+s for s in range(setLen)) for i in range(rangeStart,rangeStop,stepTot))
In this case you will get the subsets as sublists in the array.
I dont think you need to use numpy or pandas to do what you want. I achieved it with a simple while loop
num = 0
end = 54
sequence = []
while num <= end:
sequence.append(num)
num += 1
if num%4 == 0: //If four numbers have been added
num += 20
//output: [0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]

Why is meshgrid changing (x, y, z) order to (y, x, z)?

I have 3 vectors:
u = np.array([0, 100, 200, 300]) #hundreds
v = np.array([0, 10, 20]) #tens
w = np.array([0, 1]) #units
Then I used np.meshgrid to sum u[i]+v[j],w[k]:
x, y, z = np.meshgrid(u, v, w)
func1 = x + y + z
So, when (i,j,k)=(3,2,1), func1[i, j, k] should return 321, but I only get 321 if I put func1[2, 3, 1].
Why is it asking me for vector v before u? Should I use numpy.ix_ instead?
From the meshgrid docs:
Notes
-----
This function supports both indexing conventions through the indexing
keyword argument. Giving the string 'ij' returns a meshgrid with
matrix indexing, while 'xy' returns a meshgrid with Cartesian indexing.
In the 2-D case with inputs of length M and N, the outputs are of shape
(N, M) for 'xy' indexing and (M, N) for 'ij' indexing. In the 3-D case
with inputs of length M, N and P, outputs are of shape (N, M, P) for
'xy' indexing and (M, N, P) for 'ij' indexing.
In [109]: U,V,W = np.meshgrid(u,v,w, sparse=True)
In [110]: U
Out[110]:
array([[[ 0], # (1,4,1)
[100],
[200],
[300]]])
In [111]: U+V+W
Out[111]:
array([[[ 0, 1],
[100, 101],
[200, 201],
[300, 301]],
[[ 10, 11],
[110, 111],
[210, 211],
[310, 311]],
[[ 20, 21],
[120, 121],
[220, 221],
[320, 321]]])
The result is (3,4,2) array; This is the cartesian case described in the notes.
With the documented indexing change:
In [113]: U,V,W = np.meshgrid(u,v,w, indexing='ij',sparse=True)
In [114]: U.shape
Out[114]: (4, 1, 1)
In [115]: (U+V+W).shape
Out[115]: (4, 3, 2)
Which matches the ix_ that you wanted:
In [116]: U,V,W = np.ix_(u,v,w)
In [117]: (U+V+W).shape
Out[117]: (4, 3, 2)
You are welcome to use either. Or even np.ogrid as mentioned in the docs.
Or even the home-brewed broadcasting:
In [118]: (u[:,None,None]+v[:,None]+w).shape
Out[118]: (4, 3, 2)
Maybe the 2d layout clarifies the two coordinates:
In [119]: Out[111][:,:,0]
Out[119]:
array([[ 0, 100, 200, 300], # u going across, x-axis
[ 10, 110, 210, 310],
[ 20, 120, 220, 320]])
In [120]: (u[:,None,None]+v[:,None]+w)[:,:,0]
Out[120]:
array([[ 0, 10, 20], # u going down - rows
[100, 110, 120],
[200, 210, 220],
[300, 310, 320]])
For your indexing method, you need axis 0 to be the direction of increment of 1s, axis 1 to be for 10s, and axis 2 to be for 100s.
You can just transpose to swap the axes to suit your indexing method -
u = np.array([0, 100, 200, 300]) #hundreds
v = np.array([0, 10, 20, 30]) #tens
w = np.array([0, 1, 2, 3]) #units
x,y,z = np.meshgrid(w,v,u)
func1 = x + y + z
func1 = func1.transpose(2,0,1)
func1
# axis 0 is 1s
#------------------>
array([[[ 0, 1, 2, 3],
[ 10, 11, 12, 13], #
[ 20, 21, 22, 23], # Axis 1 is 10s
[ 30, 31, 32, 33]],
[[100, 101, 102, 103], #
[110, 111, 112, 113], # Axis 2 is 100s
[120, 121, 122, 123], #
[130, 131, 132, 133]],
[[200, 201, 202, 203],
[210, 211, 212, 213],
[220, 221, 222, 223],
[230, 231, 232, 233]],
[[300, 301, 302, 303],
[310, 311, 312, 313],
[320, 321, 322, 323],
[330, 331, 332, 333]]])
Testing this by indexing -
>> func1[2,3,1]
231
>> func1[3,2,1]
321

How can I evenly sample an array in Python, in order, according to a sample rate?

I have array_large and array_small. I need to evenly sample from array_large so that I end up with an array the same size as array_small. (Or in other words, I need a representative, downsized version of array_large to match up with array_small.)
As a super-trivial example, if array_small = [0, 1] and array_large = [0, 1, 2, 3] I would expect sample = [0, 2] or sample = [1, 3].
Let's imagine array_small is 30 items and array_large is 100.
array_small = [i for i in range(30)]
array_large = [i for i in range(100)]
sample_rate = len(array_large) / len(array_small)
In that case our sample_rate is 3.333... which means we want about every 3rd item, but sometimes every 4th item. Since the sample_rate is a float we can account for that with math.floor() and use the mod operator on the array index:
import math
array_large_sample = [
num for i, num in enumerate(array_large)
if math.floor(i % sample_rate) == 0
]
print(array_large_sample)
print(len(array_large_sample))
OUTPUT:
[0, 4, 7, 11, 14, 17, 21, 24, 27, 31, 34, 37, 41, 44, 47, 51, 54, 57, 61, 64, 67, 71, 74, 77, 81, 84, 87, 91, 94, 97]
30

How to convert numeric to strings in the box plot

I want to plot the boxplot of the following dataset :
A = [150, 112, 108, 70]
B = [260, 90, 165, 100]
C = [160, 50, 90, 60]
D = [110, 20, 35, 70]
E = [105, 450, 45, 200]
One way I can do it is via the following code:
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
font_prop = font_manager.FontProperties( size=18)
Positions = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5]
Heat = [150, 112, 108, 70, 260, 90, 165, 100, 160, 50, 90, 60, 110, 20, 35, 70, 105, 450, 45, 200]
groups = [[] for i in range(max(Positions))]
[groups[Positions[i]-1].append(Heat[i]) for i in range(len(Heat))];
b = plt.boxplot(groups, patch_artist=False);
plt.rcParams.update({'font.size': 16})
plt.rc ('xtick', labelsize=16)
plt.rc ('ytick', labelsize=18)
for median in b['medians']:
median.set(color = 'r', linewidth = 2)
I can get the following box plot but I want the numbers 1...5 to be replaced by A...E? Is there an alternative way I can do this?
To convert a char to an integer, use
ord(char)
To convert an integer to a char, use
chr(int)
An example:
int_array = list(range(5))
char_array = [chr(x + ord('A')) for x in int_array]
# char_array = ['A', 'B', 'C', 'D', 'E']

numpy.ufunc operation equivalent

I have 4D numpy array a and 3D array b. Also I have 2D arrays of indices i0, j0, k0. Suppose I want to use the following construction:
np.add.at(a, (slice(None), k0, i0, j0), b)
As fas as I understood, a[slice(None), k0, i0, j0] += b is not equivalent to this np.add.at.
The question is how can this np.add.at line can be replaced with a simple numpy adding a[...] += b[...] properly?
You wrote that i0, j0 and k0 are 2-D arrays.
This is why I assumed that each row in these arrays defines a 3-D slice
in b array (each array in the respective dimension).
For the test I defined these arrays as:
k0 = np.array([[0,1],[1,2]])
i0 = np.array([[1,3],[2,4]])
j0 = np.array([[0,2],[2,4]])
so that the first 3-D slice from b is: b[0:1, 1:3, 0:2].
Because k0, i0 and j0 (taken "by row") define consecutive slices,
you can not join them in a single instruction.
My proposition is to perform your addition in the following loop:
for sl in np.dstack([k0, i0, j0]):
idx = tuple(np.insert(np.r_[np.apply_along_axis(
lambda row: slice(*row), 0, sl)], 0, slice(None)))
a[tuple(idx)] += b[tuple(idx[1:])]
idx is a 4-tuple to index array "a". The first element is slice(None) -
operate on all elements in the first dimension.
Following elements are slices for following dimensions.
To index b array the same idx is used, but without the first
element.
To test the above code, I defined both arrays as:
aDim = (2, 2, 4, 4)
bDim = aDim[1:]
a = np.arange(1, np.array(aDim).prod() + 1).reshape(aDim)
b = np.arange(501, np.array(bDim).prod() + 501).reshape(bDim)
(print them to see the initial content).
After my code was executed, a contains:
array([[[[ 1, 2, 3, 4],
[510, 512, 7, 8],
[518, 520, 11, 12],
[ 13, 14, 15, 16]],
[[ 17, 18, 19, 20],
[ 21, 22, 23, 24],
[ 25, 26, 554, 556],
[ 29, 30, 562, 564]]],
[[[ 33, 34, 35, 36],
[542, 544, 39, 40],
[550, 552, 43, 44],
[ 45, 46, 47, 48]],
[[ 49, 50, 51, 52],
[ 53, 54, 55, 56],
[ 57, 58, 586, 588],
[ 61, 62, 594, 596]]]])

Categories

Resources