I have some data and what i am trying to do is to make full circle and half circle using that data. Below is the code i did so far but it should start from zero and end at zero. Also this creates a so called half circle. Is there a way to create half-circle and full-circle, starts from zero and ends at zero. Or using the data without manipulating it?
np.random.seed(15)
data = np.random.randint(0, 100, 100)
print(data)
arr = data - np.mean(data)
arr = np.cumsum(np.sort(arr))
plt.plot(arr)
plt.axhline(0, color="#000000", ls="-.", linewidth=0.5)
plt.show()
[72 12 5 0 28 27 71 75 85 47 93 17 31 23 32 62 10 15 68 39 37 19 44 77
60 29 79 15 56 49 1 31 96 85 26 34 75 50 65 53 70 41 34 40 22 63 79 56
28 99 4 7 66 42 96 7 24 60 45 83 49 53 29 76 88 76 33 2 88 42 81 51
62 23 93 98 87 18 90 90 16 77 90 32 70 4 28 84 35 28 69 54 64 73 84 56
46 38 35 14]
You can use Circle (http://matplotlib.org/api/patches_api.html):
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
c = plt.Circle((0, 0), radius=1, edgecolor='b', facecolor='None')
ax.add_patch(c)
plt.show()
Related
To select data for training and validation in my machine learning projects, I usually use numpys masking functionality. So a typical reoccuring block of code to select the indices for validation and test data looks like this:
import numpy as np
validation_split = 0.2
all_idx = np.arange(0,100000)
idxValid = np.random.choice(all_idx, int(validation_split * len(all_idx)))
idxTrain = np.setdiff1d(all_idx, idxValid)
Now the following should always be true:
len(all_idx) == len(idxValid)+len(idxTrain)
Unfortunately, I found out that somehow this is not always the case. As I inrease the number of elements that are chosen from the all_idx-array the resulting numbers do not add up properly. Here another standalone example which breaks as soon as I increase the number of randomly chosen validation indices above 1000:
import numpy as np
all_idx = np.arange(0,100000)
idxValid = np.random.choice(all_idx, 1000)
idxTrain = np.setdiff1d(all_idx, idxValid)
print(len(all_idx), len(idxValid), len(idxTrain))
This results in -> 100000, 1000, 99005
I am confused?! Please try yourself. I would be glad to understand this.
idxValid = np.random.choice(all_idx, 10, replace=False)
Careful, you need to indicate that you don't want to have duplicates in idxValid. To do so, you just have to had replace=False in np.random.choice
replace boolean, optional
Whether the sample is with or without replacement
Consider the following example:
all_idx = np.arange(0, 100)
print(all_idx)
>>> [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
Now if you print out your validation dataset:
idxValid = np.random.choice(all_idx, int(validation_split * len(all_idx)))
print(idxValid)
>>> [31 57 55 45 26 25 55 76 33 69 49 90 46 14 18 30 89 73 47 82]
You can actually observe that there are duplicates in the resulting set and thus
len(all_idx) == len(idxValid)+len(idxTrain)
wouldn't result to True.
What you need to do is to make sure that np.random.choice does a sampling without replcacement by passing replace=False:
idxValid = np.random.choice(all_idx, int(validation_split * len(all_idx)), replace=False)
Now the results should be as expected:
import numpy as np
validation_split = 0.2
all_idx = np.arange(0, 100)
print(all_idx)
idxValid = np.random.choice(all_idx, int(validation_split * len(all_idx)), replace=False)
print(idxValid)
idxTrain = np.setdiff1d(all_idx, idxValid)
print(idxTrain)
print(len(all_idx) == len(idxValid)+len(idxTrain))
and the output is:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
[12 85 96 64 48 21 55 56 80 42 11 92 54 77 49 36 28 31 70 66]
[ 0 1 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 22 23 24 25 26
27 29 30 32 33 34 35 37 38 39 40 41 43 44 45 46 47 50 51 52 53 57 58 59
60 61 62 63 65 67 68 69 71 72 73 74 75 76 78 79 81 82 83 84 86 87 88 89
90 91 93 94 95 97 98 99]
True
Consider using train_test_split from scikit-learn which is straight-forward:
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2)
I want to perform a manual short time fourier transform. I have a simple time series in the form of a cosine wave. I want to perform a short time fourier transform by splitting up the time series into a number of evenly spaced segments that include overlap... how do i do that?
this is my time series:
fs = 10e3 # Sampling frequency
N = 1e5 # Number of samples
time = np.arange(N) / fs
x = np.cos(5*time) # Some random audio wave
# x.shape gives (100000,)
How do i split into say, 10 evenly spaced segments?
Here's one way to do this.
import numpy as np
def get_windows(n, Mt, olap):
# Split a signal of length n into olap% overlapping windows each containing Mt terms
ists = []
ieds = []
ist = 0
while 1:
ied = ist + Mt
if ied > n:
break
ists.append(ist)
ieds.append(ied)
ist += int(Mt * (1 - olap/100))
return ists, ieds
n = 100
x = np.arange(n)
ists, ieds = get_windows(n, Mt=20, olap=50) # windows of length 20 and 50% overlap
for ist, ied in zip(ists, ieds):
print(x[ist:ied])
result:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]
[20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
[30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]
[40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
[50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69]
[60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
[70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89]
[80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
If your data is relatively small and you are comfortable with storing all the windows in RAM, then you can continue as follows:
X = np.array([x[ist:ied] for ist, ied in zip(ists, ieds)])
# X.shape is (nwindows, Mt)
By doing this, you can generate W a windowing function (e.g. Hanning window) as a 1D array of shape (Mt, ), so that W*X will broadcast in a way so that W applies to each window in X.
I just noticed that the term "window" is used with two meanings in this context. Sorry for the confusion.
Largest product in a grid
Problem 11
In the 20×20 grid below, four numbers along a diagonal line have been marked in red.
The product of these numbers is 26 × 63 × 78 × 14 = 1788696.
What is the greatest product of four adjacent numbers in the same direction (up, down, left, right, or diagonally) in the 20×20 grid?
x ='''
08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48
'''
import numpy as np
import pandas as pd
arr = np.array(list((x.split())), dtype = int)
arr = arr.reshape(20,20)
arr_m = np.zeros((20,20))
A method loop over elements
%%time
for i in range(0, 17):
for j in range(0, 17):
x = [k for k in range(i, i+4)]
y = [k for k in range(j, j+4)]
arr_m[i,j] = max(arr[i,j: j+4].prod(), arr[i:i+4,j].prod(), arr[x, y].prod(),)
print(arr_m.max())
A method loop over rows, columns and sub-arrys
arr1 = np.zeros((20, 20), dtype=int)
arr2 = np.zeros((20, 20), dtype=int)
arr3 = np.zeros((20, 20), dtype=int)
%%time
for i in range(0, 20):
arr1[:, i] = arr[:, i: i+4].prod(1)
for i in range(0, 20):
arr2[i, :] = arr[i:i+4, :].prod(0)
for i in range(0, 20):
for j in range(0, 20):
arr3[i,j] = arr[i: i+4, j: j+4].diagonal().prod()
max(arr1.max(), arr2.max(), arr3.max())
I want to push it a little bit.
is there any pure numpy or pandas ways to do it without loops?
Here is an approach using stride_tricks. It creates windowed views along all relevant directions and then just multiplies and finds the index of the best value.
The rest is just a bit of book keeping to recover the indices in the original grid.
import numpy as np
x = '''08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48
'''
arr = np.array(list((x.split())), dtype = int)
arr = arr.reshape(20,20)
I, J = arr.shape
S, T = arr.strides
horz = np.lib.stride_tricks.as_strided(arr, (I, J-3, 4), (S, T, T)).prod(axis=2)
vert = np.lib.stride_tricks.as_strided(arr, (I-3, J, 4), (S, T, S)).prod(axis=2)
tlbr = np.lib.stride_tricks.as_strided(arr, (I-3, J-3, 4), (S, T, S+T)).prod(axis=2)
bltr = np.lib.stride_tricks.as_strided(arr[3:], (I-3, J-3, 4), (S, T, -S+T)).prod(axis=2)
all_ = horz, vert, tlbr, bltr
midx = [np.unravel_index(o.argmax(), o.shape) for o in all_]
mval = [o[idx] for o, idx in zip(all_, midx)]
hy, hx, vy, vx, ty, tx, by, bx = np.ravel(midx)
a = np.arange(4)
idx = list(map(tuple, np.reshape(np.s_[hy, hx:hx+4, vy:vy+4, vx, ty+a, tx+a, by+a[::-1], bx+a], (4, 2))))
for name, I, V in zip('horizontal vertical topleft-bottomright bottomleft-topright'.split(), idx, mval):
print('best', name, ':', V, '=', ' x '.join(map(str, arr[I])))
Output:
best horizontal : 48477312 = 78 x 78 x 96 x 83
best vertical : 51267216 = 66 x 91 x 88 x 97
best topleft-bottomright : 40304286 = 94 x 99 x 71 x 61
best bottomleft-topright : 70600674 = 87 x 97 x 94 x 89
Numpy accelerate computations, but also and above all give versatile ways to scan the data. To simplify the computation of the useful products, you can :
pad the array with some 0 to simply manage borders.
use a flatten version of the data : then different directions are just different shifts. The 4 directions ar managed in the same logic.
Code :
data = pd.read_clipboard(header=None).values # read the tray
m,n = data.shape
blocksize = 4
arr = np.zeros((m+blocksize,n+1),int) #pad with the right amount of zeros.
arr[:m,:n] = data
flat = arr.ravel()
usefulsize = data.size + m # indice of last non zero value + 1
shifts = [1,n,n+1,n+2] # - / | \ , the four directions
blocks = np.array([[flat[i*s:][:usefulsize] for s in shifts] \
for i in range(blocksize)]) #15µs
scores=blocks.prod(axis=0) #8µs
With smaller "development" time, it's ~200x faster than loops. Output:
print(scores.max())
i,j = np.where(scores==scores.max())
print(blocks[:,i,j])
70600674
[[89][94][97][87]]
My shot would be
def n_prod_max(df, n):
# No rolling(...).prod() or anything out of pandas' box
win_prod = lambda x: x.prod()
# Supposed to be square
df_size = df.shape[0]
diag_nums = pd.Series(range(-df_size + n, df_size - n + 1))
# Columns max
col_max = df.rolling(n).agg(win_prod).max().max()
# Rows max
row_max = df.T.rolling(n).agg(win_prod).max().max()
# Diagonals max
diag_vals = df.values
diag_max = diag_nums.apply(lambda d: pd.Series(diag_vals.diagonal(d))
.rolling(n)
.agg(win_prod)
.max()).max()
# Antidiagonals max
adiag_vals = np.rot90(df.values)
adiag_max = diag_nums.apply(lambda d: pd.Series(adiag_vals.diagonal(d))
.rolling(n)
.agg(win_prod)
.max()).max()
return max([col_max, row_max, diag_max, adiag_max])
>>> n_prod_max(df, 4)
70600674.0
So basically we are given a text that looks like this:
20
08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48
It's a 20x20 dimensional square, and you have to figure out the greatest product of four adjacent numbers in the same direction (horizontal, vertical, or diagonal) in a grid of positive integers. This is what I have:
def main():
# open file for reading
in_file = open ("./Grid.txt", "r")
dimension = in_file.readline()
dimensions = int(dimension)
greatest = 0
grid = ''
largest = [0, 0, 0, 0]
for i in range (0, dimensions):
grid = grid + in_file.readline()
grid = grid.strip()
grid = grid.replace(" ","")
i = 0
j = 0
print(int(grid[i]))
for i in range (0, dimensions * 2 + (dimensions - 1)):
for j in range (0, dimensions * 2 + (dimensions - 1) - 3):
if (int(grid[i])*10 + int(grid[i+1]))*(int(grid[i+2])*10 + int(grid[i+3]))*(int(grid[i+4])*10 + int(grid[i+5]))*(int(grid[i+6])*10 + int(grid[i+7])) > largest[0]:
largest[0] = (int(grid[i])*10 + int(grid[i+1]))*(int(grid[i+2])*10 + int(grid[i+3]))*(int(grid[i+4])*10 + int(grid[i+5]))*(int(grid[i+6])*10 + int(grid[i+7]))
print(max(largest))
main()
I know it's super complicated but basically, I'm not sure how to make this set of numbers look like a list of numbers (an array)... So I essentially end up having to make the numbers. For example the first number is 02, so I multiple 0 by 10, and add 2... Anyways, the problem is that I get ValueError: invalid literal for int() with base 10: '\n'. Any help is appreciated!
The problem is this line:
grid = grid + in_file.readline()
Change it to:
grid = grid + in_file.readline().strip() # you must strip each line
You need to strip each line as you read it, but currently, you're stripping only the final string, which leaves all the whitespace you have in each line present. Eventually, your code tries to convert non-numeric characters (e.g. spaces, newlines) into numbers and runs into the error.
After the fix, running it produces the following output:
➜ /tmp ./test.py
0
1614
Additional Recommendations
You definitely need to make your code more readable before posting. It was painful to look at and even more painful to debug... I almost left it there.
One possible start could be in the complicated for loop. Consider:
for i in range (0, dimensions * 2 + (dimensions - 1)):
for j in range (0, dimensions * 2 + (dimensions - 1) - 3):
tmp = int(grid[i]) * 10 \
+ int(grid[i+1]) * int(grid[i+2]) * 10 \
+ int(grid[i+3]) * int(grid[i+4]) * 10 \
+ int(grid[i+5]) * int(grid[i+6]) * 10 \
+ int(grid[i+7])
if tmp > largest[0]:
largest[0] = tmp
First, it allowed me to see that the culprit was int(grid[i+7]) instruction, whereas before it would show the entire line while complaining and was not informative.
Second, it does not calculate exactly the same thing twice. It uses a temporary variable instead.
Third, you should consider converting your grid variable into an actual grid (e.g. an array of arrays). Currently, it's merely a string, so the name is misleading.
Fourth, while you turn grid into an actual grid, you can use a list comprehension and convert the values into numbers directly, as in this short example:
>>> line = '12 34 5 6 78 08 1234'
>>> [int(v) for v in line.split()]
[12, 34, 5, 6, 78, 8, 1234] # array of integers, not strings
>>>
It will save you the conversions before getting to the other parts and validates the data for you in the process while the code is still simpler, instead of waiting to your complicated calculations to blow up.
I have a long 121 element array where the data is stored in ascending order and I want to reshape to an 11x11 matrix and so I use the NumPy reshape command
Z = data.attributevalue[2,time,axial,:]
Z = np.reshape(Z, (int(math.sqrt(datacount)), int(math.sqrt(datacount))))
The data should be oriented in a Cartesian plane and I create the mesh grid with the following
x = np.arange(1.75, 12.5, 1)
y = np.arange(1.75, 12.5, 1)
X,Y = np.meshgrid(x, y)
The issue is that rows of Z are in the wrong order so the data in the last row of the matrix should be in the first and vice-versa. I want to rearrange so the rows are filled in the proper manner. The starting array Z is assembled in the following arrangement [datapoint #1, datapoint #2 ...., datapoint #N]. Datapoint #1 should be in the top left and the last point in the bottom right. Is there a simple way of accomplishing this or do I have to make a function to changed the order of the rows?
my plot statement is the following
surf = self.ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,
linewidth=1, antialiased=True)
***UPDATE****
I tried populating the initial array backwards and still no luck. I changed the orientation of the axis to the following
y = np.arrange(12.5,1,-1)
This flipped the data but my axis label is wrong so it is not a real solution to my issue. Any ideas?
It is possible that your original array does not look like a 1x121 array. The following code block shows how you reshape an array from 1x121 to 11x11.
import numpy as np
A = np.arange(1,122)
print A
print A.reshape((11,11))
Gives:
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121]
[[ 1 2 3 4 5 6 7 8 9 10 11]
[ 12 13 14 15 16 17 18 19 20 21 22]
[ 23 24 25 26 27 28 29 30 31 32 33]
[ 34 35 36 37 38 39 40 41 42 43 44]
[ 45 46 47 48 49 50 51 52 53 54 55]
[ 56 57 58 59 60 61 62 63 64 65 66]
[ 67 68 69 70 71 72 73 74 75 76 77]
[ 78 79 80 81 82 83 84 85 86 87 88]
[ 89 90 91 92 93 94 95 96 97 98 99]
[100 101 102 103 104 105 106 107 108 109 110]
[111 112 113 114 115 116 117 118 119 120 121]]