Related
Given an input n, I want to print n lines with each n numbers such that the numbers 1 through n² are displayed in a zig-zag way, starting with 1 appearing at the bottom-right corner of the output matrix, and 2 at the end of the one-but-last row, ...etc.
Examples:
Given Input 3.
Print:
9 4 3
8 5 2
7 6 1
Given Input 1.
Print:
1
Given Input 4.
Print:
13 12 5 4
14 11 6 3
15 10 7 2
16 9 8 1
Attempt
n = int(input("Enter dimensions of matrix :"))
m = n
x = 1
columns = []
for row in range(n):
inner_column = []
for col in range(m):
inner_column.append(x)
x = x + 1
columns.append(inner_column)
for inner_column in columns:
print(' '.join(map(str, inner_column)))
I've tried something like this, but it prints out the array incorrectly. Any ideas?
Your code explicitly performs x = 1 and then x = x + 1 in a loop. As you need the first column in reverse order, and there are n*n numbers to output, instead the first top-left value should be x = n * n and in the first column it should decrease like with x = x - 1. The next column should be filled from end to start, and the next should be filled from start to end, ...etc.
I would suggest making an iterator that visits rows in that zig-zag manner: 0, 1, 2, ... n - 1, and then n - 1, n - 2, ... 0, and back from the start. With that iterator you know exactly to which row you should append the next x value:
# Helper function to generate row numbers in zig-zag order, for as
# long as needed.
def zigzag(n):
if n % 2:
yield from range(n)
while True:
yield from range(n - 1, -1, -1)
yield from range(n)
n = int(input("Enter dimensions of matrix :"))
matrix = [[] for _ in range(n)]
visit = zigzag(n)
for x in range(n*n, 0, -1):
matrix[next(visit)].append(x)
Then print it:
for row in matrix:
print(' '.join(map(str, row)))
I have trouble solving this problem:
First line of input - N. N+1 is number of train stations.
Second line of input - N integers c(i) - price of a ticket between stations i-1 and i.
Third line of input - k - number of passengers.
Next k lines: int a and int b (first and last station for each passenger).
Desired output: price of ticket for each client. I.E.
Input:
4
12 23 34 45
3
0 4
1 3
3 2
Output:
114
57
34
My code:
n = int(input())
prices = list(map(int, input().split()))
x = int(input())
for i in range(x):
a, b = sorted(map(int, input().split()))
print(sum(prices[a:b]))
I guess my solution is far from optimal as I get Time Limit Exceeded error.
Solution using accumulated array
def accum(a):
" creates the accumulation of array a as input "
b = [0] * (len(a) + 1)
for i, v in enumerate(a):
b[i+1] = b[i] + v
return b
def price(acc, t):
" Price using accumulated array "
# t provides the start, stop points (e.g. [0, 4])
mini, maxi = min(t), max(t)
return acc[maxi] - acc[mini]
Usage of above functions
prices = [12, 23, 34, 45]
# create assumulation of prices
acc = accum(prices)
# Using your test cases
tests = [[0, 4], [1, 3], [3, 2]]
for t in tests:
print(t, price(acc, t))
Output
[0, 4] 114
[1, 3] 57
[3, 2] 34
I'm having trouble with implementing vectorization in pandas. Let me preface this by saying I am a total newbie to vectorization so it's extremely likely that I'm getting some syntax wrong.
Let's say I've got two pandas dataframes.
Dataframe one describes the x,y coordinates of some circles with radius R, with unique IDs.
>>> data1 = {'ID': [1, 2], 'x': [1, 10], 'y': [1, 10], 'R': [4, 5]}
>>> df_1=pd.DataFrame(data=data1)
>>>
>>> df_1
ID x y R
1 1 1 4
2 10 10 5
Dataframe two describes the x,y coordinates of some points, also with unique IDs.
>>> data2 = {'ID': [3, 4, 5], 'x': [1, 3, 9], 'y': [2, 5, 9]}
>>> df_2=pd.DataFrame(data=data2)
>>>
>>> df_2
ID x y
3 1 2
4 3 5
5 9 9
Now, imagine plotting the circles and the points on a 2D plane. Some of the points will reside inside the circles. See the image below.
All I want to do is create a new column in df_2 called "host_circle" that indicates the ID of the circle that each point resides in. If the particle does not reside in a circle, the value should be "None".
My desired output would be
>>> df_2
ID x y host_circle
3 1 2 1
4 3 5 None
5 9 9 2
First, define a function that checks if a given particle (x2,y2) resides inside a given circle (x1,y1,R1,ID_1). If it does, return the ID of the circle; else, return None.
>>> def func(x1,y1,R1,ID_1,x2,y2):
... dist = np.sqrt( (x1-x2)**2 + (y1-y2)**2 )
... if dist < R:
... return ID_1
... else:
... return None
Next, the actual vectorization. I'm sorta lost here. I think it should be something like
df_2['host']=func(df_1['x'],df_1['y'],df_1['R'],df_1['ID'],df_2['x'],df_2['y'])
but that just throws errors. Can someone help me?
One final note: My actual data I'm working with is VERY large; tens of millions of rows. Speed is crucial, hence why I'm trying to make vectorization work.
Numba v1
You might have to install numba with
pip install numba
Then use numbas jit compiler via the njit function decorator
from numba import njit
#njit
def distances(point, points):
return ((points - point) ** 2).sum(1) ** .5
#njit
def find_my_circle(point, circles):
points = circles[:, :2]
radii = circles[:, 2]
dist = distances(point, points)
mask = dist < radii
i = mask.argmax()
return i if mask[i] else -1
#njit
def find_my_circles(points, circles):
n = len(points)
out = np.zeros(n, np.int64)
for i in range(n):
out[i] = find_my_circle(points[i], circles)
return out
ids = np.append(df_1.ID.values, np.nan)
i = find_my_circles(points, df_1[['x', 'y', 'R']].values)
df_2['host_circle'] = ids[i]
df_2
ID x y host_circle
0 3 1 2 1.0
1 4 3 5 NaN
2 5 9 9 2.0
This iterates row by row... meaning one point at a time it tries to find the host circle. Now, that part is still vectorized. And the loop should be very fast. The massive benefit is that you don't occupy tons of memory.
Numba v2
This one is more loopy but short circuits when it finds a host
from numba import njit
#njit
def distance(a, b):
return ((a - b) ** 2).sum() ** .5
#njit
def find_my_circles(points, circles):
n = len(points)
m = len(circles)
out = -np.ones(n, np.int64)
centers = circles[:, :2]
radii = circles[:, 2]
for i in range(n):
for j in range(m):
if distance(points[i], centers[j]) < radii[j]:
out[i] = j
break
return out
ids = np.append(df_1.ID.values, np.nan)
i = find_my_circles(points, df_1[['x', 'y', 'R']].values)
df_2['host_circle'] = ids[i]
df_2
Vectorized
But still problematic
c = ['x', 'y']
centers = df_1[c].values
points = df_2[c].values
radii = df_1['R'].values
i, j = np.where(((points[:, None] - centers) ** 2).sum(2) ** .5 < radii)
df_2.loc[df_2.index[i], 'host_circle'] = df_1['ID'].iloc[j].values
df_2
ID x y host_circle
0 3 1 2 1.0
1 4 3 5 NaN
2 5 9 9 2.0
Explanation
Distance from any point from the center of a circle is
((x1 - x0) ** 2 + (y1 - y0) ** 2) ** .5
I can use broadcasting if I extend one of my arrays into a third dimension
points[:, None] - centers
array([[[ 0, 1],
[-9, -8]],
[[ 2, 4],
[-7, -5]],
[[ 8, 8],
[-1, -1]]])
That is all six combinations of vector differences. Now to calculate the distances.
((points[:, None] - centers) ** 2).sum(2) ** .5
array([[ 1. , 12.04159458],
[ 4.47213595, 8.60232527],
[11.3137085 , 1.41421356]])
Thats all 6 combinations of distances and I can compare against the radii to see which are within the circles
((points[:, None] - centers) ** 2).sum(2) ** .5 < radii
array([[ True, False],
[False, False],
[False, True]])
Ok, I want to find where the True values are. That is a perfect use case for np.where. It will give me two arrays, the first will be the row positions, the second the column positions of where these True values are. Turns out, the row positions are the points and column positions are the circles.
i, j = np.where(((points[:, None] - centers) ** 2).sum(2) ** .5 < radii)
Now I just have to slice df_2 with i somehow and assign to it values I get from df_1 using j somehow... But I showed that above.
Try this. I have modified your function a bit for calculation and I am getting as list assuming there are many circle satisfying one point. You can modify it if that's not the case. Also it will be zero member list in case particle do not reside in any of the circle
def func(df, x2,y2):
val = df.apply(lambda row: np.sqrt((row['x']-x2)**2 + (row['y']-y2)**2) < row['R'], axis=1)
return list(val.index[val==True])
df_2['host'] = df_2.apply(lambda row: func(df_1, row['x'],row['y']), axis=1)
Suppose we have a matrix of dimension N x M and we want to reduce its dimension preserving the values in each by summing the firs neighbors.
Suppose the matrix A is a 4x4 matrix:
A =
3 4 5 6
2 3 4 5
2 2 0 1
5 2 2 3
we want to reduce it to a 2x2 matrix as following:
A1 =
12 20
11 6
In particular my matrix represent the number of incident cases in an x-y plane. My matrix is A=103x159, if I plot it I get:
what I want to do is to aggregate those data to a bigger area, such as
Assuming you're using a numpy.matrix:
import numpy as np
A = np.matrix([
[3,4,5,6],
[2,3,4,5],
[2,2,0,1],
[5,2,2,3]
])
N, M = A.shape
assert N % 2 == 0
assert M % 2 == 0
A1 = np.empty((N//2, M//2))
for i in range(N//2):
for j in range(M//2):
A1[i,j] = A[2*i:2*i+2, 2*j:2*j+2].sum()
Though these loops can probably be optimized away by proper numpy functions.
I see that there is a solution using numpy.maxtrix, maybe you can test my solution too and return your feedbacks.
It works with a*b matrix if a and b are even. Otherwise, it may fail if a or b are odd.
Here is my solution:
v = [
[3,4,5,6],
[2,3,4,5],
[2,2,0,1],
[5,2,2,3]
]
def shape(v):
return len(v), len(v[0])
def chunks(v, step):
"""
Chunk list step per step and sum
Example: step = 2
[3,4,5,6] => [7,11]
[2,3,4,5] => [5,9]
[2,2,0,1] => [4,1]
[5,2,2,3] => [7,5]
"""
for i in v:
for k in range(0, len(i),step):
yield sum(j for j in i[k:k+step])
def sum_chunks(k, step):
"""
Sum near values with step
Example: step = 2
[
[7,11], [
[5,9], => [12, 11],
[4,1], [20, 6]
[7,5] ]
]
"""
a, c = [k[i::step] for i in range(step)], []
print(a)
for m in a:
# sum near values
c.append([sum(m[j:j+2]) for j in range(0, len(m), 2)])
return c
rows, columns = shape(v)
chunk_list = list(chunks(v, columns // 2))
final_sum = sum_chunks(chunk_list, rows // 2)
print(final_sum)
Output:
[[12, 11], [20, 6]]
I need to generate a file filled with three "random" values per line (10 lines), but those values sum must equal 15.
The structure is: "INDEX A B C".
Example:
1 15 0 0
2 0 15 0
3 0 0 15
4 1 14 0
5 2 13 0
6 3 12 0
7 4 11 0
8 5 10 0
9 6 9 0
10 7 8 0
If you want to avoid needing to create (or iterate through) the full space of satisfying permutations (which, for large N is important), then you can solve this problem with sequential sample.
My first approach was to just draw a value uniformly from [0, N], call it x. Then draw a value uniformly from [0, N-x] and call it y, then set z = N - x - y. If you then shuffle these three, you'll get back a reasonable draw from the space of solutions, but it won't be exactly uniform.
As an example, consider where N=3. Then the probability of some permutation of (3, 0, 0) is 1/4, even though it is only one out of 10 possible triplets. So this privileges values that contain a high max.
You can perfectly counterbalance this effect by sampling the first value x proportionally to how many values will be possible for y conditioned on x. So for example, if x happened to be N, then there is only 1 compatible value for y, but if x is 0, then there are 4 compatible values, namely 0 through 3.
In other words, let Pr(X=x) be (N-x+1)/sum_i(N-i+1) for i from 0 to N. Then let Pr(Y=y | X=x) be uniform on [0, N-x].
This works out to P(X,Y) = P(Y|X=x) * P(X) = 1/(N-x+1) * [N - x + 1]/sum_i(N-i+1), which is seen to be uniform, 1/sum_i(N-i+1), for each candidate triplet.
Note that sum(N-i+1 for i in range(0, N+1)) gives the number of different ways to sum 3 non-negative integers to get N. I don't know a good proof of this, and would happy if someone adds one to the comments!
Here's a solution that will sample this way:
import random
from collections import Counter
def discrete_sample(weights):
u = random.uniform(0, 1)
w_t = 0
for i, w in enumerate(weights):
w_t += w
if u <= w_t:
return i
return len(weights)-1
def get_weights(N):
vals = [(N-i+1.0) for i in range(0, N+1)]
totl = sum(vals)
return [v/totl for v in vals]
def draw_summing_triplet(N):
weights = get_weights(N)
x = discrete_sample(weights)
y = random.randint(0, N-x)
triplet = [x, y, N - x - y]
random.shuffle(triplet)
return tuple(triplet)
Much credit goes to #DSM in the comments for questioning my original answer and providing good feedback.
In this case, we can test out the sampler like this:
foo = Counter(draw_summing_triplet(3) for i in range(10**6))
print foo
Counter({(1, 2, 0): 100381,
(0, 2, 1): 100250,
(1, 1, 1): 100027,
(2, 1, 0): 100011,
(0, 3, 0): 100002,
(3, 0, 0): 99977,
(2, 0, 1): 99972,
(1, 0, 2): 99854,
(0, 0, 3): 99782,
(0, 1, 2): 99744})
If the numbers can by any just use combinations:
from itertools import combinations
with open("rand.txt","w") as f:
combs = [x for x in combinations(range(16),3) if sum(x ) == 15 ][:10]
for a,b,c in combs:
f.write("{} {} {}\n".format(a,b,c))
This seems straight forward to me and it utilizes the random module.
import random
def foo(x):
a = random.randint(0,x)
b = random.randint(0,x-a)
c = x - (a +b)
return (a,b,c)
for i in range(100):
print foo(15)