I have a dataframe df that looks like this:
ID Sequence
0 A->A
1 C->C->A
2 C->B->A
3 B->A
4 A->C->A
5 A->C->C
6 A->C
7 A->C->C
8 B->B
9 C->C and so on ....
I want to create a column called 'Outcome', which is binomial in nature.
Its value essentially depends on three lists that I am generating from below
Whenever 'A' occurs in a sequence, probability of "Outcome" being 1 is 2%
Whenever 'B' occurs in a sequence, probability of "Outcome" being 1 is 6%
Whenever 'C' occurs in a sequence, probability of "Outcome" being 1 is 1%
so here is the code which is generating these 3 (bi_A, bi_B, bi_C) lists -
A=0.02
B=0.06
C=0.01
count_A=0
count_B=0
count_C=0
for i in range(0,len(df)):
if('A' in df.sequence[i]):
count_A+=1
if('B' in df.sequence[i]):
count_B+=1
if('C' in df.sequence[i]):
count_C+=1
bi_A = np.random.binomial(1, A, count_A)
bi_B = np.random.binomial(1, B, count_B)
bi_C = np.random.binomial(1, C, count_C)
What I am trying to do is to combine these 3 lists as an "output" column so that probability of Outcome being 1 when "A" is in sequence is 2% and so on. How to I solve for it as I understand there would be data overlap, where bi_A says one sequence is 0 and bi_B says it's 1, so how would we solve for this ?
End data should look like -
ID Sequence Output
0 A->A 0
1 C->C->A 1
2 C->B->A 0
3 B->A 0
4 A->C->A 0
5 A->C->C 1
6 A->C 0
7 A->C->C 0
8 B->B 0
9 C->C 0
and so on ....
Such that when I find probability of Outcome = 1 when A is in string, it should be 2%
EDIT -
you can generate the sequence data using this code-
import pandas as pd
import itertools
import numpy as np
import random
alphabets=['A','B','C']
combinations=[]
for i in range(1,len(alphabets)+1):
combinations.append(['->'.join(i) for i in itertools.product(alphabets, repeat = i)])
combinations=(sum(combinations, []))
weights=np.random.normal(100,30,len(combinations))
weights/=sum(weights)
weights=weights.tolist()
#weights=np.random.dirichlet(np.ones(len(combinations))*1000.,size=1)
'''n = len(combinations)
weights = [random.random() for _ in range(n)]
sum_weights = sum(weights)
weights = [w/sum_weights for w in weights]'''
df=pd.DataFrame(random.choices(
population=combinations,weights=weights,
k=10000),columns=['sequence'])
t = int(input())
lis =[]
for i in range(t):
col = list(map(int,input()))
colindex = col[0] - 1
count = 0
matsize = col[0] * col[0]
mat = list(map(int,input().split()))
while len(lis) != matsize:
for j in range(len(mat)):
if colindex < len(mat):
if mat[j] == mat[colindex]:
lis.append(mat[j])
colindex += col[0]
count += 1
colindex = col[0] - 1
colindex -= count
for i in lis:
print(i,end= ' ')
Given a square matrix mat[][] of size N x N. The task is to rotate it by 90 degrees in anti-clockwise direction without using any extra space.
Input:
The first line of input contains a single integer T denoting the number of test cases. Then T test cases follow. Each test case consist of two lines. The first line of each test case consists of an integer N, where N is the size of the square matrix.The second line of each test case contains N x N space separated values of the matrix mat.
Output:
Corresponding to each test case, in a new line, print the rotated array.
Constraints:
1 ≤ T ≤ 50
1 ≤ N ≤ 50
1 <= mat[][] <= 100
Example:
Input:
2
3
1 2 3 4 5 6 7 8 9
2
5 7 10 9
Output:
3 6 9 2 5 8 1 4 7
7 9 5 10
Explanation:
Testcase 1: Matrix is as below:
1 2 3
4 5 6
7 8 9
Rotating it by 90 degrees in anticlockwise directions will result as below matrix:
3 6 9
2 5 8
1 4 7
https://practice.geeksforgeeks.org/problems/rotate-by-90-degree/0
It doesn't look like there is a problem with j. Can colindex ever be below 0? One way to identify this would be to simply keep track of the counters. For example, you can add an extra if condition if colindex >= 0: before if mat[j] == mat[colindex].
Rather than using one dimensional list, we can use two dimensional list to solve this challenge. From the given statement and sample test case, we get the following information:
Print the rotated matrix in a single line.
If the given matrix has n columns, the rotated matrix will have the sequential elements of n-1th column, n-2th column, .. 0th column.
Here is my accepted solution of this challenge:
def get_rotated_matrix(ar, n):
ar_2d = []
for i in range(0, len(ar)-n+1, n):
ar_2d.append(ar[i:i+n])
result = []
for i in range(n-1, -1, -1):
for j in range(n):
result.append(str(ar_2d[j][i]))
return result
cas = int(input())
for t in range(cas):
n = int(input())
ar = list(map(int, input().split()))
result = get_rotated_matrix(ar, n)
print(" ".join(result))
Explanation:
To make the solution simple, I created a 2 dimensional list to store the input data as a 2D matrix called ar_2d.
Then I traverse the matrix column wise; from last column to first column and appended the values to our result list as string value.
Finally, I have printed the result with space between elements using join method.
Disclaimer:
My solution uses a 1D list to store the rotated matrix elements thus usages extra space.
I have some data in the form:
ID A B VALUE EXPECTED RESULT
1 1 2 5 GROUP1
2 2 3 5 GROUP1
3 3 4 6 GROUP2
4 3 5 5 GROUP1
5 6 4 5 GROUP3
What i want to do is iterate through the data (thousand of rows) and create a common field so i will be able to join the data easily ( *A-> start Node, B->End Node Value-> Order...the data form something like a chain where only neighbors share a common A or B)
Rules for joining:
equal value for all elements of a group
A of element one equal to B of element two (or the oposite but NOT A=A' or B=B')
The most difficult one: assign to same group all sequential data that form a series of intersecting nodes.
That is the first element [1 1 2 5] has to be joined with [2 2 3 5] and then with [4 3 5 5]
Any idea how to accomplish this robustly when iterating through a large number of data? I have problem with rule number 3, the others are easily applied. For limited data i have some success, but this depends on the order i start examining the data. And this doesn't work for the large dataset.
I can use arcpy (preferably) or even Python or R or Matlab to solve this. Have tried arcpy with no success so i am checking on alternatives.
In ArcPy this code works ok but to limited extend (i.e. in large features with many segments i get 3-4 groups instead of 1):
TheShapefile="c:/Temp/temp.shp"
desc = arcpy.Describe(TheShapefile)
flds = desc.fields
fldin = 'no'
for fld in flds: #Check if new field exists
if fld.name == 'new':
fldin = 'yes'
if fldin!='yes': #If not create
arcpy.AddField_management(TheShapefile, "new", "SHORT")
arcpy.CalculateField_management(TheShapefile,"new",'!FID!', "PYTHON_9.3") # Copy FID to new
with arcpy.da.SearchCursor(TheShapefile, ["FID","NODE_A","NODE_B","ORDER_","new"]) as TheSearch:
for SearchRow in TheSearch:
if SearchRow[1]==SearchRow[4]:
Outer_FID=SearchRow[0]
else:
Outer_FID=SearchRow[4]
Outer_NODEA=SearchRow[1]
Outer_NODEB=SearchRow[2]
Outer_ORDER=SearchRow[3]
Outer_NEW=SearchRow[4]
with arcpy.da.UpdateCursor(TheShapefile, ["FID","NODE_A","NODE_B","ORDER_","new"]) as TheUpdate:
for UpdateRow in TheUpdate:
Inner_FID=UpdateRow[0]
Inner_NODEA=UpdateRow[1]
Inner_NODEB=UpdateRow[2]
Inner_ORDER=UpdateRow[3]
if Inner_ORDER==Outer_ORDER and (Inner_NODEA==Outer_NODEB or Inner_NODEB==Outer_NODEA):
UpdateRow[4]=Outer_FID
TheUpdate.updateRow(UpdateRow)
And some data in shapefile form and dbf form
Using matlab:
A = [1 1 2 5
2 2 3 5
3 3 4 6
4 3 5 5
5 6 4 5]
%% Initialization
% index of matrix line sharing the same group
ind = 1
% length of the index
len = length(ind)
% the group array
g = []
% group counter
c = 1
% Start the small algorithm
while 1
% Check if another line with the same "Value" share some common node
ind = find(any(ismember(A(:,2:3),A(ind,2:3)) & A(:,4) == A(ind(end),4),2));
% If there is no new line, we create a group with the discovered line
if length(ind) == len
%group assignment
g(A(ind,1)) = c
c = c+1
% delete the already discovered line (or node...)
A(ind,:) = []
% break if no more node
if isempty(A)
break
end
% reset the index for the next group
ind = 1;
end
len = length(ind);
end
And here is the output:
g =
1 1 2 1 3
As expected
How can I find all the points (or the region) in an image which has 3-dimensions in which the first two dimensions show the resolution and the 3rd one shows the density? I can use Matlab or Python. I wonder if there is a native function for finding those points that is least computationally expensive.
UPDATE:
Imagine I have the following:
A= [1,2,3; 4,6,6; 7,6,6]
A =
1 2 3
4 6 6
7 6 6
>> B=[7,8,9; 10,11,11; 1, 11,11]
B =
7 8 9
10 11 11
1 11 11
>> C=[0,1,2; 3, 7, 7; 5,7,7]
C =
0 1 2
3 7 7
5 7 7
How can I find the lower square in which all the values of A equal the same all the values of B and all the values of C? If this is too much how can I find the lower square in A wherein all the values in A are equal?
*The shown values are the intensity of the image.
UPDATE: tries the provided answer and got this error:
>> c=conv2(M,T, 'full');
Warning: CONV2 on values of class UINT8 is obsolete.
Use CONV2(DOUBLE(A),DOUBLE(B)) or CONV2(SINGLE(A),SINGLE(B)) instead.
> In uint8/conv2 (line 10)
Undefined function 'conv2' for input arguments of type 'double' and attributes 'full 3d real'.
Error in uint8/conv2 (line 17)
y = conv2(varargin{:});
*Also tried convn and it took forever so I just stopped it!
Basically how to do this for a 2D array as described above?
A possible solution:
A = [1,2,3; 4,6,6; 7,6,6];
B = [7,8,9; 10,11,11; 1, 11,11];
C = [0,1,2; 3, 7, 7; 5,7,7];
%create a 3D array
D = cat(3,A,B,C)
%reshape the 3D array to 2D
%its columns represent the third dimension
%and its rows represent resolution
E = reshape(D,[],size(D,3));
%third output of the unique function applied row-wise to the data
%represents the label of each pixel a [m*n, 1] vector created
[~,~,F] = unique(E,'rows');
%reshape the vector to a [m, n] matrix of labels
result = reshape(F, size(D,1), size(D,2));
You can reshape the 3D matrix to a 2D matrix (E) that its columns represent the third dimension and its rows represent resolution.
Then using unique function you can label the image.
We have a 3D matrix:
A =
1 2 3
4 6 6
7 6 6
B =
7 8 9
10 11 11
1 11 11
C =
0 1 2
3 7 7
5 7 7
When we reshape the 3D matrix to a 2D matrix E we get:
E =
1 7 0
4 10 3
7 1 5
2 8 1
6 11 7
6 11 7
3 9 2
6 11 7
6 11 7
So we need to classify the rows base on their values.
Unique function is capable of extracting unique rows and assign the same label to rows that are equal to each other.
Here varible F capture third output of the unique function that is label of each row.
F =
1
4
6
2
5
5
3
5
5
that should be reshaped to 2D
result =
1 2 3
4 5 5
6 5 5
so each region has different label.
If you want to segment distinct regions(based on both their values and their spatial positions) you need to do labeling the image in a loop
numcolors = max(F);
N = 0;
segment = zeros(size(result));
for c = 1 : numcolors
[label,n] = bwlabel(result==c);
segment = segment +label + logical(label)*N;
N = N + n;
end
So here you need to mark disconnected regions that have the same values with different labels. since MATLAB doesn't have functions for gray segmentation You can use bwlabel function multiple times to do segmentation and add result of the previous iteration to result of current iteration. segment variable contains the segmentd image.
*Note: this result obtained from GNU Octave that its labeling is different from MATLAB. if You use unique(E,'rows','last'); result of MATLAB and Octave will be the same.
You can use a pair of horizontal and vertical 1D filters such that the horizontal filter has a kernel of [1 -1] while the vertical filter has a kernel of [1; -1]. The effect of this is that it takes both horizontal and vertical pairwise distances for each element in each dimension separately. You can then perform image filtering or convolution using these two kernels ensuring that you replicate the borders. To be able to find uniform regions, by checking which regions in both results map to 0 between them both, this gives you areas where areas that are uniform over all channels independently.
To do this, you would first take the opposite of both filtering results so that uniform regions that would become 0 are now 1 and vice-versa. that you perform the logical AND operation on both of these together and then ensure that for each pixel temporally, all of the values are true. This would mean that for a spatial location in this image, all values experience the same uniformity as you expect.
In MATLAB, assuming you have the Image Processing Toolbox, use imfilter to filter the images, then use all in MATLAB to look temporally after the two filtering results, and then use regionprops to find the coordinates of the regions you seek. So do something like this:
%# Reproducing your data
A = [1,2,3; 4,6,6; 7,6,6];
B = [7,8,9; 10,11,11; 1, 11,11];
C = [0,1,2; 3, 7, 7; 5,7,7];
%# Create a 3D matrix to allow for efficient filtering
D = cat(3, A, B, C);
%# Filter using the kernels
ker = [1 -1];
ker2 = ker.'; %#
out = imfilter(D, ker, 'replicate');
out2 = imfilter(D, ker2, 'replicate');
%# Find uniform regions
regions = all(~out & ~out2, 3);
%# Determine the locations of the uniform areas
R = regionprops(regions, 'BoundingBox');
%# Round to ensure pixel accuracy and reshape into a matrix
coords = round(reshape([R.BoundingBox], 4, [])).';
coords would be a N x 4 matrix with each row telling the upper-left coordinates of the bounding box origin as well as the width and height of the bounding box. The first and second elements in a row are the column and row coordinate while the third and fourth elements are the width and height of the bounding box.
The regions we have detected can be found in the regions variable. Both of these show:
>> regions
regions =
3×3 logical array
0 0 0
0 1 1
0 1 1
>> coords
coords =
2 2 2 2
This tells us that we have localised the region of "uniformity" to be the bottom right corner while the coordinates of the top-left corner of the bounding box are row 2, column 2 with a width and height of 2 and 2 respectively.
check out https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.signal.correlate2d.html
2D correlation basically "slides" the two images across each other, and adds up the dot product of the overlap.
more reading: http://www.cs.umd.edu/~djacobs/CMSC426/Convolution.pdf
https://en.wikipedia.org/wiki/Two-dimensional_correlation_analysis
I am looking for the right approach for solve the following task (using python):
I have a dataset which is a 2D matrix. Lets say:
1 2 3
5 4 7
8 3 9
0 7 2
From each row I need to pick one number which is not 0 (I can also make it NaN if that's easier).
I need to find the combination with the lowest total sum.
So far so easy. I take the lowest value of each row.
The solution would be:
1 x x
x 4 x
x 3 x
x x 2
Sum: 10
But: There is a variable minimum and a maximum sum allowed for each column. So just choosing the minimum of each row may lead to a not valid combination.
Let's say min is defined as 2 in this example, no max is defined. Then the solution would be:
1 x x
5 x x
x 3 x
x x 2
Sum: 11
I need to choose 5 in row two as otherwise column one would be below the minimum (2).
I could use brute force and test all possible combinations. But due to the amount of data which needs to be analyzed (amount of data sets, not size of each data set) that's not possible.
Is this a common problem with a known mathematical/statistical or other solution?
Thanks
Robert