Find number of positives in segment of list - python

I have a unsorted list. I'm asked to print k-times number of positive values in a segment.The boarders for segment start with 1 not 0. So, if the boarder [1,3] it means that we should find all positives among elements with indices [0,1,2]
For example,
2 -1 2 -2 3
4
1 1
1 3
2 4
1 5
The answer needs to be:
1
2
1
3
Currently, I'm creating a list with length as original where i equals 1 if original is positive and 0 if original is negative or zero. After that I sum for this segment:
lst = list(map(int, input().split()))
k = int(input())
neg = [1 if x > 0 else 0 for x in lst]
for i in range(k):
l,r = map(int, input().split())
l = l - 1
print(sum(neg[l:r]))
Despite the fact that it's the fastest code that I created so far, it is still too slow for this task. How would I optimize it (or make it faster)?

If I understand you correctly, there doesn't seem to be a lot of room for optimization. The only thing that comes to mind really is that the lst and neg steps could be combined, which would save one loop:
positive = [int(x) > 0 for x in input().split()]
k = int(input())
for i in range(k):
l, r = map(int, input().split())
print(sum(positive[l-1:r]))
We can just have bools in the positive list, because bool is just a subclass of int, meaning True is treated like 1 and False like 0. (Also I would call the list positive instead of neg.)
The complexity is still O(n) though.

Related

Print 1..N² in NxN matrix, starting at bottom-right and zig-zag

Given an input n, I want to print n lines with each n numbers such that the numbers 1 through n² are displayed in a zig-zag way, starting with 1 appearing at the bottom-right corner of the output matrix, and 2 at the end of the one-but-last row, ...etc.
Examples:
Given Input 3.
Print:
9 4 3
8 5 2
7 6 1
Given Input 1.
Print:
1
Given Input 4.
Print:
13 12 5 4
14 11 6 3
15 10 7 2
16 9 8 1
Attempt
n = int(input("Enter dimensions of matrix :"))
m = n
x = 1
columns = []
for row in range(n):
inner_column = []
for col in range(m):
inner_column.append(x)
x = x + 1
columns.append(inner_column)
for inner_column in columns:
print(' '.join(map(str, inner_column)))
I've tried something like this, but it prints out the array incorrectly. Any ideas?
Your code explicitly performs x = 1 and then x = x + 1 in a loop. As you need the first column in reverse order, and there are n*n numbers to output, instead the first top-left value should be x = n * n and in the first column it should decrease like with x = x - 1. The next column should be filled from end to start, and the next should be filled from start to end, ...etc.
I would suggest making an iterator that visits rows in that zig-zag manner: 0, 1, 2, ... n - 1, and then n - 1, n - 2, ... 0, and back from the start. With that iterator you know exactly to which row you should append the next x value:
# Helper function to generate row numbers in zig-zag order, for as
# long as needed.
def zigzag(n):
if n % 2:
yield from range(n)
while True:
yield from range(n - 1, -1, -1)
yield from range(n)
n = int(input("Enter dimensions of matrix :"))
matrix = [[] for _ in range(n)]
visit = zigzag(n)
for x in range(n*n, 0, -1):
matrix[next(visit)].append(x)
Then print it:
for row in matrix:
print(' '.join(map(str, row)))

Pick subset of items minimizing the count of the most frequent of the selected item's labels

Problem
I want to pick a subset of fixed size from a list of items such that the count of the most frequent occurrence of the labels of the selected items is minimized. In English, I have a DataFrame consisting of a list of 10000 items, generated as follows.
import random
import pandas as pd
def RandLet():
alphabet = "ABCDEFG"
return alphabet[random.randint(0, len(alphabet) - 1)]
items = pd.DataFrame([{"ID": i, "Label1": RandLet(), "Label2": RandLet(), "Label3": RandLet()} for i in range(0, 10000)])
items.head(3)
Each item has 3 labels. The labels are letters within ABCDEFG, and the order of the labels doesn't matter. An item may be tagged multiple times with the same label.
[Example of the first 3 rows]
ID Label1 Label2 Label3
0 0 G B D
1 1 C B C
2 2 C A B
From this list, I want to pick 1000 items in a way that minimizes the number of occurrences of the most frequently appearing label within those items.
For example, if my DataFrame only consisted of the above 3 items, and I only wanted to pick 2 items, and I picked items with ID #1 and #2, the label 'C' appears 3 times, 'B' appears 2 times, 'A' appears 1 time, and all other labels appear 0 times - The maximum of these is 3. However, I could have done better by picking items #0 and #2, in which label 'B' appears the most frequently, coming in as a count of 2. Since 2 is less than 3, picking items #0 and #2 is better than picking items #1 and #2.
In the case where there are multiple ways to pick 1000 items such that the count of the maximum label occurrence is minimized, returning any of those selections is fine.
What I've got
To me, this feels similar a knapsack problem in len("ABCDEFG") = 7 dimensions. I want to put 1000 items in the knapsack, and each item's size in the relevant dimension is the sum of the occurrences of the label for that particular item. To that extent, I've built this function to convert my list of items into a list of sizes for the knapsack.
def ReshapeItems(items):
alphabet = "ABCDEFG"
item_rebuilder = []
for i, row in items.iterrows():
letter_counter = {}
for letter in alphabet:
letter_count = sum(row[[c for c in items.columns if "Label" in c]].apply(lambda x: 1 if x == letter else 0))
letter_counter[letter] = letter_count
letter_counter["ID"] = row["ID"]
item_rebuilder.append(letter_counter)
items2 = pd.DataFrame(item_rebuilder)
return items2
items2 = ReshapeItems(items)
items2.head(3)
[Example of the first 3 rows of items2]
A B C D E F G ID
0 0 1 0 1 0 0 1 0
1 0 1 2 0 0 0 0 1
2 1 1 1 0 0 0 0 2
Unfortunately, at that point, I am completely stuck. I think that the point of knapsack problems is to maximize some sort of value, while keeping the sum of the selected items sizes under some limit - However, here my problem is the opposite, I want to minimize the sum of the selected size such that my value is at least some amount.
What I'm looking for
Although a function that takes in items or items2 and returns a subset of these items that meets my specifications would be ideal, I'd be happy to accept any sufficiently detailed answer that points me in the right direction.
Using a different approach, here is my take on your interesting question.
def get_best_subset(
df: pd.DataFrame, n_rows: int, key_cols: list[str], iterations: int = 50_000
) -> tuple[int, pd.DataFrame]:
"""Subset df in such a way that the frequency
of most frequent values in key columns is minimum.
Args:
df: input dataframe.
n_rows: number of rows in subset.
key_cols: columns to consider.
iterations: max number of tries. Defaults to 50_000.
Returns:
Minimum frequency, subset of n rows of input dataframe.
"""
lowest_frequency: int = df.shape[0] * df.shape[1]
best_df = pd.DataFrame([])
# Iterate through possible subsets
for _ in range(iterations):
sample_df = df.sample(n=n_rows)
# Count values in each column, concat and sum counts, get max count
frequency = (
pd.concat([sample_df[col].value_counts() for col in key_cols])
.pipe(lambda df_: df_.groupby(df_.index).sum())
.max()
)
if frequency < lowest_frequency:
lowest_frequency = frequency
best_df = sample_df
return lowest_frequency, best_df.sort_values(by=["ID"]).reset_index(drop=True)
And so, with the toy dataframe constructor you provided:
lowest_frequency, best_df = get_best_subset(
items, 1_000, ["Label1", "Label2", "Label3"]
)
print(lowest_frequency)
# 431
print(best_df)
# Output
ID Label1 Label2 Label3
0 39 F D A
1 46 B G E
2 52 D D B
3 56 D F B
4 72 C D E
.. ... ... ... ...
995 9958 A F E
996 9961 E E E
997 9966 G E C
998 9970 B C B
999 9979 A C G
[1000 rows x 4 columns]

Running perfectly on IDE but line (if mat[j] == mat[colindex]:) is giving index out of range error when submitting on geeks for geeks

t = int(input())
lis =[]
for i in range(t):
col = list(map(int,input()))
colindex = col[0] - 1
count = 0
matsize = col[0] * col[0]
mat = list(map(int,input().split()))
while len(lis) != matsize:
for j in range(len(mat)):
if colindex < len(mat):
if mat[j] == mat[colindex]:
lis.append(mat[j])
colindex += col[0]
count += 1
colindex = col[0] - 1
colindex -= count
for i in lis:
print(i,end= ' ')
Given a square matrix mat[][] of size N x N. The task is to rotate it by 90 degrees in anti-clockwise direction without using any extra space.
Input:
The first line of input contains a single integer T denoting the number of test cases. Then T test cases follow. Each test case consist of two lines. The first line of each test case consists of an integer N, where N is the size of the square matrix.The second line of each test case contains N x N space separated values of the matrix mat.
Output:
Corresponding to each test case, in a new line, print the rotated array.
Constraints:
1 ≤ T ≤ 50
1 ≤ N ≤ 50
1 <= mat[][] <= 100
Example:
Input:
2
3
1 2 3 4 5 6 7 8 9
2
5 7 10 9
Output:
3 6 9 2 5 8 1 4 7
7 9 5 10
Explanation:
Testcase 1: Matrix is as below:
1 2 3
4 5 6
7 8 9
Rotating it by 90 degrees in anticlockwise directions will result as below matrix:
3 6 9
2 5 8
1 4 7
https://practice.geeksforgeeks.org/problems/rotate-by-90-degree/0
It doesn't look like there is a problem with j. Can colindex ever be below 0? One way to identify this would be to simply keep track of the counters. For example, you can add an extra if condition if colindex >= 0: before if mat[j] == mat[colindex].
Rather than using one dimensional list, we can use two dimensional list to solve this challenge. From the given statement and sample test case, we get the following information:
Print the rotated matrix in a single line.
If the given matrix has n columns, the rotated matrix will have the sequential elements of n-1th column, n-2th column, .. 0th column.
Here is my accepted solution of this challenge:
def get_rotated_matrix(ar, n):
ar_2d = []
for i in range(0, len(ar)-n+1, n):
ar_2d.append(ar[i:i+n])
result = []
for i in range(n-1, -1, -1):
for j in range(n):
result.append(str(ar_2d[j][i]))
return result
cas = int(input())
for t in range(cas):
n = int(input())
ar = list(map(int, input().split()))
result = get_rotated_matrix(ar, n)
print(" ".join(result))
Explanation:
To make the solution simple, I created a 2 dimensional list to store the input data as a 2D matrix called ar_2d.
Then I traverse the matrix column wise; from last column to first column and appended the values to our result list as string value.
Finally, I have printed the result with space between elements using join method.
Disclaimer:
My solution uses a 1D list to store the rotated matrix elements thus usages extra space.

Masking nested array with value at index with a second array

I have a nested array with some values. I have another array, where the length of both arrays are equal. I'd like to get an output, where I have a nested array of 1's and 0's, such that it is 1 where the value in the second array was equal to the value in that nested array.
I've taken a look on existing stack overflow questions but have been unable to construct an answer.
masks_list = []
for i in range(len(y_pred)):
mask = (y_pred[i] == y_test.values[i]) * 1
masks_list.append(mask)
masks = np.array(masks_list);
Essentially, that's the code I currently have and it works, but I think that it's probably not the most effecient way of doing it.
YPRED:
[[4 0 1 2 3 5 6]
[0 1 2 3 5 6 4]]
YTEST:
8 1
5 4
Masks:
[[0 0 1 0 0 0 0]
[0 0 0 0 0 0 1]]
Another good solution with less line of code.
a = set(y_pred).intersection(y_test)
f = [1 if i in a else 0 for i, j in enumerate(y_pred)]
After that you can check performance like in this answer as follow:
import time
from time import perf_counter as pc
t0=pc()
a = set(y_pred).intersection(y_test)
f = [1 if i in a else 0 for i, j in enumerate(y_pred)]
t1 = pc() - t0
t0=pc()
for i in range(len(y_pred)):
mask = (y_pred[i] == y_test[i]) * 1
masks_list.append(mask)
t2 = pc() - t0
val = t1 - t2
Generally it means if value is positive than the first solution are slower.
If you have np.array instead of list you can try do as described in this answer:
type(y_pred)
>> numpy.ndarray
y_pred = y_pred.tolist()
type(y_pred)
>> list
Idea(least loop): compare array and nested array:
masks = np.equal(y_pred, y_test.values)
you can look at this too:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values

Comparing rows of two pandas dataframes?

This is a continuation of my question. Fastest way to compare rows of two pandas dataframes?
I have two dataframes A and B:
A is 1000 rows x 500 columns, filled with binary values indicating either presence or absence.
For a condensed example:
A B C D E
0 0 0 0 1 0
1 1 1 1 1 0
2 1 0 0 1 1
3 0 1 1 1 0
B is 1024 rows x 10 columns, and is a full iteration from 0 to 1023 in binary form.
Example:
0 1 2
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1
I am trying to find which rows in A, at a particular 10 columns of A, correspond with each row of B.
Each row of A[My_Columns_List] is guaranteed to be somewhere in B, but not every row of B will match up with a row in A[My_Columns_List]
For example, I want to show that for columns [B,D,E] of A,
rows [1,3] of A match up with row [6] of B,
row [0] of A matches up with row [2] of B,
row [2] of A matches up with row [3] of B.
I have tried using:
pd.merge(B.reset_index(), A.reset_index(),
left_on = B.columns.tolist(),
right_on =A.columns[My_Columns_List].tolist(),
suffixes = ('_B','_A')))
This works, but I was hoping that this method would be faster:
S = 2**np.arange(10)
A_ID = np.dot(A[My_Columns_List],S)
B_ID = np.dot(B,S)
out_row_idx = np.where(np.in1d(A_ID,B_ID))[0]
But when I do this, out_row_idx returns an array containing all the indices of A, which doesn't tell me anything.
I think this method will be faster, but I don't know why it returns an array from 0 to 999.
Any input would be appreciated!
Also, credit goes to #jezrael and #Divakar for these methods.
I'll stick by my initial answer but maybe explain better.
You are asking to compare 2 pandas dataframes. Because of that, I'm going to build dataframes. I may use numpy, but my inputs and outputs will be dataframes.
Setup
You said we have a a 1000 x 500 array of ones and zeros. Let's build that.
A_init = pd.DataFrame(np.random.binomial(1, .5, (1000, 500)))
A_init.columns = pd.MultiIndex.from_product([range(A_init.shape[1]/10), range(10)])
A = A_init
In addition, I gave A a MultiIndex to easily group by columns of 10.
Solution
This is very similar to #Divakar's answer with one minor difference that I'll point out.
For one group of 10 ones and zeros, we can treat it as a bit array of length 8. We can then calculate what it's integer value is by taking the dot product with an array of powers of 2.
twos = 2 ** np.arange(10)
I can execute this for every group of 10 ones and zeros in one go like this
AtB = A.stack(0).dot(twos).unstack()
I stack to get a row of 50 groups of 10 into columns in order to do the dot product more elegantly. I then brought it back with the unstack.
I now have a 1000 x 50 dataframe of numbers that range from 0-1023.
Assume B is a dataframe with each row one of 1024 unique combinations of ones and zeros. B should be sorted like B = B.sort_values().reset_index(drop=True).
This is the part I think I failed at explaining last time. Look at
AtB.loc[:2, :2]
That value in the (0, 0) position, 951 means that the first group of 10 ones and zeros in the first row of A matches the row in B with the index 951. That's what you want!!! Funny thing is, I never looked at B. You know why, B is irrelevant!!! It's just a goofy way of representing the numbers from 0 to 1023. This is the difference with my answer, I'm ignoring B. Ignoring this useless step should save time.
These are all functions that take two dataframes A and B and returns a dataframe of indices where A matches B. Spoiler alert, I'll ignore B completely.
def FindAinB(A, B):
assert A.shape[1] % 10 == 0, 'Number of columns in A is not a multiple of 10'
rng = np.arange(A.shape[1])
A.columns = pd.MultiIndex.from_product([range(A.shape[1]/10), range(10)])
twos = 2 ** np.arange(10)
return A.stack(0).dot(twos).unstack()
def FindAinB2(A, B):
assert A.shape[1] % 10 == 0, 'Number of columns in A is not a multiple of 10'
rng = np.arange(A.shape[1])
A.columns = pd.MultiIndex.from_product([range(A.shape[1]/10), range(10)])
# use clever bit shifting instead of dot product with powers
# questionable improvement
return (A.stack(0) << np.arange(10)).sum(1).unstack()
I'm channelling my inner #Divakar (read, this is stuff I've learned from Divakar)
def FindAinB3(A, B):
assert A.shape[1] % 10 == 0, 'Number of columns in A is not a multiple of 10'
a = A.values.reshape(-1, 10)
a = np.einsum('ij->i', a << np.arange(10))
return pd.DataFrame(a.reshape(A.shape[0], -1), A.index)
Minimalist One Liner
f = lambda A: pd.DataFrame(np.einsum('ij->i', A.values.reshape(-1, 10) << np.arange(10)).reshape(A.shape[0], -1), A.index)
Use it like
f(A)
Timing
FindAinB3 is an order of magnitude faster

Categories

Resources