The problem is:
Given an array containing 0s and 1s, if you are allowed to replace no more than âkâ 0s with 1s, find the length of the longest contiguous subarray having all 1s.
Input: Array=[0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1], k=2
Output: 6
Explanation: Replace the '0' at index 5 and 8 to have the longest contiguous subarray of 1s having length 6.
def length_of_longest_substring(arr, k):
'''
Create a hashmap that records the values of 0 and 1, initialize them to 0. Do a sliding
window.
WHILE the frequency of 0 is greater than k, subtract arr[windowStart] from HM and then
increment
wS.
Use the max function to record longest substring length. Return that.
'''
hm = {'0': '0', '1': '0'}
(windowStart, longest) = (0, 0)
for windowEnd in range(len(arr)):
right = arr[windowEnd]
hm[right] = hm.get(right, 0) + 1
while hm["0"] > k:
hm[arr[windowStart]] -= 1
windowStart += 1
longest = max(longest, windowEnd - windowStart + 1)
return longest
def main():
print(length_of_longest_substring([1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1], 2))
#Return 6
print(length_of_longest_substring([1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1], 3))
#Return 9
main()
I am getting error with "while hm["0"] > k:" it says
File "main.py", line 12, in length_of_longest_substring
while hm["0"] > k:
KeyError: 0
It works if I replace both starting indices with 0.
I tried the .get function aswell. I did hm.get("0"), same error.
I want the while loop to count the VALUES of 0. How can I achieve that? Thank you in advance, all is very much appreciated.
I think you got the logic wrong in your for loop. I have modified it and the error got fixed.
# While the frequency of 0 is greater than k, subtract arr[windowStart] from hm and then increment windowStart.
if hm[0] > k:
hm[arr[windowStart]] -= 1
windowStart += 1
# Record longest substring length.
longest = max(longest, windowEnd - windowStart + 1)
# Increment the frequency of arr[windowEnd] in hm.
hm[arr[windowEnd]] += 1
Output:
6
10
You are comparing integer and STR convert the STR to int
Note: I am not solving the problem but resolving the error as requested by the question.
def length_of_longest_substring(arr, k):
'''
Create a hashmap that records the values of 0 and 1, initialize them to 0. Do a sliding
window.
WHILE the frequency of 0 is greater than k, subtract arr[windowStart] from HM and then
increment
wS.
Use the max function to record longest substring length. Return that.
'''
hm = {'0': '0', '1': '0'}
(windowStart, longest) = (0, 0)
for windowEnd in range(len(arr)):
right = arr[windowEnd]
hm[right] = hm.get(right, 0) + 1
while (((int)(hm["0"])) > k):
hm[arr[windowStart]] -= 1
windowStart += 1
longest = max(longest, windowEnd - windowStart + 1)
return longest
def main():
print(length_of_longest_substring([1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1], 2))
#Return 6
print(length_of_longest_substring([1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1], 3))
#Return 9
main()
Related
I'm trying to figure it out a way to count the number of times that a subset appears in a list of lists. For example if I have the following list:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]
The pattern [0,0,1,0,1,0] appears in three of the four items of the list (i.e. in three of the lists, the elements at index 2 and index 4 are set to 1, just like in the pattern). How can I count the number of times that the pattern appears?
So far I've tried this, but it does not work:
subsets_count = []
for i in range(len(dataset)):
current_subset_count = 0
for j in range(len(dataset)):
if dataset[i] in dataset[j]:
subset_count += 1
subsets_count.append(current_subset_count)
Using one of my favorite itertools, compress:
[sum(all(compress(e, d)) for e in dataset)
for d in dataset]
Results in (Try it online!):
[3, 1, 1, 1]
For each sublist, generate a set of indices where the ones exist. Do the same for the pattern. Then, for each set of indices, find whether the pattern indices are a subset of that set. If so, the pattern is in the sublist.
one_indices_of_subsets = [{i for i, v in enumerate(sublist) if v} for sublist in dataset]
pattern_indices = {i for i, v in enumerate(pattern) if v}
result = sum(1 for s in one_indices_of_subsets if pattern_indices <= s)
print(result)
This outputs:
3
This allows for one digit to be different from the pattern.
Straight forward pattern matcher:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]
pattern = [0,0,1,0,1,0]
m = len(pattern)
subsets_count = 0
for i in range(len(dataset)):
count = 0
for j in range(m):
if dataset[i][j] == pattern[j]:
count +=1
if count >= m-1:
subsets_count +=1
print(subsets_count)
Output:
3
if you want to count a pattern (by taking into account the order of the pattern) you can simply use the .count() function by applying it as follows:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]
num_count = dataset.count([0,0,1,0,1,0])
print(num_count)
output:
2
and if you dont care about the order of the 0's and ones, you can use:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]
num_count = [sum(el) for el in dataset].count(sum([0,0,1,0,1,0]))
print(num_count)
output2:
3
Try:
dataset = [
[0, 0, 1, 0, 1, 0],
[0, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 0],
[0, 1, 1, 0, 0, 0],
]
pat = [0, 0, 1, 0, 1, 0]
cnt = sum(all(a == b for a, b in zip(pat, d) if a == 1) for d in dataset)
print(cnt)
Prints:
3
How can I create a function that returns True if there are 2 or more items in my list that are different from 0 and returns False if there are fewer than 2 items in my list that aren't 0?
(Incorrect code so you get the idea)
list=[0, 0, 0, 0, 0, 1, 0 , 4]
def checker:
if > 2 items in list are > 0:
return True
else:
return False
How could I actually do this in Python?
You don't need to loop over the numbers, just count the zeroes and compare to the length of the list...
my_list=[0, 0, 0, 0, 0, 1, 0 , 4]
def checker(my_list):
return len(my_list) - my_list.count(0) >= 2
checker(my_list)
A straight-forward solution is to count the number of elements that are not 0
def checker(lst):
counter = 0
for i in lst:
if (i != 0):
counter += 1
return counter >= 2
A better solution is to use list comprehension:
def checker(lst):
return len([i for i in lst if i != 0]) >= 2
def checker(l, thresh=2):
return len([i for i in l if i > 0]) >= thresh
list1=[0, 0, 0, 0, 0, 1, 0 , 4]
if len([x for x in list1 if x!=0 ])>1:
print('True')
else:
print('False')
The most efficient method would simply be to use the count method of a list:
def check(l):
return (len(l) - l.count(0)) >= 2
# your code goes here
data=[0, 0, 0, 0, 0 , 0 , 4]
def checker(data):
return len(list(filter(lambda x: x!=0, data)))>=2
print(checker(data))
You could do it this way:
lst=[0, 0, 0, 0, 0, 1, 0 , 4]
if sum(map(bool,lst)) >= 2:
print("2 or more non-zero")
else:
print("fewer than 2 non-zero")
I have a list of 1 and 0 --> output = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
I would like to convert that list of ones and zeroes to a string, where each 8 bits in "litlle-endian" represent one letter in "latin1"
So far I have this code (below) which works fine, but I think its quite slow and seem to slow down my script...
for i in range(0,len(output),8):
x=output[i:i+8]
l="".join([str(j) for j in x[::-1]])
out_str += chr(int(("0b"+l),base=2))
Do you have any faster ideas?
Here's a faster solution using a dictionary of tuples for the 256 possible characters:
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
chars = { tuple(map(int,f"{n:08b}"[::-1])):chr(n) for n in range(0,256) }
def toChars(bits):
return "".join(chars[tuple(bits[i:i+8])] for i in range(0,len(bits),8) )
roughly 3x faster than original solution
[EDIT] and an even faster one using bytes and zip:
chars = { tuple(map(int,f"{n:08b}")):n for n in range(256) }
def toChars(bits):
return bytes(chars[b] for b in zip(*(bits[7-i::8] for i in range(8)))).decode()
about 2x faster than the previous one (on long lists)
[EDIT2] a bit of explanations for this last one ...
b in the list comprehension will be a tuple of 8 bits
chars[b] will return an integer corresponding to the 8 bits
bytes(...).decode() converts the list of integers to a string based on the chr(n) of each value
zip(*(... 8 bit iterators...)) unpacks the 8 striding ranges of bits running in parallel, each from a different starting point
The strategy with the unpacked zip is to go through the bits in steps of 8. For example, if we were going through 8 parallel ranges, we would get this:
bits[7::8] -> [ 0, 0, ... ] zip returns: (0,1,0,0,0,1,1)
bits[6::8] -> [ 1, 1, ... ] (0,1,1,0,1,1,1)
bits[5::8] -> [ 0, 1, ... ] ...
bits[4::8] -> [ 0, 0, ... ]
bits[3::8] -> [ 0, 1, ... ]
bits[2::8] -> [ 0, 1, ... ]
bits[1::8] -> [ 1, 1, ... ]
bits[0::8] -> [ 1, 1, ... ]
The zip function will take one column of this per iteration and return it as a tuple of bits.
#!/usr/bin/python
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
result = []
c = 0
for i,v in enumerate(bits):
i = i % 8
c = c | v << i
if i == 7:
result.append(chr(c))
c = 0
print(''.join(result))
Testing:
$ python ./test.py
Co
Using sum and enumerate should be faster, as they are built-ins. Let's time yours and mine, on the same machine.
Run 100,000 times in a loop and tested with time python3 tmp.py. (user values. For both the amount of sys time hovered around 0m0.012s, so it only had a percentual influence on the results.)
Yours: 0m1.624s
Mine is 50% faster: 0m1.063s, with this
output = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
for item in [output[i:i + 8] for i in range(0, len(output), 8)]:
out_str += chr(sum(x<<i for i,x in enumerate(item)))
I did some measuments of the execution time for all valid solutions. See the results below in the code. Codes are sorted from slowest to fastest. Fatest being the one from Alain T.. I've tested the codes on a quite large list resulting in a string of 200000 characters.
Even for such a large list the execution time is still pretty fast also for my original solution. There has to be an issue somewhere else in my program... :-)
Thank you all for your codes!
import time
start_time = time.time()
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0] * 100000
### tested code ###
print("Execution time: ", time.time() - start_time, "seconds")
### former solution --> 0.59 seconds
out_str = ""
for i in range(0,len(bits),8):
x=bits[i:i+8]
l="".join([str(j) for j in x[::-1]])
out_str += chr(int(("0b"+l),base=2))
### enumerate and result.append --> 0.48 seconds
result = []
c = 0
for i,v in enumerate(bits):
i = i % 8
c = c | v << i
if i == 7:
result.append(chr(c))
c = 0
out_str = ''.join(result)
### sum and enumerate --> 0.45 seconds
out_str = ""
for item in [bits[i:i + 8] for i in range(0, len(bits), 8)]:
out_str += chr(sum(x<<i for i,x in enumerate(item)))
### map and chars dictionary --> 0.10 seconds
chars = { tuple(map(int,f"{n:08b}"[::-1])):chr(n) for n in range(0,256) }
def toChars(bits):
return "".join(chars[tuple(bits[i:i+8])] for i in range(0,len(bits),8) )
### bytes and zip --> 0.06 seconds
chars = { tuple(map(int,f"{n:08b}")):n for n in range(256) }
def toChars(bits):
return bytes(chars[b] for b in zip(*(bits[7-i::8] for i in range(8)))).decode()
EDIT:
I wrote the best (fastest) solution in a more understandable form (not using list comprehensions) so I could step through the code because it took me some while to understand how it works (solution by Alain T.):
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0] * 10
chars = {tuple(map(int,f"{n:08b}")):n for n in range(256)}
temp = []
out = []
for i in range(8):
temp.append(bits[7-i::8])
unzipped = zip(*temp)
for b in unzipped:
out.append(bytes([chars[b]]).decode())
print("".join(out))
Check whether this is faster:
tmp_list = []
for i in range(0,len(output),8):
byte_value = 0
for digit in output[i:i+8:-1]:
byte_value = (byte_value<<1) + digit
tmp_list.append(chr(byte_value))
out_str = ''.join(tmp_list)
I have a list that looks like this:
a = [0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0...]
How do I get the index of the first 1 in each block of zero - one so the resulting index is:
[8 23 ..] and so on
I've been using this code:
def find_one (a):
for i in range(len(a)):
if (a[i] > 0):
return i
print(find_one(a))
but it gives me only the first occurrence of 1. How can implement it to iterate trough the entire list?
Thank you!!
You can do it using zip and al list comprehension:
a = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
r = [i for n,(i,v) in zip([1]+a,enumerate(a)) if v > n]
print(r) # [8,23]
Since you tagged pandas, can use groupby. If s = pd.Series(a) then
>>> x = s.groupby(s.diff().ne(0).cumsum()).head(1).astype(bool)
>>> x[x].index
Int64Index([8, 23], dtype='int64')
Without pandas:
b = a[1:]
[(num+1) for num,i in enumerate(zip(a,b)) if i == (0,1)]
# `state` is (prev_char, cur_char)
# where `prev_char` is the previous character seen
# and `cur_char` is the current character
#
#
# (0, 1) .... previous was "0"
# current is "1"
# RECORD THE INDEX.
# STRING OF ONES JUST BEGAN
#
# (0, 0) .... previous was "0"
# current is "0"
# do **NOT** reccord the index
#
# (1, 1) .... previous was "1"
# current is "1"
# we are in a string of ones, but
# not the begining of it.
# do **NOT** reccord the index.
#
# (1, 0).... previous was "1"
# current is "0"
# string of ones, just ended
# not the start of a string of ones.
# do **NOT** reccord the index.
state_to_print_decision = dict()
state_to_print_decision[(0, 1)] = True
def find_one (a, state_to_print_decision):
#
# pretend we just saw a bunch of zeros
# initilize state to (0, 0)
state = (0, 0)
for i in range(len(a)):
#
# a[i] is current character
#
# state[0] is the left element of state
#
# state[1] is the right elemet of state
#
# state[1] was current character,
# is now previous character
#
state = (state[1], a[i])
it_is_time_to_print = state_to_print_decision.get(state, False)
if(it_is_time_to_print):
indicies.append()
return indicies
a = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]
print(find_one(a, state_to_print_decision))
I have this code:
gs = open("graph.txt", "r")
gp = gs.readline()
gp_splitIndex = gp.find(" ")
gp_nodeCount = int(gp[0:gp_splitIndex])
gp_edgeCount = int(gp[gp_splitIndex+1:-1])
matrix = [] # predecare the array
for i in range(0, gp_nodeCount):
matrix.append([])
for y in range(0, gp_nodeCount):
matrix[i].append(0)
for i in range(0, gp_edgeCount-1):
gp = gs.readline()
gp_splitIndex = gp.find(" ") # get the index of space, dividing the 2 numbers on a row
gp_from = int(gp[0:gp_splitIndex])
gp_to = int(gp[gp_splitIndex+1:-1])
matrix[gp_from][gp_to] = 1
print matrix
The file graph.txt contains this:
5 10
0 1
1 2
2 3
3 4
4 0
0 3
3 1
1 4
4 2
2 0
The first two number are telling me, that GRAPH has 5 nodes and 10 edges. The Following number pairs demonstrate the edges between nodes. For example "1 4" means an edge between node 1 and 4.
Problem is, the output should be this:
[[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
But instead of that, I get this:
[[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [0, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
Only one number is different and I can't understand why is this happening. The edge "3 1" is not present. Can someone explain, where is the problem?
Change for i in range(0, gp_edgeCount-1): to
for i in range(0, gp_edgeCount):
The range() function already does the "-1" operation. range(0,3) "==" [0,1,2]
And it is not the "3 1" edge that is missing, it is the "2 0" edge that is missing, and that is the last edge. The matrices start counting at 0.
Matthias has it; you don't need edgeCount - 1 since the range function doesn't include the end value in the iteration.
There are several other things you can do to clean up your code:
The with operator is preferred for opening files, since it closes them automatically for you
You don't need to call find and manually slice, split already does what you want.
You can convert and assign directly to a pair of numbers using a generator expression and iterable unpacking
You can call range with just an end value, the 0 start is implicit.
The multiplication operator is handy for initializing lists
With all of those changes:
with open('graph.txt', 'r') as graph:
node_count, edge_count = (int(n) for n in graph.readline().split())
matrix = [[0]*node_count for _ in range(node_count)]
for i in range(edge_count):
src, dst = (int(n) for n in graph.readline().split())
matrix[src][dst] = 1
print matrix
# [[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
Just to keep your code and style, of course it could be much more readable:
gs = open("graph.txt", "r")
gp = gs.readline()
gp_splitIndex = gp.split(" ")
gp_nodeCount = int(gp_splitIndex[0])
gp_edgeCount = int(gp_splitIndex[1])
matrix = [] # predecare the array
for i in range(0, gp_nodeCount):
matrix.append([])
for y in range(0, gp_nodeCount):
matrix[i].append(0)
for i in range(0, gp_edgeCount):
gp = gs.readline()
gp_Index = gp.split(" ") # get the index of space, dividing the 2 numbers on a row
gp_from = int(gp_Index[0])
gp_to = int(gp_Index[1])
matrix[gp_from][gp_to] = 1
print matrix
Exactly is the last instance not used..the 2 0 from your file. Thus the missed 1. Have a nice day!
The other answers are correct, another version similar to the one of tzaman:
with open('graph.txt', mode='r') as txt_file:
lines = [l.strip() for l in txt_file.readlines()]
number_pairs = [[int(n) for n in line.split(' ')] for line in lines]
header = number_pairs[0]
edge_pairs = number_pairs[1:]
num_nodes, num_edges = header
edges = [[0] * num_nodes for _ in xrange(num_nodes)]
for edge_start, edge_end in edge_pairs:
edges[edge_start][edge_end] = 1
print edges