I have a list of 1 and 0 --> output = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
I would like to convert that list of ones and zeroes to a string, where each 8 bits in "litlle-endian" represent one letter in "latin1"
So far I have this code (below) which works fine, but I think its quite slow and seem to slow down my script...
for i in range(0,len(output),8):
x=output[i:i+8]
l="".join([str(j) for j in x[::-1]])
out_str += chr(int(("0b"+l),base=2))
Do you have any faster ideas?
Here's a faster solution using a dictionary of tuples for the 256 possible characters:
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
chars = { tuple(map(int,f"{n:08b}"[::-1])):chr(n) for n in range(0,256) }
def toChars(bits):
return "".join(chars[tuple(bits[i:i+8])] for i in range(0,len(bits),8) )
roughly 3x faster than original solution
[EDIT] and an even faster one using bytes and zip:
chars = { tuple(map(int,f"{n:08b}")):n for n in range(256) }
def toChars(bits):
return bytes(chars[b] for b in zip(*(bits[7-i::8] for i in range(8)))).decode()
about 2x faster than the previous one (on long lists)
[EDIT2] a bit of explanations for this last one ...
b in the list comprehension will be a tuple of 8 bits
chars[b] will return an integer corresponding to the 8 bits
bytes(...).decode() converts the list of integers to a string based on the chr(n) of each value
zip(*(... 8 bit iterators...)) unpacks the 8 striding ranges of bits running in parallel, each from a different starting point
The strategy with the unpacked zip is to go through the bits in steps of 8. For example, if we were going through 8 parallel ranges, we would get this:
bits[7::8] -> [ 0, 0, ... ] zip returns: (0,1,0,0,0,1,1)
bits[6::8] -> [ 1, 1, ... ] (0,1,1,0,1,1,1)
bits[5::8] -> [ 0, 1, ... ] ...
bits[4::8] -> [ 0, 0, ... ]
bits[3::8] -> [ 0, 1, ... ]
bits[2::8] -> [ 0, 1, ... ]
bits[1::8] -> [ 1, 1, ... ]
bits[0::8] -> [ 1, 1, ... ]
The zip function will take one column of this per iteration and return it as a tuple of bits.
#!/usr/bin/python
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
result = []
c = 0
for i,v in enumerate(bits):
i = i % 8
c = c | v << i
if i == 7:
result.append(chr(c))
c = 0
print(''.join(result))
Testing:
$ python ./test.py
Co
Using sum and enumerate should be faster, as they are built-ins. Let's time yours and mine, on the same machine.
Run 100,000 times in a loop and tested with time python3 tmp.py. (user values. For both the amount of sys time hovered around 0m0.012s, so it only had a percentual influence on the results.)
Yours: 0m1.624s
Mine is 50% faster: 0m1.063s, with this
output = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
for item in [output[i:i + 8] for i in range(0, len(output), 8)]:
out_str += chr(sum(x<<i for i,x in enumerate(item)))
I did some measuments of the execution time for all valid solutions. See the results below in the code. Codes are sorted from slowest to fastest. Fatest being the one from Alain T.. I've tested the codes on a quite large list resulting in a string of 200000 characters.
Even for such a large list the execution time is still pretty fast also for my original solution. There has to be an issue somewhere else in my program... :-)
Thank you all for your codes!
import time
start_time = time.time()
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0] * 100000
### tested code ###
print("Execution time: ", time.time() - start_time, "seconds")
### former solution --> 0.59 seconds
out_str = ""
for i in range(0,len(bits),8):
x=bits[i:i+8]
l="".join([str(j) for j in x[::-1]])
out_str += chr(int(("0b"+l),base=2))
### enumerate and result.append --> 0.48 seconds
result = []
c = 0
for i,v in enumerate(bits):
i = i % 8
c = c | v << i
if i == 7:
result.append(chr(c))
c = 0
out_str = ''.join(result)
### sum and enumerate --> 0.45 seconds
out_str = ""
for item in [bits[i:i + 8] for i in range(0, len(bits), 8)]:
out_str += chr(sum(x<<i for i,x in enumerate(item)))
### map and chars dictionary --> 0.10 seconds
chars = { tuple(map(int,f"{n:08b}"[::-1])):chr(n) for n in range(0,256) }
def toChars(bits):
return "".join(chars[tuple(bits[i:i+8])] for i in range(0,len(bits),8) )
### bytes and zip --> 0.06 seconds
chars = { tuple(map(int,f"{n:08b}")):n for n in range(256) }
def toChars(bits):
return bytes(chars[b] for b in zip(*(bits[7-i::8] for i in range(8)))).decode()
EDIT:
I wrote the best (fastest) solution in a more understandable form (not using list comprehensions) so I could step through the code because it took me some while to understand how it works (solution by Alain T.):
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0] * 10
chars = {tuple(map(int,f"{n:08b}")):n for n in range(256)}
temp = []
out = []
for i in range(8):
temp.append(bits[7-i::8])
unzipped = zip(*temp)
for b in unzipped:
out.append(bytes([chars[b]]).decode())
print("".join(out))
Check whether this is faster:
tmp_list = []
for i in range(0,len(output),8):
byte_value = 0
for digit in output[i:i+8:-1]:
byte_value = (byte_value<<1) + digit
tmp_list.append(chr(byte_value))
out_str = ''.join(tmp_list)
Related
The problem is:
Given an array containing 0s and 1s, if you are allowed to replace no more than âkâ 0s with 1s, find the length of the longest contiguous subarray having all 1s.
Input: Array=[0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1], k=2
Output: 6
Explanation: Replace the '0' at index 5 and 8 to have the longest contiguous subarray of 1s having length 6.
def length_of_longest_substring(arr, k):
'''
Create a hashmap that records the values of 0 and 1, initialize them to 0. Do a sliding
window.
WHILE the frequency of 0 is greater than k, subtract arr[windowStart] from HM and then
increment
wS.
Use the max function to record longest substring length. Return that.
'''
hm = {'0': '0', '1': '0'}
(windowStart, longest) = (0, 0)
for windowEnd in range(len(arr)):
right = arr[windowEnd]
hm[right] = hm.get(right, 0) + 1
while hm["0"] > k:
hm[arr[windowStart]] -= 1
windowStart += 1
longest = max(longest, windowEnd - windowStart + 1)
return longest
def main():
print(length_of_longest_substring([1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1], 2))
#Return 6
print(length_of_longest_substring([1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1], 3))
#Return 9
main()
I am getting error with "while hm["0"] > k:" it says
File "main.py", line 12, in length_of_longest_substring
while hm["0"] > k:
KeyError: 0
It works if I replace both starting indices with 0.
I tried the .get function aswell. I did hm.get("0"), same error.
I want the while loop to count the VALUES of 0. How can I achieve that? Thank you in advance, all is very much appreciated.
I think you got the logic wrong in your for loop. I have modified it and the error got fixed.
# While the frequency of 0 is greater than k, subtract arr[windowStart] from hm and then increment windowStart.
if hm[0] > k:
hm[arr[windowStart]] -= 1
windowStart += 1
# Record longest substring length.
longest = max(longest, windowEnd - windowStart + 1)
# Increment the frequency of arr[windowEnd] in hm.
hm[arr[windowEnd]] += 1
Output:
6
10
You are comparing integer and STR convert the STR to int
Note: I am not solving the problem but resolving the error as requested by the question.
def length_of_longest_substring(arr, k):
'''
Create a hashmap that records the values of 0 and 1, initialize them to 0. Do a sliding
window.
WHILE the frequency of 0 is greater than k, subtract arr[windowStart] from HM and then
increment
wS.
Use the max function to record longest substring length. Return that.
'''
hm = {'0': '0', '1': '0'}
(windowStart, longest) = (0, 0)
for windowEnd in range(len(arr)):
right = arr[windowEnd]
hm[right] = hm.get(right, 0) + 1
while (((int)(hm["0"])) > k):
hm[arr[windowStart]] -= 1
windowStart += 1
longest = max(longest, windowEnd - windowStart + 1)
return longest
def main():
print(length_of_longest_substring([1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1], 2))
#Return 6
print(length_of_longest_substring([1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1], 3))
#Return 9
main()
I'm trying to figure it out a way to count the number of times that a subset appears in a list of lists. For example if I have the following list:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]
The pattern [0,0,1,0,1,0] appears in three of the four items of the list (i.e. in three of the lists, the elements at index 2 and index 4 are set to 1, just like in the pattern). How can I count the number of times that the pattern appears?
So far I've tried this, but it does not work:
subsets_count = []
for i in range(len(dataset)):
current_subset_count = 0
for j in range(len(dataset)):
if dataset[i] in dataset[j]:
subset_count += 1
subsets_count.append(current_subset_count)
Using one of my favorite itertools, compress:
[sum(all(compress(e, d)) for e in dataset)
for d in dataset]
Results in (Try it online!):
[3, 1, 1, 1]
For each sublist, generate a set of indices where the ones exist. Do the same for the pattern. Then, for each set of indices, find whether the pattern indices are a subset of that set. If so, the pattern is in the sublist.
one_indices_of_subsets = [{i for i, v in enumerate(sublist) if v} for sublist in dataset]
pattern_indices = {i for i, v in enumerate(pattern) if v}
result = sum(1 for s in one_indices_of_subsets if pattern_indices <= s)
print(result)
This outputs:
3
This allows for one digit to be different from the pattern.
Straight forward pattern matcher:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]
pattern = [0,0,1,0,1,0]
m = len(pattern)
subsets_count = 0
for i in range(len(dataset)):
count = 0
for j in range(m):
if dataset[i][j] == pattern[j]:
count +=1
if count >= m-1:
subsets_count +=1
print(subsets_count)
Output:
3
if you want to count a pattern (by taking into account the order of the pattern) you can simply use the .count() function by applying it as follows:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]
num_count = dataset.count([0,0,1,0,1,0])
print(num_count)
output:
2
and if you dont care about the order of the 0's and ones, you can use:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]
num_count = [sum(el) for el in dataset].count(sum([0,0,1,0,1,0]))
print(num_count)
output2:
3
Try:
dataset = [
[0, 0, 1, 0, 1, 0],
[0, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 0],
[0, 1, 1, 0, 0, 0],
]
pat = [0, 0, 1, 0, 1, 0]
cnt = sum(all(a == b for a, b in zip(pat, d) if a == 1) for d in dataset)
print(cnt)
Prints:
3
I would like to loop over following check_matrix in such a way that code recognize whether the first and second element is 1 and 1 or 1 and 2 etc? Then for each separate class of pair i.e. 1,1 or 1,2 or 2,2, the code should store in the new matrices, the sum of last element (which in this case has index 8) times exp(-i*q(check_matrix[k][2:5]-check_matrix[k][5:8])), where i is iota (complex number), k is the running index on check_matrix and q is a vector defined as given below. So there are 20 q vectors.
import numpy as np
q= []
for i in np.linspace(0, 10, 20):
q.append(np.array((0, 0, i)))
q = np.array(q)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
This means in principles I will have to have 20 matrices of shape 2x2, corresponding to each q vector.
For the moment my code is giving only one matrix, which appears to be the last one, even though I am appending in the Matrices. My code looks like below,
for i in range(2):
i = i+1
for j in range(2):
j= j +1
j_list = []
Matrices = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
j_list.append(check_matrix[k][8]*np.exp(-1J*np.dot(q,(np.subtract(check_matrix[k][2:5],check_matrix[k][5:8])))))
j_11 = np.sum(j_list)
I_matrix[i-1][j-1] = j_11
Matrices.append(I_matrix)
I_matrix is defined as below:
I_matrix= np.zeros((2,2),dtype=np.complex_)
At the moment I get following output.
Matrices = [array([[-0.66071446-0.77603624j, -0.29038112+2.34855023j], [-0.31387562-0.08116629j, 4.2788 +0.j ]])]
But, I desire to get a matrix corresponding to each q value meaning that in total there should be 20 matrices in this case, where each 2x2 matrix element would be containing sums such that elements belong to 1,1 and 1,2 and 2,2 pairs in following manner
array([[11., 12.],
[21., 22.]])
I shall highly appreciate your suggestion to correct it. Thanks in advance!
I am pretty sure you can solve this problem in an easier way and I am not 100% sure that I understood you correctly, but here is some code that does what I think you want. If you have a possibility to check if the results are valid, I would suggest you do so.
import numpy as np
n = 20
q = np.zeros((20, 3))
q[:, -1] = np.linspace(0, 10, n)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
check_matrix[:, :2] -= 1 # python indexing is zero based
matrices = np.zeros((n, 2, 2), dtype=np.complex_)
for i in range(2):
for j in range(2):
k_list = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
k_list.append(check_matrix[k][8] *
np.exp(-1J * np.dot(q, check_matrix[k][2:5]
- check_matrix[k][5:8])))
matrices[:, i, j] = np.sum(k_list, axis=0)
NOTE: I changed your indices to have consistent
zero-based indexing.
Here is another approach where I replaced the k-loop with a vectored version:
for i in range(2):
for j in range(2):
k = np.logical_and(check_matrix[:, 0] == i, check_matrix[:, 1] == j)
temp = np.dot(check_matrix[k, 2:5] - check_matrix[k, 5:8], q[:, :, np.newaxis])[..., 0]
temp = check_matrix[k, 8:] * np.exp(-1J * temp)
matrices[:, i, j] = np.sum(temp, axis=0)
3 line solution
You asked for efficient solution in your original title so how about this solution that avoids nested loops and if statements in a 3 liner, which is thus hopefully faster?
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
grp=np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[np.sum(x) for x in grp]
output:
[-0.23872600000000002, 1.126557, 0.023742000000000003, 0.21394]
How does it work?
I combine the first two columns into a single index, treating each as "bits" (i.e. base 2)
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
( If you have indexes that exceed 2, you can still use this technique but you will need to use a different base to combine the columns. i.e. if your indices go from 1 to 18, you would need to multiply column 0 by a number equal to or larger than 18 instead of 2. )
So the result of the first line is
array([0., 0., 1., 2., 2., 3.])
Note as well it assumes the data is ordered, that one column changes fastest, if this is not the case you will need an extra step to sort the index and the original check matrix. In your example the data is ordered.
The next step groups the data according to the index, and uses the solution posted here.
np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[array([-0.243293, 0.004567]), array([1.126557]), array([ 0.038934, -0.015192]), array([0.21394])]
i.e. it outputs the 8th column of check_matrix according to the grouping of fac
then the last line simply sums those... knowing how the first two columns were combined to give the single index allows you to map the result back. Or you could simply add it to check matrix as a 9th column if you wanted.
I have a list that looks like this:
a = [0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0...]
How do I get the index of the first 1 in each block of zero - one so the resulting index is:
[8 23 ..] and so on
I've been using this code:
def find_one (a):
for i in range(len(a)):
if (a[i] > 0):
return i
print(find_one(a))
but it gives me only the first occurrence of 1. How can implement it to iterate trough the entire list?
Thank you!!
You can do it using zip and al list comprehension:
a = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
r = [i for n,(i,v) in zip([1]+a,enumerate(a)) if v > n]
print(r) # [8,23]
Since you tagged pandas, can use groupby. If s = pd.Series(a) then
>>> x = s.groupby(s.diff().ne(0).cumsum()).head(1).astype(bool)
>>> x[x].index
Int64Index([8, 23], dtype='int64')
Without pandas:
b = a[1:]
[(num+1) for num,i in enumerate(zip(a,b)) if i == (0,1)]
# `state` is (prev_char, cur_char)
# where `prev_char` is the previous character seen
# and `cur_char` is the current character
#
#
# (0, 1) .... previous was "0"
# current is "1"
# RECORD THE INDEX.
# STRING OF ONES JUST BEGAN
#
# (0, 0) .... previous was "0"
# current is "0"
# do **NOT** reccord the index
#
# (1, 1) .... previous was "1"
# current is "1"
# we are in a string of ones, but
# not the begining of it.
# do **NOT** reccord the index.
#
# (1, 0).... previous was "1"
# current is "0"
# string of ones, just ended
# not the start of a string of ones.
# do **NOT** reccord the index.
state_to_print_decision = dict()
state_to_print_decision[(0, 1)] = True
def find_one (a, state_to_print_decision):
#
# pretend we just saw a bunch of zeros
# initilize state to (0, 0)
state = (0, 0)
for i in range(len(a)):
#
# a[i] is current character
#
# state[0] is the left element of state
#
# state[1] is the right elemet of state
#
# state[1] was current character,
# is now previous character
#
state = (state[1], a[i])
it_is_time_to_print = state_to_print_decision.get(state, False)
if(it_is_time_to_print):
indicies.append()
return indicies
a = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]
print(find_one(a, state_to_print_decision))
I have this code:
gs = open("graph.txt", "r")
gp = gs.readline()
gp_splitIndex = gp.find(" ")
gp_nodeCount = int(gp[0:gp_splitIndex])
gp_edgeCount = int(gp[gp_splitIndex+1:-1])
matrix = [] # predecare the array
for i in range(0, gp_nodeCount):
matrix.append([])
for y in range(0, gp_nodeCount):
matrix[i].append(0)
for i in range(0, gp_edgeCount-1):
gp = gs.readline()
gp_splitIndex = gp.find(" ") # get the index of space, dividing the 2 numbers on a row
gp_from = int(gp[0:gp_splitIndex])
gp_to = int(gp[gp_splitIndex+1:-1])
matrix[gp_from][gp_to] = 1
print matrix
The file graph.txt contains this:
5 10
0 1
1 2
2 3
3 4
4 0
0 3
3 1
1 4
4 2
2 0
The first two number are telling me, that GRAPH has 5 nodes and 10 edges. The Following number pairs demonstrate the edges between nodes. For example "1 4" means an edge between node 1 and 4.
Problem is, the output should be this:
[[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
But instead of that, I get this:
[[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [0, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
Only one number is different and I can't understand why is this happening. The edge "3 1" is not present. Can someone explain, where is the problem?
Change for i in range(0, gp_edgeCount-1): to
for i in range(0, gp_edgeCount):
The range() function already does the "-1" operation. range(0,3) "==" [0,1,2]
And it is not the "3 1" edge that is missing, it is the "2 0" edge that is missing, and that is the last edge. The matrices start counting at 0.
Matthias has it; you don't need edgeCount - 1 since the range function doesn't include the end value in the iteration.
There are several other things you can do to clean up your code:
The with operator is preferred for opening files, since it closes them automatically for you
You don't need to call find and manually slice, split already does what you want.
You can convert and assign directly to a pair of numbers using a generator expression and iterable unpacking
You can call range with just an end value, the 0 start is implicit.
The multiplication operator is handy for initializing lists
With all of those changes:
with open('graph.txt', 'r') as graph:
node_count, edge_count = (int(n) for n in graph.readline().split())
matrix = [[0]*node_count for _ in range(node_count)]
for i in range(edge_count):
src, dst = (int(n) for n in graph.readline().split())
matrix[src][dst] = 1
print matrix
# [[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
Just to keep your code and style, of course it could be much more readable:
gs = open("graph.txt", "r")
gp = gs.readline()
gp_splitIndex = gp.split(" ")
gp_nodeCount = int(gp_splitIndex[0])
gp_edgeCount = int(gp_splitIndex[1])
matrix = [] # predecare the array
for i in range(0, gp_nodeCount):
matrix.append([])
for y in range(0, gp_nodeCount):
matrix[i].append(0)
for i in range(0, gp_edgeCount):
gp = gs.readline()
gp_Index = gp.split(" ") # get the index of space, dividing the 2 numbers on a row
gp_from = int(gp_Index[0])
gp_to = int(gp_Index[1])
matrix[gp_from][gp_to] = 1
print matrix
Exactly is the last instance not used..the 2 0 from your file. Thus the missed 1. Have a nice day!
The other answers are correct, another version similar to the one of tzaman:
with open('graph.txt', mode='r') as txt_file:
lines = [l.strip() for l in txt_file.readlines()]
number_pairs = [[int(n) for n in line.split(' ')] for line in lines]
header = number_pairs[0]
edge_pairs = number_pairs[1:]
num_nodes, num_edges = header
edges = [[0] * num_nodes for _ in xrange(num_nodes)]
for edge_start, edge_end in edge_pairs:
edges[edge_start][edge_end] = 1
print edges