Why will the loop in my allSubStrings function not work? - python

I am trying to write a Python program that has two functions. The first finds all substrings of a word with a given length and adds them to a list (i.e. "hello" with x = 3 would return 'hel', 'ell', 'llo'). The second function uses this to find all possible substrings in a word. However, whenever I run it through a loop, the second function does not work. Can someone try to explain why?
def subString(string, x):
cutList = []
j = 0
i = 0
while(j < 1):
sliceText = slice(i, i + x)
cut = string[sliceText]
if (len(cut) == x):
cutList.append(cut)
else:
j += 5
i = i + 1
return cutList
def allSubStrings(string):
fullList = []
for k in range(len(string)):
tempList = subString(string, k)
fullList.extend(tempList)
print(k)
return fullList

The problem is that subString() goes into an infinite loop when x == 0, because len(cut) == 0 will always be true. Since you don't need zero-length substrings, the loop in allSubStrings() should use range(1, len(string)).
def subString(string, x):
return [string[i:i+x] for i in range(len(string)-x)]
def allSubStrings(string):
fullList = []
for k in range(1, len(string)):
tempList = subString(string, k)
fullList.extend(tempList)
print(k)
return fullList
print(allSubStrings("abcdefgh"))
Output is:
1
2
3
4
5
6
7
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'ab', 'bc', 'cd', 'de', 'ef', 'fg', 'abc', 'bcd', 'cde', 'def', 'efg', 'abcd', 'bcde', 'cdef', 'defg', 'abcde', 'bcdef', 'cdefg', 'abcdef', 'bcdefg', 'abcdefg']

Related

how to separate alternating digits and characters in string into dict or list?

'L134e2t1C1o1d1e1'
the original string is "LeetCode"
but I need to separate strings from digits, digits can be not only single-digit but also 3-4 digits numbers like 345.
My code needs to separate into dict of key values; keys are characters and numbers is the digit right after the character. Also create 2 lists of separate digits, letters only.
expected output:
letters = ['L', 'e', 't', 'C', 'o', 'd', 'e']
digits = [134,2,1,1,1,1,1]
This code is not properly processing this.
def f(s):
d = dict()
letters = list()
# letters = list(filter(lambda x: not x.isdigit(), s))
i = 0
while i < len(s):
print('----------------------')
if not s[i].isdigit():
letters.append(s[i])
else:
j = i
temp = ''
while j < len(s) and s[j].isdigit():
j += 1
substr = s[i:j]
print(substr)
i += 1
print('----END -')
print(letters)
With the following modification your function separates letters from digits in s:
def f(s):
letters = list()
digits = list()
i = 0
while i < len(s):
if not s[i].isdigit():
letters.append(s[i])
i += 1
else:
j = i
temp = ''
while j < len(s) and s[j].isdigit():
j += 1
substr = s[i:j]
i = j
digits.append(substr)
print(letters)
print(digits)
f('L134e2t1C1o1d1e1')
As said in my comments you didn't update i after the inner loop terminates which made i go back to a previous and already processed index.
If I would be limited to not use regex I would do it following way
text = 'L134e2t1C1o1d1e1'
letters = [i for i in text if i.isalpha()]
digits = ''.join(i if i.isdigit() else ' ' for i in text).split()
print(letters)
print(digits)
output
['L', 'e', 't', 'C', 'o', 'd', 'e']
['134', '2', '1', '1', '1', '1', '1']
Explanation: for letters I use simple list comprehension with condition, .isalpha() is str method which check if string (in this consisting of one character) is alphabetic. For digits (which should be rather called numbers) I replace non-digits using single space, turn that into string using ''.join then use .split() (it does split on one or more whitespaces). Note that digits is now list of strs rather than ints, if that is desired add following line:
digits = list(map(int,digits))
Your string only had two e's, so I've added one more to complete the example. This is one way you could do it:
import re
t = 'L1e34e2t1C1o1d1e1'
print(re.sub('[^a-zA-Z]', '', t))
Result:
LeetCode
I know you cannot use regex, but to complete this answer, I'll just add a solution:
def f(s):
d = re.findall('[0-9]+', s)
l = re.findall('[a-zA-Z]', s)
print(d)
print(l)
f(t)
Result:
['134', '2', '1', '1', '1', '1', '1']
['L', 'e', 't', 'C', 'o', 'd', 'e']
You edited your question and I got a bit confused, so here is a really exhaustive code giving you a list of letters, list of the numbers, the dict with the number associated with the number, and finally the sentence with corresponding number of characters ...
def f(s):
letters = [c for c in s if c.isalpha()]
numbers = [c for c in s if c.isdigit()]
mydict = {}
currentKey = ""
for c in s:
print(c)
if c.isalpha():
mydict[c] = [] if c not in mydict.keys() else mydict[c]
currentKey = c
elif c.isdigit():
mydict[currentKey].append(c)
sentence = ""
for i in range(len(letters)):
count = int(numbers[i])
while count > 0:
sentence += letters[i]
count -= 1
print(letters)
print(numbers)
print(mydict)
print(sentence)
letters = []
digits = []
dig = ""
for letter in 'L134e2t1C1o1d1e1':
if letter.isalpha():
# do not add empty string to list
if dig:
# append dig to list of digits
digits.append(dig)
dig = ""
letters.append(letter)
# if it is a actual letter continue
continue
# add digits to `dig`
dig = dig + letter
Try this. The idea is to skip all actual letters and add the digits to dig.
I know there's an accepted answer but I'll throw this one in anyway:
letters = []
digits = []
lc = 'L134e2t1C1o1d1e1'
n = None
for c in lc:
if c.isalpha():
if n is not None:
digits.append(n)
n = None
letters.append(c)
else:
if n is None:
n = int(c)
else:
n *= 10
n += int(c)
if n is not None:
digits.append(n)
for k, v in zip(letters, digits):
dct.setdefault(k, []).append(v)
print(letters)
print(digits)
print(dct)
Output:
['L', 'e', 't', 'C', 'o', 'd', 'e']
[134, 2, 1, 1, 1, 1, 1]
{'L': [134], 'e': [2, 1], 't': [1], 'C': [1], 'o': [1], 'd': [1]}

Convert array of letters into their counts with Python

I have an array, for example
[A,A,A,B,B,C,A,A,D,D,D,B]
I want to count the entries and convert it to this
[3A,2B,1C,2A,3D,1B]
I have tried a load of if else type logic but I'm having issues. Is there a neat way to do this?
Seems like a good use of itertools.groupby:
from itertools import groupby
l = ['A','A','A','B','B','C','A','A','D','D','D','B']
[f'{len(list(g))}{k}' for k, g in groupby(l)]
# ['3A', '2B', '1C', '2A', '3D', '1B']
This is one way to solve this problem.
letters = ['A', 'A', 'A', 'B', 'B', 'C', 'A', 'A', 'D', 'D', 'D']
previous_letter = letters[0]
counter = 1
res = list()
for i in range(1, len(letters)):
current_letter = letters[i]
if current_letter == previous_letter:
counter += 1
else:
res.append(f"{counter}{current_letter}")
previous_letter = letters[i]
counter = 1
res.append(f"{counter}{previous_letter}")
print(res)
The trick is in checking a change of letters and keeping track of the count.
you could try using list.count() to get the number of times the first item appears then use list.pop(list.index()) in a for loop to remove all occurences of the first item, do this in a while loop until len(lst) returns 0
out = []
while len(lst) > 0:
s = lst[0]
c = 0
while len(lst) > 0:
if lst[0] == s:
c += 1
lst.pop(0)
out.append(f"{c}{s}")

Split a string into chunks of substrings with successively increasing length

Let's say I have this string:
a = 'abcdefghijklmnopqrstuvwxyz'
And I want to split this string into chunks, like below:
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
so that every chunk has a different number of characters. For instance, the first one should have one character, the second two and so on.
If there are not enough characters in the last chunk, then I need to add spaces so it matches the length.
I tried this code so far:
print([a[i: i + i + 1] for i in range(len(a))])
But it outputs:
['a', 'bc', 'cde', 'defg', 'efghi', 'fghijk', 'ghijklm', 'hijklmno', 'ijklmnopq', 'jklmnopqrs', 'klmnopqrstu', 'lmnopqrstuvw', 'mnopqrstuvwxy', 'nopqrstuvwxyz', 'opqrstuvwxyz', 'pqrstuvwxyz', 'qrstuvwxyz', 'rstuvwxyz', 'stuvwxyz', 'tuvwxyz', 'uvwxyz', 'vwxyz', 'wxyz', 'xyz', 'yz', 'z']
Here is my desired output:
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
I don't think any one liner or for loop will look as elegant, so let's go with a generator:
from itertools import islice, count
def get_increasing_chunks(s):
it = iter(s)
c = count(1)
nxt, c_ = next(it), next(c)
while nxt:
yield nxt.ljust(c_)
nxt, c_ = ''.join(islice(it, c_+1)), next(c)
return out
[*get_increasing_chunks(a)]
# ['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
Thanks to #Prune's comment, I managed to figure out a way to solve this:
a = 'abcdefghijklmnopqrstuvwxyz'
lst = []
c = 0
for i in range(1, len(a) + 1):
c += i
lst.append(c)
print([a[x: y] + ' ' * (i - len(a[x: y])) for i, (x, y) in enumerate(zip([0] + lst, lst), 1) if a[x: y]])
Output:
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
I find the triangular numbers than do a list comprehension, and add spaces if the length is not right.
so what you need is to have a number that controls how many characters you're going to grab (in this case the amount of iterations), and a second number that remembers what the last index was, plus one last number to tell where to stop.
my_str = "abcdefghijklmnopqrstuvwxyz"
last_index = 0
index = 1
iter_count = 1
while True:
sub_string = my_str[last_index:index]
print(sub_string)
last_index = index
iter_count += 1
index = index + iter_count
if last_index > len(my_str):
break
note that you don't need the while loop. i was just feeling lazy
It seems like the split_into recipe at more_itertools can help here. This is less elegant than the answer by #cs95, but perhaps this will help others discover the utility of the itertools module.
Yield a list of sequential items from iterable of length ā€˜nā€™ for each integer ā€˜nā€™ in sizes.
>>> list(split_into([1,2,3,4,5,6], [1,2,3]))
[[1], [2, 3], [4, 5, 6]]
To use this, we need to construct a list of sizes like [1, 2, 3, 3, 5, 6, 7].
import itertools
def split_into(iterable, sizes):
it = iter(iterable)
for size in sizes:
if size is None:
yield list(it)
return
else:
yield list(itertools.islice(it, size))
a = 'abcdefghijklmnopqrstuvwxyz'
sizes = [1]
while sum(sizes) <= len(a):
next_value = sizes[-1] + 1
sizes.append(next_value)
# sizes = [1, 2, 3, 4, 5, 6, 7]
list(split_into(a, sizes))
# [['a'],
# ['b', 'c'],
# ['d', 'e', 'f'],
# ['g', 'h', 'i', 'j'],
# ['k', 'l', 'm', 'n', 'o'],
# ['p', 'q', 'r', 's', 't', 'u'],
# ['v', 'w', 'x', 'y', 'z']]
chunks = list(map("".join, split_into(a, sizes)))
# ['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz']
# Pad last item with whitespace.
chunks[-1] = chunks[-1].ljust(sizes[-1], " ")
# ['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
Here is a solution using accumulate from itertools.
>>> from itertools import accumulate
>>> from string import ascii_lowercase
>>> s = ascii_lowercase
>>> n = 0
>>> accum = 0
>>> while accum < len(s):
n += 1
accum += n
>>> L = [s[j:i+j] for i, j in enumerate(accumulate(range(n)), 1)]
>>> L[-1] += ' ' * (n-len(L[-1]))
>>> L
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
Update: Could also be obtained within the while loop
n = 0
accum = 0
L = []
while accum < len(s):
n += 1
L.append(s[accum:accum+n])
accum += n
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz']
Adding a little to U11-Forward's answer:
a = 'abcdefghijklmnopqrstuvwxyz'
l = list(range(len(a))) # numberes list / 1 to len(a)
triangular = [sum(l[:i+2]) for i in l] # sum of 1, 2 and 1,2,3 and 1,2,3,4 and etc
print([a[x: y].ljust(i, ' ') for i, (x, y) in enumerate(zip([0] + triangular, triangular), 1) if a[x: y]])
Output:
['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz ']
Find the triangular numbers, do a list comprehension and fill with spaces if the length is incorrect.
a = 'abcdefghijklmnopqrstuvwxyz'
inc = 0
output = []
for i in range(0, len(a)):
print(a[inc: inc+i+1])
inc = inc+i+1
if inc > len(a):
break
output.append(a[inc: inc+i+1])
print(output)
Hey, here is the snippet for your required output. I have just altered your logic.
Output:
['b', 'de', 'ghi', 'klmn', 'pqrst', 'vwxyz']

Generating n-grams from a string

I need to make a list of all š‘› -grams beginning at the head of string for each integer š‘› from 1 to M. Then return a tuple of M such lists.
def letter_n_gram_tuple(s, M):
s = list(s)
output = []
for i in range(0, M+1):
output.append(s[i:])
return(tuple(output))
From letter_n_gram_tuple("abcd", 3) output should be:
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd']))
However, my output is:
(['a', 'b', 'c', 'd'], ['b', 'c', 'd'], ['c', 'd'], ['d']).
Should I use string slicing and then saving slices into the list?
you can use nested for, first for about n-gram, second to slice the string
def letter_n_gram_tuple(s, M):
output = []
for i in range(1, M + 1):
gram = []
for j in range(0, len(s)-i+1):
gram.append(s[j:j+i])
output.append(gram)
return tuple(output)
or just one line by list comprehension:
output = [[s[j:j+i] for j in range(0, len(s)-i+1)] for i in range(1, M + 1)]
or use windowed in more_itertools:
import more_itertools
output = [list(more_itertools.windowed(s, i)) for i in range(1, M + 1)]
test and output:
print(letter_n_gram_tuple("abcd", 3))
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd'])
You need one more for loop to iterate over letters or str :
def letter_n_gram_tuple(s, M):
output = []
for i in range(0, M):
vals = [s[j:j+i+1] for j in range(len(s)) if len(s[j:j+i+1]) == i+1]
output.append(vals)
return tuple(output)
print(letter_n_gram_tuple("abcd", 3))
Output:
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd'])
Use the below fuction:
def letter_n_gram_tuple(s, M):
s = list(s)
output = [s]
for i in range(M + 1):
output.append([''.join(sorted(set(a + b), key=lambda x: (a + b).index(x))) for a, b in zip(output[-1], output[-1][1:])])
return tuple(filter(lambda x: len(x) > 1, output))
And now:
print(letter_n_gram_tuple('abcd',3))
Returns:
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd'])
def n_grams(word,max_size):
i=1
output=[]
while i<= max_size:
index = 0
innerArray=[]
while index < len(word)-i+1:
innerArray.append(word[index:index+i])
index+=1
i+=1
output.append(innerArray)
innerArray=[]
return tuple(output)
print(n_grams("abcd",3))

Find all substrings in a string in python 3 with brute-force

I want to find all substrings 'A' to 'B' in L = ['C', 'A', 'B', 'A', 'A', 'X', 'B', 'Y', 'A'] with bruteforce, this is what i've done:
def find_substring(L):
t = 0
s = []
for i in range(len(L) - 1):
l = []
if ord(L[i]) == 65:
for j in range(i, len(L)):
l.append(L[j])
if ord(L[j]) == 66:
t = t + 1
s.append(l)
return s, t
Now I want the output:
[['A','B'], ['A','B','A','A','X','B'], ['A','A','X','B'], ['A','X','B']]
But i get:
[['A','B','A','A','X','B','Y','A'],['A','B','A','A','X','B','Y','A'],['A','A','X','B','Y','A'],['A','X','B','Y','A']]
Can someone tell me what I'm doing wrong?
The problem is that the list s, holds references to the l lists.
So even though you are appending the correct l lists to s, they are changed after being appended as the future iterations of the j loop modify the l lists.
You can fix this by appending a copy of the l list: l[:].
Also, you can compare strings directly, no need to convert to ASCII.
def find_substring(L):
s = []
for i in range(len(L) - 1):
l = []
if L[i] == 'A':
for j in range(i, len(L)):
l.append(L[j])
if L[j] == 'B':
s.append(l[:])
return s
which now works:
>>> find_substring(['C', 'A', 'B', 'A', 'A', 'X', 'B', 'Y', 'A'])
[['A', 'B'], ['A', 'B', 'A', 'A', 'X', 'B'], ['A', 'A', 'X', 'B'], ['A', 'X', 'B']]
When you append l to s, you are adding a reference to a list which you then continue to grow. You want to append a copy of the l list's contents at the time when you append, to keep it static.
s.append(l[:])
This is a common FAQ; this question should probably be closed as a duplicate.
You would be better first finding all indices of 'A' and 'B', then iterating over those, avoiding brute force.
def find_substrings(lst)
idx_A = [i for i, c in enumerate(lst) if c == 'A']
idx_B = [i for i, c in enumerate(lst) if c == 'B']
return [lst[i:j+1] for i in idx_A for j in idx_B if j > i]
You can reset l to a copy of the string after l is appended l = l[:] right after the last append.
So, you want all the substrings that start with 'A' and end with 'B'?
When you use #Joeidden's code you can change need the for i in range(len(L) - 1): to for i in range(len(L)): because only strings that end with 'B' will be appended to s.
def find_substring(L):
s = []
for i in range(len(L)):
l = []
if L[i] == 'A':
for j in range(i, len(L)):
l.append(L[j])
if L[j] == 'B':
s.append(l[:])
return s
Another slightly different approach would be this:
L = ['C', 'A', 'B', 'A', 'A', 'X', 'B', 'Y', 'A']
def find_substring(L):
output = []
# Start searching for A.
for i in range(len(L)):
# If you found one start searching all B's until you reach the end.
if L[i]=='A':
for j in range(i,len(L),1):
# If you found a B, append the sublist from i index to j+1 index (positions of A and B respectively).
if L[j]=='B':
output.append(L[i:j+1])
return output
result = find_substring(L)
print(result)
Output:
[['A', 'B'], ['A', 'B', 'A', 'A', 'X', 'B'], ['A', 'A', 'X', 'B'], ['A', 'X', 'B']]
In case you need a list comprehension of the above:
def find_substring(L):
output = [L[i:j+1] for i in range(len(L)) for j in range(i,len(L),1) if L[i]=='A' and L[j]=='B']
return output

Categories

Resources