How to get certain number of alphabets from a list? - python

I have a 26-digit list. I want to print out a list of alphabets according to the numbers. For example, I have a list(consisting of 26-numbers from input):
[0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
I did like the output to be like this:
[e,e,l,s]
'e' is on the output 2-times because on the 4-th index it is the 'e' according to the English alphabet formation and the digit on the 4-th index is 2. It's the same for 'l' since it is on the 11-th index and it's digit is 1. The same is for s. The other letters doesn't appear because it's digits are zero.
For example, I give another 26-digit input. Like this:
[1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
The output should be:
[a,b,b,c,c,d,d,d,e,e,e,e,g,g,g,h,h,h,h,i,i,i,i,j,k,k,k,l,m,m,m,m,n,n,n,n,o,u,u,u,u,v,v,w,w,w,x,x,y,y,z]
Is, there any possible to do this in Python 3?

You can use chr(97 + item_index) to get the respective items and then multiply by the item itself:
In [40]: [j * chr(97 + i) for i, j in enumerate(lst) if j]
Out[40]: ['ee', 'l', 's']
If you want them separate you can utilize itertools module:
In [44]: from itertools import repeat, chain
In [45]: list(chain.from_iterable(repeat(chr(97 + i), j) for i, j in enumerate(lst) if j))
Out[45]: ['e', 'e', 'l', 's']

Yes, it is definitely possible in Python 3.
Firstly, define an example list (as you did) of numbers and an empty list to store the alphabetical results.
The actual logic to link with the index is using chr(97 + index), ord("a") = 97 therefore, the reverse is chr(97) = a. First index is 0 so 97 remains as it is and as it iterates the count increases and your alphabets too.
Next, a nested for-loop to iterate over the list of numbers and then another for-loop to append the same alphabet multiple times according to the number list.
We could do this -> result.append(chr(97 + i) * my_list[i]) in the first loop itself but it wouldn't yield every alphabet separately [a,b,b,c,c,d,d,d...] rather it would look like [a,bb,cc,ddd...].
my_list = [1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
result = []
for i in range(len(my_list)):
if my_list[i] > 0:
for j in range(my_list[i]):
result.append(chr(97 + i))
else:
pass
print(result)

An alternative to the wonderful answer by #Kasramvd
import string
n = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
res = [i * c for i, c in zip(n, string.ascii_lowercase) if i]
print(res) # -> ['ee', 'l', 's']
Your second example produces:
['a', 'bb', 'cc', 'ddd', 'eeee', 'ggg', 'hhhh', 'iiii', 'j', 'kkk', 'l', 'mmmm', 'nnnn', 'o', 'uuuu', 'vv', 'www', 'xx', 'yy', 'z']
Splitting the strings ('bb' to 'b', 'b') can be done with the standard schema:
[x for y in something for x in y]

Using a slightly different approach, which gives the characters individually as in your example:
import string
a = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
alphabet_lookup = np.repeat(np.arange(len(a)), a)
letter_lookup = np.array(list(string.ascii_lowercase))
res = letter_lookup[alphabet_lookup]
print(res)
To get
['e' 'e' 'l' 's']

Related

Correct word generation without repetitions

How many five-letter words can you make from a 26-letter alphabet (no repetitions)?
I am writing a program that generates names (just words) from 5 letters in the format: consonant_vowel_consistent_vowel_consonant. Only 5 letters. in Latin. I just want to understand how many times I have to run the cycle for generation. At 65780, for example, repetitions already begin. Can you please tell me how to do it correctly?
import random
import xlsxwriter
consonants = ['B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q',
'R', 'S', 'T', 'V', 'W', 'X', 'Z']
vowels = ['A', 'E', 'I', 'O', 'U', 'Y']
workbook = xlsxwriter.Workbook('GeneratedNames.xlsx')
worksheet = workbook.add_worksheet()
def names_generator(size=5, chars=consonants + vowels):
for y in range(65780):
toggle = True
_id = ""
for i in range(size):
if toggle:
toggle = False
_id += random.choice(consonants)
else:
toggle = True
_id += random.choice(vowels)
worksheet.write(y, 0, _id)
print(_id)
workbook.close()
names_generator()
You can use itertools.combinations to get 3 different consonants and 2 different vowels and get the permutations of those to generate all possible "names".
from itertools import combinations, permutations
names = [a+b+c+d+e for cons in combinations(consonants, 3)
for a, c, e in permutations(cons)
for vow in combinations(vowels, 2)
for b, d in permutations(vow)]
There are only 205,200 = 20x19x18x6x5 in total, so this will take no time at all for 5 letters, but will quickly take longer for more. That is, if by "no repetitions" you mean that no letter should occur more than once. If, instead, you just want that no consecutive letters are repeated (which is already guaranteed by alternating consonants and vowels), or that no names are repeated (which is guaranteed by constructing them without randomness), you can just use itertools.product instead, for a total of 288,000 = 20x20x20x6x6 names:
names = [a+b+c+d+e for a, c, e in product(consonants, repeat=3)
for b, d in product(vowels, repeat=2)]
If you want to generate them in random order, you could just random.shuffle the list afterwards, or if you want just a few such names, you can use random.sample or random.choice on the resulting list.
If you want to avoid duplicates, you shouldn't use randomness but simply generate all such IDs:
from itertools import product
C = consonants
V = vowels
for id_ in map(''.join, product(C, V, C, V, C)):
print(id_)
or
from itertools import cycle, islice, product
for id_ in map(''.join, product(*islice(cycle((consonants, vowels)), 5))):
print(id_)
itertools allows for non repetitive permutations https://docs.python.org/3/library/itertools.html
import itertools, re
names = list(itertools.product(consonants + vowels, repeat=5))
consonants_regex = "(" + "|".join(consonants) + ")"
vowels_regex = "(" + "|".join(vowels) + ")"
search_string = consonants_regex + vowels_regex + consonants_regex + vowels_regex + consonants_regex
names_format = ["".join(name) for name in names if re.match(search_string, "".join(name))]
Output:
>>> len(names)
11881376
>>> len(names_format)
288000
I want to make sure to answer your question
I just want to understand how many times I have to run the cycle for
generation
since I think it is important to get a better intuition about the problem.
You have 20 consonants and 6 vowels and in total that yields 20x6x20x6x20 = 288000 different combinations for words. Since it is sequential, you can split it up to make that easier to understand. You have 20 different consonants you can put as the 1st letter and for each one 6 vowels you can attach afterwards = 20x6 = 120. Then you can keep going and say for those 120 combinations you can add 20 consonants for each = 120x20 = 2400 ... and so on.

All possible substring in Python

Can anyone help me with finding all the possible substring in a string using python?
E.g:
string = 'abc'
output
a, b, c, ab, bc, abc
P.s : I am a beginner and would appreciate if the solution is simple to understand.
You could do something like:
for length in range(len(string)):
for index in range(len(string) - length):
print(string[index:index+length+1])
Output:
a
b
c
ab
bc
abc
else one way is using the combinations
from itertools import combinations
s = 'abc'
[
''.join(x)
for size in range(1, len(s) + 1)
for x in (combinations(s, size))
]
Out
['a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']
Every substring contains a unique start index and a unique end index (which is greater than the start index). You can use two for loops to get all unique combinations of indices.
def all_substrings(s):
all_subs = []
for end in range(1, len(s) + 1):
for start in range(end):
all_subs.append(s[start:end])
return all_subs
s = 'abc'
print(all_substrings(s)) # prints ['a', 'ab', 'b', 'abc', 'bc', 'c']
You can do like:
def subString(s):
for i in range(len(s)):
for j in range(i+1,len(s)+1):
print(s[i:j])
subString("aashu")
a
aa
aas
aash
aashu
a
as
ash
ashu
s
sh
shu
h
hu
u

How to wrap a string or an array around and slice the wrapped string or array in Python?

Before anything: I did read Wrapping around a python list as a slice operation and wrapping around slices in Python / numpy
This question is not a duplicate of any of those two questions simply because this question is a totally different question. So stop downvoting it and do not mark it as a duplicate. In the first mentioned thread, the "wrap" there means something different. For the second mentioned thread, they dealt with ndarray and can only work for integers only.
Real question:
How to slice a string or an array from a point to another point with an end between them?
Essentially, we want to do something like this,
n = whatever we want
print(string[n-5:n+6])
The above code may look normal. But it doesn't work near the edges (near the beginning of the string/array or the end of the string/array). Because Python's slicing doesn't allow slicing through the end of the array and continuing from the beginning. What if n is smaller than 5 or length of string longer than n+6?
Here's a better example, consider that we have
array = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']
We want to print an element with its nearest two neighbors in string for all elements in an array
print("Two neighbors:")
for i, x in enumerate(array):
print(array[i-1] + array[i] + array[(i+1)%len(array)])
Output:
Two neighbors:
kab
abc
bcd
cde
def
efg
fgh
ghi
hij
ijk
jka
So far so good, let's do it with four neighbors.
print("Four neighbors:")
for i, x in enumerate(array):
print(array[i-2] + array[i-1] + array[i] + array[(i+1)%len(array)] + array[(i+2)%len(array)])
Output:
Four neighbors:
jkabc
kabcd
abcde
bcdef
cdefg
defgh
efghi
fghij
ghijk
hijka
ijkab
You can see where this is going, as the desired number of neighbors grow, the number of times we must type them out one by one increases.
Is there a way instead of s[n-3]+s[n-2]+s[n-1]+s[n]+s[n+1]+s[n+2]+s[n+3], we can do something like s[n-3:n+4]?
Note that s[n-3:n]+s[n:(n+4)%len(s)] doesn't work at the edges.
NOTE:
For the particular example above, it is possible to do a 3*array or add a number of elements to the front and to the back to essentially "pad" it.
However, this type of answer cost a bit of memory AND cannot work when we want to wrap it many folds around.
Consider the following,
# len(string) = 10
# n = 0 or any number we want
print(string[n-499:n+999])
If the start and end indices can be flexible instead of mirroring each other(eg. string[n-2:n+9] instead of string[n-3:n+4]), it is even better.
A solution which doesn't use an excessive amount of memory is as follows
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
def get_sequences(a_list, sequence_length):
sequences = []
for i in range(len(my_list)):
sequences.append("".join(str(my_list[(x + i) % len(my_list)]) for x in range(sequence_length)))
return sequences
print(get_sequences(my_list, 2))
print(get_sequences(my_list, 3))
will output
['12', '23', '34', '45', '56', '67', '78', '89', '91']
['123', '234', '345', '456', '567', '678', '789', '891', '912']
This is nice because it utilizes a generator everywhere that it can.
This could give ideas. The only thing to check is the order in your interval. Works with any n.
array = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']
def print_neighbors(n_neighbors):
for idx in range(len(array)):
start = (idx- n_neighbors//2) % len(array)
end = (idx+n_neighbors//2) % len(array) + 1
if start > end:
print(''.join(array[start:] + array[:end]))
else:
print(''.join(array[start:end]))
>>> print_neighbors(6)
ijkabcd
jkabcde
kabcdef
abcdefg
bcdefgh
cdefghi
defghij
efghijk
fghijka
ghijkab
hijkabc
You could create a class to wrap your original iterable like this:
class WrappingIterable():
def __init__(self, orig):
self.orig=orig
def __getitem__(self, index):
return self.orig[index%len(self.orig)]
def __len__(self):
return len(self.orig)
>>> w = WrappingIterable("qwerty")
>>> for i in range(-2, 8):
... print(w[i])
t
y
q
w
e
r
t
y
q
w
For this particular issue you can use a snippet like this:
def print_neighbors(l, n):
wrapped = l[-(n//2):] + l + l[:(n//2)]
for i in range(len(l)):
print(''.join(wrapped[i:i+n+1]))
l = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']
print_neighbors(l, 2)
print_neighbors(l, 4)
Hope it makes sense!

Count letter differences of two strings Python

I'm having some problems with an exercise about strings in python.
I have 2 different lists:
list1= "ABCDEFABCDEF"
and
list2= "AZBYCXDWEVFABCDEF"
I need to compare those 2 lists according to their position so the 1 letter together, then the 2...using the min length (so here length of list1) and store the letters in a new variable according to if they are different or the same.
identicals=[]
different=[]
I tried to code something and it seems to find the same ones, but doesn't work on the different ones since it copies them multiple times.
for x in list1:
for y in list2:
if list1>list2:
if x==y:
identicals.append(x)
if x!=y :
different.append(x)
if list2>list1:
if y==x:
identicals.append(y)
if y!=x:
different.append(y)
EDIT: Output result should be something like this:
identicals=['A']
different=["Z","B","Y","C","X","D","W","E","V",F","A"]
The thing is that the letter A is only shown on identicals but not in different even if F!=A.
You are getting unwanted duplicates because you have a nested pair of for loops, so each item in list2 get tested for every item in list1.
The key idea is to iterate over the two strings in parallel. You can do that with the built-in zip function, which yields a tuple of the corresponding items from each iterable you feed it, stopping as soon as one of the iterables runs out of items.
From your example code, it looks like you want to take the items for the different list from the longer string. To do that efficiently, figure out which string is the longer before you start looping.
I've renamed your strings because it's confusing to give strings a name starting with "list".
s1 = "ABCDEFABCDEF"
s2 = "AZBYCXDWEVFABCDEF"
identicals = []
different = []
small, large = (s1, s2) if len(s1) <= len(s2) else (s2, s1)
for x, y in zip(small, large):
if x == y:
identicals.append(y)
else:
different.append(y)
print(identicals)
print(different)
output
['A']
['Z', 'B', 'Y', 'C', 'X', 'D', 'W', 'E', 'V', 'F', 'A']
We can make the for loop more compact at the expense of readability. We put our destination lists into a tuple and then use the equality test to select which list in that tuple to append to. This works because False has a numeric value of 0, and True has a numeric value of 1.
for x, y in zip(small, large):
(different, identicals)[x == y].append(y)
The problem is the inner loop. You are comparing each of the letters in list1 with all the letters of list2.
Instead you should have a single loop:
identicals=[]
different=[]
short_list = list1 if len(list1)<= len(list2) else list2
for i in range(len(short_list):
if list1[i] == list2[i]:
identicals.append(list1[i])
else:
different.append(short_list[i])
Try this
a = "ABCDEFABCDEF"
b = "AZBYCXDWEVFABCDEF"
import numpy
A = numpy.array(list(a))
B = numpy.array(list(b))
common = A[:len(B)] [ (A[:len(B)] == B[:len(A)]) ]
different = A[:len(B)] [ - (A[:len(B)] == B[:len(A)]) ]
>>> list(common)
['A']
>>> list(different)
['B', 'C', 'D', 'E', 'F', 'A', 'B', 'C', 'D', 'E', 'F']

Multiply an integer in a list by a word in the list

I'm not sure how to multiply a number following a string by the string. I want to find the RMM of a compound so I started by making a dictionary of RMMs then have them added together. My issue is with compounds such as H2O.
name = input("Insert the name of a molecule/atom to find its RMM/RAM: ")
compound = re.sub('([A-Z])', r' \1', name)
Compound = compound.split(' ')
r = re.split('(\d+)', compound)
For example:
When name = H2O
Compound = ['', 'H2', 'O']
r = ['H', '2', 'O']
I want to multiply 2 by H making a value "['H', 'H', 'O']."
TLDR: I want integers following names in a list to print the previously listed object 'x' amount of times (e.g. [O, 2] => O O, [C, O, 2] => C O O)
The question is somewhat complicated, so let me know if I can clarify it. Thanks.
How about the following, after you define compound:
test = re.findall('([a-zA-z]+)(\d*)', compound)
expand = [a*int(b) if len(b) > 0 else a for (a, b) in test]
Match on letters of 1 or more instances followed by an optional number of digits - if there's no digit we just return the letters, if there is a digit we duplicate the letters by the appropriate value. This doesn't quite return what you expected - it instead will return ['HH', 'O'] - so please let me know if this suits.
EDIT: assuming your compounds use elements consisting of either a single capital letter or a single capital followed by a number of lowercase letters, you can add the following:
final = re.findall('[A-Z][a-z]*', ''.join(expand))
Which will return your elements each as a separate entry in the list, e.g. ['H', 'H', 'O']
EDIT 2: with the assumption of my previous edit, we can actually reduce the whole thing down to just a couple of lines:
name = raw_input("Insert the name of a molecule/atom to find its RMM/RAM: ")
test = re.findall('([A-z][a-z]*)(\d*)', name)
final = re.findall('[A-Z][a-z]*', ''.join([a*int(b) if len(b) > 0 else a for (a, b) in test]))
You could probably do something like...
compound = 'h2o'
final = []
for x in range(len(compound)):
if compound[x].isdigit() and x != 0:
for count in range(int(compound[x])-1):
final.append(compound[x-1])
else:
final.append(compound[x])
Use regex and a generator function:
import re
def multilpy_string(seq):
regex = re.compile("([a-zA-Z][0-9])|([a-zA-Z])")
for alnum, alpha in regex.findall(''.join(seq)):
if alnum:
for char in alnum[0] * int(alnum[1]):
yield char
else:
yield alpha
l = ['C', 'O', '2'] # ['C', 'O', 'O']
print(list(multilpy_string(l)))
We join your list back together using ''.join. Then we compile a regex pattern that matches two types of strings in your list. If the string is a letter and is followed by a number its put in a group. If its a single number, its put in its own group. We then iterate over each group. If we've found something in a group, we yield the correct values.
Here are a few nested for comprehensions to get it done in two lines:
In [1]: groups = [h*int(''.join(t)) if len(t) else h for h, *t in re.findall('[A-Z]\d*', 'H2O')]
In[2]: [c for cG in groups for c in cG]
Out[2]: ['H', 'H', 'O']
Note: I am deconstructing and reconstructing strings so this is probably not the most efficient method.
Here is a longer example:
In [2]: def findElements(molecule):
...: groups = [h*int(''.join(t)) if len(t) else h for h, *t in re.findall('[A-Z]\d*', molecule)]
...: return [c for cG in groups for c in cG]
In [3]: findElements("H2O5S7D")
Out[3]: ['H', 'H', 'O', 'O', 'O', 'O', 'O', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'D']
In python3 (I don't know about python2) you can simply multiply strings.
for example:
print("H"*2) # HH
print(2*"H") # HH
Proof that this information is useful:
r = ['H', '2', 'O']
replacements = [(index, int(ch)) for index, ch in enumerate(r) if ch.isdigit()]
for postion, times in replacements:
r[postion] = (times - 1) * r[postion - 1]
# flaten the result
r = [ch for s in r for ch in s]
print(r) # ['H', 'H', 'O']

Categories

Resources