How many five-letter words can you make from a 26-letter alphabet (no repetitions)?
I am writing a program that generates names (just words) from 5 letters in the format: consonant_vowel_consistent_vowel_consonant. Only 5 letters. in Latin. I just want to understand how many times I have to run the cycle for generation. At 65780, for example, repetitions already begin. Can you please tell me how to do it correctly?
import random
import xlsxwriter
consonants = ['B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q',
'R', 'S', 'T', 'V', 'W', 'X', 'Z']
vowels = ['A', 'E', 'I', 'O', 'U', 'Y']
workbook = xlsxwriter.Workbook('GeneratedNames.xlsx')
worksheet = workbook.add_worksheet()
def names_generator(size=5, chars=consonants + vowels):
for y in range(65780):
toggle = True
_id = ""
for i in range(size):
if toggle:
toggle = False
_id += random.choice(consonants)
else:
toggle = True
_id += random.choice(vowels)
worksheet.write(y, 0, _id)
print(_id)
workbook.close()
names_generator()
You can use itertools.combinations to get 3 different consonants and 2 different vowels and get the permutations of those to generate all possible "names".
from itertools import combinations, permutations
names = [a+b+c+d+e for cons in combinations(consonants, 3)
for a, c, e in permutations(cons)
for vow in combinations(vowels, 2)
for b, d in permutations(vow)]
There are only 205,200 = 20x19x18x6x5 in total, so this will take no time at all for 5 letters, but will quickly take longer for more. That is, if by "no repetitions" you mean that no letter should occur more than once. If, instead, you just want that no consecutive letters are repeated (which is already guaranteed by alternating consonants and vowels), or that no names are repeated (which is guaranteed by constructing them without randomness), you can just use itertools.product instead, for a total of 288,000 = 20x20x20x6x6 names:
names = [a+b+c+d+e for a, c, e in product(consonants, repeat=3)
for b, d in product(vowels, repeat=2)]
If you want to generate them in random order, you could just random.shuffle the list afterwards, or if you want just a few such names, you can use random.sample or random.choice on the resulting list.
If you want to avoid duplicates, you shouldn't use randomness but simply generate all such IDs:
from itertools import product
C = consonants
V = vowels
for id_ in map(''.join, product(C, V, C, V, C)):
print(id_)
or
from itertools import cycle, islice, product
for id_ in map(''.join, product(*islice(cycle((consonants, vowels)), 5))):
print(id_)
itertools allows for non repetitive permutations https://docs.python.org/3/library/itertools.html
import itertools, re
names = list(itertools.product(consonants + vowels, repeat=5))
consonants_regex = "(" + "|".join(consonants) + ")"
vowels_regex = "(" + "|".join(vowels) + ")"
search_string = consonants_regex + vowels_regex + consonants_regex + vowels_regex + consonants_regex
names_format = ["".join(name) for name in names if re.match(search_string, "".join(name))]
Output:
>>> len(names)
11881376
>>> len(names_format)
288000
I want to make sure to answer your question
I just want to understand how many times I have to run the cycle for
generation
since I think it is important to get a better intuition about the problem.
You have 20 consonants and 6 vowels and in total that yields 20x6x20x6x20 = 288000 different combinations for words. Since it is sequential, you can split it up to make that easier to understand. You have 20 different consonants you can put as the 1st letter and for each one 6 vowels you can attach afterwards = 20x6 = 120. Then you can keep going and say for those 120 combinations you can add 20 consonants for each = 120x20 = 2400 ... and so on.
Related
I have a 26-digit list. I want to print out a list of alphabets according to the numbers. For example, I have a list(consisting of 26-numbers from input):
[0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
I did like the output to be like this:
[e,e,l,s]
'e' is on the output 2-times because on the 4-th index it is the 'e' according to the English alphabet formation and the digit on the 4-th index is 2. It's the same for 'l' since it is on the 11-th index and it's digit is 1. The same is for s. The other letters doesn't appear because it's digits are zero.
For example, I give another 26-digit input. Like this:
[1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
The output should be:
[a,b,b,c,c,d,d,d,e,e,e,e,g,g,g,h,h,h,h,i,i,i,i,j,k,k,k,l,m,m,m,m,n,n,n,n,o,u,u,u,u,v,v,w,w,w,x,x,y,y,z]
Is, there any possible to do this in Python 3?
You can use chr(97 + item_index) to get the respective items and then multiply by the item itself:
In [40]: [j * chr(97 + i) for i, j in enumerate(lst) if j]
Out[40]: ['ee', 'l', 's']
If you want them separate you can utilize itertools module:
In [44]: from itertools import repeat, chain
In [45]: list(chain.from_iterable(repeat(chr(97 + i), j) for i, j in enumerate(lst) if j))
Out[45]: ['e', 'e', 'l', 's']
Yes, it is definitely possible in Python 3.
Firstly, define an example list (as you did) of numbers and an empty list to store the alphabetical results.
The actual logic to link with the index is using chr(97 + index), ord("a") = 97 therefore, the reverse is chr(97) = a. First index is 0 so 97 remains as it is and as it iterates the count increases and your alphabets too.
Next, a nested for-loop to iterate over the list of numbers and then another for-loop to append the same alphabet multiple times according to the number list.
We could do this -> result.append(chr(97 + i) * my_list[i]) in the first loop itself but it wouldn't yield every alphabet separately [a,b,b,c,c,d,d,d...] rather it would look like [a,bb,cc,ddd...].
my_list = [1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
result = []
for i in range(len(my_list)):
if my_list[i] > 0:
for j in range(my_list[i]):
result.append(chr(97 + i))
else:
pass
print(result)
An alternative to the wonderful answer by #Kasramvd
import string
n = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
res = [i * c for i, c in zip(n, string.ascii_lowercase) if i]
print(res) # -> ['ee', 'l', 's']
Your second example produces:
['a', 'bb', 'cc', 'ddd', 'eeee', 'ggg', 'hhhh', 'iiii', 'j', 'kkk', 'l', 'mmmm', 'nnnn', 'o', 'uuuu', 'vv', 'www', 'xx', 'yy', 'z']
Splitting the strings ('bb' to 'b', 'b') can be done with the standard schema:
[x for y in something for x in y]
Using a slightly different approach, which gives the characters individually as in your example:
import string
a = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
alphabet_lookup = np.repeat(np.arange(len(a)), a)
letter_lookup = np.array(list(string.ascii_lowercase))
res = letter_lookup[alphabet_lookup]
print(res)
To get
['e' 'e' 'l' 's']
I'm not sure how to multiply a number following a string by the string. I want to find the RMM of a compound so I started by making a dictionary of RMMs then have them added together. My issue is with compounds such as H2O.
name = input("Insert the name of a molecule/atom to find its RMM/RAM: ")
compound = re.sub('([A-Z])', r' \1', name)
Compound = compound.split(' ')
r = re.split('(\d+)', compound)
For example:
When name = H2O
Compound = ['', 'H2', 'O']
r = ['H', '2', 'O']
I want to multiply 2 by H making a value "['H', 'H', 'O']."
TLDR: I want integers following names in a list to print the previously listed object 'x' amount of times (e.g. [O, 2] => O O, [C, O, 2] => C O O)
The question is somewhat complicated, so let me know if I can clarify it. Thanks.
How about the following, after you define compound:
test = re.findall('([a-zA-z]+)(\d*)', compound)
expand = [a*int(b) if len(b) > 0 else a for (a, b) in test]
Match on letters of 1 or more instances followed by an optional number of digits - if there's no digit we just return the letters, if there is a digit we duplicate the letters by the appropriate value. This doesn't quite return what you expected - it instead will return ['HH', 'O'] - so please let me know if this suits.
EDIT: assuming your compounds use elements consisting of either a single capital letter or a single capital followed by a number of lowercase letters, you can add the following:
final = re.findall('[A-Z][a-z]*', ''.join(expand))
Which will return your elements each as a separate entry in the list, e.g. ['H', 'H', 'O']
EDIT 2: with the assumption of my previous edit, we can actually reduce the whole thing down to just a couple of lines:
name = raw_input("Insert the name of a molecule/atom to find its RMM/RAM: ")
test = re.findall('([A-z][a-z]*)(\d*)', name)
final = re.findall('[A-Z][a-z]*', ''.join([a*int(b) if len(b) > 0 else a for (a, b) in test]))
You could probably do something like...
compound = 'h2o'
final = []
for x in range(len(compound)):
if compound[x].isdigit() and x != 0:
for count in range(int(compound[x])-1):
final.append(compound[x-1])
else:
final.append(compound[x])
Use regex and a generator function:
import re
def multilpy_string(seq):
regex = re.compile("([a-zA-Z][0-9])|([a-zA-Z])")
for alnum, alpha in regex.findall(''.join(seq)):
if alnum:
for char in alnum[0] * int(alnum[1]):
yield char
else:
yield alpha
l = ['C', 'O', '2'] # ['C', 'O', 'O']
print(list(multilpy_string(l)))
We join your list back together using ''.join. Then we compile a regex pattern that matches two types of strings in your list. If the string is a letter and is followed by a number its put in a group. If its a single number, its put in its own group. We then iterate over each group. If we've found something in a group, we yield the correct values.
Here are a few nested for comprehensions to get it done in two lines:
In [1]: groups = [h*int(''.join(t)) if len(t) else h for h, *t in re.findall('[A-Z]\d*', 'H2O')]
In[2]: [c for cG in groups for c in cG]
Out[2]: ['H', 'H', 'O']
Note: I am deconstructing and reconstructing strings so this is probably not the most efficient method.
Here is a longer example:
In [2]: def findElements(molecule):
...: groups = [h*int(''.join(t)) if len(t) else h for h, *t in re.findall('[A-Z]\d*', molecule)]
...: return [c for cG in groups for c in cG]
In [3]: findElements("H2O5S7D")
Out[3]: ['H', 'H', 'O', 'O', 'O', 'O', 'O', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'D']
In python3 (I don't know about python2) you can simply multiply strings.
for example:
print("H"*2) # HH
print(2*"H") # HH
Proof that this information is useful:
r = ['H', '2', 'O']
replacements = [(index, int(ch)) for index, ch in enumerate(r) if ch.isdigit()]
for postion, times in replacements:
r[postion] = (times - 1) * r[postion - 1]
# flaten the result
r = [ch for s in r for ch in s]
print(r) # ['H', 'H', 'O']
I am doing a python project for my Intro to CSC class. We are given a .txt file that is basically 200,000 lines of single words. We have to read in the file line by line, and count how many times each letter in the alphabet appears as the first letter of a word. I have the count figured out and stored in a list. But now I need to print it in the format
"a:10,898 b:9,950 c:17,045 d:10,596 e:8,735
f:11,257 .... "
Another aspect is that it has to print 5 of the letter counts per line, as I did above.
This is what I am working with so far...
def main():
file_name = open('dictionary.txt', 'r').readlines()
counter = 0
totals = [0]*26
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for i in file_name:
for n in range(0,26):
if i.startswith(alphabet[n]):
totals[n] = totals[n]+1
print(totals)
main()
This code currently outputs
[10898, 9950, 17045, 10675, 7421, 7138, 5998, 6619, 6619, 7128, 1505, 1948, 5393, 10264, 4688, 6079, 15418, 890, 10790, 20542, 9463, 5615, 2924, 3911, 142, 658]
I would highly recommend using a dictionary to store the counts. It will greatly simplify your code, and make it much faster. I'll leave that as an exercise for you since this is clearly homework. (other hint: Counter is even better). In addition, right now your code is only correct for lowercase letters, not uppercase ones. You need to add additional logic to either treat uppercase letters as lowercase ones, or treat them independently. Right now you just ignore them.
Having said that, the following will get it done for your current format:
print(', '.join('{}:{}'.format(letter, count) for letter, count in zip(alphabet, total)))
zip takes n lists and generates a new list of tuples with n elements, with each element coming from one of the input lists. join concatenates a list of strings together using the supplied separator. And format does string interpolation to fill in values in a string with the provided ones using format specifiers.
python 3.4
the solution is to read the line of the file into words variable below in cycle and use Counter
from collections import Counter
import string
words = 'this is a test of functionality'
result = Counter(map(lambda x: x[0], words.split(' ')))
words = 'and this is also very cool'
result = result + Counter(map(lambda x: x[0], words.split(' ')))
counters = ['{letter}:{value}'.format(letter=x, value=result.get(x, 0)) for x in string.ascii_lowercase]
if you print counters:
['a:3', 'b:0', 'c:1', 'd:0', 'e:0', 'f:1', 'g:0', 'h:0', 'i:2', 'j:0', 'k:0', 'l:0', 'm:0', 'n:0', 'o:1', 'p:0', 'q:0', 'r:0', 's:0', 't:3', 'u:0', 'v:1', 'w:0', 'x:0', 'y:0', 'z:0']
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python - Intersection of two lists
i'm trying to compare two lists in order to find the number of elements they have in common.
The main problem I'm having is when either list contains repeated elements, for example
A = [1,1,1,1] and
B = [1,1,2,3]
using the code
n = 0
for x in A:
if x in B:
n += 1
print n
gives me the output that n = 4, as technically all elements of A are in B
I'd like to get the output that n = 2, preferably without using sets, Is there anyway I can adapt my code, or a new way of thinking about the problem to achieve this?
Thanks
It's not entirely clear what your specification is, but if you want the number of elements in A that appear in B, without regard to order, but with regard to multiplicity, use collections.Counter:
>>> from collections import Counter
>>> A = [1,1,1,1]
>>> B = [1,1,2,3]
>>> C = Counter(A) & Counter(B)
>>> sum(C.itervalues())
2
>>> list(C.elements())
[1, 1]
Here is an efficient (O(n logn)) way to do it without using sets:
def count_common(a, b):
ret = 0
a = sorted(a)
b = sorted(b)
i = j = 0
while i < len(a) and j < len(b):
c = cmp(a[i], b[j])
if c == 0:
ret += 1
if c <= 0:
i += 1
if c >= 0:
j += 1
return ret
print count_common([1,1,1,1], [1,1,2,3])
If your lists are always sorted, as they are in your example, you can drop the two sorted() calls. This would give an O(n) algorithm.
Here's an entirely different way of thinking about the problem.
Imagine I've got two words, "hello" and "world". To find the common elements, I could iterate through "hello", giving me ['h', 'e', 'l', 'l', 'o']. For each element in the list, I'm going to remove it from the second list(word).
Is 'h' in ['w', 'o', 'r', 'l', 'd']? No.
Is 'e' in ['w', 'o', 'r', 'l', 'd']? No.
Is 'l' in ['w', 'o', 'r', 'l', 'd']? Yes!
Remove it from "world", giving me ['w', 'o', 'r', 'd'].
is 'l' in ['w', 'o', 'r', 'd']? No.
Is 'o' in ['w', 'o', 'r', 'd']?
Yes! Remove it ['w', 'o', 'r', 'd'], giving me ['w', 'r', 'd']
Compare the length of the original object (make sure you've kept a copy around) to the newly generated object and you will see a difference of 2, indicating 2 common letters.
So you want the program to check whether only elements at the same indices in the two lists are equal? That would be pretty simple: Just iterate over the length of the two arrays (which I presume, are supposed to be of the same length), say using a variable i, and compare each by the A.index(i) and B.index(i) functions.
If you'd like, I could post the code.
If this is not what you want to do, please do make your problem clearer.
I'm trying to see if a word or sentence has each letter of the alphabet and I can't get it to print all the letters that isn't in the sentence/word.
alpha = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t'
,'u','v','w','x','y','z']
x = raw_input('')
counter = 0
counter2 = 0
for i in range(len(x))
counter += 1
for o in range(26):
counter2 += 1
if alpha[counter2] not in x[counter]:
and I'm stuck there...
alphabet = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t'
,'u','v','w','x','y','z'}
input_chars = set(raw_input())
print alphabet - input_chars
All we do is set difference between the set of alphabet characters and the set of characters in our input. Note that the difference operation can take as a second operand any iterable, so we don't even really have to turn our input into a set if we don't want to, although this will speed up the difference a small amount. Furthermore, there is a built-in string which gives us the ascii letters so we could do it like this:
import string
print set(string.ascii_lowercase) - raw_input()
using set difference:
import string
x=raw_input()
not_found=set(string.ascii_lowercase) - set("".join(x.split()))
print (list(not_found))
output:
>>>
the quick brown fox
['a', 'd', 'g', 'j', 'm', 'l', 'p', 's', 'v', 'y', 'z']
Since you're already iterating over both strings, there is no need to use counter and counter2.
You were almost there. Python makes list operations simple, so there's no need to iterate over the lists element-by-element using indices:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
sentence = raw_input('Enter a sentence: ').lower() # Because 'a' != 'A'
letters = []
for letter in sentence:
if letter in alphabet and letter not in letters:
letters.append(letter)
print(letters)
Much easier:
import string
x = raw_input()
print [c for c in string.ascii_lowercase if c not in x]