Randomizing list of words with Python - python

I need help with solving a problem. I have list of 8 words.
What I need to achieve is to generate all possible variants where exactly 3 of this words are included. All those variants need to be saved in different .txt file. The same words but in different positions should be treated as another variant.
It needs to be done on Raspberry Pi in Python.
To be honest i don't even know where to start with it...
I am total noob in any sort of programming...
Any clues how to do it?

You can easily solve this problem by using itertools. In the following example, I will produce all possible combinations of 3 elements for the list l:
>>> import itertools
>>> l = [1, 2, 3, 4, 5]
>>> list(itertools.permutations(l, 3))
[(1, 2, 3),
(1, 2, 4),
(1, 2, 5),
(1, 3, 4),
(1, 3, 5),
(1, 4, 5),
...
...
(2, 3, 4),
(2, 3, 5),
(2, 4, 5),
(3, 4, 5),
...
...
(5, 4, 2),
(5, 4, 3)]
Now, if you want to save those values in a different text file, you should do the following:
for i, e in enumerate(itertools.permutations(l, 3)):
with open(f"file_{i}.txt","w+") as f:
f.write(e)

#lmiguelvargasf's answer only covers half the question, so here comes the part where you save the word combinations to individual files.
import itertools
import random
import string
class fileNames:
def __init__(self):
self.file_names = []
def randomString(self,stringLength):
letters = string.ascii_lowercase
file_name = ''.join(random.choice(letters) for i in range(stringLength))
if file_name in self.file_names:
randomString(stringLength)
self.file_names.append(file_name)
return file_name
# Original word list
l = [1, 2, 3]
# Create a new list, containing the combinations
word_combinations = list(itertools.permutations(l, 3))
# Creating an instance of fileNames() class
files = fileNames()
# Specifying the number of characters
n = 5
# For each of these combinations, save in a file
for word_comb in word_combinations:
# The file will be named by a random string containing n characters
with open('{}.txt'.format(files.randomString(n)), 'w') as f:
# The file will contain each word seperated by a space, change the string below as desired
f.write('{} {} {}'.format(word_comb[0], word_comb[1], word_comb[2]))
If you want the filename to be an integer which increases with 1 for every file, do swap the last part with this:
# For each of these combinations, save in a file
for n, word_comb in enumerate(word_combinations):
# The file will be named by an integer
with open('{}.txt'.format(n), 'w') as f:
# The file will contain each word seperated by a space, change the string below as desired
f.write('{} {} {}'.format(word_comb[0], word_comb[1], word_comb[2]))

try using random.choice()
import random
# your code
word = []
for x in range(0, 7)
word.add(random.choice(words))
file.write(word)
repeat everything after word = [] using a for loop and you can even check if theyre the same using a method described here

Related

Filter generated permutations in python

I want to generate permutations of elements in a list, but only keep a set where each element is on each position only once.
For example [1, 2, 3, 4, 5, 6] could be a user list and I want 3 permutations. A good set would be:
[1,2,3,5,4,6]
[2,1,4,6,5,3]
[3,4,5,1,6,2]
However, one could not add, for example, [1,3,2,6,5,4] to the above, as there are two permutations in which 1 is on the first position twice, also 5 would be on the 5th position twice, however other elements are only present on those positions once.
My code so far is :
# this simply generates a number of permutations specified by number_of_samples
def generate_perms(player_list, number_of_samples):
myset = set()
while len(myset) < number_of_samples:
random.shuffle(player_list)
myset.add(tuple(player_list))
return [list(x) for x in myset]
# And this is my function that takes the stratified samples for permutations.
def generate_stratified_perms(player_list, number_of_samples):
user_idx_dict = {}
i = 0
while(i < number_of_samples):
perm = generate_perms(player_list, 1)
for elem in perm:
if not user_idx_dict[elem]:
user_idx_dict[elem] = [perm.index(elem)]
else:
user_idx_dict[elem] += [perm.index(elem)]
[...]
return total_perms
but I don't know how to finish the second function.
So in short, I want to give my function a number of permutations to generate, and the function should give me that number of permutations, in which no element appears on the same position more than the others (once, if all appear there once, twice, if all appear there twice, etc).
Let's starting by solving the case of generating n or fewer rows first. In that case, your output must be a Latin rectangle or a Latin square. These are easy to generate: start by constructing a Latin square, shuffle the rows, shuffle the columns, and then keep just the first r rows. The following always works for constructing a Latin square to start with:
1 2 3 ... n
2 3 4 ... 1
3 4 5 ... 2
... ... ...
n 1 2 3 ...
Shuffling rows is a lot easier than shuffling columns, so we'll shuffle the rows, then take the transpose, then shuffle the rows again. Here's an implementation in Python:
from random import shuffle
def latin_rectangle(n, r):
square = [
[1 + (i + j) % n for i in range(n)]
for j in range(n)
]
shuffle(square)
square = list(zip(*square)) # transpose
shuffle(square)
return square[:r]
Example:
>>> latin_rectangle(5, 4)
[(2, 4, 3, 5, 1),
(5, 2, 1, 3, 4),
(1, 3, 2, 4, 5),
(3, 5, 4, 1, 2)]
Note that this algorithm can't generate all possible Latin squares; by construction, the rows are cyclic permutations of each other, so you won't get Latin squares in other equivalence classes. I'm assuming that's OK since generating a uniform probability distribution over all possible outputs isn't one of the question requirements.
The upside is that this is guaranteed to work, and consistently in O(n^2) time, because it doesn't use rejection sampling or backtracking.
Now let's solve the case where r > n, i.e. we need more rows. Each column can't have equal frequencies for each number unless r % n == 0, but it's simple enough to guarantee that the frequencies in each column will differ by at most 1. Generate enough Latin squares, put them on top of each other, and then slice r rows from it. For additional randomness, it's safe to shuffle those r rows, but only after taking the slice.
def generate_permutations(n, r):
rows = []
while len(rows) < r:
rows.extend(latin_rectangle(n, n))
rows = rows[:r]
shuffle(rows)
return rows
Example:
>>> generate_permutations(5, 12)
[(4, 3, 5, 2, 1),
(3, 4, 1, 5, 2),
(3, 1, 2, 4, 5),
(5, 3, 4, 1, 2),
(5, 1, 3, 2, 4),
(2, 5, 1, 3, 4),
(1, 5, 2, 4, 3),
(5, 4, 1, 3, 2),
(3, 2, 4, 1, 5),
(2, 1, 3, 5, 4),
(4, 2, 3, 5, 1),
(1, 4, 5, 2, 3)]
This uses the numbers 1 to n because of the formula 1 + (i + j) % n in the first list comprehension. If you want to use something other than the numbers 1 to n, you can take it as a list (e.g. players) and change this part of the list comprehension to players[(i + j) % n], where n = len(players).
If runtime is not that important I would go for the lazy way and generate all possible permutations (itertools can do that for you) and then filter out all permutations which do not meet your requirements.
Here is one way to do it.
import itertools
def permuts (l, n):
all_permuts = list(itertools.permutations(l))
picked = []
for a in all_permuts:
valid = True
for p in picked:
for i in range(len(a)):
if a[i] == p[i]:
valid = False
break
if valid:
picked.append (a)
if len(picked) >= n:
break
print (picked)
permuts ([1,2,3,4,5,6], 3)

top 10 most frequent wordlengths in a list of words

I am writing a function that returns the top 10 most frequent word lengths in a file called wordlist.txt that contains all words starting from a to z. I have wrote a function (named 'value_length') that returns a list of each word's length inside a certain list. I also applied the Counter module in a dictionary (that has the lengths of word as keys, frequency of those lengths as values) to solve the problem.
from collections import Counter
def value_length(seq):
'''This function takes a sequence and returns a list that contains
the length of each element
'''
value_l = []
for i in range(len(seq)):
length = len(seq[i])
value_l.append(length)
print(value_l)
# open the txt file
fileobj = open("wordlist.txt", "r")
file_content = []
# create a list with length of every single word
for line in fileobj:
file_content.append(line)
wordlist_lengths = value_length(file_content)
# create a dictionary that has the number of occurrence of each length as key
occurrence = {x:file_content.count(x) for x in file_content}
c = Counter(occurrence)
c.most_common(10)
But whenever I run this code, I do not get the result I desired; I only get the outcome from the value_length function (i.e. an extremely long list that has the length of each word). In other words, Python does not interpret the dictionary. I do not understand what my mistake is.
There's no need to store the lengths in a list, or to use the list's count method; you've imported Counter already, so just use that to do the counting.
c = Counter()
for word in seq:
length = len(word)
c[length] += 1
This code will find the lengths of each list item and sort them. Then you can simply make a tuple out of the occurance + count of occurance in list:
words = ["Hi", "bye", "hello", "what", "no", "crazy", "why", "say", "imaginary"]
lengths = [len(w) for w in words]
print(lengths)
sortedLengths = sorted(lengths)
print(sortedLengths)
countedLengths = [(w, sortedLengths.count(w)) for w in sortedLengths]
print(countedLengths)
This prints:
[2, 3, 5, 4, 2, 5, 3, 3, 9]
[2, 2, 3, 3, 3, 4, 5, 5, 9]
[(2, 2), (2, 2), (3, 3), (3, 3), (3, 3), (4, 1), (5, 2), (5, 2), (9, 1)]

Quickest way to remove mirror opposites from a list

Say I have a list of tuples [(0, 1, 2, 3), (4, 5, 6, 7), (3, 2, 1, 0)], I would like to remove all instances where a tuple is reversed e.g. removing (3, 2, 1, 0) from the above list.
My current (rudimentary) method is:
L = list(itertools.permutations(np.arange(x), 4))
for ll in L:
if ll[::-1] in L:
L.remove(ll[::-1])
Where time taken increases exponentially with increasing x. So if x is large this takes ages! How can I speed this up?
Using set comes to mind:
L = set()
for ll in itertools.permutations(np.arange(x), 4):
if ll[::-1] not in L:
L.add(ll)
or even, for slightly better performance:
L = set()
for ll in itertools.permutations(np.arange(x), 4):
if ll not in L:
L.add(ll[::-1])
The need to keep the first looks like it forces you to iterate with a contitional.
a = [(0, 1, 2, 3), (4, 5, 6, 7), (3, 2, 1, 0)]
s = set(); a1 = []
for t in a:
if t not in s:
a1.append(t)
s.add(t[::-1])
Edit: The accepted answer addresses the example code (i.e. the itertools permutations sample). This answers the generalized question for any list (or iterable).

Fast removal of consecutive duplicates in a list and corresponding items from another list

My question is similar to this previous SO question.
I have two very large lists of data (almost 20 million data points) that contain numerous consecutive duplicates. I would like to remove the consecutive duplicate as follows:
list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2] # This is 20M long!
list2 = ... # another list of size len(list1), also 20M long!
i = 0
while i < len(list)-1:
if list[i] == list[i+1]:
del list1[i]
del list2[i]
else:
i = i+1
And the output should be [1, 2, 3, 4, 5, 1, 2] for the first list.
Unfortunately, this is very slow since deleting an element in a list is a slow operation by itself. Is there any way I can speed up this process? Please note that, as shown in the above code snipped, I also need to keep track of the index i so that I can remove the corresponding element in list2.
Python has this groupby in the libraries for you:
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [k for k,_ in groupby(list1)]
[1, 2, 3, 4, 5, 1, 2]
You can tweak it using the keyfunc argument, to also process the second list at the same time.
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> list2 = [9,9,9,8,8,8,7,7,7,6,6,6,5]
>>> from operator import itemgetter
>>> keyfunc = itemgetter(0)
>>> [next(g) for k,g in groupby(zip(list1, list2), keyfunc)]
[(1, 9), (2, 7), (3, 7), (4, 7), (5, 6), (1, 6), (2, 5)]
If you want to split those pairs back into separate sequences again:
>>> zip(*_) # "unzip" them
[(1, 2, 3, 4, 5, 1, 2), (9, 7, 7, 7, 6, 6, 5)]
You can use collections.deque and its max len argument to set a window size of 2. Then just compare the duplicity of the 2 entries in the window, and append to the results if different.
def remove_adj_dups(x):
"""
:parameter x is something like '1, 1, 2, 3, 3'
from an iterable such as a string or list or a generator
:return 1,2,3, as list
"""
result = []
from collections import deque
d = deque([object()], maxlen=2) # 1st entry is object() which only matches with itself. Kudos to Trey Hunner -->object()
for i in x:
d.append(i)
a, b = d
if a != b:
result.append(b)
return result
I generated a random list with duplicates of 20 million numbers between 0 and 10.
def random_nums_with_dups(number_range=None, range_len=None):
"""
:parameter
:param number_range: use the numbers between 0 and number_range. The smaller this is then the more dups
:param range_len: max len of the results list used in the generator
:return: a generator
Note: If number_range = 2, then random binary is returned
"""
import random
return (random.choice(range(number_range)) for i in range(range_len))
I then tested with
range_len = 2000000
def mytest():
for i in [1]:
return [remove_adj_dups(random_nums_with_dups(number_range=10, range_len=range_len))]
big_result = mytest()
big_result = mytest()[0]
print(len(big_result))
The len was 1800197 (read dups removed), in <5 secs, which includes the random list generator spinning up.
I lack the experience/knowhow to say if it is memory efficient as well. Could someone comment please

unexpected list appearing in python loop

I am new to python and have the following piece of test code featuring a nested loop and I'm getting some unexpected lists generated:
import pybel
import math
import openbabel
search = ["CCC","CCCC"]
matches = []
#n = 0
#b = 0
print search
for n in search:
print "n=",n
smarts = pybel.Smarts(n)
allmol = [mol for mol in pybel.readfile("sdf", "zincsdf2mols.sdf.txt")]
for b in allmol:
matches = smarts.findall(b)
print matches, "\n"
Essentially, the list "search" is a couple of strings I am looking to match in some molecules and I want to iterate over both strings in every molecule contained in allmol using the pybel software. However, the result I get is:
['CCC', 'CCCC']
n= CCC
[(1, 2, 28), (1, 2, 4), (2, 4, 5), (4, 2, 28)]
[]
n= CCCC
[(1, 2, 4, 5), (5, 4, 2, 28)]
[]
as expected except for a couple of extra empty lists slotted in which are messing me up and I cannot see where they are coming from. They appear after the "\n" so are not an artefact of the smarts.findall(). What am I doing wrong?
thanks for any help.
allmol has 2 items and so you're looping twice with matches being an empty list the second time.
Notice how the newline is printed after each; changing that "\n" to "<-- matches" may clear things up for you:
print matches, "<-- matches"
# or, more commonly:
print "matches:", matches
Perhaps it is supposed to end like this
for b in allmol:
matches.append(smarts.findall(b))
print matches, "\n"
otherwise I'm not sure why you'd initialise matches to an empty list
If that is the case, you can instead write
matches = [smarts.findall(b) for b in allmol]
print matches
another possibility is that the file is ending in an empty line
for b in allmol:
if not b.strip(): continue
matches.append(smarts.findall(b))
print matches, "\n"

Categories

Resources