unexpected list appearing in python loop - python

I am new to python and have the following piece of test code featuring a nested loop and I'm getting some unexpected lists generated:
import pybel
import math
import openbabel
search = ["CCC","CCCC"]
matches = []
#n = 0
#b = 0
print search
for n in search:
print "n=",n
smarts = pybel.Smarts(n)
allmol = [mol for mol in pybel.readfile("sdf", "zincsdf2mols.sdf.txt")]
for b in allmol:
matches = smarts.findall(b)
print matches, "\n"
Essentially, the list "search" is a couple of strings I am looking to match in some molecules and I want to iterate over both strings in every molecule contained in allmol using the pybel software. However, the result I get is:
['CCC', 'CCCC']
n= CCC
[(1, 2, 28), (1, 2, 4), (2, 4, 5), (4, 2, 28)]
[]
n= CCCC
[(1, 2, 4, 5), (5, 4, 2, 28)]
[]
as expected except for a couple of extra empty lists slotted in which are messing me up and I cannot see where they are coming from. They appear after the "\n" so are not an artefact of the smarts.findall(). What am I doing wrong?
thanks for any help.

allmol has 2 items and so you're looping twice with matches being an empty list the second time.
Notice how the newline is printed after each; changing that "\n" to "<-- matches" may clear things up for you:
print matches, "<-- matches"
# or, more commonly:
print "matches:", matches

Perhaps it is supposed to end like this
for b in allmol:
matches.append(smarts.findall(b))
print matches, "\n"
otherwise I'm not sure why you'd initialise matches to an empty list
If that is the case, you can instead write
matches = [smarts.findall(b) for b in allmol]
print matches
another possibility is that the file is ending in an empty line
for b in allmol:
if not b.strip(): continue
matches.append(smarts.findall(b))
print matches, "\n"

Related

Randomizing list of words with Python

I need help with solving a problem. I have list of 8 words.
What I need to achieve is to generate all possible variants where exactly 3 of this words are included. All those variants need to be saved in different .txt file. The same words but in different positions should be treated as another variant.
It needs to be done on Raspberry Pi in Python.
To be honest i don't even know where to start with it...
I am total noob in any sort of programming...
Any clues how to do it?
You can easily solve this problem by using itertools. In the following example, I will produce all possible combinations of 3 elements for the list l:
>>> import itertools
>>> l = [1, 2, 3, 4, 5]
>>> list(itertools.permutations(l, 3))
[(1, 2, 3),
(1, 2, 4),
(1, 2, 5),
(1, 3, 4),
(1, 3, 5),
(1, 4, 5),
...
...
(2, 3, 4),
(2, 3, 5),
(2, 4, 5),
(3, 4, 5),
...
...
(5, 4, 2),
(5, 4, 3)]
Now, if you want to save those values in a different text file, you should do the following:
for i, e in enumerate(itertools.permutations(l, 3)):
with open(f"file_{i}.txt","w+") as f:
f.write(e)
#lmiguelvargasf's answer only covers half the question, so here comes the part where you save the word combinations to individual files.
import itertools
import random
import string
class fileNames:
def __init__(self):
self.file_names = []
def randomString(self,stringLength):
letters = string.ascii_lowercase
file_name = ''.join(random.choice(letters) for i in range(stringLength))
if file_name in self.file_names:
randomString(stringLength)
self.file_names.append(file_name)
return file_name
# Original word list
l = [1, 2, 3]
# Create a new list, containing the combinations
word_combinations = list(itertools.permutations(l, 3))
# Creating an instance of fileNames() class
files = fileNames()
# Specifying the number of characters
n = 5
# For each of these combinations, save in a file
for word_comb in word_combinations:
# The file will be named by a random string containing n characters
with open('{}.txt'.format(files.randomString(n)), 'w') as f:
# The file will contain each word seperated by a space, change the string below as desired
f.write('{} {} {}'.format(word_comb[0], word_comb[1], word_comb[2]))
If you want the filename to be an integer which increases with 1 for every file, do swap the last part with this:
# For each of these combinations, save in a file
for n, word_comb in enumerate(word_combinations):
# The file will be named by an integer
with open('{}.txt'.format(n), 'w') as f:
# The file will contain each word seperated by a space, change the string below as desired
f.write('{} {} {}'.format(word_comb[0], word_comb[1], word_comb[2]))
try using random.choice()
import random
# your code
word = []
for x in range(0, 7)
word.add(random.choice(words))
file.write(word)
repeat everything after word = [] using a for loop and you can even check if theyre the same using a method described here

Adding list python

Help! What do I have to change so that it comes out like this?
[('Mavis', 3), ('Ethel', 1), ('Rick', 2), ('Joseph', 5), ('Louis', 4)]
Right now, with my code, it comes out like this.
bots_status = [(bot_one_info) + (bot_two_info) + (bot_three_info) + (bot_four_info) + (bot_five_info)]
[('Mavis', 3, 'Ethel', 1, 'Rick', 2, 'Joseph', 5, 'Louis', 4)]
Place commas instead of + signs between your bots.
If working with a variable amount of entries, initialize an array and add to it using append.
bots_status = []
for bot_info in bot_infos:
bots_status.append(bot_info)
Replace the plusses (+) by commas (,) to make this a list of tuples instead of a list of one concatenated tuple:
bots_status = [bot_one_info, bot_two_info, bot_three_info, bot_four_info, bot_five_info]
Since your bot_x_info variables already are tuples, you also don’t need to use parentheses around the names (those don’t do anything).
The problem with your code was that you were using + on the tuples. The add operator concatenates tuples to a single one:
>>> (1, 2) + (3, 4)
(1, 2, 3, 4)
That’s why you ended up with one giant tuple in your list.
What you wanted is have each tuple as a separate item in the list, so you just need to create a list from those. Just like you would do [1, 2, 3] to create a list with three items, using a comma to separate each item, you also do this with other values, e.g. tuples in your case.
Let's say:
bot_one_info = ('Mavis', 3)
bot_two_info = ('Mavi', 3)
If you use +
lis = [bot_one_info + bot_two_info]
print lis
#Output
[('Mavis', 3, 'Mavi', 3)]
But if you use ,
lis = [bot_one_info,bot_two_info]
print lis
#Output
[('Mavis', 3), ('Mavi', 3)]
You can use here , instead of +.

Split string with parenthesis within parenthesis in Python

I have two strings types; each type can have one of the following exemplary forms:
str = ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
or
str = ((0, 1, 2), (3, 4, 5, 6, 7), (8, 9))
The number of substrings within parentheses in the second form can range from 1 to any number.
I need a) to be able to detect the presence of each form, and b) if the string has the second form I need to extract each of the substrings within each of the inner parenthesis.
I have a basic understanding of regular expressions but I can't see how this should be handled.
If these are the only two options you can use:
if type(str[0]) == int:
print 'TYPE1'
else if type(str[0]) == tuple:
print 'TYPE2'
else:
print 'unknown'
and for your second question, in case you're in form 2, use:
list(sum(str, ()))
to flatten the tuple, this way you can access every element individually.
If you want to access the tuples as whole, you can use:
for element in str:
#element is an inner tuple
for inner_element in element:
#inner_element is an integer within the tuple
print inner_element
Hope this helps

Find and use multiple occurences of a string in a string

I recently started using Python and wrote some simple scripts
Now I have this question:
I have this string:
mystring = 'AAAABBAAABBAAAACCAAAACCAAAA'
and I have these following strings:
String_A = BB
String_B = CC
I would like to get all possible combinations of strings starting with String_A and ending with String_B (kind of vague so below is the desired output)
output:
BBAAABBAAAACCAAACC
BBAAABBAAAACC
BBAAACCAAAACC
BBAAACC
I am able to count the number of occurences of String_A and String_B in mystring using
mystring.count()
And I am able to print out one specific output (the one with the first occurence of String_A and the first occurence of String_B), by doing the following:
if String_A in mystring:
String_B_End = mystring.index(String_B) + len(String_B)
output = mystring[mystring.index(String_A); String_B_End]
print(output)
this works perfect but only gives me the following output:
BBAAABBAAAACC
How can I get all the specified output strings from mystring?
thanx in advance!
If I understand the intention of your question correctly you can use the following code:
>>> import re
>>> mystring = 'AAAABBAAABBAAAACCAAAACCAAAA'
>>> String_A = 'BB'
>>> String_B = 'CC'
>>> def find_occurrences(s, a, b):
a_is = [m.start() for m in re.finditer(re.escape(a), s)] # All indexes of a in s
b_is = [m.start() for m in re.finditer(re.escape(b), s)] # All indexes of b in s
result = [s[i:j+len(b)] for i in a_is for j in b_is if j>i]
return result
>>> find_occurrences(mystring, String_A, String_B)
['BBAAABBAAAACC', 'BBAAABBAAAACCAAAACC', 'BBAAAACC', 'BBAAAACCAAAACC']
This uses the find all occurrences of a substring code from this answer
In its current form the code does not work for overlapping substrings, if mystring = 'BBB' and you look for substring 'BB' it only returns the index 0. If you want to account for such overlapping substrings change the lines where you are getting the indexes of the substrings to a_is = [m.start() for m in re.finditer("(?={})".format(re.escape(a)), s)]
Well, first you need to get the indexes of String_A and String_B in the text. See this:
s = mystring
[i for i in range(len(s)-len(String_A)+1) if s[i:i+len(String_A)]==String_A]
it returns [4, 9], i.e. the indexes of 'BB' in mystring. You do similarly for String_B for which the answer would be [15, 21].
Then you do this:
[(i, j) for i in [4, 9] for j in [15, 21] if i < j]
This line combines each starting location with each ending location and ensures that the starting location occurs before the ending location. The i < j would not be essential for this particular example, but in general you should have it. The result is [(4, 15), (4, 21), (9, 15), (9, 21)].
Then you just convert the start and end indices to substrings:
[s[a:b+len(String_B)] for a, b in [(4, 15), (4, 21), (9, 15), (9, 21)]]

python string slicing with a list

Here is my list:
liPos = [(2,5),(8,9),(18,22)]
The first item of each tuple is the starting position and the second is the ending position.
Then I have a string like this:
s = "I hope that I will find an answer to my question!"
Now, considering my liPos list, I want to format the string by removing the chars between each starting and ending position (and including the surrounding numbers) provided in the tuples. Here is the result that I want:
"I tt I will an answer to my question!"
So basically, I want to remove the chars between 2 and 5 (including 2 and 5), then between 8,9 (including 8 and 9) and finally between 18,22 (including 18 and 22).
Any suggestion?
This assumes that liPos is already sorted, if it is not used sorted(liPos, reverse=True) in the for loop.
liPos = [(2,5),(8,9),(18,22)]
s = "I hope that I will find an answer to my question!"
for begin, end in reversed(liPos):
s = s[:begin] + s[end+1:]
print s
Here is an alternative method that constructs a new list of slice tuples to include, and then joining the string with only those included portions.
from itertools import chain, izip_longest
# second slice index needs to be increased by one, do that when creating liPos
liPos = [(a, b+1) for a, b in liPos]
result = "".join(s[b:e] for b, e in izip_longest(*[iter(chain([0], *liPos))]*2))
To make this slightly easier to understand, here are the slices generated by izip_longest:
>>> list(izip_longest(*[iter(chain([0], *liPos))]*2))
[(0, 2), (6, 8), (10, 18), (23, None)]
liPos = [(2,5),(8,9),(18,22)]
s = "I hope that I will find an answer to my question!"
exclusions = set().union(* (set(range(t[0], t[1]+1)) for t in liPos) )
pruned = ''.join(c for i,c in enumerate(s) if i not in exclusions)
print pruned
Here is one, compact possibility:
"".join(s[i] for i in range(len(s)) if not any(start <= i <= end for start, end in liPos))
This ... is a quick stab at the problem. There may be a better way, but it's a start at least.
>>> liPos = [(2,5),(8,9),(18,22)]
>>>
>>> toRemove = [i for x, y in liPos for i in range(x, y + 1)]
>>>
>>> toRemove
[2, 3, 4, 5, 8, 9, 18, 19, 20, 21, 22]
>>>
>>> s = "I hope that I will find an answer to my question!"
>>>
>>> s2 = ''.join([c for i, c in enumerate(s) if i not in toRemove])
>>>
>>> s2
'I tt I will an answer to my question!'

Categories

Resources