Related
I am trying to fetch the different file paths through this:
for (i, imagePath) in enumerate(imagePaths):
name = set(imagePath.split(os.path.sep)[-2])
It brings multiple paths that have the same names such as this:
Angelina Jolie
Angelina Jolie
Sam
Sam
Sam
What I want to do is print the unique ones of them. Like print Angelina Jolie only once. But whatever I try whether it is the unique method, or the set method to convert list to a set it returns something like this. And I am not understanding the logic behind this.
{'l', 'e', 'J', 'g', 'i', ' ', 'o', 'a', 'A', 'n'}
{'l', 'e', 'J', 'g', 'i', ' ', 'o', 'a', 'A', 'n'}
{'l', 'e', 'J', 'g', 'i', ' ', 'o', 'a', 'A', 'n'}
{'l', 'e', 'J', 'g', 'i', ' ', 'o', 'a', 'A', 'n'}
Please help me try to understand why this is happening and what solution should I look for?
There are two ways to remove duplicates.
list(dict.fromkeys(imagePaths).keys())
# or
list(set(imagePaths)) # If you don't care about order
You don't necessarily have to convert to lists btw.
I am trying to print a python list using join after it has randomly selected a specified amount of characters. What I want is for it to print all characters beside each other instead of printing each character on a separate line. Everything works fine up until my for statement, if I print out password_letters it will print (on separate lines) the specified amount based on nr_letters. All I want is to join/concatenate the specified letters onto one line. I have followed the documentation on here and some on google, but I still can't find where I have gone wrong.
Please help me find where I have gone wrong in the below code:
import random
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
nr_letters= int(input("How many letters would you like in your password?\n"))
password_letters = random.sample(letters, nr_letters )
for letter in password_letters:
print("".join(letter))
No need for a loop, just join the list.
print("".join(password_letters))
I want to write a really short script that will help me generate a random/nonsense word with the following qualities:
-Has 8 letters
-First letter is "A"
-Second and Fourth letters are random letters
-Fifth letter is a vowel
-Sixth and Seventh letters are random letters and are the same
-Eighth letter is a vowel that's not "a"
This is what I have tried so far (using all the info I could find and understand online)
firsts = 'A'
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
print [''.join(first, second, third, fourth, fifth)
for first in firsts
for second in seconds
for third in thirds
for fourth in fourths
for fifth in fifths
for sixth in sixths
for seventh in sevenths
for eighth in eighths]
However it keeps showing a SyntaxError: invalid syntax after the for and now I have absolutely no idea how to make this work. If possible please look into this for me, thank you so much!
So the magic function you need to know about to pick a random letter is random.choice. You can pass a list into this function and it will give you a random element from that list. It also works with strings because strings are basically a list of chars. Also to make your life easier, use string module. string.ascii_lowercase returns all the letters from a to z in a string so you don't have to type it out. Lastly, you don't use loops to join strings together. Keep it simple. You can just add them together.
import string
from random import choice
first = 'A'
second = choice(string.ascii_lowercase)
third = choice(string.ascii_lowercase)
fourth = choice(string.ascii_lowercase)
fifth = choice("aeiou")
sixthSeventh = choice(string.ascii_lowercase)
eighth = choice("eiou")
word = first + second + third + fourth + fifth + sixthSeventh + sixthSeventh + eighth
print(word)
Try this:
import random
sixth=random.choice(sixths)
s='A'+random.choice(seconds)+random.choice(thirds)+random.choice(fourths)+random.choice(fifths)+sixth+sixth+random.choice(eighths)
print(s)
Output:
Awixonno
Ahiwojjy
etc
There are several things to consider. First, the str.join() method takes in an iterable (e.g. a list), not a bunch of individual elements. Doing
''.join([first, second, third, fourth, fifth])
fixes the program in this respect. If you are using Python 3, print() is a function, and so you should add parentheses around the entire list comprehension.
With the syntax out of the way, let's get to a more interesting problem: Your program constructs every (82255680 !) possible word. This takes a long time and memory. What you want is probably to just pick one. You can of course do this by first constructing all, then picking one at random. It's far cheaper though to pick one letter from each of firsts, seconds, etc. at random and then collecting these. All together then:
import random
firsts = ['A']
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
result = ''.join([
random.choice(firsts),
random.choice(seconds),
random.choice(thirds),
random.choice(fourths),
random.choice(fifths),
random.choice(sixths),
random.choice(sevenths),
random.choice(eighths),
])
print(result)
To improve the code from here, try to:
Find a way to generate the "data" in a neater way than writing it out explicitly. As an example:
import string
seconds = list(string.ascii_lowercase) # you don't even need list()!
Instead of having a separate variable firsts, seconds, etc., collect these into a single variable, e.g. a single list containing each original list as a single str with all characters included.
This will implement what you describe. You can make the code neater by putting the choices into an overall list rather than have several different variables, but you will have to explicitly deal with the fact that the sixth and seventh letters are the same; they will not be guaranteed to be the same simply because there are the same choices available for each of them.
The list choices_list could contain sub-lists per your original code, but as you are choosing single characters it will work equally with strings when using random.choice and this also makes the code a bit neater.
import random
choices_list = [
'A',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'eiouy'
]
letters = [random.choice(choices) for choices in choices_list]
word = ''.join(letters[:6] + letters[5:]) # here the 6th letter gets repeated
print(word)
Some example outputs:
Alaeovve
Aievellu
Ategiwwo
Aeuzykko
Here's the syntax fix:
print(["".join([first, second, third])
for first in firsts
for second in seconds
for third in thirds])
This method might take up a lot of memory.
I have been working on Rosalind exercises for Bioinformatics stronghold on RNA Splicing. I am currently using Python 3.6 version. It didn't tell me there is any error in my code, so I'm assuming my code is fine. However, there is no output produced, no error warning or whatsoever. Below is my code:
DNA_CODON_TABLE = {
'TTT': 'F', 'CTT': 'L', 'ATT': 'I', 'GTT': 'V',
'TTC': 'F', 'CTC': 'L', 'ATC': 'I', 'GTC': 'V',
'TTA': 'L', 'CTA': 'L', 'ATA': 'I', 'GTA': 'V',
'TTG': 'L', 'CTG': 'L', 'ATG': 'M', 'GTG': 'V',
'TCT': 'S', 'CCT': 'P', 'ACT': 'T', 'GCT': 'A',
'TCC': 'S', 'CCC': 'P', 'ACC': 'T', 'GCC': 'A',
'TCA': 'S', 'CCA': 'P', 'ACA': 'T', 'GCA': 'A',
'TCG': 'S', 'CCG': 'P', 'ACG': 'T', 'GCG': 'A',
'TAT': 'Y', 'CAT': 'H', 'AAT': 'N', 'GAT': 'D',
'TAC': 'Y', 'CAC': 'H', 'AAC': 'N', 'GAC': 'D',
'TAA': '-', 'CAA': 'Q', 'AAA': 'K', 'GAA': 'E',
'TAG': '-', 'CAG': 'Q', 'AAG': 'K', 'GAG': 'E',
'TGT': 'C', 'CGT': 'R', 'AGT': 'S', 'GGT': 'G',
'TGC': 'C', 'CGC': 'R', 'AGC': 'S', 'GGC': 'G',
'TGA': '-', 'CGA': 'R', 'AGA': 'R', 'GGA': 'G',
'TGG': 'W', 'CGG': 'R', 'AGG': 'R', 'GGG': 'G'
}
def result(s):
result = ''
lines = s.split()
dna = lines[0]
introns = lines[1:]
for intron in introns:
dna = dna.replace(intron, '')
for i in range(0, len(dna), 3):
codon = dna[i:i+3]
protein = None
if codon in DNA_CODON_TABLE:
protein = DNA_CODON_TABLE[codon]
if protein == '-':
break
if protein:
result += protein
return ''.join(list(result))
if __name__ == "__main__":
"""small_dataset = ' '"""
large_dataset = open('rosalind_splc.txt').read().strip()
print (result(large_dataset))
This is the content in rosalind_splc.txt text file:
>Rosalind_3363
ATGGGGCTGAGCCCATGTCTAAATGATATCTTGGTGCATTGCAATCTAACTATTTTTTCG
CAACCATGTTCCATCTGGCGCAAAATGGGCGTGTAGGGAGCTTCGCTATAGTCACTGAAG
AACATTCGCAACTTACAGCTCTCGAGAGGGTACAGCTGGACGGTGTTTGTTTGGTCTAAG
TCTGAGTCCAAAGTCGTTGAATGTCGAGCTAGGTTGACGTCATTCTTCGAGTTACGTCTT
CATTGATTCGCGGCGGCCGCCAGCATTTGATTGTACACATCCGACGTCTTTGGCAATCTA
CATAATTATATTGAGAGGGGCGCCATTACTCGAACCCATAACAAACAACTGTCCGTTTAC
AAGGTTATATTATCATGACCTAATGGTTGAGCTACGGAGTGGGGGGCCCTCGGCTACAGG
TGTTAAACTATCCTGCGGATGCGGATCTTAGCCCGATTTGCATGGCCCAGTAAGGCGCTG
ATTGTAAACCGCCTAGCATACATGTGCTTCTTACTCCAGGGTCCATTGCTACCAGTTCGC
TTCTGACGCCTCAATTGTACCTTCCTTTTTTGAATGGCAACCTGCAATAGCAGTCGACTG
ATGGGGCGTTACAGTATGAAGGCTATATTTACATTATCTCTAAACACACTGCTACCGCGA
AACCCCAACTCGGACCGGTCAGAGCGCTCGTGCTTTGTTCTTGGTCGCTAGCGACCAACA
GTGGATAGGTGGGCGCGGGCCTTGCACCTCCTAGAGCATCACGTGGAGTGGATGCAAACA
GTCTATGGTCCCCCGCTTCGGCTCACGGGTAACGTCTCTTGTGGTACTAGACCATAGGCA
TCCAGGTGAGGGCTACATCCGTATTTAATGAAACTGAGTTCCTCCAAAGCTCCTCGGGAC
GCAGGCAGGTTCATCCGCAGTCAGTAAGGGAGGGAAGAGCTTTCCCCGTTCCACCCAGAT
GCCCTGTGCACGGGAGAGAGATCCAGGTGGTAG
>Rosalind_0423
TCGCAACTTACAGCTCTCGAGAGGG
>Rosalind_5768
GCCCAGTAAGGCGCTGATTGTAAACCGCCTAGCATACAT
>Rosalind_6780
GTCTTCATTGATTCGCGGCGGCCGCCAGCA
>Rosalind_6441
GCAAACAGTCT
>Rosalind_3315
TTGGTCGCTAGCGACCAACAGTGGATAGGTGGGCGCGGGCCTTGCACCT
>Rosalind_7467
TTATCTCTAAACACACTGC
>Rosalind_3159
CGCAGTCAGTAAGGGAGG
>Rosalind_6420
TCTAAGTCTGAGTCCAAAGTCGTTGAATGTCGAGCTAGGTTGACGT
>Rosalind_8344
GGGGCGCCATTACTCGAACCCATAACAAACAACT
>Rosalind_2993
CCAGGTGAGGGCTACATCCGTAT
>Rosalind_0536
ATTATCATGACCTAATG
>Rosalind_3774
TCGCAACCATGTTCCAT
>Rosalind_7168
GGGCCCTCGGCTACAGGTGTTAAACTAT
>Rosalind_8059
CAATTGTACCTTCCTTTTTTGAATG
Since there is no output given, I would like to know which part of my code need to be fixed in order for the output to come out. Thanks.
To understand which part of your code you need to change, it helps to understand what goes wrong in your code. If you have a code editor with a debugger, it helps to step through the code. If you don't have one, you can use the online tool http://pythontutor.com. Here is a direct link to your code with the first few lines of your input.
Click on the forward button under the code. At step 20 you jump into your function result(). After step 24 your input is split on the newlines. You can see that lines is now:
lines = ['>Rosalind_3363',
'ATGGGGCTGAGCCCATGTCTAAATGATATCTTGGTGCATTGCAATCTAACTATTTTTTCG',
'CAACCATGTTCCATCTGGCGCAAAATGGGCGTGTAGGGAGCTTCGCTATAGTCACTGAAG',
'>Rosalind_0423',
'TCGCAACTTACAGCTCTCGAGAGGG',
'>Rosalind_5768',
'GCCCAGTAAGGCGCTGATTGTAAACCGCCTAGCATACAT']
In step 25, you assign the first item of lines to the variable dna. So dna is now equal to >Rosalind_3363. You assign the rest of the items in the list to the variable introns in the next step. So now we have
dna = '>Rosalind_3363'
introns = ['ATGGGGCTGAGCCCATGTCTAAATGATATCTTGGTGCATTGCAATCTAACTATTTTTTCG',
'CAACCATGTTCCATCTGGCGCAAAATGGGCGTGTAGGGAGCTTCGCTATAGTCACTGAAG',
'>Rosalind_0423',
'TCGCAACTTACAGCTCTCGAGAGGG',
'>Rosalind_5768',
'GCCCAGTAAGGCGCTGATTGTAAACCGCCTAGCATACAT']
Here the first signs of trouble are already apparent. You probably expect dna to contain a DNA sequence. But it contains the sequence header of the FASTA file. Similarly, introns should only contain DNA sequences as well, but here they also contains FASTA sequence headers (>Rosalind_0423, >Rosalind_5768).
So what happens in the next lines doesn't make any sense anymore with the data you have now.
In the lines
for intron in introns:
dna = dna.replace(intron, '')
you want to remove the introns from the DNA, but dna doesn't contain a DNA sequence string and introns contains other things than substrings of dna. So after this loop, dna still equals >Rosalind_3363. None of the three letter sequences of dna (>Ro, sal, ind, ...) are valid codons, so they are not found in DNA_CODON_TABLE. And hence, result() returns an empty string.
Now my guess as to what happened. You lifted the code verbatim from the internet (it is exactly equal to the code here) without understanding what it does and without realizing that the original author had already preprocessed the input data.
So, what do you need to do to fix the code?
parse the FASTA file, for example using Bio.SeqIO.parse()
If necessary, concatenate the DNA strings of the first sequence. This is what should end up in your dnavariable
the following sequence strings are what should end up in your introns variable.
When i run the scirpt below, i get no output at all. What i really want to do is: Create a string from an iterable and then use this string as an argument to re.findall.
Print(tab), gives a-z0-9.
import re
my_tab = ['a-z',
'0-9']
tab = ''.join(my_tab)
line = 'and- then 3 times minus 456: no m0re!'
re.findall('tab', 'line')
What am i missing here? Is this the most pythonic way to achieve this??
This will not work, you are telling the regular expression to search for the string 'tab' in the string 'line' .
Even if you did not make that mistake. And did indeed search using the string 'a-z 0-9' which you named tab with the string 'and- then 3 times minus 456: no m0re!' which you named line you would find nothing, this is because 'a-z 0-9' is not valid as regular expression capture group, and will result in no matches in this case.
If you wanted to find any instance of a lower-case letter (a-z) or a number (0-9) you could use this:
>>> re.findall('([a-z\d])', 'and- then 3 times minus 456: no m0re!')
['a', 'n', 'd', 't', 'h', 'e', 'n', '3', 't', 'i', 'm', 'e', 's', 'm', 'i', 'n', 'u', 's', '4', '5', '6', 'n', 'o', 'm', '0', 'r', 'e']
But I do not see how this helps you? Maybe you could explain what you are trying to do.. Either way, I suggest you read about regular expression to learn more.
You have done 'tab' and not tab. One is a string, another is a variable. You want to do re.findall(tab, line) (see how tab is no longer a string). You also did this for line.
However, if you print tab beforehand, you'll notice you have:
a-z0-9
When I think you're intending to have
[a-z0-9]
So you can concatenate strings:
>>> print re.findall('['+tab+']',line) # Here we add a bracket to each side
# of a-z0-9 to create a valid regex
# capture group [a-z0-9]
['a', 'n', 'd', 't', 'h', 'e', 'n', '3', 't', 'i', 'm', 'e', 's', 'm', 'i', 'n', 'u', 's', '4', '5', '6', 'n', 'o', 'm', '0', 'r', 'e']
Or you can use str.format():
>>> print re.findall('[{}]'.format(tab),line)
['a', 'n', 'd', 't', 'h', 'e', 'n', '3', 't', 'i', 'm', 'e', 's', 'm', 'i', 'n', 'u', 's', '4', '5', '6', 'n', 'o', 'm', '0', 'r', 'e']
re.findall(tab, line)
You have used two strings not variables. And actually I think what you want is re.findall('[a-z0-9]', line). But for this goal, you could just use list comprehension [x for x in list(line) if x != ' '].