I am new to python and really programming in general and am learning python through a website called rosalind.info, which is a website that aims to teach through problem solving.
Here is the problem, wherein you're asked to calculate the percentage of guanine and thymine to the string of DNA given to for each ID, then return the ID of the sample with the greatest percentage.
I'm working on the sample problem on the page and am experiencing some difficulty. I know my code is probably really inefficient and cumbersome but I take it that's to be expected for those who are new to programming.
Anyway, here is my code.
gc = open("rosalind_gcsamp.txt","r")
biz = gc.readlines()
i = 0
gcc = 0
d = {}
for i in xrange(biz.__len__()):
if biz[i].startswith(">"):
biz[i] = biz[i].replace("\n","")
biz[i+1] = biz[i+1].replace("\n","") + biz[i+2].replace("\n","")
del biz[i+2]
What I'm trying to accomplish here is, given input such as this:
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
Break what's given into a list based on the lines and concatenate the two lines of DNA like so:
['>Rosalind_6404', 'CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG', 'TCCCACTAATAATTCTGAGG\n']
And delete the entry two indices after the ID, which is >Rosalind. What I do with it later I still need to figure out.
However, I keep getting an index error and can't, for the life of me, figure out why. I'm sure it's a trivial reason, I just need some help.
I've even attempted the following to limited success:
for i in xrange(biz.__len__()):
if biz[i].startswith(">"):
biz[i] = biz[i].replace("\n","")
biz[i+1] = biz[i+1].replace("\n","") + biz[i+2].replace("\n","")
elif biz[i].startswith("A" or "C" or "G" or "T") and biz[i+1].startswith(">"):
del biz[i]
which still gives me an index error but at least gives me the biz value I want.
Thanks in advance.
It is very easy do with itertools.groupby using lines that start with > as the keys and as the delimiters:
from itertools import groupby
with open("rosalind_gcsamp.txt","r") as gc:
# group elements using lines that start with ">" as the delimiter
groups = groupby(gc, key=lambda x: not x.startswith(">"))
d = {}
for k,v in groups:
# if k is False we a non match to our not x.startswith(">")
# so use the value v as the key and call next on the grouper object
# to get the next value
if not k:
key, val = list(v)[0].rstrip(), "".join(map(str.rstrip,next(groups)[1],""))
d[key] = val
print(d)
{'>Rosalind_0808': 'CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT', '>Rosalind_5959': 'CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACGC', '>Rosalind_6404': 'CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG'}
If you need order use a collections.OrderedDict in place of d.
You are looping over the length of biz. So in your last iteration biz[i+1] and biz[i+2] don't exist. There is no item after the last.
Related
I have lists of items:
['MRS_103_005_010_BG_001_v001',
'MRS_103_005_010_BG_001_v002',
'MRS_103_005_010_FG_001_v001',
'MRS_103_005_010_FG_001_v002',
'MRS_103_005_010_FG_001_v003',
'MRS_103_005_020_BG_001_v001',
'MRS_103_005_020_BG_001_v002',
'MRS_103_005_020_BG_001_v003']
I need to identify the latest version of each item and store it to a new list. Having trouble with my logic.
Based on how this has been built I believe I need to first compare the indices to each other. If I find a match I then check to see which number is greater.
I figured I first needed to do a check to see if the folder names matched between the current index and the next index. I did this by making two variables, 0 and 1, to represent the index so I could do a staggered incremental comparison of the list on itself. If the two indices matched I then needed to check the vXXX number on the end. whichever one was the highest would be appended to the new list.
I suspect that the problem lies in one copy of the list getting to an empty index before the other one does but I'm unsure of how to compensate for that.
Again, I am not a programmer by trade. Any help would be appreciated! Thank you.
# Preparing variables for filtering the folders
versions = foundVerList
verAmountTotal = len(foundVerList)
verIndex = 0
verNextIndex = 1
highestVerCount = 1
filteredVersions = []
# Filtering, this will find the latest version of each folder and store to a list
while verIndex < verAmountTotal:
try:
nextVer = (versions[verIndex])
nextVerCompare = (versions[verNextIndex])
except IndexError:
verNextIndex -= 1
if nextVer[0:24] == nextVerCompare[0:24]:
if nextVer[-3:] < nextVerCompare [-3:]:
filteredVersions.append(nextVerCompare)
else:
filteredVersions.append(nextVer)
verIndex += 1
verNextIndex += 1
My expected output is:
print filteredVersions
['MRS_103_005_010_BG_001_v002', 'MRS_103_005_010_FG_001_v003']
['MRS_103_005_020_BG_001_v003']
The actual output is:
print filteredVersions
['MRS_103_005_010_BG_001_v002', 'MRS_103_005_010_FG_001_v002',
'MRS_103_005_010_FG_001_v003']
['MRS_103_005_020_BG_001_v002', 'MRS_103_005_020_BG_001_v003']
During the with loop I am using os.list on each folder referenced via verIndex. I believe the problem is that a list is being generated for every folder that is searched but I want all the searches to be combined in a single list which will THEN go through the groupby and sorted actions.
Seems like a case for itertools.groupby:
from itertools import groupby
grouped = groupby(data, key=lambda version: version.rsplit('_', 1)[0])
result = [sorted(group, reverse=True)[0] for key, group in grouped]
print(result)
Output:
['MRS_103_005_010_BG_001_v002',
'MRS_103_005_010_FG_001_v003',
'MRS_103_005_020_BG_001_v003']
This groups the entries by everything before the last underscore, which I understand to be the "item code".
Then, it sorts each group in reverse order. The elements of each group differ only by the version, so the entry with the highest version number will be first.
Lastly, it extracts the first entry from each group, and puts it back into a result list.
Try this:
text = """MRS_103_005_010_BG_001_v001
MRS_103_005_010_BG_001_v002
MRS_103_005_010_FG_001_v001
MRS_103_005_010_FG_001_v002
MRS_103_005_010_FG_001_v003
MRS_103_005_020_BG_001_v001
MRS_103_005_020_BG_001_v002
MRS_103_005_020_BG_001_v003
"""
result = {}
versions = text.splitlines()
for item in versions:
v = item.split('_')
num = int(v.pop()[1:])
name = item[:-3]
if result.get(name, 0) < num:
result[name] = num
filteredVersions = [k + str(v) for k, v in result.items()]
print(filteredVersions)
output:
['MRS_103_005_010_BG_001_v2', 'MRS_103_005_010_FG_001_v3', 'MRS_103_005_020_BG_001_v3']
I need to write a function that accepts a dictionary as the inventory and also a product_list of (name, number) pairs which indicate when we should update the inventory of that product by adding a certain number to it which could be a negative number.
Once a product is mentioned for the first time it is added to the dictionary and when its count reaches zero it shold remain in the dictionary. If the count ever becomes negative I need to raise a value error.
Example:
d = {"apple":50, "pear":30, "orange":25}
ps = [("apple",20),("pear",-10),("grape",18)]
shelve(d,ps)
d
{'pear': 20, 'grape': 18, 'orange': 25, 'apple': 70}
shelve(d,[("apple",-1000)])
Traceback (most recent call last):
ValueError: negative amount for apple
My code is giving either an unexpected EOF error or invalid syntax depending on if I include the last print line. It is definitely not currently accomplishing the goal but I believe this is the format and somewhat the logic I'll need to solve this. I need the function to print 'negative amount for x' where x is the fruit that is negative. Any help on this is appreciated
Code:
def shelve(inventory,product_list):
count = 0
try:
for x in product_list:
if x == True:
product_list.append(x)
count += key
else:
return product_list
except ValueError:
print ('negative amount for (product)')
print "hello program starts here"
d = {"apple":50, "pear":30, "orange":25}
ps = [("apple",20),("pear",-10),("grape",18)]
shelve(d,ps)
the important part of your task is to split your problem in sub problems. Using the dict and list data structure is mainly based on iterating over those data structures. Start simple and do one step at a time.So one way to solve the problem could be:
1.) Iterate over the product list (you can print the items to see what is happening). This will be the product loop.
for x in ps:
print x
Check how you can access the lists elements with e.g. changing print x to print x[0] or x[1]
2.) Now for every product in the product loop, you need to iterate the inventory and set the inventory to the corresponding values. Start by just iterating the inventory and print its contents. Check out how it works before doing more complicated stuff, play around with it. ^^-d
(I just noticed there is a simpler solution than iterating, since its a dict, you will know what to do)
3.) Now add the Value error and Exception stuff
Hope this helps
I am searching for hours and hours on this problem and tried everything possible but I can't get it cracked, I am quiet a dictionary noob.
I work with maya and got clashing names of lights, this happens when you duplicate a group all children are named the same as before, so having a ALL_KEY in one group results in a clashing name with a key_char in another group.
I need to identify a clashing name of the short name and return the long name so I can do a print long name is double or even a cmds.select.
Unfortunately everything I find on this matter in the internet is about returning if a list contains double values or not and only returns True or False, which is useless for me, so I tried list cleaning and list comparison, but I get stuck with a dictionary to maintain long and short names at the same time.
I managed to fetch short names if they are duplicates and return them, but on the way the long name got lost, so of course I can't identify it clearly anymore.
>import itertools
>import fnmatch
>import maya.cmds as mc
>LIGHT_TYPES = ["spotLight", "areaLight", "directionalLight", "pointLight", "aiAreaLight", "aiPhotometricLight", "aiSkyDomeLight"]
#create dict
dblList = {'long' : 'short'}
for x in mc.ls (type=LIGHT_TYPES, transforms=True):
y = x.split('|')[-1:][0]
dblList['long','short'] = dblList.setdefault(x, y)
#reverse values with keys for easier detection
rev_multidict = {}
for key, value in dblList.items():
rev_multidict.setdefault(value, set()).add(key)
#detect the doubles in the dict
#print [values for key, values in rev_multidict.items() if len(values) > 1]
flattenList = set(itertools.chain.from_iterable(values for key, values in rev_multidict.items() if len(values) > 1))
#so by now I got all the long names which clash in the scene already!
#means now I just need to make a for loop strip away the pipes and ask if the object is already in the list, then return the path with the pipe, and ask if the object is in lightlist and return the longname if so.
#but after many many hours I can't get this part working.
##as example until now print flattenList returns
>set([u'ALL_blockers|ALL_KEY', u'ABCD_0140|scSet', u'SARAH_TOPShape', u'ABCD_0140|scChars', u'ALL|ALL_KEY', u'|scChars', u'|scSet', u'|scFX', ('long', 'short'), u'ABCD_0140|scFX'])
#we see ALL_KEY is double! and that's exactly what I need returned as long name
#THIS IS THE PART THAT I CAN'T GET WORKING, CHECK IN THE LIST WHICH VALUES ARE DOUBLE IN THE LONGNAME AND RETURN THE SHORTNAME LIST.
THE WHOLE DICTIONARY IS STILL COMPLETE AS
seen = set()
uniq = []
for x in dblList2:
if x[0].split('|')[-1:][0] not in seen:
uniq.append(x.split('|')[-1:][0])
seen.add(x.split('|')[-1:][0])
thanks for your help.
I'm going to take a stab with this. If this isn't what you want let me know why.
If I have a scene with a hierarchy like this:
group1
nurbsCircle1
group2
nurbsCircle2
group3
nurbsCircle1
I can run this (adjust ls() if you need it for selection or whatnot):
conflictObjs = {}
objs = cmds.ls(shortNames = True, transforms = True)
for obj in objs:
if len( obj.split('|') ) > 1:
conflictObjs[obj] = obj.split('|')[-1]
And the output of conflictObjs will be:
# Dictionary of objects with non-unique short names
# {<long name>:<short name>}
{u'group1|nurbsCircle1': u'nurbsCircle1', u'group3|nurbsCircle1': u'nurbsCircle1'}
Showing me what objects don't have unique short names.
This will give you a list of all the lights which have duplicate short names, grouped by what the duplicated name is and including the full path of the duplicated objects:
def clashes_by_type(*types):
long_names = cmds.ls(type = types, l=True) or []
# get the parents from the lights, not using ls -type transform
long_names = set(cmds.listRelatives(*long_names, p=True, f=True) or [])
short_names = set([i.rpartition("|")[-1] for i in long_names])
short_dict = dict()
for sn in short_names:
short_dict[sn] = [i for i in long_names if i.endswith("|"+ sn)]
clashes = dict((k,v) for k, v in short_dict.items() if len(v) > 1)
return clashes
clashes_by_type('directionalLight', 'ambientLight')The main points to note:
work down from long names. short names are inherently unreliable!
when deriving the short names, include the last pipe so you don't get accidental overlaps of common names
short_names will always be a list of lists since it's created by a comprehension
once you have a dict of (name, [objects with that shortname]) it's easy to get clashes by looking for values longer than 1
This is kind of convoluted, so if I'm missing out on an easy construct for this, please let me know :)
I'm analysing the results of some matching experiments. At the end game, I want to be able to query things such as experiments[0]["cat"]["cat"], which yields the number of times "cat" was matched against "cat". Conversely, I could do experiments[0]["cat"]["dog"], when the first query was a cat and the match attempt was a dog.
The following is my code to populate this structure:
# initializing the first layer, a list of dictionaries.
experiments = []
for assignment in assignments:
match_sums = {}
experiments.append(match_sums)
for i in xrange(len(classes)):
for experiment in xrange(len(experiments)):
# experiments[experiment][classes[i]] should hold a dictionary,
# where the keys are the things that were matched against classes[i],
# and the value is the number of times this occurred.
experiments[experiment][classes[i]] = collections.defaultdict(dict)
# matches[experiment][i] is an integer for what the i'th match was in an experiment.
# classes[j] for some integer j is the string name of the i'th match. could be "dog" or "cat".
experiments[experiment][classes[i]][classes[matches[experiment][i]]] += 1
total_class_sums[classes[i]] = total_class_sums.get(classes[i], 0) + 1
print experiments[0]["cat"]["cat"]
exit()
So clearly this is a bit convoluted. And I'm getting a value of "1" for the last match, rather than a full dictionary at experiments[0]["cat"]. Have I approached this wrong? What could the bug here be? Sorry for the craziness and thanks for any possible help!
Two points:
Dictionary keys can be tuples; and
If you're counting things, use collections.Counter. (You can use defaultdict(int), but Counter is more useful.)
So, instead of
experiments[experiment][classes[i]][classes[matches[experiment][i]]] += 1
write
experiments = Counter()
...
experiments[experiment, classes[i], classes[matches[experiment][i]]] += 1
I just trying to guess your needs, so i tried to change order of your dimensions.
for className, classIdx in enumerate(classes):
experiment = collections.defaultdict(list)
experiments[className] = experiment
for assignment,assignmentIdx in enumerate(assignments):
counterpart = classes[matches[assignmentIdx][classIdx]]
experiment[counterpart].append((assignment,assignmentIdx))
print(len(experiments["cat"]["cat"]), len(experiments["cat"]))
I would like to figure out if any deal is selected twice or more.
The following example is stripped down for sake of readability. But in essence I thought the best solution would be using a dictionary, and whenever any deal-container (e.g. deal_pot_1) contains the same deal twice or more, I would capture it as an error.
The following code served me well, however by itself it throws an exception...
if deal_pot_1:
duplicates[deal_pot_1.pk] += 1
if deal_pot_2:
duplicates[deal_pot_2.pk] += 1
if deal_pot_3:
duplicates[deal_pot_3.pk] += 1
...if I didn't initialize this before hand like the following.
if deal_pot_1:
duplicates[deal_pot_1.pk] = 0
if deal_pot_2:
duplicates[deal_pot_2.pk] = 0
if deal_pot_3:
duplicates[deal_pot_3.pk] = 0
Is there anyway to simplify/combine this?
There are basically two options:
Use a collections.defaultdict(int). Upon access of an unknown key, it will initialise the correposnding value to 0.
For a dictionary d, you can do
d[x] = d.get(x, 0) + 1
to initialise and increment in a single statement.
Edit: A third option is collections.Counter, as pointed out by Mark Byers.
It looks like you want collections.Counter.
Look at collections.defaultdict. It looks like you want defaultdict(int).
So you only want to know if there are duplicated values? Then you could use a set:
duplicates = set()
for value in values:
if value in duplicates():
raise Exception('Duplicate!')
duplicates.add(value)
If you would like to find all duplicated:
maybe_duplicates = set()
confirmed_duplicates = set()
for value in values:
if value in maybe_duplicates():
confirmed_duplicates.add(value)
else:
maybe_duplicates.add(value)
if confirmed_duplicates:
raise Exception('Duplicates: ' + ', '.join(map(str, confirmed_duplicates)))
A set is probably the way to go here - collections.defaultdict is probably more than you need.
Don't forget to come up with a canonical order for your hands - like sort the cards from least to greatest, by suit and face value. Otherwise you might not detect some duplicates.