python return double entry in dictionary - python

I am searching for hours and hours on this problem and tried everything possible but I can't get it cracked, I am quiet a dictionary noob.
I work with maya and got clashing names of lights, this happens when you duplicate a group all children are named the same as before, so having a ALL_KEY in one group results in a clashing name with a key_char in another group.
I need to identify a clashing name of the short name and return the long name so I can do a print long name is double or even a cmds.select.
Unfortunately everything I find on this matter in the internet is about returning if a list contains double values or not and only returns True or False, which is useless for me, so I tried list cleaning and list comparison, but I get stuck with a dictionary to maintain long and short names at the same time.
I managed to fetch short names if they are duplicates and return them, but on the way the long name got lost, so of course I can't identify it clearly anymore.
>import itertools
>import fnmatch
>import maya.cmds as mc
>LIGHT_TYPES = ["spotLight", "areaLight", "directionalLight", "pointLight", "aiAreaLight", "aiPhotometricLight", "aiSkyDomeLight"]
#create dict
dblList = {'long' : 'short'}
for x in mc.ls (type=LIGHT_TYPES, transforms=True):
y = x.split('|')[-1:][0]
dblList['long','short'] = dblList.setdefault(x, y)
#reverse values with keys for easier detection
rev_multidict = {}
for key, value in dblList.items():
rev_multidict.setdefault(value, set()).add(key)
#detect the doubles in the dict
#print [values for key, values in rev_multidict.items() if len(values) > 1]
flattenList = set(itertools.chain.from_iterable(values for key, values in rev_multidict.items() if len(values) > 1))
#so by now I got all the long names which clash in the scene already!
#means now I just need to make a for loop strip away the pipes and ask if the object is already in the list, then return the path with the pipe, and ask if the object is in lightlist and return the longname if so.
#but after many many hours I can't get this part working.
##as example until now print flattenList returns
>set([u'ALL_blockers|ALL_KEY', u'ABCD_0140|scSet', u'SARAH_TOPShape', u'ABCD_0140|scChars', u'ALL|ALL_KEY', u'|scChars', u'|scSet', u'|scFX', ('long', 'short'), u'ABCD_0140|scFX'])
#we see ALL_KEY is double! and that's exactly what I need returned as long name
#THIS IS THE PART THAT I CAN'T GET WORKING, CHECK IN THE LIST WHICH VALUES ARE DOUBLE IN THE LONGNAME AND RETURN THE SHORTNAME LIST.
THE WHOLE DICTIONARY IS STILL COMPLETE AS
seen = set()
uniq = []
for x in dblList2:
if x[0].split('|')[-1:][0] not in seen:
uniq.append(x.split('|')[-1:][0])
seen.add(x.split('|')[-1:][0])
thanks for your help.

I'm going to take a stab with this. If this isn't what you want let me know why.
If I have a scene with a hierarchy like this:
group1
nurbsCircle1
group2
nurbsCircle2
group3
nurbsCircle1
I can run this (adjust ls() if you need it for selection or whatnot):
conflictObjs = {}
objs = cmds.ls(shortNames = True, transforms = True)
for obj in objs:
if len( obj.split('|') ) > 1:
conflictObjs[obj] = obj.split('|')[-1]
And the output of conflictObjs will be:
# Dictionary of objects with non-unique short names
# {<long name>:<short name>}
{u'group1|nurbsCircle1': u'nurbsCircle1', u'group3|nurbsCircle1': u'nurbsCircle1'}
Showing me what objects don't have unique short names.

This will give you a list of all the lights which have duplicate short names, grouped by what the duplicated name is and including the full path of the duplicated objects:
def clashes_by_type(*types):
long_names = cmds.ls(type = types, l=True) or []
# get the parents from the lights, not using ls -type transform
long_names = set(cmds.listRelatives(*long_names, p=True, f=True) or [])
short_names = set([i.rpartition("|")[-1] for i in long_names])
short_dict = dict()
for sn in short_names:
short_dict[sn] = [i for i in long_names if i.endswith("|"+ sn)]
clashes = dict((k,v) for k, v in short_dict.items() if len(v) > 1)
return clashes
clashes_by_type('directionalLight', 'ambientLight')The main points to note:
work down from long names. short names are inherently unreliable!
when deriving the short names, include the last pipe so you don't get accidental overlaps of common names
short_names will always be a list of lists since it's created by a comprehension
once you have a dict of (name, [objects with that shortname]) it's easy to get clashes by looking for values longer than 1

Related

Iterate over Python list with clear code - rewriting functions

I've followed a tutorial to write a Flask REST API and have a special request about a Python code.
The offered code is following:
# data list is where my objects are stored
def put_one(name):
list_by_id = [list for list in data_list if list['name'] == name]
list_by_id[0]['name'] = [new_name]
print({'list_by_id' : list_by_id[0]})
It works, which is nice, and even though I understand what line 2 is doing, I would like to rewrite it in a way that it's clear how the function iterates over the different lists. I already have an approach but it returns Key Error: 0
def put(name):
list_by_id = []
list = []
for list in data_list:
if(list['name'] == name):
list_by_id = list
list_by_id[0]['name'] = request.json['name']
return jsonify({'list_by_id' : list_by_id[0]})
My goal with this is also to be able to put other elements, that don't necessarily have the type 'name'. If I get to rewrite the function in an other way I'll be more likely to adapt it to my needs.
I've looked for tools to convert one way of coding into the other and answers in forums before coming here and couldn't find it.
It may not be beatiful code, but it gets the job done:
def put(value):
for i in range(len(data_list)):
key_list = list(data_list[i].keys())
if data_list[i][key_list[0]] == value:
print(f"old value: {key_list[0], data_list[i][key_list[0]]}")
data_list[i][key_list[0]] = request.json[test_key]
print(f"new value: {key_list[0], data_list[i][key_list[0]]}")
break
Now it doesn't matter what the key value is, with this iteration the method will only change the value when it finds in the data_list. Before the code breaked at every iteration cause the keys were different and they played a role.

Regular expressions matching words which contain the pattern but also the pattern plus something else

I have the following problem:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
I need to find which elements of list2 are in list1. In actual fact the elements of list1 correspond to a numerical value which I need to obtain then change. The problem is that 'xyz2' contains 'xyz' and therefore matches also with a regular expression.
My code so far (where 'data' is a python dictionary and 'specie_name_and_initial_values' is a list of lists where each sublist contains two elements, the first being specie name and the second being a numerical value that goes with it):
all_keys = list(data.keys())
for i in range(len(all_keys)):
if all_keys[i]!='Time':
#print all_keys[i]
pattern = re.compile(all_keys[i])
for j in range(len(specie_name_and_initial_values)):
print re.findall(pattern,specie_name_and_initial_values[j][0])
Variations of the regular expression I have tried include:
pattern = re.compile('^'+all_keys[i]+'$')
pattern = re.compile('^'+all_keys[i])
pattern = re.compile(all_keys[i]+'$')
And I've also tried using 'in' as a qualifier (i.e. within a for loop)
Any help would be greatly appreciated. Thanks
Ciaran
----------EDIT------------
To clarify. My current code is below. its used within a class/method like structure.
def calculate_relative_data_based_on_initial_values(self,copasi_file,xlsx_data_file,data_type='fold_change',time='seconds'):
copasi_tool = MineParamEstTools()
data=pandas.io.excel.read_excel(xlsx_data_file,header=0)
#uses custom class and method to get the list of lists from a file
specie_name_and_initial_values = copasi_tool.get_copasi_initial_values(copasi_file)
if time=='minutes':
data['Time']=data['Time']*60
elif time=='hour':
data['Time']=data['Time']*3600
elif time=='seconds':
print 'Time is already in seconds.'
else:
print 'Not a valid time unit'
all_keys = list(data.keys())
species=[]
for i in range(len(specie_name_and_initial_values)):
species.append(specie_name_and_initial_values[i][0])
for i in range(len(all_keys)):
for j in range(len(specie_name_and_initial_values)):
if all_keys[i] in species[j]:
print all_keys[i]
The table returned from pandas is accessed like a dictionary. I need to go to my data table, extract the headers (i.e. the all_keys bit), then look up the name of the header in the specie_name_and_initial_values variable and obtain the corresponding value (the second element within the specie_name_and_initial_value variable). After this, I multiply all values of my data table by the value obtained for each of the matched elements.
I'm most likely over complicating this. Do you have a better solution?
thanks
----------edit 2 ---------------
Okay, below are my variables
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
species = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])
You don't need a regex to find common elements, set.intersection will find all elements in list2 that are also in list1:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
print(set(list2).intersection(list1))
set(['xyz'])
Also if you wanted to compare 'xyz' to 'xyz2' you would use == not in and then it would correctly return False.
You can also rewrite your own code a lot more succinctly, :
for key in data:
if key != 'Time':
pattern = re.compile(val)
for name, _ in specie_name_and_initial_values:
print re.findall(pattern, name)
Based on your edit you have somehow managed to turn lists into strings, one option is to strip the []:
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
specie_name_and_initial_values = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])
specie_name_and_initial_values = set(s.strip("[]") for s in specie_name_and_initial_values)
print(all_keys.intersection(specie_name_and_initial_values))
Which outputs:
set([u'Cyp26_G_R1', u'Cyp26_G_rep1'])
FYI, if you had lists inside the set you would have gotten an error as lists are mutable so are not hashable.

Constantly getting IndexError and am unsure why in Python

I am new to python and really programming in general and am learning python through a website called rosalind.info, which is a website that aims to teach through problem solving.
Here is the problem, wherein you're asked to calculate the percentage of guanine and thymine to the string of DNA given to for each ID, then return the ID of the sample with the greatest percentage.
I'm working on the sample problem on the page and am experiencing some difficulty. I know my code is probably really inefficient and cumbersome but I take it that's to be expected for those who are new to programming.
Anyway, here is my code.
gc = open("rosalind_gcsamp.txt","r")
biz = gc.readlines()
i = 0
gcc = 0
d = {}
for i in xrange(biz.__len__()):
if biz[i].startswith(">"):
biz[i] = biz[i].replace("\n","")
biz[i+1] = biz[i+1].replace("\n","") + biz[i+2].replace("\n","")
del biz[i+2]
What I'm trying to accomplish here is, given input such as this:
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
Break what's given into a list based on the lines and concatenate the two lines of DNA like so:
['>Rosalind_6404', 'CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG', 'TCCCACTAATAATTCTGAGG\n']
And delete the entry two indices after the ID, which is >Rosalind. What I do with it later I still need to figure out.
However, I keep getting an index error and can't, for the life of me, figure out why. I'm sure it's a trivial reason, I just need some help.
I've even attempted the following to limited success:
for i in xrange(biz.__len__()):
if biz[i].startswith(">"):
biz[i] = biz[i].replace("\n","")
biz[i+1] = biz[i+1].replace("\n","") + biz[i+2].replace("\n","")
elif biz[i].startswith("A" or "C" or "G" or "T") and biz[i+1].startswith(">"):
del biz[i]
which still gives me an index error but at least gives me the biz value I want.
Thanks in advance.
It is very easy do with itertools.groupby using lines that start with > as the keys and as the delimiters:
from itertools import groupby
with open("rosalind_gcsamp.txt","r") as gc:
# group elements using lines that start with ">" as the delimiter
groups = groupby(gc, key=lambda x: not x.startswith(">"))
d = {}
for k,v in groups:
# if k is False we a non match to our not x.startswith(">")
# so use the value v as the key and call next on the grouper object
# to get the next value
if not k:
key, val = list(v)[0].rstrip(), "".join(map(str.rstrip,next(groups)[1],""))
d[key] = val
print(d)
{'>Rosalind_0808': 'CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT', '>Rosalind_5959': 'CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACGC', '>Rosalind_6404': 'CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG'}
If you need order use a collections.OrderedDict in place of d.
You are looping over the length of biz. So in your last iteration biz[i+1] and biz[i+2] don't exist. There is no item after the last.

How do you get back tuple or 2 lists with key and value matching order of reg pattern group names?

I'm trying to create repaired path using 2 dicts created using groupdict() from re.compile
The idea is the swap out values from the wrong path with equally named values of the correct dict.
However, due to the fact they are not in the captured group order, I can't rebuild the resulting string as a correct path as the values are not in order that is required for path.
I hope that makes sense, I've only been using python for a couple of months, so I may be missing the obvious.
# for k, v in pat_list.iteritems():
# pat = re.compile(v)
# m = pat.match(Path)
# if m:
# mgd = m.groups(0)
# pp (mgd)
this gives correct value order, and groupdict() creates the right k,v pair, but in wrong order.
You could perhaps use something a bit like that:
pat = re.compile(r"(?P<FULL>(?P<to_ext>(?:(?P<path_file_type>(?P<path_episode>(?P<path_client>[A-Z]:[\\/](?P<client_name>[a-zA-z0-1]*))[\\/](?P<episode_format>[a-zA-z0-9]*))[\\/](?P<root_folder>[a-zA-Z0-9]*)[\\/])(?P<file_type>[a-zA-Z0-9]*)[\\/](?P<path_folder>[a-zA-Z0-9]*[_,\-]\d*[_-]?\d*)[\\/](?P<base_name>(?P<episode>[a-zA-Z0-9]*)(?P<scene_split>[_,\-])(?P<scene>\d*)(?P<shot_split>[_-])(?P<shot>\d*)(?P<version_split>[_,\-a-zA-Z]*)(?P<version>[0-9]*))))[\.](?P<ext>[a-zA-Z0-9]*))")
s = r"T:\Grimm\Grimm_EPS321\Comps\Fusion\G321_08_010\G321_08_010_v02.comp"
mat = pat.match(s)
result = []
for i in range(1, pat.groups):
name = list(pat.groupindex.keys())[list(pat.groupindex.values()).index(i)]
cap = res.group(i)
result.append([name, cap])
That will give you a list of lists, the smaller lists having the capture group as first item, and the capture group as second item.
Or if you want 2 lists, you can make something like:
names = []
captures = []
for i in range(1, pat.groups):
name = list(pat.groupindex.keys())[list(pat.groupindex.values()).index(i)]
cap = res.group(i)
names.append(name)
captures.append(cap)
Getting key from value in a dict obtained from this answer

Python dictionary key error.. big mess of dictionaries in dictionaries in lists

This is kind of convoluted, so if I'm missing out on an easy construct for this, please let me know :)
I'm analysing the results of some matching experiments. At the end game, I want to be able to query things such as experiments[0]["cat"]["cat"], which yields the number of times "cat" was matched against "cat". Conversely, I could do experiments[0]["cat"]["dog"], when the first query was a cat and the match attempt was a dog.
The following is my code to populate this structure:
# initializing the first layer, a list of dictionaries.
experiments = []
for assignment in assignments:
match_sums = {}
experiments.append(match_sums)
for i in xrange(len(classes)):
for experiment in xrange(len(experiments)):
# experiments[experiment][classes[i]] should hold a dictionary,
# where the keys are the things that were matched against classes[i],
# and the value is the number of times this occurred.
experiments[experiment][classes[i]] = collections.defaultdict(dict)
# matches[experiment][i] is an integer for what the i'th match was in an experiment.
# classes[j] for some integer j is the string name of the i'th match. could be "dog" or "cat".
experiments[experiment][classes[i]][classes[matches[experiment][i]]] += 1
total_class_sums[classes[i]] = total_class_sums.get(classes[i], 0) + 1
print experiments[0]["cat"]["cat"]
exit()
So clearly this is a bit convoluted. And I'm getting a value of "1" for the last match, rather than a full dictionary at experiments[0]["cat"]. Have I approached this wrong? What could the bug here be? Sorry for the craziness and thanks for any possible help!
Two points:
Dictionary keys can be tuples; and
If you're counting things, use collections.Counter. (You can use defaultdict(int), but Counter is more useful.)
So, instead of
experiments[experiment][classes[i]][classes[matches[experiment][i]]] += 1
write
experiments = Counter()
...
experiments[experiment, classes[i], classes[matches[experiment][i]]] += 1
I just trying to guess your needs, so i tried to change order of your dimensions.
for className, classIdx in enumerate(classes):
experiment = collections.defaultdict(list)
experiments[className] = experiment
for assignment,assignmentIdx in enumerate(assignments):
counterpart = classes[matches[assignmentIdx][classIdx]]
experiment[counterpart].append((assignment,assignmentIdx))
print(len(experiments["cat"]["cat"]), len(experiments["cat"]))

Categories

Resources