Sum values in a dict of sets

Sum values in a dict of sets - python

I have what might be a simple task and I tried several solutions but can't seem to figure it out.
I have a dict of sets containing gene names and corresponding positions as sets like:
gene_nr_snp = {'gene1: {3,9}, gene2: {2,3,1}, gene3: {1}}
I want to return a dict with the gene name and the corresponding summed value.
I tried the following:
gene_values = {}
for gene, snp in gene_nr_snp.items():
for i in snp: # iterate the values in each set
snp_total = 0
snp_total += i
gene_values[gene].add(snp_total)
This is returning the same set of values

You can use a dict comprehension and the sum() function:
gene_values = {gene: sum(snp) for gene, snp in gene_nr_snp.items()}
Your attempt fails because you set the snp_total variable to 0 for every value in snp, thus failing to sum anything. You then seem to treat gene_values[gene] as a set but the dictionary starts empty, so you'll get a KeyError. A working version would be:
gene_values = {}
for gene, snp in gene_nr_snp.items():
snp_total = 0
for i in snp: # iterate the values in each set
snp_total += i
gene_values[gene] = snp_total
but the sum() function makes the inner loop rather more verbose than needed; the whole loop body could be replaced by gene_values[gene] = sum(snp).

Related

dictionary being replaced and I am not sure why it is happening?

I have some code which is something along the lines of
storage = {}
for index, n in enumerate(dates):
if n in specific_dates:
for i in a_list:
my_dict[i] = {}
my_dict[i]["somthing"] = value
my_dict[i]["somthing2"] = value_2
else:
#print(storage[dates[index - 1]["my_dict"][i]["somthing"])
for i in a_list:
my_dict[i] = {}
my_dict[i][somthing] = different_value - storage[dates[index - 1]["my_dict"][i]["somthing"]
my_dict[i]["somthing2"] = different_value_2
storage[n]["my_dict"] = my_dict
The first pass will initiate the code in if n in specific_dates: the second pass goes to for i in a_list:
Essentially the code is getting a value set on specific dates and this value is then used for nonspecific dates that occur after the specific date until the next specific date overrides that value. However, at every date, i save a dictionary of values within a master dictionary called storage.
I found the problem which is when I print my_dict on the second pass my_dict[i] is literally an empty dictionary whereas prior to that loop it was filled. Where I have put the commented-out print line it would print value. I have fixed this by changing storage[n]["my_dict"] = my_dict to storage[n]["my_dict"] = my_dict.copy() and can now access value.
However, I do not really understand why this didnt work how I expected in the first place as I thought by assigning my_dict to storage it was creating new memory.
I was hoping someone could explain why this is happening and why storage[dates[index - 1]["my_dict"][i]["somthing"] doesn't create a new space in memory if that is indeed what is happening.

python comparing two lists and retaining second list index

I am taking a user input of "components" splitting it into a list and comparing those components to a list of available components generated from column A of a google sheet. Then what I am attempting to do is return the cell value from column G corresponding the Column A index. Then repeat this for all input values.
So far I am getting the first value just fine but I'm obviously missing something to get it to cycle back and to the remaining user input components. I tried some stuff using itertools but wasn't able to get the results I wanted. I have a feeling I will facepalm when I discover the solution to this through here or on my own.
mix = select.split(',') # sets user input to string and sparates elements
ws = s.worksheet("Details") # opens table in google sheet
c_list = ws.col_values(1) # sets column A to a list
modifier = [""] * len(mix) # sets size of list based on user input
list = str(c_list).lower()
for i in range(len(mix)):
if str(mix[i]).lower() in str(c_list).lower():
for j in range(len(c_list)):
if str(mix[i]).lower() == str(c_list[j]).lower():
modifier[i] = ws.cell(j+1,7).value # get value of cell from Column G corresponding to Column A for component name
print(mix)
print(modifier)

You are over complicating the code by writing C like code.
I have changed all the loops you had to a simpler single loop, I have also left comments above each code line to explain what it does.
# Here we use .lower() to lower case all the values in select
# before splitting them and adding them to the list "mix"
mix = select.lower().split(",")
ws = s.worksheet("Details")
# Here we used a list comprehension to create a list of the "column A"
# values but all in lower case
c_list = [cell.lower() for cell in ws.col_values(1)]
modifier = [""] * len(mix)
# Here we loop through every item in mix, but also keep a count of iterations
# we have made, which we will use later to add the "column G" element to the
# corresponding location in the list "modifier"
for i, value in enumerate(mix):
# Here we check if the value exists in the c_list
if value in c_list:
# If we find the value in the c_list, we get the index of the value in c_list
index = c_list.index(value)
# Here we add the value of column G that has an index of "index + 1" to
# the modifier list at the same location of the value in list "mix"
modifier[i] = ws.cell(index + 1, 7).value

How to add the value pairs to excel without including the brackets from a python dictionary?

I want to append the key value pairs in my python dictionary without including the brackets... I'm not really sure how to do that.
I've tried looking at similar questions but it isn't working for me.
#this creates a new workbook call difference
file = xlrd.open_workbook('/Users/im/Documents/Exception_Cases/Orders.xls')
wb = xl_copy(file)
Sheet1 = wb.add_sheet('differences')
#this creates header for two columns
Sheet1.write(0,0,"S_Numbers")
Sheet1.write(0,1," Values")
#this would store all the of Key, value pair of my dictionary into their respective SO_Numbers, Booking Values column
print(len(diff_so_keyval))
rowplacement = 1
while rowplacement < len(diff_so_keyval):
for k, v in diff_so_keyval.items():
Sheet1.write(rowplacement,0,k)
Sheet1.write(rowplacement,1,str(v))
rowplacement = rowplacement + 1
#This is what I have in my diff_so_keyval dictionary
diff_so_keyval = {104370541:[31203.7]
106813775:[187500.0]
106842625:[60349.8]
106843037:[492410.5]
106918995:[7501.25]
106919025:[427090.0]
106925184:[30676.4]
106941476:[203.58]
106941482:[203.58]
106941514:[407.16]
106962317:[61396.36]}
#this is the output
S_numbers Values
104370541 [31203.7]
106813775 [187500.0]
106842625 [60349.8]
I want the values without the brackets

Looks to me like the 'values' in the dictionary are actually single-element lists.
If you simply extract the 0th element out of the list, then that should work for 'removing the brackets':
Sheet1.write(rowplacement, 1, v[0])

Why isn't all the data being stored?

I have a dictionary carrying key:value however it only saves the last iteration and discards the previous entries where is it being reset ?? This is the output from the ctr of iterations and the length of the dictionary
Return the complete Term and DocID Ref.
LENGTH:6960
CTR:88699
My code:
class IndexData:
def getTermDocIDCollection(self):
...............
for term in terms:
#TermDocIDCollection[term] = sourceFile['newid']
TermDocIDCollection[term] = []
TermDocIDCollection[term].append(sourceFile['newid'])
return TermDocIDCollection

The piece of code you've commented out does the following:
Sets a value to the key (removing whatever was there before, if it existed)
Sets a new value to the key (an empty list)
Appends the value set in step 1 to the new empty list
Sadly, it would do the same each iteration, so you'd end up with [last value] assigned to the key. The new code (with update) does something similar. In the old days you'd do this:
if term in TermDocIDCollection:
TermDocIDCollection[term].append(sourceFile['newid'])
else:
TermDocIDCollection[term] = [sourceFile['newid']]
or a variation of the theme using try-except. After collections was added you can do this instead:
from collections import defaultdict
# ... code...
TermDocIDCollection = defaultdict(list)
and you'd update it like this:
TermDocIDCollection[term].append(sourceFile['newid'])
no need to check if term exists in the dictionary. If it doesn't, the defaultdict type will first call to the constructor you passed (list) to create the initial value for the key

How do you get back tuple or 2 lists with key and value matching order of reg pattern group names?

I'm trying to create repaired path using 2 dicts created using groupdict() from re.compile
The idea is the swap out values from the wrong path with equally named values of the correct dict.
However, due to the fact they are not in the captured group order, I can't rebuild the resulting string as a correct path as the values are not in order that is required for path.
I hope that makes sense, I've only been using python for a couple of months, so I may be missing the obvious.
# for k, v in pat_list.iteritems():
# pat = re.compile(v)
# m = pat.match(Path)
# if m:
# mgd = m.groups(0)
# pp (mgd)
this gives correct value order, and groupdict() creates the right k,v pair, but in wrong order.

You could perhaps use something a bit like that:
pat = re.compile(r"(?P<FULL>(?P<to_ext>(?:(?P<path_file_type>(?P<path_episode>(?P<path_client>[A-Z]:[\\/](?P<client_name>[a-zA-z0-1]*))[\\/](?P<episode_format>[a-zA-z0-9]*))[\\/](?P<root_folder>[a-zA-Z0-9]*)[\\/])(?P<file_type>[a-zA-Z0-9]*)[\\/](?P<path_folder>[a-zA-Z0-9]*[_,\-]\d*[_-]?\d*)[\\/](?P<base_name>(?P<episode>[a-zA-Z0-9]*)(?P<scene_split>[_,\-])(?P<scene>\d*)(?P<shot_split>[_-])(?P<shot>\d*)(?P<version_split>[_,\-a-zA-Z]*)(?P<version>[0-9]*))))[\.](?P<ext>[a-zA-Z0-9]*))")
s = r"T:\Grimm\Grimm_EPS321\Comps\Fusion\G321_08_010\G321_08_010_v02.comp"
mat = pat.match(s)
result = []
for i in range(1, pat.groups):
name = list(pat.groupindex.keys())[list(pat.groupindex.values()).index(i)]
cap = res.group(i)
result.append([name, cap])
That will give you a list of lists, the smaller lists having the capture group as first item, and the capture group as second item.
Or if you want 2 lists, you can make something like:
names = []
captures = []
for i in range(1, pat.groups):
name = list(pat.groupindex.keys())[list(pat.groupindex.values()).index(i)]
cap = res.group(i)
names.append(name)
captures.append(cap)
Getting key from value in a dict obtained from this answer

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sum values in a dict of sets - python

Related

dictionary being replaced and I am not sure why it is happening?

python comparing two lists and retaining second list index

How to add the value pairs to excel without including the brackets from a python dictionary?

Why isn't all the data being stored?

How do you get back tuple or 2 lists with key and value matching order of reg pattern group names?

Categories

Resources