Update key name in a dictionary python - python

I have the following fasta file in a dictionary, in the following shape:
from Bio import SeqIO
alignment_file = '/Users/dissertation/Desktop/Alignment 4 sequences.fasta'
seq_dict = {rec.id : rec.seq for rec in SeqIO.parse(alignment_file, "fasta")}
Which gives me the following input:
{'NC_000962.3': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
'NC_008596.1': Seq('------------------------------------------------------...ccg'),
'NC_009525.1': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
'NC_002945.4': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN')}
The only issue here is that I would like to replace the key names for other than easier to identify when comparing the sequences to other parts of my code. So I have tried the following:
name_list = ['Tuberculosis', 'Smegmatis', 'H37Ra', 'Bovis']
for key in seq_dict:
for name in name_list:
seq_dict[name[x]]= seq_dict[key]
seq_dict
However I get the following error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/var/folders/pq/ghtv3wj159j681vy0ny3tz9w0000gp/T/ipykernel_47822/1486954832.py in <module>
9
---> 10 for key in seq_dict:
11 for name in name_list:
12 seq_dict[name[x]]= seq_dict[key]
RuntimeError: dictionary changed size during iteration
I understand that there's not an easy straight forward way of updating key names values in a dictionary, but I don't understand the error. Is there a way of doing something similar?
I have also tried this:
seq_dict.update({'NC_000962.3': 'Tuberculosis', 'NC_008596.1': 'Smegmatis', 'NC_009525.1': 'H37Ra', 'NC_002945.4': 'Bovis'})
But this gives me the following output:
{'NC_000962.3': 'Tuberculosis',
'NC_008596.1': 'Smegmatis',
'NC_009525.1': 'H37Ra',
'NC_002945.4': 'Bovis'}
My desire output would look like this:
{'Tuberculosis': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
'Smegmatis': Seq('------------------------------------------------------...ccg'),
'H37Ra': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
'Bovis': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN')}
Does anybody have an idea on how to update these?

Construct a new dictionary and then assign it to seq_dict in a single operation, rather than mutating seq_dict as you're in the process of iterating over it. I think this is what you're aiming for:
seq_dict = dict(zip(name_list, seq_dict.values()))
although I'd personally want to have an explicit mapping from sequence IDs to names rather than relying on the ordering being the same.

Related

iter through the dict store the key value and iter again to look for similar word in dict and delete form dict eg(Light1on,Light1off) in Python

[I had problem on how to iter through dict to find a pair of similar words and output it then the delete from dict]
My intention is to generate a random output label then store it into dictionary then iter through the dictionary and store the first key in the list or some sort then iter through the dictionary to search for similar key eg Light1on and Light1off has Light1 in it and get the value for both of the key to store into a table in its respective columns.
such as
Dict = {Light1on,Light2on,Light1off...}
store value equal to Light1on the iter through the dictionary to get eg Light1 off then store its Light1on:value1 and Light1off:value2 into a table or DF with columns name: On:value1 off:value2
As I dont know how to insert the code as code i can only provide the image sry for the trouble,its my first time asking question here thx.
from collections import defaultdict
import difflib, random
olist = []
input = 10
olist1 = ['Light1on','Light2on','Fan1on','Kettle1on','Heater1on']
olist2 = ['Light2off','Kettle1off','Light1off','Fan1off','Heater1off']
events = list(range(input + 1))
for i in range(len(olist1)):
output1 = random.choice(olist1)
print(output1,'1')
olist1.remove(output1)
output2 = random.choice(olist2)
print(output2,'2')
olist2.remove(output2)
olist.append(output1)
olist.append(output2)
print(olist,'3')
outputList = {olist[i]:events[i] for i in range(10)}
print (str(outputList),'4')
# Iterating through the keys finding a pair match
for s in range(5):
for i in outputList:
if i == list(outputList)[0]:
skeys = difflib.get_close_matches(i, outputList, n=2, cutoff=0.75)
print(skeys,'5')
del outputList[skeys]
# Modified Dictionary
difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animaltion'])
['animal']
Updated: I was unable to delete the pair of similar from the list(Dictionary) after founding par in the dictionary
You're probably getting an error about a dictionary changing size during iteration. That's because you're deleting keys from a dictionary you're iterating over, and Python doesn't like that:
d = {1:2, 3:4}
for i in d:
del d[i]
That will throw:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
To work around that, one solution is to store a list of the keys you want to delete, then delete all those keys after you've finished iterating:
keys_to_delete = []
d = {1:2, 3:4}
for i in d:
if i%2 == 1:
keys_to_delete.append(i)
for i in keys_to_delete:
del d[i]
Ta-da! Same effect, but this way avoids the error.
Also, your code above doesn't call the difflib.get_close_matches function properly. You can use print(help(difflib.get_close_matches)) to see how you are meant to call that function. You need to provide a second argument that indicates the items to which you wish to compare your first argument for possible matches.
All of that said, I have a feeling that you can accomplish your fundamental goals much more simply. If you spend a few minutes describing what you're really trying to do (this shouldn't involve any references to data types, it should just involve a description of your data and your goals), then I bet someone on this site can help you solve that problem much more simply!

Running a loop where each iteration returns a dict. How best to combine them into one?

I've found several SO posts on similar questions but I'm maybe overthinking my problem.
I'm running a loop. Each iteration returns a dict with the same keys and their own values. I'd like to combine them into a new master dict.
On each loop iteration I can save the results to a list
store_response = [] # will store the results of each iteration here
myloop:
code here...
store_response.append(iresponse.copy())
Or I can do:
store_response = {} # will store the results of each iteration here
myloop:
code here...
store_response[page_token] = iresponse # store this iteration and call it whatever string page_token currently is
So I can return either a list of dicts or dict of dicts.
My question is, how can I combine them into just one dict?
Tried several for loops but keep hitting errors e.g.:
for d in store_response:
for key, value in d.iteritems():
test[key].append(value)
Traceback (most recent call last):
File "<input>", line 2, in <module>
AttributeError: 'dict' object has no attribute 'iteritems'
Here is how the variable looks in PyCharms variables pane, currently in list form but I could make it a dict:
How can I take each dict within store response and create a single master dict?
You could try a pattern like:
from collections import defaultdict
store_response = defaultdict(list)
for _ in loop:
# Assuming the loop provides the key and value
store_response[key].append(value)
This will result in a dict with one key that collapses all values for that key as a list (in your use case since your dictionaries only have one key - this solution works for arbitrarily many keys like 'reports').
You are using Python 3, and in Python 3 iteritems has been removed use items instead.
for d in store_response:
for key, value in d.items():
test.setdefault(key, [])
test[key].append(value)

How to create a nested python dictionary with keys as strings?

Summary of issue: I'm trying to create a nested Python dictionary, with keys defined by pre-defined variables and strings. And I'm populating the dictionary from regular expressions outputs. This mostly works. But I'm getting an error because the nested dictionary - not the main one - doesn't like having the key set to a string, it wants an integer. This is confusing me. So I'd like to ask you guys how I can get a nested python dictionary with string keys.
Below I'll walk you through the steps of what I've done. What is working, and what isn't. Starting from the top:
# Regular expressions module
import re
# Read text data from a file
file = open("dt.cc", "r")
dtcc = file.read()
# Create a list of stations from regular expression matches
stations = sorted(set(re.findall(r"\n(\w+)\s", dtcc)))
The result is good, and is as something like this:
stations = ['AAAA','BBBB','CCCC','DDDD']
# Initialize a new dictionary
rows = {}
# Loop over each station in the station list, and start populating
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
The result is good, and is something like this:
rows['AAAA'] = ['AAAA 0.1132 0.32 P',...]
However, when I try to create a sub-dictionary with a string key:
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
I get the following error.
"TypeError: list indices must be integers, not str"
It doesn't seem to like that I'm specifying the second dictionary key as "dt". If I give it a number instead, it works just fine. But then my dictionary key name is a number, which isn't very descriptive.
Any thoughts on how to get this working?
The issue is that by doing
rows[station] = re.findall(...)
You are creating a dictionary with the station names as keys and the return value of re.findall method as values, which happen to be lists. So by calling them again by
rows[station]["dt"] = re.findall(...)
on the LHS row[station] is a list that is indexed by integers, which is what the TypeError is complaining about. You could do rows[station][0] for example, you would get the first match from the regex. You said you want a nested dictionary. You could do
rows[station] = dict()
rows[station]["dt"] = re.findall(...)
To make it a bit nicer, a data structure that you could use instead is a defaultdict from the collections module.
The defaultdict is a dictionary that accepts a default type as a type for its values. You enter the type constructor as its argument. For example dictlist = defaultdict(list) defines a dictionary that has as values lists! Then immediately doing dictlist[key].append(item1) is legal as the list is automatically created when setting the key.
In your case you could do
from collections import defaultdict
rows = defaultdict(dict)
for station in stations:
rows[station]["bulk"] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
Where you have to assign the first regex result to a new key, "bulk" here but you can call it whatever you like. Hope this helps.

Python: My directory is not giving individual value output

I have created a code that imports data via .xlrd in two directories in Python.
Code:
import xlrd
#category.clear()
#term.clear()
book = xlrd.open_workbook("C:\Users\Koen\Google Drive\etc...etc..")
sheet = book.sheet_by_index(0)
num_rows = sheet.nrows
for i in range(1,num_rows,1):
category = {i:( sheet.cell_value(i, 0))}
term = {i:( sheet.cell_value(i, 1))}
When I open one of the two directories (category or term), it will present me with a list of values.
print(category[i])
So far, so good.
However, when I try to open an individual value
print(category["2"])
, it will consistently give me an error>>
Traceback (most recent call last):
File "testfile", line 15, in <module>
print(category["2"])
KeyError: '2'
The key's are indeed numbered (as determined by i).
I've already tried to []{}""'', etc etc. Nothing works.
As I need those values later on in the code, I would like to know what the cause of the key-error is.
Thanks in advance for taking a look!
First off, you are reassigning category and term in every iteration of the for loop, this way the dictionary will always have one key at each iteration, finishing with the last index, so if our sheet have 100 lines, the dict will only have the key 99. To overcome this, you need to define the dictionary outside the loop and assign the keys inside the loop, like following:
category = {}
term = {}
for i in range(1, num_rows, 1):
category[i] = (sheet.cell_value(i, 0))
term[i] = (sheet.cell_value(i, 1))
And second, the way you are defining the keys using the for i in range(1, num_rows, 1):, they are integers, so you have to access the dictionary keys like so category[1]. To use string keys you need to cast them with category[str(i)] for example.
I hope have clarifying the problem.

Error while iterating list to put those into a key value pairs in python

I am new to python . I have a python list old_ecim_mims_list like below :-
['ReqSyncPort_v2_5_0', 'ECIM_SwM_v2_1_0_2_2', 'ECIM_SwM_v3_0_0_2_3', 'ResPowerDistribution_v1_0_0', 'ECIM_SwM_v4_2_0_3_2', 'ResPowerDistribution_v3_4_1', 'LratBb_v1_8025_0']
Now my requirement is here to iterate it and put it into a map like below key value pairs structure :-
ReqSyncPort=ReqSyncPort_v2_5_0
ECIM_SwM=ECIM_SwM_v2_1_0_2_2,ECIM_SwM_v3_0_0_2_3,ECIM_SwM_v4_2_0_3_2
ResPowerDistribution=ResPowerDistribution_v1_0_0,ResPowerDistribution_v3_4_1
LratBb=LratBb_v1_8025_0
I have done a sample program for this but I am getting error while executing :-
old_ecim_mims_map={} ;
for index , item in enumerate(old_ecim_mims_list) :
print(index , item ) ;
split_str=item.split("_v");
#print(split_str[0]);
if split_str[0] in old_ecim_mims_map :
new_prop_map[split_str[0]].append(item);
#old_ecim_mims_map.update({split_str[0]:item }) ;
else :
old_ecim_mims_map[split_str[0]]=item ;
Error :-
Traceback (most recent call last):
File "F:/DeltaProject/com/dash/abinash/DeltaOperation/Createdelta.py", line 50, in <module>
new_prop_map[split_str[0]].append(item);
AttributeError: 'str' object has no attribute 'append'
Suggest me where I am doing wrong .Searched lots of concepts , but those did not help me that much .Any help will be appreciated .
Your code fails because you add a string as value in dictionary (map), instead of enclosing it in [] to make an array in last line (old_ecim_mims_map[split_str[0]]=item). Next time you come across same key, you try to append to string, not to array.
What you have to do (and managed to do) is first check whether a certain key is already in a map. If it is, then you can append to the list old_ecim_mims_dict[key]. If there is no such key, a KeyError will be raised and then you have to create new list and put el inside it.
old_ecim_mims_list = ['ReqSyncPort_v2_5_0', 'ECIM_SwM_v2_1_0_2_2', 'ECIM_SwM_v3_0_0_2_3', 'ResPowerDistribution_v1_0_0', 'ECIM_SwM_v4_2_0_3_2', 'ResPowerDistribution_v3_4_1', 'LratBb_v1_8025_0']
old_ecim_mims_map = {}
for el in old_ecim_mims_list:
key, _ = el.split('_v')
try:
old_ecim_mims_map[key].append(el)
except KeyError:
old_ecim_mims_map[key] = [el]
This code is much cleaner. If you want to rewrite your code, just change last line to
old_ecim_mims_map[split_str[0]]=[item]
Edit: As suggested in the comments, although I do not prefer this, it can be done by checking whether key is in map:
old_ecim_mims_list = ['ReqSyncPort_v2_5_0', 'ECIM_SwM_v2_1_0_2_2', 'ECIM_SwM_v3_0_0_2_3', 'ResPowerDistribution_v1_0_0', 'ECIM_SwM_v4_2_0_3_2', 'ResPowerDistribution_v3_4_1', 'LratBb_v1_8025_0']
old_ecim_mims_map = {}
for el in old_ecim_mims_list:
key, _ = el.split('_v')
if key in old_ecim_mims_map: # The same as if key in old_ecim_mims_map.keys()
old_ecim_mims_map[key].append(el)
else:
old_ecim_mims_map[key] = [el]

Categories

Resources