Editing dictionary key names based on a specific value - python

After converting to a string represented dictionary in Python I am looking to edit some key names based on a particular value. Here's an example of the dictionary in string format:
s = '{"some.info": "ABC","more.info": "DEF","device.0.Id":"12345678", "device.0.Type":"DEVICE-X", ' \
'"device.0.Status":"ACTIVE", "device.1.Id":"123EFEF8", "device.1.Type":"DEVICE-Y", "device.1.Status":"NOT FOUND", ' \
'"device.2.Id":"ABCD4328", "device.2.Type":"DEVICE-Z", "device.2.Status":"SLEEPING", "other.info":"Hello", ' \
'"additional.info":"Hi Again",}'
I have a working method below, which converts the string into a dictionary, scans for key entries containing '.Type' and drops into a list a tuple of the key section to replace and what to replace it with. However the whole process seems too inefficient, is there a better way to do this?
I have key value pairs of interest in my dictionary like this:
'device.0.Type':'DEVICE-X'
'device.1.Type':'DEVICE-Y'
'device.2.Type':'DEVICE-Z'
What I am looking to do is change all Key name instances of device.X to the value given for key 'device.X.Type'.
For example:
'device.0.Id':'12345678', 'device.0.Type':'DEVICE-X', 'device.0.Status':'ACTIVE',
'device.1.Id':'123EFEF8', 'device.1.Type':'DEVICE-Y', 'device.1.Status':'NOT FOUND', etc
would become:
'DEVICE-X.Id':'12345678', 'DEVICE-X.Type':'DEVICE-X', 'DEVICE-X.Status':'ACTIVE',
'DEVICE-Y.Id':'123EFEF8', 'DEVICE-Y.Type':'DEVICE-Y', 'DEVICE-Y.Status':'NOT FOUND', etc
Basically I am looking to remove the ambiguity of 'device.X' with something that's easier to read based on the device type
Here's my longwinded version:
s = '{"some.info": "ABC","more.info": "DEF","device.0.Id":"12345678", "device.0.Type":"DEVICE-X", ' \
'"device.0.Status":"ACTIVE", "device.1.Id":"123EFEF8", "device.1.Type":"DEVICE-Y", "device.1.Status":"NOT FOUND", ' \
'"device.2.Id":"ABCD4328", "device.2.Type":"DEVICE-Z", "device.2.Status":"SLEEPING", "other.info":"Hello", ' \
'"additional.info":"Hi Again",}'
d = eval(s)
devs = []
for k, v in d.items():
if '.Type' in k:
devs.append((k.split('.Type')[0], v))
for item in devs:
if item[0] in s:
s = s.replace(item[0], item[1])
s = eval(s)
print(s)

You can solve this by loading the data as a json, then iterating over it:
import json
s = '{"some.info": "ABC","more.info": "DEF","device.0.Id":"12345678", "device.0.Type":"DEVICE-X", "device.0.Status":"ACTIVE", "device.1.Id":"123EFEF8", "device.1.Type":"DEVICE-Y", "device.1.Status":"NOT FOUND", "device.2.Id":"ABCD4328", "device.2.Type":"DEVICE-Z", "device.2.Status":"SLEEPING", "other.info":"Hello", "additional.info":"Hi Again"}'
# load the string to a dictionary
devices_data = json.loads(s)
device_names = {}
for key, value in devices_data.items():
if key.endswith("Type"):
# if the key looks like a device type, store the value
device_names[key.rpartition(".")[0]] = value
renamed_device_data = {}
for key, value in devices_data.items():
x = key.rpartition(".") # split the key apart
if x[0] in device_names: # check if the first part matches a device name
renamed_device_data[f"{device_names[x[0]]}.{x[2]}"] = value # add the new key to the renamed dictionary with the value
else:
renamed_device_data[key] = value # for non-matches, put them in as is
This could certainly be optimised, but it should work at least!

Related

"dictionary changed size during iteration" while updating a list of dictionaries

I have a list of dictionaries, which looks like this:
car_list = [
{'Toyota': '{name}/Toyota'},
{'Mazda': '{name}/Mazda'},
{'Nissan': '{name}/Nissan'}
]
Now, using a regex, I want to replace all {name}s with another string (say "car"), and update the list of dictionaries. This is the code:
regex = r'\{.+?\}'
for dic in car_list:
for key, value in dic.items():
for name in re.findall(regex, value):
value = value.replace(name, "car")
dic.update(key=value)
I know as a fact that the regex part is working. However, I get this error:
RuntimeError: dictionary changed size during iteration
What am I doing wrong?
.update(key=value) inserts a new key into the dictionary where the key is the string literal 'key' and value value (as assigned in the line above).
You should use brackets to index into the dictionary, rather than calling .update():
for dic in car_list:
for key, value in dic.items():
for name in re.findall(regex, value):
value = value.replace(name, "car")
dic[key]=value
# Prints [{'Toyota': 'car/Toyota'}, {'Mazda': 'car/Mazda'}, {'Nissan': 'car/Nissan'}]
print(car_list)
There's no need to use a regex - str has two built-in methods, format and format_map, to replace fields marked with curly brackets, like your sample code:
msg = "Hello, {location}!"
print(msg.format(location="World"))
msg2 = "{greeting}, {place}!"
params = {"greeting": "Bonjour", "place": "Birmingham"}
print(msg2.format_map(params))
Using this:
for dct in car_list:
for key, value in dct.items():
dct[key] = value.format(name="car")

How to sort a Python dictionary by a substring contained in the keys, according to the order set in a list?

I'm very new to Python and I'm stuck on a task. First I made a file containing a number of fasta files with sequence names into a dictionary, then managed to select only those I want, based on substrings included in the keys which are defined in list "flu_genes".
Now I'm trying to reorder the items in this dictionary based on the order of substrings defined in the list "flu_genes". I'm completely stuck; I found a way of reordering based on the key order in a list BUT it is not my case, as the order is defined not by the keys but by a substring within the keys.
Should also add that in this case the substring its at the end with format "_GENE", however it could be in the middle of the string with the same format, perhaps "GENE", therefore I'd rather not rely on a code to find the substring at the end of the string.
I hope this is clear enough and thanks in advance for any help!
"full_genome.fasta"
>A/influenza/1/1_NA
atgcg
>A/influenza/1/1_NP
ctgat
>A/influenza/1/1_FluB
agcta
>A/influenza/1/1_HA
tgcat
>A/influenza/1/1_FluC
agagt
>A/influenza/1/1_M
tatag
consensus = {}
flu_genes = ['_HA', '_NP', '_NA', '_M']
with open("full_genome.fasta", 'r') as myseq:
for line in myseq:
line = line.rstrip()
if line.startswith('>'):
key = line[1:]
else:
if key in consensus:
consensus[key] += line
else:
consensus[key] = line
flu_fas = {key : val for key, val in consensus.items() if any(ele in key for ele in flu_genes)}
print("Dictionary after removal of keys : " + str(flu_fas))
>>>Dictionary after removal of keys : {'>A/influenza/1/1_NA': 'atgcg', '>A/influenza/1/1_NP': 'ctgat', '>A/influenza/1/1_HA': 'tgcat', '>A/influenza/1/1_M': 'tatag'}
#reordering by keys order (not going to work!) as in: https://try2explore.com/questions/12586065
reordered_dict = {k: flu_fas[k] for k in flu_genes}
A dictionary is fundamentally unsorted, but as an implementation detail of python3 it remembers its insertion order, and you're not going to change anything later, so you can do what you're doing.
The problem is, of course, that you're not working with the actual keys. So let's just set up a list of the keys, and sort that according to your criteria. Then you can do the other thing you did, except using the actual keys.
flu_genes = ['_HA', '_NP', '_NA', '_M']
def get_gene_index(k):
for index, gene in enumerate(flu_genes):
if k.endswith(gene):
return index
raise ValueError('I thought you removed those already')
reordered_keys = sorted(flu_fas.keys(), key=get_gene_index)
reordered_dict = {k: flu_fas[k] for k in reordered_keys}
for k, v in reordered_dict.items():
print(k, v)
A/influenza/1/1_HA tgcat
A/influenza/1/1_NP ctgat
A/influenza/1/1_NA atgcg
A/influenza/1/1_M tatag
Normally, I wouldn't do an n-squared sort, but I'm assuming the lines in the data file is much larger than the number of flu_genes, making that essentially a fixed constant.
This may or may not be the best data structure for your application, but I'll leave that to code review.
It's because you are trying to reorder it with non-existent dictionary keys. Your keys are
['>A/influenza/1/1_NA', '>A/influenza/1/1_NP', '>A/influenza/1/1_HA', '>A/influenza/1/1_M']
which doesn't match the list
['_HA', '_NP', '_NA', '_M']
you first need to get transform them to make them match and since we know the pattern that it's at the end of the string starting with an underscore, we can split at underscores and get the last match.
consensus = {}
flu_genes = ['_HA', '_NP', '_NA', '_M']
with open("full_genome.fasta", 'r') as myseq:
for line in myseq:
line = line.rstrip()
if line.startswith('>'):
sequence = line
gene = line.split('_')[-1]
key = f"_{gene}"
else:
consensus[key] = {
'sequence': sequence,
'data': line
}
flu_fas = {key : val for key, val in consensus.items() if any(ele in key for ele in flu_genes)}
print("Dictionary after removal of keys : " + str(flu_fas))
reordered_dict = {k: flu_fas[k] for k in flu_genes}

How can I store this in a dictionary?

I must write a dictionary. This is my first time doing it and I can't wrap my head around it. The first 5 element should be the key to it and the rest the value.
for i in verseny:
if i not in eredmeny:
eredmeny[i] = 1
else:
eredmeny[i] += 1
YS869 CCCADCADBCBCCB this is a line from the hw. This YS869 should be the key and this CCCADCADBCBCCB should be the value.
The problem is that I can't store them in a dictionary. I'm grinding gears here but getting nowhere.
Assuming that erdemeny is your dictionary name and that verseny is the list that includes your values and keys strings. This should do it:
verseny = ['YS869 CCCADCADBCBCCB', 'CS769 CCCADCADBCBCCB', 'BS869 CCCADCADBCBCCB']
eredmeny = {}
for i in verseny:
key, value = i.split(' ')[0], i.split(' ')[1]
if key not in eredmeny.keys():
eredmeny[key] = value
else:
eredmeny[key].append(value)
I'm not really understanding the question well, but an easy way to do the task at hand would be converting the list into a string and then using split():
line = 'YS869 CCCADCADBCBCCB'
words = l.split()
d = {ls[0]: ls[1]}
print(d)
this is the basic skill in python. I hope you can refer to the existing materials. As your example, the following demonstrations are given:
line = 'YS869 CCCADCADBCBCCB'
dictionary = {}
dictionary[line[:4]] = line[5:]
print(dictionary) # {'YS86': ' CCCADCADBCBCCB'}

Two dimensional dictionary with list as a value python

I'm writing a simple parser for exercise and I have a problem with saving downloaded data to a dictionary.
data = {"":{"":[]}}
with open("Training_01.txt", "r") as open_file:
text = open_file.read()
text = text.split("\n")
for i in text:
i = i.split("/")
try:
data[i[1]] = {i[2]:[].append(i[3])}
except:
print("Can't")
This is an example of the data that I want to parse:
/a/abbey/sun_aobrvxdhumowzajn.jpg
/a/abbey/sun_apstfzmbeiwbjqvb.jpg
/a/abbey/sun_apyilcssuybumhbu.jpg
/a/abbey/sun_arrohcvipmrghrzh.jpg
/a/abbey/sun_asgeghboyugsatii.jpg
/a/airplane_cabin/sun_blczihbhbntqccux.jpg
/a/airplane_cabin/sun_ayzaayjpoknjvpds.jpg
/a/airplane_cabin/sun_afuoinkozbbhqksk.jpg
/b/butte/sun_asfnwmuzhtjrztns.jpg
/b/butte/sun_ajzkngginlffsozz.jpg
/b/butte/sun_adonkmfgywrhpakt.jpg
/c/cabin/outdoor/sun_atqvmarllxqynnks.jpg
/c/cabin/outdoor/sun_acfcobswmnoyhyfi.jpg
/c/cabin/outdoor/sun_afgjdqosvakljsmc.jpg
I want to create dictionary with "a","b","c" or any letter, as a key (I cant hard code it) with dictionary as a value that contains place where images were taken and list of images.
But when I want to read my saved data I'm getting None as a value
print(data["a"])
Output: {'auto_factory': None}
Try to use defaultdict from python stdlib. It's very convenient in situations like this:
from collections import defaultdict
data = defaultdict(lambda: defaultdict(list))
with open("Training_01.txt", "r") as open_file:
text = open_file.read()
text = text.split("\n")
for line in text:
try:
_, key, subkey, rem = line.split("/", 3)
data[key][subkey].append(rem)
except:
print("Can't")
print(data)
Explanation: the first time you access data (which is a dictionary) with a not existing key, a new entry for such a key will be created. This entry is going to be again a defaultdict, but the first try you access it with a not existing key, again a new (nested this time) entry will be created. And this entry will be a list. So, then you can safely append a new element to such a list.
UPD: Here is a way to implement the same requirement but without defaultdict:
data = {} # just a plain dict
# for ...:
data[key] = data.get(key, {}) # try to access the key, if it doesn't exist - create a new dict entry for such a key
data[key][subkey] = data[key].get(subkey, []) # same as above but for the sub key
data[key][subkey].append(rem) # finally do the job
Because data[i[1]] = {i[2]:[].append(i[3])} create a new 2nd layer dictionary everytime.
This is a possible solution. It is the cleanest solution, but it shows step by step. It creates a new dict and list if the key is not in the last layer dict. But it append value to the list if the dict has the key.
data = {"":{"":[]}}
with open("Training_01.txt", "r") as open_file:
text = open_file.read()
text = text.split("\n")
for i in text:
i = i.split("/")
key_1 = i[1]
key_2 = i[2]
value = i[3]
try:
if key_1 in data.keys(): # Whether the key i[1] is in the 1st layer of the Dict
if key_2 in data[key_1].keys(): # Whether the key i[2] is in the 2nd layer of the Dict
# Yes, Append to the list
data[key_1][key_2].append(value)
else:
# No, Creat a new list
data[key_1][key_2] = [value]
# if i[1] not in the 1st layer, creat a 2nd layer dict with i[2] as key, i[3] as value
else:
data[key_1] = {key_2:[value]}
except:
print("Can't")
print(data['a'])

why does this only write the name to the file and not scoreavg2 as well

data = {name:[scoreavg2]}
for k,v in data.items():
f.write(k + ": " + str(v))
when I run this code and enter the data it only writes the variable name but not the the score average as well.
Let's create some sample data:
>>> data = {'dog':80, 'cat':90}
Now, let's write it to file in a useful format:
>>> filehandle.writelines('%s: %s\n' % item for item in data.items())
The file will contain:
dog: 80
cat: 90
How it works
We have a dictionary with several items and we want to print them in some useful form. We do this in two steps.
First, to get the items in a dictionary, we use data.items(). This returns key, value pairs.
Second, for each key, value pair, we need to format it. Here I chose the key followed by a colon and a space followed by the value and a newline: '%s: %s\n' % item. Putting it all together looks like;
'%s: %s\n' % item for item in data.items()
This is what we write to file.
Iterating over a dictionary, by definition, iterates over the keys only. It doesn't matter if you do it explicitly (for key in data:) or implicitly (list(data) or f.writelines(data).
If you want to iterate over both keys and values, use .iteritems() (Python 2) or .items() (Python 3):
data = {name:[scoreavg2]}
for k,v in data.items():
f.write("{}: {}\n".format(k, v))

Categories

Resources