How to create a nested python dictionary with keys as strings? - python

Summary of issue: I'm trying to create a nested Python dictionary, with keys defined by pre-defined variables and strings. And I'm populating the dictionary from regular expressions outputs. This mostly works. But I'm getting an error because the nested dictionary - not the main one - doesn't like having the key set to a string, it wants an integer. This is confusing me. So I'd like to ask you guys how I can get a nested python dictionary with string keys.
Below I'll walk you through the steps of what I've done. What is working, and what isn't. Starting from the top:
# Regular expressions module
import re
# Read text data from a file
file = open("dt.cc", "r")
dtcc = file.read()
# Create a list of stations from regular expression matches
stations = sorted(set(re.findall(r"\n(\w+)\s", dtcc)))
The result is good, and is as something like this:
stations = ['AAAA','BBBB','CCCC','DDDD']
# Initialize a new dictionary
rows = {}
# Loop over each station in the station list, and start populating
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
The result is good, and is something like this:
rows['AAAA'] = ['AAAA 0.1132 0.32 P',...]
However, when I try to create a sub-dictionary with a string key:
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
I get the following error.
"TypeError: list indices must be integers, not str"
It doesn't seem to like that I'm specifying the second dictionary key as "dt". If I give it a number instead, it works just fine. But then my dictionary key name is a number, which isn't very descriptive.
Any thoughts on how to get this working?

The issue is that by doing
rows[station] = re.findall(...)
You are creating a dictionary with the station names as keys and the return value of re.findall method as values, which happen to be lists. So by calling them again by
rows[station]["dt"] = re.findall(...)
on the LHS row[station] is a list that is indexed by integers, which is what the TypeError is complaining about. You could do rows[station][0] for example, you would get the first match from the regex. You said you want a nested dictionary. You could do
rows[station] = dict()
rows[station]["dt"] = re.findall(...)
To make it a bit nicer, a data structure that you could use instead is a defaultdict from the collections module.
The defaultdict is a dictionary that accepts a default type as a type for its values. You enter the type constructor as its argument. For example dictlist = defaultdict(list) defines a dictionary that has as values lists! Then immediately doing dictlist[key].append(item1) is legal as the list is automatically created when setting the key.
In your case you could do
from collections import defaultdict
rows = defaultdict(dict)
for station in stations:
rows[station]["bulk"] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
Where you have to assign the first regex result to a new key, "bulk" here but you can call it whatever you like. Hope this helps.

Related

Dynamically building a dictionary based on variables

I am trying to build a dictionary based on a larger input of text. From this input, I will create nested dictionaries which will need to be updated as the program runs. The structure ideally looks like this:
nodes = {}
node_name: {
inc_name: inc_capacity,
inc_name: inc_capacity,
inc_name: inc_capacity,
}
Because of the nature of this input, I would like to use variables to dynamically create dictionary keys (or access them if they already exist). But I get KeyError if the key doesn't already exist. I assume I could do a try/except, but was wondering if there was a 'cleaner' way to do this in python. The next best solution I found is illustrated below:
test_dict = {}
inc_color = 'light blue'
inc_cap = 2
test_dict[f'{inc_color}'] = inc_cap
# test_dict returns >>> {'light blue': 2}
Try this code, for Large Scale input. For example file input
Lemme give you an example for what I am aiming for, and I think, this what you want.
File.txt
Person1: 115.5
Person2: 128.87
Person3: 827.43
Person4:'18.9
Numerical Validation Function
def is_number(a):
try:
float (a)
except ValueError:
return False
else:
return True
Code for dictionary File.txt
adict = {}
with open("File.txt") as data:
adict = {line[:line.index(':')]: line[line.index(':')+1: ].strip(' \n') for line in data.readlines() if is_number(line[line.index(':')+1: ].strip('\n')) == True}
print(adict)
Output
{'Person1': '115.5', 'Person2': '128.87', 'Person3': '827.43'}
For more explanation, please follow this issue solution How to fix the errors in my code for making a dictionary from a file
As already mentioned in the comments sections, you can use setdefault.
Here's how I will implement it.
Assume I want to add values to dict : node_name and I have the keys and values in two lists. Keys are in inc_names and values are in inc_ccity. Then I will use the below code to load them. Note that inc_name2 key exists twice in the key list. So the second occurrence of it will be ignored from entry into the dictionary.
node_name = {}
inc_names = ['inc_name1','inc_name2','inc_name3','inc_name2']
inc_ccity = ['inc_capacity1','inc_capacity2','inc_capacity3','inc_capacity4']
for i,names in enumerate(inc_names):
node = node_name.setdefault(names, inc_ccity[i])
if node != inc_ccity[i]:
print ('Key=',names,'already exists with value',node, '. New value=', inc_ccity[i], 'skipped')
print ('\nThe final list of values in the dict node_name are :')
print (node_name)
The output of this will be:
Key= inc_name2 already exists with value inc_capacity2 . New value= inc_capacity4 skipped
The final list of values in the dict node_name are :
{'inc_name1': 'inc_capacity1', 'inc_name2': 'inc_capacity2', 'inc_name3': 'inc_capacity3'}
This way you can add values into a dictionary using variables.

Thingspeak: Parse json response with Python

I would like to create an Alexa skill using Python to use data uploaded by sensors to Thingspeak. The cases where I only use one specific value is quite easy, the response from Thingspeak is the value only. When I want to use several values, in my case to sum up the athmospheric pressure to determine tendencies, teh response is a json object like this:
{"channel":{"id":293367,"name":"Weather Station","description":"My first attempt to build a weather station based on an ESP8266 and some common sensors.","latitude":"51.473509","longitude":"7.355569","field1":"humidity","field2":"pressure","field3":"lux","field4":"rssi","field5":"temp","field6":"uv","field7":"voltage","field8":"radiation","created_at":"2017-06-25T07:35:37Z","updated_at":"2018-08-04T12:11:22Z","elevation":"121","last_entry_id":1812},"feeds":
[{"created_at":"2018-10-21T18:11:45Z","entry_id":1713,"field2":"1025.62"},
{"created_at":"2018-10-21T18:12:05Z","entry_id":1714,"field2":"1025.58"},
{"created_at":"2018-10-21T18:12:25Z","entry_id":1715,"field2":"1025.56"},
{"created_at":"2018-10-21T18:12:45Z","entry_id":1716,"field2":"1025.65"},
{"created_at":"2018-10-21T18:13:05Z","entry_id":1717,"field2":"1025.58"},
{"created_at":"2018-10-21T18:13:25Z","entry_id":1718,"field2":"1025.63"}]
I now started with
f = urllib.urlopen(link) # Get your data
json_object = json.load(f)
for entry in json_object[0]
print entry["field2"]
The json object is a bit recursive, it is a list containing a list with an element with an array as the value.
Now I am not quite sure how to iterate over the values of the key "field2" in the array. I am quite new to Python and also json. Perhaps anyone can help me out?
Thanks in advance!
This has nothing to do with json - once the json string parsed by json.load(), what you get is a plain python object (usually a dict, sometimes a list, rarely - but this would be legal - a string, int, float, boolean or None).
it is a list containing a list with an element with an array as the value.
Actually it's a dict with two keys "channel" and "feeds". The first one has another dict for value, and the second a list of dicts. How to use dicts and lists is extensively documented FWIW
https://docs.python.org/3/tutorial/datastructures.html#dictionaries
https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
https://docs.python.org/3/tutorial/introduction.html#lists
https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range
Here the values you're looking for are stored under the "field2" keys of the dicts in the "feeds" key, so what you want is:
# get the list stored under the "feeds" key
feeds = json_object["feeds"]
# iterate over the list:
for feed in feeds:
# get the value for the "field2" key
print feed["field2"]
You have a dictionary. Use key to access the value
Ex:
json_object = {"channel":{"id":293367,"name":"Weather Station","description":"My first attempt to build a weather station based on an ESP8266 and some common sensors.","latitude":"51.473509","longitude":"7.355569","field1":"humidity","field2":"pressure","field3":"lux","field4":"rssi","field5":"temp","field6":"uv","field7":"voltage","field8":"radiation","created_at":"2017-06-25T07:35:37Z","updated_at":"2018-08-04T12:11:22Z","elevation":"121","last_entry_id":1812},"feeds":
[{"created_at":"2018-10-21T18:11:45Z","entry_id":1713,"field2":"1025.62"},
{"created_at":"2018-10-21T18:12:05Z","entry_id":1714,"field2":"1025.58"},
{"created_at":"2018-10-21T18:12:25Z","entry_id":1715,"field2":"1025.56"},
{"created_at":"2018-10-21T18:12:45Z","entry_id":1716,"field2":"1025.65"},
{"created_at":"2018-10-21T18:13:05Z","entry_id":1717,"field2":"1025.58"},
{"created_at":"2018-10-21T18:13:25Z","entry_id":1718,"field2":"1025.63"}]}
for entry in json_object["feeds"]:
print entry["field2"]
Output:
1025.62
1025.58
1025.56
1025.65
1025.58
1025.63
I just figured it out, it was just like expected.
You have to get the entries array from the dict and than iterate over the list of items and print the value to the key field2.
# Get entries from the response
entries = json_object["feeds"]
# Iterate through each measurement and print value
for entry in entries:
print entry['field2']

Regular expressions matching words which contain the pattern but also the pattern plus something else

I have the following problem:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
I need to find which elements of list2 are in list1. In actual fact the elements of list1 correspond to a numerical value which I need to obtain then change. The problem is that 'xyz2' contains 'xyz' and therefore matches also with a regular expression.
My code so far (where 'data' is a python dictionary and 'specie_name_and_initial_values' is a list of lists where each sublist contains two elements, the first being specie name and the second being a numerical value that goes with it):
all_keys = list(data.keys())
for i in range(len(all_keys)):
if all_keys[i]!='Time':
#print all_keys[i]
pattern = re.compile(all_keys[i])
for j in range(len(specie_name_and_initial_values)):
print re.findall(pattern,specie_name_and_initial_values[j][0])
Variations of the regular expression I have tried include:
pattern = re.compile('^'+all_keys[i]+'$')
pattern = re.compile('^'+all_keys[i])
pattern = re.compile(all_keys[i]+'$')
And I've also tried using 'in' as a qualifier (i.e. within a for loop)
Any help would be greatly appreciated. Thanks
Ciaran
----------EDIT------------
To clarify. My current code is below. its used within a class/method like structure.
def calculate_relative_data_based_on_initial_values(self,copasi_file,xlsx_data_file,data_type='fold_change',time='seconds'):
copasi_tool = MineParamEstTools()
data=pandas.io.excel.read_excel(xlsx_data_file,header=0)
#uses custom class and method to get the list of lists from a file
specie_name_and_initial_values = copasi_tool.get_copasi_initial_values(copasi_file)
if time=='minutes':
data['Time']=data['Time']*60
elif time=='hour':
data['Time']=data['Time']*3600
elif time=='seconds':
print 'Time is already in seconds.'
else:
print 'Not a valid time unit'
all_keys = list(data.keys())
species=[]
for i in range(len(specie_name_and_initial_values)):
species.append(specie_name_and_initial_values[i][0])
for i in range(len(all_keys)):
for j in range(len(specie_name_and_initial_values)):
if all_keys[i] in species[j]:
print all_keys[i]
The table returned from pandas is accessed like a dictionary. I need to go to my data table, extract the headers (i.e. the all_keys bit), then look up the name of the header in the specie_name_and_initial_values variable and obtain the corresponding value (the second element within the specie_name_and_initial_value variable). After this, I multiply all values of my data table by the value obtained for each of the matched elements.
I'm most likely over complicating this. Do you have a better solution?
thanks
----------edit 2 ---------------
Okay, below are my variables
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
species = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])
You don't need a regex to find common elements, set.intersection will find all elements in list2 that are also in list1:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
print(set(list2).intersection(list1))
set(['xyz'])
Also if you wanted to compare 'xyz' to 'xyz2' you would use == not in and then it would correctly return False.
You can also rewrite your own code a lot more succinctly, :
for key in data:
if key != 'Time':
pattern = re.compile(val)
for name, _ in specie_name_and_initial_values:
print re.findall(pattern, name)
Based on your edit you have somehow managed to turn lists into strings, one option is to strip the []:
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
specie_name_and_initial_values = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])
specie_name_and_initial_values = set(s.strip("[]") for s in specie_name_and_initial_values)
print(all_keys.intersection(specie_name_and_initial_values))
Which outputs:
set([u'Cyp26_G_R1', u'Cyp26_G_rep1'])
FYI, if you had lists inside the set you would have gotten an error as lists are mutable so are not hashable.

Pandas Dataframe to Dictionary with Multiple Keys

I am currently working with a dataframe consisting of a column of 13 letter strings ('13mer') paired with ID codes ('Accession') as such:
However, I would like to create a dictionary in which the Accession codes are the keys with values being the 13mers associated with the accession so that it looks as follows:
{'JO2176': ['IGY....', 'QLG...', 'ESS...', ...],
'CYO21709': ['IGY...', 'TVL...',.............],
...}
Which I've accomplished using this code:
Accession_13mers = {}
for group in grouped:
Accession_13mers[group[0]] = []
for item in group[1].iteritems():
Accession_13mers[group[0]].append(item[1])
However, now I would like to go back through and iterate through the keys for each Accession code and run a function I've defined as find_match_position(reference_sequence, 13mer) which finds the 13mer in in a reference sequence and returns its position. I would then like to append the position as a value for the 13mer which will be the key.
If anyone has any ideas for how I can expedite this process that would be extremely helpful.
Thanks,
Justin
I would suggest creating a new dictionary, whose values are another dictionary. Essentially a nested dictionary.
position_nmers = {}
for key in H1_Access_13mers:
position_nmers[key] = {} # replicate key, val in new dictionary, as a dictionary
for value in H1_Access_13mers[key]:
position_nmers[key][value] = # do something
To introspect the dictionary and make sure it's okay:
print position_nmers
You can iterate over the groupby more cleanly by unpacking:
d = {}
for key, s in df.groupby('Accession')['13mer']:
d[key] = list(s)
This also makes it much clearer where you should put your function!
... However, I think that it might be better suited to an enumerate:
d2 = {}
for pos, val in enumerate(df['13mer']):
d2[val] = pos

Adding Multiple Values to a Single Key in Python Dictionary

Python dictionaries really have me today. I've been pouring over stack, trying to find a way to do a simple append of a new value to an existing key in a python dictionary adn I'm failing at every attempt and using the same syntaxes I see on here.
This is what i am trying to do:
#cursor seach a xls file
definitionQuery_Dict = {}
for row in arcpy.SearchCursor(xls):
# set some source paths from strings in the xls file
dataSourcePath = str(row.getValue("workspace_path")) + "\\" + str(row.getValue("dataSource"))
dataSource = row.getValue("dataSource")
# add items to dictionary. The keys are the dayasource table and the values will be definition (SQL) queries. First test is to see if a defintion query exists in the row and if it does, we want to add the key,value pair to a dictionary.
if row.getValue("Definition_Query") <> None:
# if key already exists, then append a new value to the value list
if row.getValue("dataSource") in definitionQuery_Dict:
definitionQuery_Dict[row.getValue("dataSource")].append(row.getValue("Definition_Query"))
else:
# otherwise, add a new key, value pair
definitionQuery_Dict[row.getValue("dataSource")] = row.getValue("Definition_Query")
I get an attribute error:
AttributeError: 'unicode' object has no attribute 'append'
But I believe I am doing the same as the answer provided here
I've tried various other methods with no luck with various other error messages. i know this is probably simple and maybe I couldn't find the right source on the web, but I'm stuck. Anyone care to help?
Thanks,
Mike
The issue is that you're originally setting the value to be a string (ie the result of row.getValue) but then trying to append it if it already exists. You need to set the original value to a list containing a single string. Change the last line to this:
definitionQuery_Dict[row.getValue("dataSource")] = [row.getValue("Definition_Query")]
(notice the brackets round the value).
ndpu has a good point with the use of defaultdict: but if you're using that, you should always do append - ie replace the whole if/else statement with the append you're currently doing in the if clause.
Your dictionary has keys and values. If you want to add to the values as you go, then each value has to be a type that can be extended/expanded, like a list or another dictionary. Currently each value in your dictionary is a string, where what you want instead is a list containing strings. If you use lists, you can do something like:
mydict = {}
records = [('a', 2), ('b', 3), ('a', 4)]
for key, data in records:
# If this is a new key, create a list to store
# the values
if not key in mydict:
mydict[key] = []
mydict[key].append(data)
Output:
mydict
Out[4]: {'a': [2, 4], 'b': [3]}
Note that even though 'b' only has one value, that single value still has to be put in a list, so that it can be added to later on.
Use collections.defaultdict:
from collections import defaultdict
definitionQuery_Dict = defaultdict(list)
# ...

Categories

Resources