Python - Regex capture multiple fields and build a dictionary with them

Python - Regex capture multiple fields and build a dictionary with them - python

I have this in the HTML code:
"name":"London Street" dsakjhasfsa safksafas "north":"232","south":"12","east":"113","west":"9","curRoom":"110"
"name":"London Street" something asdgas dsakhdask "north":"0","south":"22","east":"131","west":"19","curRoom":"10"
I have try those with Regex, but somehow I'm failing somewhere.
\"name\":\"\A(...)\Z\"*?north\":\"(d+)\",\"south\":\"(\d+)\",east\":\"(\d+)\",west\":\"(\d+)\",curRoom\":\"(\d+)\"
\"name\":\"^(...)$\"*?north\":\"(\d+)\",\"south\":\"(\d+)\",east\":\"(\d+)\",west\":\"(\d+)\",currentRoom\":\"(d+)\"
\"name\":\"(...)+\"*?north\":\"(\d+)\",\"south\":\"(d+)\",east\":\"(d+)\",west\":\"(\d+)\",currentRoom\":\"(\d+)\"
And with those captures I want to create a dictionary like this:
{ key is current room, values [position 0: a list with neighbours [1,2,3], position 1 - the name of the room] }
I only know how to achieve this until a point, by assigning to variable the find for each room like this:
list_of_neighbours = []
number_south = re.findall('south\":\"(\d+)\"', url)
list_of_neighbours.append(number_south)
....
list_of_neighbours = [n,s,e,w]
dictionay ={}
for k, v in list_of_neighbours:
dictionay[k] = str(number_current_room)
dictionay[k].append(v)
but this doesnt add the room and have too many steps.
The questions are: Its possible to be a shorter version? and How can I fix the regex find?Thanks

You're trying to parse JSON with regex. Don't. JSON is basically already a dictionary, just stringified in a standard format.
Use the json module:
import json
rooms = json.loads(some_json_string)

Related

How to parse JSON with a list of lists?

I am trying to parse a "complicated" JSON string that is returned to me by an API.
It looks like this:
{
"data":[
["Distance to last strike","23.0","miles"],
["Time of last strike","1/14/2022 9:23:42 AM",""],
["Number of strikes today","1",""]
]
}
While the end goal will be to extract the distance, date/time, as well as count, for right now I am just trying to successfully get the distance.
My python script is:
import requests
import json
response_API = requests.get('http://localhost:8998/api/extra/lightning.json')
data = response_API.text
parse_json = json.loads(data)
value = parse_json['Distance to last strike']
print(value)
This does not work. If I change the value line to
value = parse_json['data']
then the entire string I listed above is returned.
I am hoping it's just a simple formatting issue. Suggestions?

You have an object with a list of lists. If you fetch
value = parse_json['data']
Then you will have a list containing three lists. So:
print(value[0][1])
will print "23.0".

Python Retrieve value from dict using variable as key [duplicate]

This question already has answers here:
Access nested dictionary items via a list of keys?
(20 answers)
Closed 2 years ago.
Apologies if this is really simple, but I can't seem to get it going.
My application is working with nested dicts. For example: -
test = {
"alpha": "first",
"beta": {
"mid": {
"message": "winner winner"
}
},
"omega": "last"
}
Now I need to be able retrieve values out of that dict using variable the value of which is being dynamically constructed based on myriad other factors. So essentially I'm trying to put together a way to generate the key that I need depending on variable factors.
For example if I get back from one function "beta", from another, "mid" and from another "message", the best I can think to do is assemble a string which looks like the key path.
So for example:
current = '["beta"]["mid"]["message"]'
How can I use current to get back the "winner winner" string?
I have tried things like:-
v = '"[beta"]["mid"]"message]"'
print(test[v])
But just hitting Key errors.
Must be an easy way to get values based on calculated keys. Would appreciate a shove in the right direction.
[Question text updated]
Yes, I know I can do:
val = test['beta']['mid']['message']
And get back the value, I'm stuck on how to use the generated string as the the key path. Apologies for not being clear enough.

import re
t = '["beta"]["mid"]["message"]'
val = None
for i in re.findall(r'"([^"]+)"', t):
if(val == None):
val = test.get(i)
else:
val = val.get(i)
print(val)
or,
from functools import reduce
import operator
import re
t = '["beta"]["mid"]["message"]'
reduce(operator.getitem, re.findall(r'"([^"]+)"', t), test)
winner winner

Store the three keys as three different variables rather than as a string:
key_one = 'beta'
key_two = 'mid'
key_three = 'message'
v = test[key_one][key_two][key_three]
If you already have the keys in the string format you describe, then do some string splitting to produce three variables like the above. You don't want to eval the code as it creates a security risk.

current = '["beta"]["mid"]["message"]'
keys = [w.strip('[]"') for w in current.split('"]["')]
test[keys[0]][keys[1]][keys[2]]
# or
# key_one = keys[0]
# key_two = keys[1]
# key_three = keys[2]
# v = test[key_one][key_two][key_three]

This should do it:
v = test['beta']['mid']['message']
print(v)
Note: The issue is you're indexing the dictionary with a string in your example, not a set of keys.

How to create a nested python dictionary with keys as strings?

Summary of issue: I'm trying to create a nested Python dictionary, with keys defined by pre-defined variables and strings. And I'm populating the dictionary from regular expressions outputs. This mostly works. But I'm getting an error because the nested dictionary - not the main one - doesn't like having the key set to a string, it wants an integer. This is confusing me. So I'd like to ask you guys how I can get a nested python dictionary with string keys.
Below I'll walk you through the steps of what I've done. What is working, and what isn't. Starting from the top:
# Regular expressions module
import re
# Read text data from a file
file = open("dt.cc", "r")
dtcc = file.read()
# Create a list of stations from regular expression matches
stations = sorted(set(re.findall(r"\n(\w+)\s", dtcc)))
The result is good, and is as something like this:
stations = ['AAAA','BBBB','CCCC','DDDD']
# Initialize a new dictionary
rows = {}
# Loop over each station in the station list, and start populating
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
The result is good, and is something like this:
rows['AAAA'] = ['AAAA 0.1132 0.32 P',...]
However, when I try to create a sub-dictionary with a string key:
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
I get the following error.
"TypeError: list indices must be integers, not str"
It doesn't seem to like that I'm specifying the second dictionary key as "dt". If I give it a number instead, it works just fine. But then my dictionary key name is a number, which isn't very descriptive.
Any thoughts on how to get this working?

The issue is that by doing
rows[station] = re.findall(...)
You are creating a dictionary with the station names as keys and the return value of re.findall method as values, which happen to be lists. So by calling them again by
rows[station]["dt"] = re.findall(...)
on the LHS row[station] is a list that is indexed by integers, which is what the TypeError is complaining about. You could do rows[station][0] for example, you would get the first match from the regex. You said you want a nested dictionary. You could do
rows[station] = dict()
rows[station]["dt"] = re.findall(...)
To make it a bit nicer, a data structure that you could use instead is a defaultdict from the collections module.
The defaultdict is a dictionary that accepts a default type as a type for its values. You enter the type constructor as its argument. For example dictlist = defaultdict(list) defines a dictionary that has as values lists! Then immediately doing dictlist[key].append(item1) is legal as the list is automatically created when setting the key.
In your case you could do
from collections import defaultdict
rows = defaultdict(dict)
for station in stations:
rows[station]["bulk"] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
Where you have to assign the first regex result to a new key, "bulk" here but you can call it whatever you like. Hope this helps.

Urlencode dictionary using Python - naming key and value in the url

I am attempting to generate a URL link in the following format using urllib and urlencode.
<img src=page.psp?KEY=%28SpecA%2CSpecB%29&VALUE=1&KEY=%28SpecA%2C%28SpecB%2CSpecC%29%29&VALUE=2>
I'm trying to use data from my dictionary to input into the urllib.urlencode() function however, I need to get it into a format where the keys and values have a variable name, like below. So the keys from my dictionary will = NODE and values will = VALUE.
wanted = urllib.urlencode( [("KEY",v1),("VALUE",v2)] )
req.write( "<a href=page.psp?%s>" % (s) );
The problem I am having is that I want the URL as above and instead I am getting what is below, rather than KEY=(SpecA,SpecB) NODE=1, KEY=(SpecA,SpecB,SpecC) NODE=2 which is what I want.
KEY=%28SpecA%2CSpecB%29%2C%28%28SpecA%2CSpecB%29%2CSpecC%29&VALUE=1%2C2
So far I have extracted keys and values from the dictionary, extracted into tuples, lists, strings and also tried dict.items() but it hasn't helped much as I still can't get it to go into the format I want. Also I am doing this using Python server pages which is why I keep having to print things as a string due to constant string errors. This is part of what I have so far:
k = (str(dict))
ver1 = dict.keys()
ver2 = dict.values()
new = urllib.urlencode(function)
f = urllib.urlopen("page.psp?%s" % new)
I am wondering what I need to change in terms of extracting values from the dictionary/converting them to different formats in order to get the output I want? Any help would be appreciated and I can add more of my code (as messy as it has become) if need be. Thanks.

This should give you the format you want:
data = {
'(SpecA,SpecB)': 1,
'(SpecA,SpecB,SpecC)': 2,
}
params = []
for k,v in data.iteritems():
params.append(('KEY', k))
params.append(('VALUE', v))
new = urllib.urlencode(params)
Note that the KEY/VALUE pairings may not be the order you want, given that dicts are unordered.

Stuck on learnpython.org exercise using JSON

http://www.learnpython.org/Serialization_using_JSON_and_pickle
Here are the instructions:
The aim of this exercise is to print out the JSON string with key-value pair "Me" : 800 added to it.
And below is the starting code, which we should modify.
#Exercise fix this function, so it adds the given name and salary pair to the json it returns
def add_employee(jsonSalaries, name, salary):
# Add your code here
return jsonSalaries
#Test code - shouldn't need to be modified
originalJsonSalaries = '{"Alfred" : 300, "Jane" : 301 }'
newJsonSalaries = add_employee(originalJsonSalaries, "Me", 800)
print(newJsonSalaries)
I'm completely lost. The JSON lesson was brief, at best. The issue I seem to be running in to here is that orginalJsonSalaries is defined as a string (containing all sort of unnecessary symbols like brackets. In fact, I think if the single quotes surrounding its definition were removed, originalJsonSalaries would be a dictionary and this would be a lot easier. But as it stands, how can I append "Me" and 800 to the string and still maintain the dictionary-like formatting?
And yes, I'm very very new to coding. The only other language I know is tcl.
EDIT:
OK, thanks to the answers, I figured out I was being dense and I wrote this code:
import json
#Exercise fix this function, so it adds the given name and salary pair to the json it returns
def add_employee(jsonSalaries, name, salary):
# Add your code here
jsonSalaries = json.loads(jsonSalaries)
jsonSalaries["Me"] = 800
return jsonSalaries
#Test code - shouldn't need to be modified
originalJsonSalaries = '{"Alfred" : 300, "Jane" : 301 }'
newJsonSalaries = add_employee(originalJsonSalaries, "Me", 800)
print(newJsonSalaries)
This does not work. For whatever reason, the original dictionary keys are formatted as unicode (I don't know where that happened), so when I print out the dictionary, the "u" flag is shown:
{u'Jane': 301, 'Me': 800, u'Alfred': 300}
I have tried using dict.pop() to replace the key ( dict("Jane") = dict.pop(u"Jane") ) but that just brings up SyntaxError: can't assign to function call
Is my original solution incorrect, or is this some annoying formatting issue and how to resolve it?

The page you linked to says exactly how to do this:
In order to use the json module, it must first be imported:
import json
[...]
To load JSON back to a data structure, use the "loads" method. This method takes a string and turns it back into the json object datastructure:
print json.loads(json_string)
They gave you a string (jsonSalaries). Use json.loads to turn it into a dictionary.

Your last question is a new question, but... When you print a dictionary like that you are just using the fact that python is nice enough to show you the contents of its variables in a meaningful way. To print the dictionary in your own format, you would want to iterate through the keys and print the key and value:
for k in newJsonSalaries:
print("Employee {0} makes {1}".format(k, newJsonSalaries[k]))
There are other problems in your code....
It is weird to load the JSON inside the add employee function. That should be separate...
Also, in your add_employee() function you are hardwired always to add the same values of Me and 800 instead of using the name and salary variables that are passed in, so that line should be:
jsonSalaries[name] = salary

Use this:
import json
def add_employee(jsonSalaries, name, salary):
# Add your code here
jsonSalaries = json.loads(jsonSalaries)
jsonSalaries[name] = salary
jsonSalaries = json.dumps(jsonSalaries)
return jsonSalaries
#Test code - shouldn't need to be modified
originalJsonSalaries = '{"Alfred" : 300, "Jane" : 301 }'
newJsonSalaries = add_employee(originalJsonSalaries, "Me", 800)
print(newJsonSalaries)
Add this before return jsonSalaries:
jsonSalaries = json.dumps(jsonSalaries)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Regex capture multiple fields and build a dictionary with them - python

You're trying to parse JSON with regex. Don't. JSON is basically already a dictionary, just stringified in a standard format. Use the json module: import json rooms = json.loads(some_json_string)

Related

How to parse JSON with a list of lists?

Python Retrieve value from dict using variable as key [duplicate]

How to create a nested python dictionary with keys as strings?

Urlencode dictionary using Python - naming key and value in the url

Stuck on learnpython.org exercise using JSON

Categories

Resources