Making Python RegEx use variables for string expressions

Making Python RegEx use variables for string expressions - python

I have a .csv file with the regular expression patterns that I want to match as well as the replacement patterns that I want. Some are extremely simple, such as "." -> "" or "," -> "".
When I run the following code, however, it doesn't seem to recognize the variables and the pattern is never matched.
f = open('normalize_patterns.csv', 'rU')
c = csv.DictReader(f)
for row in c:
v = re.sub(row['Pattern'],row['Replacement'],v)
Afterwards, v is never changed and I can't seem to find out why. When I run the simple case of
v = re.sub("\.", "", v)
v = re.sub(",", "", v)
however, all the periods and commas are removed. Any help on the issue would be amazing. Thank you in advance! (I am pretty sure that the .csv file is formatted correctly, I've run it with just the "." and "" case and it still does not work for a certain reason)
Edit:
Here are the outputs of printing row. (Thanks David!)
{'Pattern': "r'(?i)&'", 'ID': '1', 'Replacement': "'and'"}
{'Pattern': "r'(?i)\\bAssoc\\b\\.?'", 'ID': '2', 'Replacement': "'Association'"}
{'Pattern': "r'(?i)\\bInc\\b\\.?'", 'ID': '3', 'Replacement': "'Inc.'"}
{'Pattern': "r'(?i)\\b(L\\.?){2}P\\.?'", 'ID': '4', 'Replacement': "''"}
{'Pattern': "r'(?i)\\bUniv\\b\\.?'", 'ID': '5', 'Replacement': "'University'"}
{'Pattern': "r'(?i)\\bCorp\\b\\.?'", 'ID': '6', 'Replacement': "'Corporation'"}
{'Pattern': "r'(?i)\\bAssn\\b\\.?'", 'ID': '7', 'Replacement': "'Association'"}
{'Pattern': "r'(?i)\\bUnivesity\\b'", 'ID': '8', 'Replacement': "'University'"}
{'Pattern': "r'(?i)\\bIntl\\b\\.?'", 'ID': '9', 'Replacement': "'International'"}
{'Pattern': "r'(?i)\\bInst\\b\\.?'", 'ID': '10', 'Replacement': "'Institute'"}
{'Pattern': "r'(?i)L\\.L\\.C\\.'", 'ID': '11', 'Replacement': "'LLC'"}
{'Pattern': "r'(?i)Chtd'", 'ID': '12', 'Replacement': "'Chartered'"}
{'Pattern': "r'(?i)Mfg\\b\\.?'", 'ID': '13', 'Replacement': "'Manufacturing'"}
{'Pattern': 'r"Nat\'l"', 'ID': '14', 'Replacement': "'National'"}
{'Pattern': "r'(?i)Flordia'", 'ID': '15', 'Replacement': "'Florida'"}
{'Pattern': "r'(?i)\\bLtd\\b\\.?'", 'ID': '16', 'Replacement': "'Ltd.'"}
{'Pattern': "r'(?i)\\bCo\\b\\.?'", 'ID': '17', 'Replacement': "'Company'"}
{'Pattern': "r'(?i)\\bDept\\b\\.?i\\'", 'ID': '18', 'Replacement': "'Department'"}
{'Pattern': "r'(?i)Califronia'", 'ID': '19', 'Replacement': "'California'"}
{'Pattern': "r'(?i)\\bJohn\\bHopkins\\b'", 'ID': '20', 'Replacement': "'Johns Hopkins'"}
{'Pattern': "r'(?i)\\bOrg\\b\\.?'", 'ID': '21', 'Replacement': "'Organization'"}
{'Pattern': "r'(?i)^[T]he\\s'", 'ID': '22', 'Replacement': "''"}
{'Pattern': "r'(?i)\\bAuth\\b\\.?'", 'ID': '23', 'Replacement': "'Authority'"}
{'Pattern': "r'.'", 'ID': '24', 'Replacement': "''"}
{'Pattern': "r','", 'ID': '25', 'Replacement': "''"}
{'Pattern': "r'(?i)\\s+'", 'ID': '0', 'Replacement': "''"}
And here are a few lines of the csv file (Opened in TextMate)
0,r'(?i)\s+',''
1,r'(?i)&','and'
2,r'(?i)\bAssoc\b\.?','Association'
3,r'(?i)\bInc\b\.?','Inc.'

Your issue is that your pattern values are not actually the regex pattern you want, your regex pattern is wrapped in an additional string.
For example, in your dictionary you have the value "r'.'", which you are using as a pattern. You code will run re.sub("r'.'", "", v), which probably isn't what you want:
>>> re.sub("r'.'", "", "This . won't match")
"This . won't match"
>>> re.sub("r'.'", "", "This r'x' will match")
'This will match'
To fix this you should go back to where you are adding the regex to the dictionary and stop doing whatever is causing the string wrapping. It might be something like row['Pattern'] = repr(regex).
If you need to keep the dictionary the same for reason then be very careful with eval, if the strings are coming from an untrusted source then eval is a big security risk. Use ast.literal_eval instead.

If you remove the r'' around the pattern, it will work.
So the pattern that matches . should be as simple as '\.' instead of "r'\.'"
The problem is r in your pattern is taken as a literal r instead of it raw string meaning.
So you can also try:
v=re.sub(eval(row['Pattern']), row['Replacement'], v)

Related

How do I match number ranges in an IF statement if the output of numbers are presented as a string?

I am a network engineer by day learning python to automate tasks, please go easy as I am a python newbie.
My goal is to iterate through a range of switchport interfaces and identify down switchport interfaces, then apply a new VLAN ID to the port.
The first stage of my script is below, which presents me with a list of down ports.
The issue I am facing is that I am wanting to over iterate over port numbers 3-6, 38-52 and that are down.
At present I am iterating through the entire list of ports identified on the switch.
import netmiko
from netmiko import ConnectHandler
from getpass4 import getpass
user = 'example_user'
password = getpass('Password: ')
net_connect = ConnectHandler(
device_type="hp_procurve",
host="10.0.0.1",
username= user,
password= password,
)
print('*** Sending command ***')
show_int_brief = net_connect.send_command("show int brief", use_textfsm=True)
net_connect.disconnect()
int_down = []
for item in show_int_brief:
if item['status'] == 'Down':
int_down.append(item['port'])
print('*** Port status known as down ***\n', int_down)
Example output prior to being added to the list int_down.
[{'port': '1', 'type': '100/1000T', 'intrusion_alert': 'No', 'enabled': 'Yes', 'status': 'Up', 'mode': '1000FDx', 'mdi_mode': 'MDI', 'flow_ctrl': 'off', 'bcast_limit': '0'},
{'port': '2', 'type': '100/1000T', 'intrusion_alert': 'No', 'enabled': 'Yes', 'status': 'Up', 'mode': '1000FDx', 'mdi_mode': 'MDIX', 'flow_ctrl': 'off', 'bcast_limit': '0'},
{'port': '3', 'type': '100/1000T', 'intrusion_alert': 'No', 'enabled': 'Yes', 'status': 'Down', 'mode': '1000FDx', 'mdi_mode': 'Auto', 'flow_ctrl': 'off', 'bcast_limit': '0'},
{'port': '4', 'type': '100/1000T', 'intrusion_alert': 'No', 'enabled': 'Yes', 'status': 'Down', 'mode': '1000FDx', 'mdi_mode': 'Auto', 'flow_ctrl': 'off', 'bcast_limit': '0'},
{'port': '5', 'type': '100/1000T', 'intrusion_alert': 'No', 'enabled': 'Yes', 'status': 'Down', 'mode': '1000FDx', 'mdi_mode': 'Auto', 'flow_ctrl': 'off', 'bcast_limit': '0'},
{'port': '6', 'type': '100/1000T', 'intrusion_alert': 'No', 'enabled': 'Yes', 'status': 'Down', 'mode': '1000FDx', 'mdi_mode': 'Auto', 'flow_ctrl': 'off', 'bcast_limit': '0'}]
And so on..
Example output after being placed in 'int_down' and printed.
Numbers identified are expected, as these are in a down state.
['3', '4', '5', '6', '7', '8', '9', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '47', '49', '50', '51', '52']
The idea is to then use this list in another command that will proceed with applying VLAN configuration only to these ports, though will tackle this as I get past this hurdle.
Cheers,
Luppa

I don't think it is worth to limit to a range a priory, since ensuring the port data is in the proper shape and organization would be similar or even more computational effort with respect to looping through all ports directly. Instead I suggest to filter ad-hoc:
I use the boild down list of ports-dicts:
ports = [
{'port': '1', 'status': 'Down'},
{'port': '7', 'status': 'Down'},
{'port': '2', 'status': 'Up'},
{'port': '8', 'status': 'Up'},
{'port': '4', 'status': 'Down'},
{'port': '5', 'status': 'Up'},
{'port': '9', 'status': 'Up'},
{'port': '6', 'status': 'Down'}
]
Please note that they are not in order and have missing entries to simulate real world data more closely.
Then first I implement a helper function, where I can specify the ranges of interest. If these are more than two consider to use any() and a list of ranges for readability.
def of_interest(port_num: int) -> bool:
return port_num in range(3, 7) or port_num in range(38, 53)
Now your nested for and if structure can be expressed as list comprehension:
down_ports = [e['port'] for e in ports if e['status'] == 'Down' and of_interest(int(e['port']))]
Of course, depending on what you want to do in later steps it might make sense to copy all values from the port list of that entry.
Does this help?

TypeError: '<' not supported between instances of 'str' and 'float'

I have a simple Python script that queries an API and parses the JSON data. Specifically, I am trying to find all ids that fall into a rectangle based on given latitude and longitude coordinates. I am having some data type issues since one is being returned as type str and the other as type float. The coordinates are: (37.769754, -122.427050) and, (37.748554, -122.404535).
Below is my code, sample JSON, and the trace.
Code:
import requests
def get_ids():
url = "https://retro.umoiq.com/service/publicJSONFeed?command=vehicleLocations&a=sf-muni&t=0"
response = requests.get(url).json()
id_list = []
for id in response['vehicle']:
lat = id['lat']
lon = id['lon']
if (lat <= 37.769754 and lat >= 37.748554):
if (lon >= -122.427050 and lon <= -122.404535):
id_list.append(id['id'])
return id_list
def main():
print(get_ids())
if __name__ == "__main__":
main()
JSON:
{'lastTime': {'time': '1653259435728'}, 'copyright': 'All data copyright San Francisco Muni 2022.', 'vehicle': [{'routeTag': 'KT', 'predictable': 'true', 'heading': '218', 'speedKmHr': '0', 'lon': '-122.405464', 'id': '1462', 'dirTag': 'KT___O_F20', 'lat': '37.708099', 'secsSinceReport': '33', 'leadingVehicleId': '1487'}, {'routeTag': '33', 'predictable': 'true', 'heading': '165', 'speedKmHr': '35', 'lon': '-122.40744', 'id': '5817', 'dirTag': '33___O_F00', 'lat': '37.763451', 'secsSinceReport': '6'}, {'routeTag': '1', 'predictable': 'true', 'heading': '269', 'speedKmHr': '0', 'lon': '-122.492844', 'id': '5818', 'dirTag': '1____O_F00', 'lat': '37.77985', 'secsSinceReport': '33'}, {'routeTag': '1', 'predictable': 'true', 'heading': '219', 'speedKmHr': '0', 'lon': '-122.493156', 'id': '5819', 'lat': '37.779823', 'secsSinceReport': '6'}, {'routeTag': 'N', 'predictable': 'true', 'heading': '195', 'speedKmHr': '6', 'lon': '-122.457748', 'id': '1453', 'dirTag': 'N____O_F01', 'lat': '37.764671', 'secsSinceReport': '33'}, {'routeTag': '24', 'predictable': 'true', 'heading': '231', 'speedKmHr': '0', 'lon': '-122.412033', 'id': '5813', 'dirTag': '24___I_F00', 'lat': '37.739773', 'secsSinceReport': '20'}, {'routeTag': '1', 'predictable': 'true', 'heading': '80', 'speedKmHr': '0', 'lon': '-122.397522', 'id': '5815', 'dirTag': '1____I_F00', 'lat': '37.795418', 'secsSinceReport': '46'}, {'routeTag': '1', 'predictable': 'true', 'heading': '87', 'speedKmHr': '0', 'lon': '-122.472931', 'id': '5827', 'dirTag': '1____I_F00', 'lat': '37.78437', 'secsSinceReport': '6'}, {'routeTag': 'KT', 'predictable': 'true', 'heading': '330', 'speedKmHr': '32', 'lon': '-122.468117', 'id': '1469', 'dirTag': 'KT___I_F20', 'lat': '37.7290149', 'secsSinceReport': '6'}, {'routeTag': '33', 'predictable': 'true', 'heading': '77', 'speedKmHr': '0', 'lon': '-122.456421', 'id': '5828', 'dirTag': '33___O_F00', 'lat': '37.786957', 'secsSinceReport': '6'}, {'routeTag': '45', 'predictable': 'true', 'heading': '165', 'speedKmHr': '21', 'lon': '-122.406647', 'id': '5829', 'dirTag': '45___I_F00', 'lat': '37.78756', 'secsSinceReport': '6'}
etc...
Trace:
Traceback (most recent call last):
File "/main.py", line 51, in <module>
main()
File "/main.py", line 48, in main
get_ids()
File "/main.py", line 41, in get_ids
if (lat < 37.769754 and lon < -122.427050):
TypeError: '<' not supported between instances of 'str' and 'float'

Try converting the data to floats like so:
lat = float(i['lat'])
lon = float(i['lon'])
This will allow the comparison operators to work correctly when comparing 2 floats.
Keep in mind the operators themselves are wrong (long -179 and lat -179 would fit inside your rectangle).
I took the liberty to improve some of your code and fix the comparison operators:
import requests
VEHICLE_LOCATIONS_URL = "https://retro.umoiq.com/service/publicJSONFeed?command=vehicleLocations&a=sf-muni&t=0"
# (min_lat, max_lat), (min_long, max_long)
BOUNDARIES = ((37.748554, 37.769754), (-122.427050, -122.404535))
def get_ids_in_boundry():
response = requests.get(VEHICLE_LOCATIONS_URL).json()
id_list = []
for vehicle in response['vehicle']:
lat, long = float(vehicle['lat']), float(vehicle['lon'])
if ((BOUNDARIES[0][0] <= lat <= BOUNDARIES[0][1])
and (BOUNDARIES[1][0] <= long <= BOUNDARIES[1][1])):
id_list.append(vehicle['id'])
return id_list
def main():
print(get_ids_in_boundry())
if __name__ == "__main__":
main()
I've changed the URL to be a const, so as the boundaries, and returned the id_list outside of the function. Many other improvements can be added such as asking for boundaries in the function parameter or splitting doing the request and checking the boundaries into 2 different functions.

If I understood everything correctly the aswer is to convert the variables lat and lon to floats.
To do that just modify this in your code
lat = float(i['lat'])
lon = float(i['lon'])

How to format this output into an array/useable data

I'm not a coder by trade, rather an infrastructure engineer that's learning to code for my role. I have an output that I am getting and I am struggling to think how I can get this to work.
I've utilized some of my colleagues but the data is outputted in a weird format and I am unsure how to get the outcome I want. I have tried splitting the lines but it will not work perfectly.
The current code is simple. It just pulls the output command from the switch & I then have it split the lines:
output = net_connect.send_command("show switch")
switchlines = output.splitlines()
print(output)
print(switchlines[5])
It will then output the following in this case:
Switch/Stack Mac Address : 188b.45ea.a000 - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Switch# Role Mac Address Priority Version State
------------------------------------------------------------
*1 Active 188b.45ea.a000 15 V01 Ready
2 Standby 00ca.e5fc.1780 14 V06 Ready
3 Member 00ca.e5fc.5e80 13 V06 Ready
4 Member 00ca.e588.f480 12 V06 Ready
5 Member 00ca.e588.ee80 11 V06 Ready
*1 Active 188b.45ea.a000 15 V01 Ready
That table comes out as a string & essentially, I need to find a way to split that into usable chunks (I.E a 2D Array) So I can use each field individually.

You already got the lines separated in a list (switchlines), so all you have left to do is iterate over that list and split each one on spaces. Because there are many spaces separating, we also want to strip those elements. So you could do something like:
res = []
for line in switchlines[5:]:
elements = [x.strip() for x in line.split()]
res.append(elements)
And this gives on your example text:
[['*1', 'Active', '188b.45ea.a000', '15', 'V01', 'Ready'],
['2', 'Standby', '00ca.e5fc.1780', '14', 'V06', 'Ready'],
['3', 'Member', '00ca.e5fc.5e80', '13', 'V06', 'Ready'],
['4', 'Member', '00ca.e588.f480', '12', 'V06', 'Ready'],
['5', 'Member', '00ca.e588.ee80', '11', 'V06', 'Ready']]
Another option that can later help you work on the data, is collect it into a dictionary instead of a list:
for line in switchlines[5:]:
switch, role, mac, prio, ver, state, *extras = [x.strip() for x in line.split()]
res.append({'switch': switch, 'role': role, 'mac': mac,
'prio': prio, 'ver': ver, 'state': state, 'extras': extras})
And this gives on your example text:
[{'switch': '*1', 'role': 'Active', 'mac': '188b.45ea.a000', 'prio': '15', 'ver': 'V01', 'state': 'Ready', 'extras': []},
{'switch': '2', 'role': 'Standby', 'mac': '00ca.e5fc.1780', 'prio': '14', 'ver': 'V06', 'state': 'Ready', 'extras': []},
{'switch': '3', 'role': 'Member', 'mac': '00ca.e5fc.5e80', 'prio': '13', 'ver': 'V06', 'state': 'Ready', 'extras': []},
{'switch': '4', 'role': 'Member', 'mac': '00ca.e588.f480', 'prio': '12', 'ver': 'V06', 'state': 'Ready', 'extras': []},
{'switch': '5', 'role': 'Member', 'mac': '00ca.e588.ee80', 'prio': '11', 'ver': 'V06', 'state': 'Ready', 'extras': []}]

Unable to find stats with Yahoo Fantasy Sports API

https://developer.yahoo.com/fantasysports/guide/game-resource.html
So on the API guide, under stat_categories, there are a set of ids,
https://fantasysports.yahooapis.com/fantasy/v2/game/nba/stat_categories
But when I'm looking at the json data from all the API request I make, there's no
{'stats': [{'stat': {'stat_id': 0,
'name': 'Games Played',
'display_name': 'GP',
'sort_order': '1',
'position_types': [{'position_type': 'P'}]}}
on it.
This is the result from my json data. As you can see below, there's no stat_id = 0 or 1 or 2. But it starts with 3.
'stats': [{'stat': {'stat_id': '3', 'value': '3473'}},
{'stat': {'stat_id': '4', 'value': '1625'}},
{'stat': {'stat_id': '6', 'value': '920'}},
{'stat': {'stat_id': '7', 'value': '713'}},
{'stat': {'stat_id': '9', 'value': '1069'}},
{'stat': {'stat_id': '10', 'value': '384'}},
{'stat': {'stat_id': '12', 'value': '4347'}},
{'stat': {'stat_id': '13', 'value': '408'}},
{'stat': {'stat_id': '15', 'value': '1792'}},
{'stat': {'stat_id': '16', 'value': '1016'}},
{'stat': {'stat_id': '17', 'value': '271'}},
{'stat': {'stat_id': '18', 'value': '132'}},
{'stat': {'stat_id': '19', 'value': '586'}},
{'stat': {'stat_id': '27', 'value': '63'}},
{'stat': {'stat_id': '28', 'value': '3'}}]
Can anyone help me with this?
I need the number of Game played everyday to do my analysis.

If you query "League Settings", Yahoo will let you know which stats your league tracks for points. Yahoo will only provide STAT details for stats you're tracking for points.
Most likely, Stat 0 -> Games Played is not a tracked / scoring statistic. Hence it does not provide you this info when you query players.
The only way to get this stat, is to make your league provide this as a scoring stat.

How to extract the uris lists by regex? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
In my python code , I get strings from the text file like :
a = "[{'index': '1', 'selected': 'true', 'length': '0', 'completedLength': '0', 'path': '', 'uris': [{'status': 'used', 'uri': 'http://www.single.com'}]}]"
b ="[{'index': '1', 'selected': 'true', 'length': '0', 'completedLength': '0', 'path': '', 'uris': [{'status': 'used', 'uri': 'http://www.mirrors.com'}, {'status': 'used', 'uri': 'http://www.mirrors2.com'}]}]"
c ="[{'index': '1', 'selected': 'true', 'length': '103674793', 'completedLength': '0', 'path': '/home/dr/Maher_Al-Muaiqly_(MP3_Quran)/002.mp3', 'uris': []}, {'index': '2', 'selected': 'true', 'length': '62043128', 'completedLength': '0', 'path': '/home/dr/Maher_Al-Muaiqly_(MP3_Quran)/004.mp3', 'uris': []}, {'index': '3', 'selected': 'true', 'length': '57914945', 'completedLength': '0', 'path': '/home/dr/Maher_Al-Muaiqly_(MP3_Quran)/003.mp3', 'uris': []}]"
I want to get the text of the value uris , the output should looks like :
a = [{'status': 'used', 'uri': 'http://www.single.com'}]
b = [{'status': 'used', 'uri': 'http://www.mirrors.com'}, {'status': 'used', 'uri': 'http://www.mirrors2.com'}]
c = [[],[],[]]
Many hours I spent in failed trials to get this result by using the string functions ,
uris = str.split('}, {')
for uri in uris :
uri = uri.split(',')
# and so on ...
but , it work so bad especially in the second case , I hope that anyone can do it by regex or any other way.

They are all python literals. You can use ast.literal_eval. No need to use regular expression.
>>> a = "[{'index': '1', 'selected': 'true', 'length': '0', 'completedLength': '0', 'path': '', 'uris': [{'status': 'used', 'uri': 'http://www.single.com'}]}]"
>>> b = "[{'index': '1', 'selected': 'true', 'length': '0', 'completedLength': '0', 'path': '', 'uris': [{'status': 'used', 'uri': 'http://www.mirrors.com'}, {'status': 'used', 'uri': 'http://www.mirrors2.com'}]}]"
>>> c = "[{'index': '1', 'selected': 'true', 'length': '103674793', 'completedLength': '0', 'path': '/home/dr/Maher_Al-Muaiqly_(MP3_Quran)/002.mp3', 'uris': []}, {'index': '2', 'selected': 'true', 'length': '62043128', 'completedLength': '0', 'path': '/home/dr/Maher_Al-Muaiqly_(MP3_Quran)/004.mp3', 'uris': []}, {'index': '3', 'selected': 'true', 'length': '57914945', 'completedLength': '0', 'path': '/home/dr/Maher_Al-Muaiqly_(MP3_Quran)/003.mp3', 'uris': []}]"
>>> import ast
>>> [x['uris'] for x in ast.literal_eval(a)]
[[{'status': 'used', 'uri': 'http://www.single.com'}]]
>>> [x['uris'] for x in ast.literal_eval(b)]
[[{'status': 'used', 'uri': 'http://www.mirrors.com'}, {'status': 'used', 'uri': 'http://www.mirrors2.com'}]]
>>> [x['uris'] for x in ast.literal_eval(c)]
[[], [], []]

in javascript you can do this
a = a.replace(/^.*uris[^[]*(\[[^\]]*\]).*$/, '\1');
if php would be this a way
$a = preg_replace('/^.*uris[^[]*(\[[^\]]*\]).*$/', '\1', $a);
edit: well I see, it wouldn't do your complete task for 'c' -.-

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Making Python RegEx use variables for string expressions - python

If you remove the r'' around the pattern, it will work. So the pattern that matches . should be as simple as '\.' instead of "r'\.'" The problem is r in your pattern is taken as a literal r instead of it raw string meaning. So you can also try: v=re.sub(eval(row['Pattern']), row['Replacement'], v)

Related

How do I match number ranges in an IF statement if the output of numbers are presented as a string?

TypeError: '<' not supported between instances of 'str' and 'float'

How to format this output into an array/useable data

Unable to find stats with Yahoo Fantasy Sports API

How to extract the uris lists by regex? [closed]

Categories

Resources