Python matches the part after a .* at its last occurance [duplicate] - python

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 5 years ago.
Im trying to read the server states from the guildwars API. For that i match the servername, then comes an occasional language specifier and a ",\n which i intend to match with .* and after that follows the population. But instead of directly matching the first occurrence of population it instead matches the last one. Can someone tell me why( and how to fix this)?
Edit: I found a workaround. By substituting .* with .{,20} it works.
relevant part of the API
"name": "Riverside [DE]",
"population": "Full"
with urlopen('https://api.guildwars2.com/v2/worlds?ids=all') as api:
s = api.read()
s = s.decode('utf-8')
search = re.search(r'''Riverside.*"population": "''',s,re.S)
print(search)
s = s[search.span()[1]:]
state = re.search(r'[a-zA-Z]*',s)
print(state)

There are two things
You should use .*?(trailing question mark) which will stop at the first instance.I wont think this as good or better solution
Instead once you get the data convert it into JSON and do your manipulation on top of it
import json
with urlopen('https://api.guildwars2.com/v2/worlds?ids=all') as api:
s = api.read()
s = s.decode('utf-8')
jsondata = json.loads(s)
filtered_data = filter(lambda a: str(a["name"]).find("Riverside") > -1,jsondata)
print(filtered_data[0]["population"])

Related

How I read a word after the # symbol [duplicate]

This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed last month.
I'm having a problem. I need to create the #everyone_or_person feature. A bit like discord. But I'll have to be able to read the word after the # and stop reading when there is a ("SPACE"/"_") and check for that word in the list. I've appended a simple version as an example. I knew it would not work but I couldn't think of anything else.
input = input("input: ")
value = input.find("#")
output = input.partition("#")[0]
print(str(output))
I've tried to look up how to do it but to no avail.
simply use split:
test = "Some input with #your_desired_value in it"
result = test.split("#")[1].split(" ")[0]
print(result)
this splits your text at the #, takes the entire string after the #, splits again at the first space, and takes the string before that.

How to match regex to line ending in python [duplicate]

This question already has an answer here:
Regular expression works on regex101.com, but not on prod
(1 answer)
Closed 2 years ago.
I'm trying to get python regex to match the end of a string (primarily because I want to remove a common section off the end of the string. I have the following code which I think is how the docs describe to do it, but it's not performing as I'm expecting:
input_value = "Non-numeric qty or weight, from 00|XFX|201912192009"
pattern = ", from .*$"
match = re.match(pattern , input_value)
print(match)
The result is None, however I'm expecting to have matched something. I've also tested these values with an online regex tool: https://regex101.com/ using the python flavour, and it works as expected.
What am I doing wrong?
match = re.match(".*, from.*$", input_value)
you should use .* infront else it will try to fin exact match

Splitting a string after a specific character in python [duplicate]

This question already has answers here:
How to get a string after a specific substring?
(9 answers)
How can I split a URL string up into separate parts in Python?
(6 answers)
Closed 2 years ago.
I wanna split everything comes after = and assigning it into a new variable
example:
https://www.exaple.com/index.php?id=24124
I wanna split whatever comes after = which's in this case 24124 and put it into a new variable.
You can of course split this specific string. rsplit() would be a good choice since you are interested in the rightmost value:
s = "https://www.exaple.com/index.php?id=24124"
rest, n = s.rsplit('=', 1)
# n == '24124'
However, if you are dealing with URLs this is fragile. For example, a url to the same page might look like:
s = "https://www.exaple.com/index.php?id=24124#anchor"
and the above split would return '24124#anchor', which is probably not what you want.
Python includes good url parsing, which you should use if you are dealing with URLS. In this case it's just as simple to get what you want and less fragile:
from urllib.parse import (parse_qs, urlparse)
s = "https://www.exaple.com/index.php?id=24124"
qs = urlparse(s)
parse_qs(qs.query)['id'][0]
# '24124'
Simply you use .split() and then take the second part only
url = 'https://www.exaple.com/index.php?id=24124'
print(url.split('=')[1])
For your specific case, you could do...
url = "https://www.exaple.com/index.php?id=24124"
id_number = url.split('=')[1]
If you want to store id_number as an integer, then id_number = int(url.split('=')[1]) instead.

Make part of a regex match in python optional [duplicate]

This question already has answers here:
How to use regex with optional characters in python?
(5 answers)
Closed 5 years ago.
I'm trying to match a URL using re but am having trouble in regards to making part of the match optional.
import re
x = raw_input('Link: ')
reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)/[A-Za-z0-9?&=/?_]+'
if re.match(reg, x):
print 'True'
Currently, the above code would match something like:
https://iskis.com/?loc=shop_view_item&item=220503032
I would like to alter the regular expression to make the following, [A-Za-z0-9?&=/?_]+ an option - As such, anything after the slash isn't required, so the following should match:
https://iskis.com
I'm sure there is a simple solution but I don't know how to go about solving this.
reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'
Should do it. Surround the character class with () so it's a group, put a ? after it to make the text match 0-1 instances of that group, and put a $ at the end so that the regex will match to the end.
EDIT:
Come to think of it, you could use the optional match elsewhere in your regex.
reg = '(https?)://(www\.)?(iskis?)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'

Python: Dynamically add a word to a Regex which finds phone number from text? [duplicate]

This question already has answers here:
How to match exact "multiple" strings in Python
(5 answers)
Closed 5 years ago.
I have to differentiate tel:919443177747, fax=919976384999 from a huge text. The word can be of any pattern as below.
contact_text_pattern =["sale","call","inquiry","inquiries","caller","enquiries","enquiry","tel"]
Kindly help me to change my regex such that it will be able to collect only the numbers that has any of the above keywords along with them.
Below is my regex:
phones = re.findall(r'+?[0-9]{7,11}',text)
I have written a loop to substitute each word. But I'm not able to proceed. Please share your comments.
phone_word = re.findall(r("+'|'.join(self.contact_merged_list)+r"))",text) #find whether any of our keywords are in the text
for word in phone_word:
**Regex would help**
Better than regex would be to use a package that can intelligentally recognize, parse and format international phone numbers. There is a python port of the Google library, called phonenumbers (github)
Here's an example from the documentation
import phonenumbers
text = "Call me at 510-748-8230 if it's before 9:30, or on 703-4800500 after 10am."
for match in phonenumbers.PhoneNumberMatcher(text, "US"):
print match
> PhoneNumberMatch [11,23) 510-748-8230
> PhoneNumberMatch [51,62) 703-4800500
for match in phonenumbers.PhoneNumberMatcher(text, "US"):
print phonenumbers.format_number(match.number, phonenumbers.PhoneNumberFormat.E164)
> +15107488230
> +17034800500

Categories

Resources