I've got a DB chock full o' phone numbers as strings, they're all formatted like 1112223333, I'd like to display it as 111-222-3333 in my django template
I know I can do
n = contacts.objects.get(name=name)
n.phone = n.phone[:3] + '-' + n.phone[3:6] + '-' + n.phone[6:]
but is there a better / more pythonic way?
It may be overkill for your use case if all your numbers are formatted the same way, but you might consider using the phonenumbers module. It would allow you to add functionality (e.g. international phone numbers, different formatting, etc) very easily.
You can parse your numbers like this:
>>> import phonenumbers
>>> parsed_number = phonenumbers.parse('1112223333', 'US')
>>> parsed_number
PhoneNumber(country_code=1, national_number=1112223333L, extension=None, italian_leading_zero=False, country_code_source=None, preferred_domestic_carrier_code=None)
Then, to format it the way you want, you could do this:
>>> phonenumbers.format_number(parsed_number, phonenumbers.PhoneNumber())
u'111-222-3333'
Note that you could easily use other formats:
>>> phonenumbers.format_number(parsed_number, phonenumbers.PhoneNumberFormat.NATIONAL)
u'(111) 222-3333'
>>> phonenumbers.format_number(parsed_number, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
u'+1 111-222-3333'
>>> phonenumbers.format_number(parsed_number, phonenumbers.PhoneNumberFormat.E164)
u'+11112223333'
Just one other solution:
n.phone = "%c%c%c-%c%c%c-%c%c%c%c" % tuple(map(ord, n.phone))
or
n.phone = "%s%s%s-%s%s%s-%s%s%s%s" % tuple(n.phone)
This is quite a bit belated, but I figured I'd post my solution anyway. It's super simple and takes advantage of creating your own template tags (for use throughout your project). The other part of this is using the parenthesis around the area code.
from django import template
register = template.Library()
def phonenumber(value):
phone = '(%s) %s - %s' %(value[0:3],value[3:6],value[6:10])
return phone
register.filter('phonenumber', phonenumber)
For the rest of your project, all you need to do is {{ var|phonenumber }}
Since we're speaking Pythonic :), it's a good habit to always use join instead of addition (+) to join strings:
phone = n.phone
n.phone = '-'.join((phone[:3],phone[3:6],phone[6:]))
def formatPhone(phone):
formatted = ''
i = 0
# clean phone. skip not digits
phone = ''.join(x for x in phone if x.isdigit())
# set pattern
if len(phone) > 10:
pattern = 'X (XXX) XXX-XX-XX'
else:
pattern = 'XXX-XXX-XX-XX'
# reverse
phone = phone[::-1]
pattern = pattern[::-1]
# scan pattern
for p in pattern:
if i >= len(phone):
break
# skip non X
if p != 'X':
formatted += p
continue
# add phone digit
formatted += phone[i]
i += 1
# reverse again
formatted = formatted[::-1]
return formatted
print formatPhone('+7-111-222-33-44')
7 (111) 222-33-44
print formatPhone('222-33-44')
222-33-44
print formatPhone('23344')
2-33-44
Related
I just started learning dictionaries and regex and I'm having trouble creating a dictionary. In my task, area code is a combination of plus sign and three numbers. The phone number itself is a combination of 7-8 numbers. The phone number might be separated from the area code with a whitespace, but not necessarily.
def find_phone_numbers(text: str) -> dict:
pattern = r'\+\w{3} \w{8}|\+\w{11}|\+\w{3} \w{7}|\+\w{10}|\w{8}|\w{7}'
match = re.findall(pattern, text)
str1 = " "
phone_str = str1.join(match)
phone_dict = {}
phones = phone_str.split(" ")
for phone in phones:
if phone[0] == "+":
phone0 = phone
if phone_str[0:4] not in phone_dict.keys():
phone_dict[phone_str[0:4]] = [phone_str[5:]]
return phone_dict
The result should be:
print(find_phone_numbers("+372 56887364 +37256887364 +33359835647 56887364 +11 1234567 +327 1 11111111")) ->
{'+372': ['56887364', '56887364'], '+333': ['59835647'], '': ['56887364', '1234567', '11111111']}
The main problem is that phone numbers with the same area code can be written together or separately. I had an idea to use a for loop to get rid of the "tail" in the form of a phone number and only the area code will remain, but I don't understand how to get rid of the tail here +33359835647. How can this be done and is there a more efficient way?
Try (the regex pattern explained here - Regex101):
import re
s = "+372 56887364 +37256887364 +33359835647 56887364 +11 1234567 +327 1 11111111"
pat = re.compile(r"(\+\d{3})?\s*(\d{7,8})")
out = {}
for pref, number in pat.findall(s):
out.setdefault(pref, []).append(number)
print(out)
Prints:
{
"+372": ["56887364", "56887364"],
"+333": ["59835647"],
"": ["56887364", "1234567", "11111111"],
}
I have two variables that store two numbers in total.
I want to combine those numbers and separate them with a comma. I read that I can use {variablename:+} to insert a plus or spaces or a zero but the comma doesn't work.
x = 42
y = 73
print(f'the number is {x:}{y:,}')
here is my weird solution, im adding a + and then replacing the + with a comma. Is there a more direct way?
x = 42
y = 73
print(f'the number is {x:}{y:+}'.replace("+", ","))
Lets say I have names and domain names and I want to build a list of email addresses. So I want to fuse the two names with an # symbol in the middel and a .com at the end.
Thats just one example I can think off the top of my head.
x = "John"
y = "gmail"
z = ".com"
print(f'the email is {x}{y:+}{z}'.replace(",", "#"))
results in:
print(f'the email is {x}{y:+}{z}'.replace(",", "#"))
ValueError: Sign not allowed in string format specifier
You are over-complicating things.
Since only what's between { and } is going to be evaluated, you can simply do
print(f'the number is {x},{y}') for the first example, and
print(f'the email is {x}#{y}{z}') for the second.
When you put something into "{}" inside f-formatting, it's actually being evaluated. So, everything which shouldn't put it outside the "{}".
Some Examples:
x = 42
y = 73
print(f'Numbers are: {x}, {y}') # will print: 'Numbers are: 42, 73'
print(f'Sum of numbers: {x+y}') # will print: 'Sum of numbers: 115'
You can even do something like:
def compose_email(user_name, domain):
return f'{user_name}#{domain}'
user_name = 'user'
domain = 'gmail.com'
print(f'email is: {compose_email(user_name, domain)}')
>>email is: user#gmail.com
For more examples see:
Nested f-strings
I want all the tags in a text that look like <Bus:1234|Bob Alice> or <Car:5678|Nelson Mandela> to be replaced with <a my-inner-type="CR:1234">Bob Alice</a> and <a my-inner-type="BS:5678">Nelson Mandela</a> respectively. So basically, depending on the Type whether TypeA or TypeB, I want to replace the text accordingly in a text string using Python3 and regex.
I tried doing the following in python but not sure if that's the right approach to go forward:
import re
def my_replace():
re.sub(r'\<(.*?)\>', replace_function, data)
With the above, I am trying to do a regex of the< > tag and every tag I find, I pass that to a function called replace_function to split the text between the tag and determine if it is a TypeA or a TypeB and compute the stuff and return the replacement tag dynamically. I am not even sure if this is even possible using the re.sub but any leads would help. Thank you.
Examples:
<Car:1234|Bob Alice> becomes <a my-inner-type="CR:1234">Bob Alice</a>
<Bus:5678|Nelson Mandela> becomes <a my-inner-type="BS:5678">Nelson Mandela</a>
This is perfectly possible with re.sub, and you're on the right track with using a replacement function (which is designed to allow dynamic replacements). See below for an example that works with the examples you give - probably have to modify to suit your use case depending on what other data is present in the text (ie. other tags you need to ignore)
import re
def replace_function(m):
# note: to not modify the text (ie if you want to ignore this tag),
# simply do (return the entire original match):
# return m.group(0)
inner = m.group(1)
t, name = inner.split('|')
# process type here - the following will only work if types always follow
# the pattern given in the question
typename = t[4:]
# EDIT: based on your edits, you will probably need more processing here
# eg:
if t.split(':')[0] == 'Car':
typename = 'CR'
# etc
return '<a my-inner-type="{}">{}</a>'.format(typename, name)
def my_replace(data):
return re.sub(r'\<(.*?)\>', replace_function, data)
# let's just test it
data = 'I want all the tags in a text that look like <TypeA:1234|Bob Alice> or <TypeB:5678|Nelson Mandela> to be replaced with'
print(my_replace(data))
Warning: if this text is actually full html, regex matching will not be reliable - use an html processor like beautifulsoup. ;)
Probably an extension to #swalladge's answer but here we use the advantage of a dictionary, if we know a mapping. (Think replace dictionary with a custom mapping function.
import re
d={'TypeA':'A',
'TypeB':'B',
'Car':'CR',
'Bus':'BS'}
def repl(m):
return '<a my-inner-type="'+d[m.group(1)]+m.group(2)+'">'+m.group(3)+'</a>'
s='<TypeA:1234|Bob Alice> or <TypeB:5678|Nelson Mandela>'
print(re.sub('<(.*?)(:\d+)\|(.*?)>',repl,s))
print()
s='<Bus:1234|Bob Alice> or <Car:5678|Nelson Mandela>'
print(re.sub('<(.*?)(:\d+)\|(.*?)>',repl,s))
OUTPUT
<a my-inner-type="A:1234">Bob Alice</a> or <a my-inner-type="B:5678">Nelson Mandela</a>
<a my-inner-type="BS:1234">Bob Alice</a> or <a my-inner-type="CR:5678">Nelson Mandela</a>
Working example here.
regex
We capture what we need in 3 groups and refer to them through match object.Highlighted in bold are the three groups that we captured in the regex.
<(.*?)(:\d+)\|(.*?)>
We use these 3 groups in our repl function to return the right string.
Sorry this isn't a complete answer but I'm falling asleep at the computer, but this is the regex that'll match either of the strings you provided, (<Type)(\w:)(\d+\|)(\w+\s\w+>). Check out https://pythex.org/ for testing your regex stuff.
Try with:
import re
def get_tag(match):
base = '<a my-inner-type="{}">{}</a>'
inner_type = match.group(1).upper()
my_inner_type = '{}{}:{}'.format(inner_type[0], inner_type[-1], match.group(2))
return base.format(my_inner_type, match.group(3))
print(re.sub(r'\<(\w+):(\d+)\W([^\>]+).*', get_tag, '<Bus:1234|Bob Alice>'))
print(re.sub(r'\<(\w+):(\d+)\W([^\>]+).*', get_tag, '<Car:5678|Nelson Mandela>'))
This code will work if you have it in the form <Type:num|name>:
def replaceupdate(tag):
replace = ''
t = ''
i = 1
ident = ''
name = ''
typex = ''
while t != ':':
typex += tag[i]
t = tag[i]
i += 1
t = ''
while t != '|':
if tag[i] == '|':
break
ident += tag[i]
t = tag[i]
i += 1
t = ''
i += 1
while t != '>':
name += tag[i]
t = tag[i]
i += 1
replace = '<a my-inner-type="{}{}">{}</a>'.format(typex, ident, name)
return replace
I know it does not use regex and it has to split the text some other way, but this is the main bulk.
I'm trying to scrape dates from URL's of blogs and the like.
Since there's no universal way to get a date, I am for now, relying
on the date to be in the URL of the resource.
The dates come for the most part, in these formats:
url1 = "foo/bar/baz/2014/01/01/more/text"
url2 = "foo/bar/baz/2014/01/more/text"
url3 = "foo/bar/baz/20140101/more/text"
url4 = "foo/bar/baz/2014-01-01/more/text"
url5 = "foo/bar/baz/2014-01more/text"
url6 = "foo/bar/baz/2014_01_01/more/text"
url7 = "foo/bar/baz/2014_01/more/text"
# forgot one
url8 = "foo/bar/baz20140101more/text"
I've written a brute force code to get what I want.
It's explicit, but not elegant and probably not very robust.
I'd tried to cover the cases where I match "\" or "-" or "_" with no luck.
So I'm curious as to how one does that.
Although my main question is:
What's the best robust way to capture dates in a URL with the intention of converting them to datetime objects.
I don't think it's common for time elements to be in the format.
Cheers !
UPDATE
I believe I have the solution from Casimer. I'd like to add one more
url-date format that I missed before and might add a little trouble:
# this one maynot have a regex solution. Maybe machine learning.
# and it's not that big a deal if I get the wrong day for this application.
# I think it's safe to assume, that a legit date with Y/M/d with have
# /Y/m/d/ trailing "/"
http://www.nakedcapitalism.com/2014/03/17-million-reasons-rent-control-efficient.html
2014/03/17 # group captured
2014-03-17 00:00:00 # date time object
http://www.nakedcapitalism.com/2014/11/200pm-water-cooler-11514.html
2014/11/20
2014-11-20 00:00:00
# i put more restrictions on the number matching, but perhaps there's a better way...?
pat = r'(20[0-1][0-5]([-_/]?)[0-1][0-9]\2[0-3][0-9])'
Existing ugly solution:
NOTE: I've restricted the year info, because I was capturing strings of numbers that do not represent a date. Plus I figured it was more robust that way.
def get_date_from_url(self, url):
#pat = "(20[0-14]{2}\w+[0-9]{2}(?!\w+[0-9]{2}))"
pat = "(20[0-1][0-5]/[0-9]{2}/[0-9]{2})"
ob1 = re.compile(pat)
pat = "(20[0-1][0-5]-[0-9]{2}-[0-9]{2})"
ob2 = re.compile(pat)
pat = "(20[0-1][0-5]_[0-9]{2}_[0-9]{2})"
ob3 = re.compile(pat)
pat = "(20[0-1][0-5]/[0-9]{2})"
ob4 = re.compile(pat)
pat = "(20[0-1][0-5]-[0-9]{2})"
ob5 = re.compile(pat)
pat = "(20[0-1][0-5]_[0-9]{2})"
ob6 = re.compile(pat)
if ob1.search(url):
grp = ob1.search(url).group()
elif ob2.search(url):
grp = ob2.search(url).group()
elif ob3.search(url):
grp = ob3.search(url).group()
elif ob4.search(url):
grp = ob4.search(url).group()
elif ob5.search(url):
grp = ob5.search(url).group()
elif ob6.search(url):
grp = ob6.search(url).group()
else:
return None
print url
print grp
grp = re.sub('_', '/', grp) # fail to match return orig string
date = to_datetime(grp)
if isinstance(date, datetime.datetime):
print date
else:
return None
You can use this:
pat = r'(20[0-1][0-5]([-_/]?)[0-9]{2}(?:\2[0-9]{2})?)'
the delimiter is captured in group 2, so I use a backreference \2 for the second delimiter. The delimiter can be - _ or / but is optional too (with the ? quantifier).
This makes the day optional too by putting it in an optional non-capturing group: (?:\2[0-9]{2})?
Note that you can add the slashes at the begining and at the end to ensure that the date are enclosed between paths.
I am trying to allow the user to do this:
Lets say initially the text says:
"hello world hello earth"
when the user searches for "hello" it should display:
|hello| world |hello| earth
here's what I have:
m = re.compile(pattern)
i =0
match = False
while i < len(self.fcontent):
content = " ".join(self.fcontent[i])
i = i + 1;
for find in m.finditer(content):
print i,"\t"+content[:find.start()]+"|"+content[find.start():find.end()]+"|"+content[find.end():]
match = True
pr = raw_input( "(n)ext, (p)revious, (q)uit or (r)estart? ")
if (pr == 'q'):
break
elif (pr == 'p'):
i = i - 2
elif (pr == 'r'):
i = 0
if match is False:
print "No matches in the file!"
where :
pattern = user specified pattern
fcontent = contents of a file read in and stored as array of words and lines e.g:
[['line','1'],['line','2','here'],['line','3']]
however it prints
|hello| world hello earth
hello world |hello| earth
how can i merge the two lines to be displayed as one?
Thanks
Edit:
This a part of a larger search function where the pattern..in this case the word "hello" is passed from the user, so I have to use regex search/match/finditer to find the pattern. The replace and other methods sadly won't work because the user can choose to search for "[0-9]$" and that would mean to put the ending number between |'s
If you're just doing that, use str.replace.
print self.content.replace(m.find, "|%s|" % m.find)
you can use regexp as follows:
import re
src = "hello world hello earth"
dst = re.sub('hello', '|hello|', src)
print dst
or use string replace:
dst = src.replace('hello', '|hello|')
Ok, going back to original solution since OP confirmed that word would stand on its own (ie not be a substring of another word).
target = 'hello'
line = 'hello world hello earth'
rep_target = '|{}|'.format(target)
line = line.replace(target, rep_target)
yields:
|hello| world |hello| earth
As has been pointed out based on your example, using str.replace is the easiest. If more complex criteria is required, then you can adapt the following...
import re
def highlight(string, words, boundary='|'):
if isinstance(words, basestring):
words = [words]
rs = '({})'.format(boundary.join(sorted(map(re.escape, words), key=len, reverse=True)))
return re.sub(rs, lambda L: '{0}{1}{0}'.format(boundary, L.group(1)), string)