hello I have a question:
How can I remove numbers from string, I know that the fasted and best way is to use translate
'hello467'.translate(None, '0123456789')
will return hello, but what if I want to remove only numbers that are not attached to a sting for example: 'the 1rst of every month I spend 10 dollars' should be 'the 1rst of every month I spend dollars' and not 'the rst of every month I spend dollars'
import re
s = "the 1rst of every month I spend 10 dollars"
result = re.sub(r"\b\d+\b", '', s)
result = re.sub(" ", ' ', result)
should give:
the 1rst of every month I spend dollars
Split the string and check if the element is a digit and if not then join it to the result.
str = ("the 1rst of every month I spend 10 dollars").split(' ')
result = ' '.join([i for i in str if not i.isdigit()])
print(result)
Related
How can I achieve this without using if operator and huge multiline constructions?
Example:
around $98 million last week
above $10 million next month
about €5 billion past year
after £1 billion this day
Convert to
around 98 million dollars last week
above 10 million dollars next month
about 5 billion euros past year
after 1 billion pounds this day
You probably have more cases than this so the regular expression may be too specific if you have more cases, but re.sub can be used with a function to process each match and make the correct replacement. Below solves for the cases provided:
import re
text = '''\
around $98 million last week
above $10 million next month
about €5 billion past year
after £1 billion this day
'''
currency_name = {'$':'dollars', '€':'euros', '£':'pounds'}
def replacement(match):
# group 2 is the digits and million/billon,
# tack on the currency type afterward using a lookup dictionary
return f'{match.group(2)} {currency_name[match.group(1)]}'
# capture the symbol followed by digits and million/billion
print(re.sub(r'([$€£])(\d+ [mb]illion)\b', replacement, text))
Output:
around 98 million dollars last week
above 10 million dollars next month
about 5 billion euros past year
after 1 billion pounds this day
You could make a dictionary mapping currency symbol to name, and then generate the regexes. Note that these regexes will only work for something in the form of a number and then a word.
import re
CURRENCIES = {
r"\$": "dollars", # Note the slash; $ is a special regex character
"€": "euros",
"£": "pounds",
}
REGEXES = []
for symbol, name in CURRENCIES.items():
REGEXES.append((re.compile(rf"{symbol}(\d+ [^\W\d_]+)"), rf"\1 {name}"))
text = """around $98 million last week
above $10 million next month
about €5 billion past year
after £1 billion this day"""
for regex, replacement in REGEXES:
text = regex.sub(replacement, text)
It's useful in a case like this to remember that re.sub can accept a lambda rather than just a string.
The following requires Python 3.8+ for the := operator.
s = "about €5 billion past year"
re.sub(r'([€$])(\d+)\s+([mb]illion)',
lambda m: f"{(g := m.groups())[1]} {g[2]} {'euros' if g[0] == '€' else 'dollars'}",
s)
# 'about 5 billion euros past year'
This question already has answers here:
Extract Number from String in Python
(18 answers)
How do I parse a string to a float or int?
(32 answers)
Closed 5 months ago.
I have a list of strings and I would like to verify some conditions on the strings. For example:
String_1: 'The price is 15 euros'.
String_2: 'The price is 14 euros'.
Condition: The price is > 14 --> OK
How can I verify it?
I'm actually doing like this:
if ('price is 13' in string):
print('ok')
and I'm writing all the valid cases.
I would like to have just one condition.
You can list all of the integers in the string and use them in an if statement after.
str = "price is 16 euros"
for number in [int(s) for s in str.split() if s.isdigit()]:
if (number > 14):
print "ok"
If your string contains more than one number, you can select which one you want to use in the list.
Hoep it helps.
You can just compare strings if they differ only by number and numbers have the same digits count. I.e.:
String_1 = 'The price is 15 euros'
String_2 = 'The price is 14 euros'
String_3 = 'The price is 37 EUR'
The will be naturally sorted as String_3 > String_1 > String_2
But will NOT work for:
String_4 = 'The price is 114 euros'
it has 3 digits instead of 2 and it will be String_4 < String_3 thus
So, the better, if you can extract number from the string, like following:
import re
def get_price(s):
m = re.match("The price is ([0-9]*)", s)
if m:
return = int(m.group(1))
return 0
Now you can compare prices as integer:
price = get_price(String_1)
if price > 14:
print ("Okay!")
. . .
if get_price(String_1) > 14:
print ("Okay!")
([0-9]*) - is the capturing group of the regular expression, all defined in the round parenthesis will be returned in group(1) method of the Python match object. You can extend this simple regular expression [0-9]* further for your needs.
If you have list of strings:
string_list = [String_1, String_2, String_3, String_4]
for s in string_list:
if get_price(s) > 14:
print ("'{}' is okay!".format(s))
Is the string format always going to be the exact same? As in, it will always start with "The price is" and then have a positive integer, and then end with "euros'? If so, you can just split the string into words and index the integer, cast it into an int, and check if it's greater than 14.
if int(s.split()[3]) > 14:
print('ok')
If the strings will not be consistent, you may want to consider a regex solution to get the numeral part of the sentence out.
You could use a regular expression to extract the number after "price is", and then convert the number in string format to int. And, finally to compare if it is greater than 14, for example:
import re
p = re.compile('price\sis\s\d\d*')
string1 = 'The price is 15 euros'
string2 = 'The price is 14 euros'
number = re.findall(p, string1)[0].split("price is ")
if int(number[1]) > 14:
print('ok')
Output:
ok
I suppose you have only ono value in your string. So we can do it with regex.
import re
String_1 = 'The price is 15 euros.'
if float(re.findall(r'\d+', String_1)[0]) > 14:
print("OK")
def extractdollarsign(text)
> extractdollarsign('the day is good $GoodDay')
['GoodDay']
> extractdollarsign('the day is good$GoodDay')
[]
> extractdollarsign('the day is good $GoodDay $Day')
['GoodDay', 'Day']
list = []
extractedtxt = text[text.find("$")+1:].split()[0]
list.append(extractedtxt)
return list
this is what i have so far however this code is only returning the text that follows the first occurring dollar sign and not appending the dollar signs to the list, any help would be greatly appreciated
Is this what you want? This method takes a string as input and returns a list of strings. This list can be empty if no match is found.
def extractalphanum(word):
alphanum_word=''
for c in word:
if c.isalnum():
alphanum_word += c
else:
break
return alphanum_word
def extractdollarsign(sentence):
sentence_parts = sentence.split(" $")[1:]
words = [sentence_part.split(" ")[0] for sentence_part in sentence_parts]
alphanum_words = [extractalphanum(word) for word in words]
return alphanum_words
print extractdollarsign('the day is good $GoodDay')
print extractdollarsign('the day is good$GoodDay')
print extractdollarsign('the day is good $GoodDay $Day')
print extractdollarsign('the day is good $Good Day')
print extractdollarsign('the day is $goodday, but $tomorrowday is better')
print extractdollarsign('the day is $good_day')
It returns
['GoodDay']
[]
['GoodDay', 'Day']
['Good']
['goodday', 'tomorrowday']
['good']
You can use the output of the method for further processing :
len(extractdollarsign('the day is good $GoodDay $Day'))+1 #=> 3
I want split number with another character.
Example
Input:
we spend 100year
Output:
we speed 100 year
Input:
today i'm200 pound
Output
today i'm 200 pound
Input:
he maybe have212cm
Output:
he maybe have 212 cm
I tried re.sub(r'(?<=\S)\d', ' \d', string) and re.sub(r'\d(?=\S)', '\d ', string), which doesn't work.
This will do it:
ins='''\
we spend 100year
today i'm200 pound
he maybe have212cm'''
for line in ins.splitlines():
line=re.sub(r'\s*(\d+)\s*',r' \1 ', line)
print line
Prints:
we spend 100 year
today i'm 200 pound
he maybe have 212 cm
Same syntax for multiple matches in the same line of text:
>>> re.sub(r'\s*(\d+)\s*',r' \1 ', "we spend 100year + today i'm200 pound")
"we spend 100 year + today i'm 200 pound"
The capturing groups (generally) are numbered left to right and the \number refers to each numbered group in the match:
>>> re.sub(r'(\d)(\d)(\d)',r'\2\3\1','567')
'675'
If it is easier to read, you can name your capturing groups rather than using the \1 \2 notation:
>>> line="we spend 100year today i'm200 pound"
>>> re.sub(r'\s*(?P<nums>\d+)\s*',r' \g<nums> ',line)
"we spend 100 year today i'm 200 pound"
This takes care of one case:
>>> re.sub(r'([a-zA-Z])(?=\d)',r'\1 ',s)
'he maybe have 212cm'
And this takes care of the other:
>>> re.sub(r'(?<=\d)([a-zA-Z])',r' \1',s)
'he maybe have212 cm'
Hopefully someone with more regex experience than me can figure out how to combine them ...
I need help in regex or Python to extract a substring from a set of string. The string consists of alphanumeric. I just want the substring that starts after the first space and ends before the last space like the example given below.
Example 1:
A:01 What is the date of the election ?
BK:02 How long is the river Nile ?
Results:
What is the date of the election
How long is the river Nile
While I am at it, is there an easy way to extract strings before or after a certain character? For example, I want to extract the date or day like from a string like the ones given in Example 2.
Example 2:
Date:30/4/2013
Day:Tuesday
Results:
30/4/2013
Tuesday
I have actually read about regex but it's very alien to me. Thanks.
I recommend using split
>>> s="A:01 What is the date of the election ?"
>>> " ".join(s.split()[1:-1])
'What is the date of the election'
>>> s="BK:02 How long is the river Nile ?"
>>> " ".join(s.split()[1:-1])
'How long is the river Nile'
>>> s="Date:30/4/2013"
>>> s.split(":")[1:][0]
'30/4/2013'
>>> s="Day:Tuesday"
>>> s.split(":")[1:][0]
'Tuesday'
>>> s="A:01 What is the date of the election ?"
>>> s.split(" ", 1)[1].rsplit(" ", 1)[0]
'What is the date of the election'
>>>
There's no need to dig into regex if this is all you need; you can use str.partition
s = "A:01 What is the date of the election ?"
before,sep,after = s.partition(' ') # could be, eg, a ':' instead
If all you want is the last part, you can use _ as a placeholder for 'don't care':
_,_,theReallyAwesomeDay = s.partition(':')