Python find position of last digit in string - python

I have a string of characters with no specific pattern. I have to look for some specific words and then extract some information.
Currently I am stuck at finding the position of the last number in a string.
So, for example if:
mystring="The total income from company xy was 12320 for the last year and 11932 in the previous year"
I want to find out the position of the last number in this string.
So the result should be "2" in position "70".

You can do this with a regular expression, here's a quick attempt:
>>>mo = re.match('.+([0-9])[^0-9]*$', mystring)
>>>print mo.group(1), mo.start(1)
2 69
This is a 0-based position, of course.

You can use a generator expression to loop over the enumerate from trailing within a next function:
>>> next(i for i,j in list(enumerate(mystring,1))[::-1] if j.isdigit())
70
Or using regex :
>>> import re
>>>
>>> m=re.search(r'(\d)[^\d]*$',mystring)
>>> m.start()+1
70

Save all the digits from the string in an array and pop the last one out of it.
array = [int(s) for s in mystring.split() if s.isdigit()]
lastdigit = array.pop()
It is faster than a regex approach and looks more readable than it.

def find_last(s):
temp = list(enumerate(s))
temp.reverse()
for pos, chr in temp:
try:
return(pos, int(chr))
except ValueError:
continue

You could reverse the string and get the first match with a simple regex:
s = mystring[::-1]
m = re.search('\d', s)
pos = len(s) - m.start(0)

Related

Finding exact number of characters in word

I'm looking for a way to find words with the exact number of the given character.
For example:
If we have this input: ['teststring1','strringrr','wow','strarirngr'] and we are looking for 4 r characters
It will return only ['strringrr','strarirngr'] because they are the words with 4 letters r in it.
I decided to use regex and read the documentation and I can't find a function that satisfies my needs.
I tried with [r{4}] but it apparently returns any word with letters r in it.
Please help
something like this:
import collections
def map_characters(string):
characters = collections.defaultdict(lambda: 0)
for char in string:
characters[char] += 1
return characters
items = ['teststring1','strringrr','wow','strarirngr']
for item in items:
characters_map = map_characters(item)
# if any of string has 4 identical letters
# we print it
if max(characters_map.values()) >= 4:
print(item)
# in the result it outputs strringrr and strarirngr
# because these words have 4 r letters
You can use str.count() to count the occurrences of a character, combined with list comprehensions to create a new list:
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
filtered = [item for item in myArray if item.count(letter) == amount]
print(filtered) # ['strringrr', 'strarirngr']
If you wanted to make this reusable (to look for different letters or different amounts), you could pack it into a function:
def filterList(stringList, pattern, occurrences):
return [item for item in stringList if item.count(pattern)==occurrences]
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
print(filterList(myArray, letter, amount)) # ['strringrr', 'strarirngr']
The square brackets are for matching any items in the set, e.g. [abc] matches any words with a,b or c. In your case, it evaluates to [rrrr], so any one r is a match. Try it without the brackets: r{4}
Since you asked about using regex, you could use the following:
import re
l = ['teststring1', 'strringrr', 'wow', 'strarirngr']
[ word for word in l if re.match(r'(.*r.*){4}', word) ]
output: ['strringrr', 'strarirngr']

Looping through string to only append integers to list [duplicate]

I am new to Python and I have a String, I want to extract the numbers from the string. For example:
str1 = "3158 reviews"
print (re.findall('\d+', str1 ))
Output is ['4', '3']
I want to get 3158 only, as an Integer preferably, not as List.
You can filter the string by digits using str.isdigit method,
>>> int(filter(str.isdigit, str1))
3158
For Python3:
int(list(filter(str.isdigit, my_str))[0])
This code works fine. There is definitely some other problem:
>>> import re
>>> str1 = "3158 reviews"
>>> print (re.findall('\d+', str1 ))
['3158']
Your regex looks correct. Are you sure you haven't made a mistake with the variable names? In your code above you mixup total_hotel_reviews_string and str.
>>> import re
>>> s = "3158 reviews"
>>>
>>> print(re.findall("\d+", s))
['3158']
IntVar = int("".join(filter(str.isdigit, StringVar)))
You were quite close to the final answer. Your re.finadall expression was only missing the enclosing parenthesis to catch all detected numbers:
re.findall( '(\d+)', str1 )
For a more general string like str1 = "3158 reviews, 432 users", this code would yield:
Output: ['3158', '432']
Now to obtain integers, you can map the int function to convert strings into integers:
A = list(map(int,re.findall('(\d+)',str1)))
Alternatively, you can use this one-liner loop:
A = [ int(x) for x in re.findall('(\d+)',str1) ]
Both methods are equally correct. They yield A = [3158, 432].
Your final result for the original question would be first entry in the array A, so we arrive at any of these expressions:
result = list(map(int,re.findall( '(\d+)' , str1 )))[0]
result = int(re.findall( '(\d+)' , str1 )[0])
Even if there is only one number present in str1, re.findall will still return a list, so you need to retrieve the first element A[0] manually.
To extract a single number from a string you can use re.search(), which returns the first match (or None):
>>> import re
>>> string = '3158 reviews'
>>> int(re.search(r'\d+', string).group(0))
3158
In Python 3.6+ you can also index into a match object instead of using group():
>>> int(re.search(r'\d+', string)[0])
3158
If the format is that simple (a space separates the number from the rest) then
int(str1.split()[0])
would do it
Best for every complex types
str1 = "sg-23.0 300sdf343fc -34rrf-3.4r" #All kinds of occurrence of numbers between strings
num = [float(s) for s in re.findall(r'-?\d+\.?\d*', str1)]
print(num)
Output:
[-23.0, 300.0, 343.0, -34.0, -3.4]
Above solutions seem to assume integers. Here's a minor modification to allow decimals:
num = float("".join(filter(lambda d: str.isdigit(d) or d == '.', inputString)
(Doesn't account for - sign, and assumes any period is properly placed in digit string, not just some english-language period lying around. It's not built to be indestructible, but worked for my data case.)
Python 2.7:
>>> str1 = "3158 reviews"
>>> int(filter(str.isdigit, str1))
3158
Python 3:
>>> str1 = "3158 reviews"
>>> int(''.join(filter(str.isdigit, str1)))
3158
There may be a little problem with code from Vishnu's answer. If there is no digits in the string it will return ValueError. Here is my suggestion avoid this:
>>> digit = lambda x: int(filter(str.isdigit, x) or 0)
>>> digit('3158 reviews')
3158
>>> digit('reviews')
0
For python3
input_str = '21ddd3322'
int(''.join(filter(str.isdigit, input_str)))
> 213322
a = []
line = "abcd 3455 ijkl 56.78 ij"
for word in line.split():
try:
a.append(float(word))
except ValueError:
pass
print(a)
OUTPUT
3455.0 56.78
I am a beginner in coding. This is my attempt to answer the questions. Used Python3.7 version without importing any libraries.
This code extracts and returns a decimal number from a string made of sets of characters separated by blanks (words).
Attention: In case there are more than one number, it returns the last value.
line = input ('Please enter your string ')
for word in line.split():
try:
a=float(word)
print (a)
except ValueError:
pass
My answer does not require any additional libraries, and it's easy to understand. But you have to notice that if there's more than one number inside a string, my code will concatenate them together.
def search_number_string(string):
index_list = []
del index_list[:]
for i, x in enumerate(string):
if x.isdigit() == True:
index_list.append(i)
start = index_list[0]
end = index_list[-1] + 1
number = string[start:end]
return number
#Use this, THIS IS FOR EXTRACTING NUMBER FROM STRING IN GENERAL.
#To get all the numeric occurences.
*split function to convert string to list and then the list comprehension
which can help us iterating through the list
and is digit function helps to get the digit out of a string.
getting number from string
use list comprehension+isdigit()
test_string = "i have four ballons for 2 kids"
print("The original string : "+ test_string)
# list comprehension + isdigit() +split()
res = [int(i) for i in test_string.split() if i.isdigit()]
print("The numbers list is : "+ str(res))
#To extract numeric values from a string in python
*Find list of all integer numbers in string separated by lower case characters using re.findall(expression,string) method.
*Convert each number in form of string into decimal number and then find max of it.
import re
def extractMax(input):
# get a list of all numbers separated by lower case characters
numbers = re.findall('\d+',input)
# \d+ is a regular expression which means one or more digit
number = map(int,numbers)
print max(numbers)
if __name__=="__main__":
input = 'sting'
extractMax(input)
you can use the below method to extract all numbers from a string.
def extract_numbers_from_string(string):
number = ''
for i in string:
try:
number += str(int(i))
except:
pass
return number
(OR) you could use i.isdigit() or i.isnumeric(in Python 3.6.5 or above)
def extract_numbers_from_string(string):
number = ''
for i in string:
if i.isnumeric():
number += str(int(i))
return number
a = '343fdfd3'
print (extract_numbers_from_string(a))
# 3433
Using a list comprehension and Python 3:
>>> int("".join([c for c in str1 if str.isdigit(c)]))
3158

Extract Number from String in Python

I am new to Python and I have a String, I want to extract the numbers from the string. For example:
str1 = "3158 reviews"
print (re.findall('\d+', str1 ))
Output is ['4', '3']
I want to get 3158 only, as an Integer preferably, not as List.
You can filter the string by digits using str.isdigit method,
>>> int(filter(str.isdigit, str1))
3158
For Python3:
int(list(filter(str.isdigit, my_str))[0])
This code works fine. There is definitely some other problem:
>>> import re
>>> str1 = "3158 reviews"
>>> print (re.findall('\d+', str1 ))
['3158']
Your regex looks correct. Are you sure you haven't made a mistake with the variable names? In your code above you mixup total_hotel_reviews_string and str.
>>> import re
>>> s = "3158 reviews"
>>>
>>> print(re.findall("\d+", s))
['3158']
IntVar = int("".join(filter(str.isdigit, StringVar)))
You were quite close to the final answer. Your re.finadall expression was only missing the enclosing parenthesis to catch all detected numbers:
re.findall( '(\d+)', str1 )
For a more general string like str1 = "3158 reviews, 432 users", this code would yield:
Output: ['3158', '432']
Now to obtain integers, you can map the int function to convert strings into integers:
A = list(map(int,re.findall('(\d+)',str1)))
Alternatively, you can use this one-liner loop:
A = [ int(x) for x in re.findall('(\d+)',str1) ]
Both methods are equally correct. They yield A = [3158, 432].
Your final result for the original question would be first entry in the array A, so we arrive at any of these expressions:
result = list(map(int,re.findall( '(\d+)' , str1 )))[0]
result = int(re.findall( '(\d+)' , str1 )[0])
Even if there is only one number present in str1, re.findall will still return a list, so you need to retrieve the first element A[0] manually.
To extract a single number from a string you can use re.search(), which returns the first match (or None):
>>> import re
>>> string = '3158 reviews'
>>> int(re.search(r'\d+', string).group(0))
3158
In Python 3.6+ you can also index into a match object instead of using group():
>>> int(re.search(r'\d+', string)[0])
3158
If the format is that simple (a space separates the number from the rest) then
int(str1.split()[0])
would do it
Best for every complex types
str1 = "sg-23.0 300sdf343fc -34rrf-3.4r" #All kinds of occurrence of numbers between strings
num = [float(s) for s in re.findall(r'-?\d+\.?\d*', str1)]
print(num)
Output:
[-23.0, 300.0, 343.0, -34.0, -3.4]
Above solutions seem to assume integers. Here's a minor modification to allow decimals:
num = float("".join(filter(lambda d: str.isdigit(d) or d == '.', inputString)
(Doesn't account for - sign, and assumes any period is properly placed in digit string, not just some english-language period lying around. It's not built to be indestructible, but worked for my data case.)
Python 2.7:
>>> str1 = "3158 reviews"
>>> int(filter(str.isdigit, str1))
3158
Python 3:
>>> str1 = "3158 reviews"
>>> int(''.join(filter(str.isdigit, str1)))
3158
There may be a little problem with code from Vishnu's answer. If there is no digits in the string it will return ValueError. Here is my suggestion avoid this:
>>> digit = lambda x: int(filter(str.isdigit, x) or 0)
>>> digit('3158 reviews')
3158
>>> digit('reviews')
0
For python3
input_str = '21ddd3322'
int(''.join(filter(str.isdigit, input_str)))
> 213322
a = []
line = "abcd 3455 ijkl 56.78 ij"
for word in line.split():
try:
a.append(float(word))
except ValueError:
pass
print(a)
OUTPUT
3455.0 56.78
I am a beginner in coding. This is my attempt to answer the questions. Used Python3.7 version without importing any libraries.
This code extracts and returns a decimal number from a string made of sets of characters separated by blanks (words).
Attention: In case there are more than one number, it returns the last value.
line = input ('Please enter your string ')
for word in line.split():
try:
a=float(word)
print (a)
except ValueError:
pass
My answer does not require any additional libraries, and it's easy to understand. But you have to notice that if there's more than one number inside a string, my code will concatenate them together.
def search_number_string(string):
index_list = []
del index_list[:]
for i, x in enumerate(string):
if x.isdigit() == True:
index_list.append(i)
start = index_list[0]
end = index_list[-1] + 1
number = string[start:end]
return number
#Use this, THIS IS FOR EXTRACTING NUMBER FROM STRING IN GENERAL.
#To get all the numeric occurences.
*split function to convert string to list and then the list comprehension
which can help us iterating through the list
and is digit function helps to get the digit out of a string.
getting number from string
use list comprehension+isdigit()
test_string = "i have four ballons for 2 kids"
print("The original string : "+ test_string)
# list comprehension + isdigit() +split()
res = [int(i) for i in test_string.split() if i.isdigit()]
print("The numbers list is : "+ str(res))
#To extract numeric values from a string in python
*Find list of all integer numbers in string separated by lower case characters using re.findall(expression,string) method.
*Convert each number in form of string into decimal number and then find max of it.
import re
def extractMax(input):
# get a list of all numbers separated by lower case characters
numbers = re.findall('\d+',input)
# \d+ is a regular expression which means one or more digit
number = map(int,numbers)
print max(numbers)
if __name__=="__main__":
input = 'sting'
extractMax(input)
you can use the below method to extract all numbers from a string.
def extract_numbers_from_string(string):
number = ''
for i in string:
try:
number += str(int(i))
except:
pass
return number
(OR) you could use i.isdigit() or i.isnumeric(in Python 3.6.5 or above)
def extract_numbers_from_string(string):
number = ''
for i in string:
if i.isnumeric():
number += str(int(i))
return number
a = '343fdfd3'
print (extract_numbers_from_string(a))
# 3433
Using a list comprehension and Python 3:
>>> int("".join([c for c in str1 if str.isdigit(c)]))
3158

moving integers from one string to another?

I am looking at adding numbers to a string as python reads through a string.
So if I had a string a = "253+"
I would then have an empty string b.
So, how would I have python read the 2, add it to string b, then read the 5 and add it to string b, and then add the 5 and add it to string b, when it hits something that isnt an integer though, it stops the function.
then string b would be b = "253"
is there a specific call in an iteration that would ask for integers and then add i to another string?
tl;dr
I want to use an iteration to add numbers from one string to another, which stops when it reaches a non-integer.
string b would be an empty string, and string a would be a="253+"
after the call would be done, strng b would equal b="253"
I know this sounds like a homework question, but its not. If you need anything else clarified, I would be happy to.
Here is a simple method for extracting the digits from a string:
In [13]: a="253+"
In [14]: ''.join(c for c in a if c.isdigit())
Out[14]: '253'
The question is a bit unclear, but is this what you're looking for?
a = "123+"
b=""
for c in a:
try:
int(c)
b = b + c
except ValueError:
print 'This is not an int ' + c
break
Running this results in this b being 123 and breaking on the + character. It sounds like the part that's tricky for you at the moment is the try..except ValueError bit. Not that I don't have to break the loop when a ValueError happens, I could just keep looping over the remaining characters in the string and ignore ones that cannot be parsed into an int
In [201]: import itertools as IT
In [202]: a = "253+9"
In [203]: ''.join(IT.takewhile(str.isdigit, a))
Out[203]: '253'
IT.takewhile will stop at the first character in a which is not a digit.
Another way would be to use a regex pattern. You could split the string on non-digits using the pattern r'\D':
In [208]: import re
In [209]: re.split(r'\D', a, maxsplit=1)[0]
Out[209]: '253'
With the use of the for loop, this is relatively easy. If we use our ASCII knowledge, we know that the ASCII values of the digits range from 48 (which represents 0 as a string) to 57 (which represents 9 as a string).
We can find the ASCII value of a character by using the built in method ord(x) where x is the character (i.e. ord('4') is equal to 52, the integer).
Now that we have this knowledge, it will be easy to add this to our for-loop. We simply make a for-loop that goes from 0 to the length of the string minus 1. In the for loop, we are going to use the iteration that we are on as an index, find the character at that index in our string, and finally check to see if its ord value falls in the range that we want.
This will look something like this:
def method(just_a_variable):
b = ''
for i in range(0, len(a)):
if (#something):
if (#something):
b = b+a[i]
return b
Can you fill in the "#somethings"?
Try this:
a = "i889i" #Initial value of A
b = "" #Empty string to store result into
for each in a: #iterate through all characters of a
if each.isdigit(): #check if the current character is a digit
b += each #append to b if the current character is a digit
else: #if current character is NOT a digit
break #break out of for loop
print b #print out result
Hope this helps!
You can write a generator with a regex and generate them one by one:
>>> import re
>>> s='123+456abc789'
>>> nums=(m.group(1) for m in re.finditer(r'(\d+)', s))
>>> next(nums)
'123'
>>> next(nums)
'456'
>>> next(nums)
'789'

I want to implement OR operator in find() in python

While compiling the following code i am not getting an syntax error but not all results. The point of the program is to check a string sequence, find some specific substrings in it and print a resulting string having the substring and 19 characters following it. Print each time those strings occurs and every resulting string.
here is the code..
x=raw_input('GET STRING:: ');
m=len(x);
k=0;
while(k<m):
if('AAT'in x or 'AAC' in x or 'AAG' in x):
start = x.find('AAT') or x.find('AAC') or x.find('AAG')
end=start+19
print x[start:end]
When I'm inputting a string like ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT. It only searches for AAT and prints the resulting substring but not AAG and AAC. Can anyone help me implement the operator???
In your example, it's probably better to use a regular expression.
>>> text = 'ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT'
>>> re.search('(?:AA[TCG])(.{19})', text).group(1)
'CTTGTGATTGCATTGACAC'
You could change to re.findall if multiple matches are desired from the string. (But this won't work too well if you want over lapping matches (ie, your string of 3 appears again in the 19).
search for the first occurrence starting from k
mystring=raw_input('GET STRING:: ')
m=len(mystring)
k=0
while(k<m):
x=mystring[k:]
start=min(x.find('AAT'),x.find('AAC'),x.find('AAG'))
end=min(start+19,m)
print x[start:end]
k+=start+1
You should set start to the minimum non-negative value of the three find statements.
You can handle overlapping matches with regular expressions that use lookahead assertions together with a capturing group:
>>> import re
>>> regex = re.compile("(?=(AA[TCG].{19}))")
>>> regex.findall("ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT")
['AATCTTGTGATTGCATTGACAC', 'AAGAACTCTTAGTGAAATATCA', 'AACTCTTAGTGAAATATCAGTA']
>>>
How about this:
import re
str= "ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT"
alist = ['AAT','AAC','AAG']
newlist= [re.findall(e,str) for e in alist]
Output: [['AAT','AAT'],['AAC'],['AAG']].
Here a bit heavier with indexes:
import re
astr= "ATGGAATCTTGTGATTGCATTGACACGCCATGCCCTGGTGAAGAACTCTTAGTGAAATATCAGTATATCT"
def find_triple_base(astr, nth_sub):
return [(m.end(), m.group(), astr[m.end(0):m.end(0)+nth_sub]) for m in re.finditer(r'AA[TCG]', astr)]
for e in find_triple_base(astr, 19): print(e)
Output:
(7, 'AAT', 'CTTGTGATTGCATTGACAC')
(43, 'AAG', 'AACTCTTAGTGAAATATCA')
(46, 'AAC', 'TCTTAGTGAAATATCAGTA')
(58, 'AAT', 'ATCAGTATATCT')
What it does: findall finds all occurences of your base triples (alist) you'd like to find and generates a new list with 3 lists with base triples eg [['AAT','AAT'],['AAC'],['AAG']]. It's straight forward to print this out.
I hope this helps!
Have a look on this : http://ideone.com/U70n4y
Code:
x=raw_input('GET STRING:: ');
m=len(x);
k=0
if('AAT'in x ):
start = x.find('AAT')
end=start+19
print x[start:end]
elif('AAC' in x ):
start = x.find('AAC')
end=start+19
print x[start:end]
elif('AAG' in x):
start = x.find('AAG')
end=start+19
print x[start:end]
Edit : try this regexp code
import re
y=r"(?:AA[TCG]).{19}"
x=raw_input('GET STRING:: ');
l= re.findall(y,x)
for x in l:
print x
print len(x)
http://ideone.com/U70n4y

Categories

Resources