Python print first N integers from a string - python

Is it possible without regex in python to print the first n integers from a string containing both integers and characters?
For instance:
string1 = 'test120202test34234e23424'
string2 = 'ex120202test34234e23424'
foo(string1,6) => 120202
foo(string2,6) => 120202

Anything's possible without a regex. Most things are preferable without a regex.
On easy way is.
>>> str = 'test120202test34234e23424'
>>> str2 = 'ex120202test34234e23424'
>>> ''.join(c for c in str if c.isdigit())[:6]
'120202'
>>> ''.join(c for c in str2 if c.isdigit())[:6]
'120202'
You might want to handle your corner cases some specific way -- it all depends on what you know your code should do.
>>> str3 = "hello 4 world"
>>> ''.join(c for c in str3 if c.isdigit())[:6]
'4'
And don't name your strings str!

You can remove all the alphabets from you string with str.translate and the slice till the number of digits you want, like this
import string
def foo(input_string, num):
return input_string.translate(None, string.letters)[:num]
print foo('test120202test34234e23424', 6) # 120202
print foo('ex120202test34234e23424', 6) # 120202
Note: This simple technique works only in Python 2.x
But the most efficient way is to go with the itertools.islice
from itertools import islice
def foo(input_string, num):
return "".join(islice((char for char in input_string if char.isdigit()),num))
This is is the most efficient way because, it doesn't have to process the entire string before returning the result.

If you didn't want to process the whole string - not a problem with the length of strings you give as an example - you could try:
import itertools
"".join(itertools.islice((c for c in str2 if c.isdigit()),0,5))

Related

Looping through string to only append integers to list [duplicate]

I am new to Python and I have a String, I want to extract the numbers from the string. For example:
str1 = "3158 reviews"
print (re.findall('\d+', str1 ))
Output is ['4', '3']
I want to get 3158 only, as an Integer preferably, not as List.
You can filter the string by digits using str.isdigit method,
>>> int(filter(str.isdigit, str1))
3158
For Python3:
int(list(filter(str.isdigit, my_str))[0])
This code works fine. There is definitely some other problem:
>>> import re
>>> str1 = "3158 reviews"
>>> print (re.findall('\d+', str1 ))
['3158']
Your regex looks correct. Are you sure you haven't made a mistake with the variable names? In your code above you mixup total_hotel_reviews_string and str.
>>> import re
>>> s = "3158 reviews"
>>>
>>> print(re.findall("\d+", s))
['3158']
IntVar = int("".join(filter(str.isdigit, StringVar)))
You were quite close to the final answer. Your re.finadall expression was only missing the enclosing parenthesis to catch all detected numbers:
re.findall( '(\d+)', str1 )
For a more general string like str1 = "3158 reviews, 432 users", this code would yield:
Output: ['3158', '432']
Now to obtain integers, you can map the int function to convert strings into integers:
A = list(map(int,re.findall('(\d+)',str1)))
Alternatively, you can use this one-liner loop:
A = [ int(x) for x in re.findall('(\d+)',str1) ]
Both methods are equally correct. They yield A = [3158, 432].
Your final result for the original question would be first entry in the array A, so we arrive at any of these expressions:
result = list(map(int,re.findall( '(\d+)' , str1 )))[0]
result = int(re.findall( '(\d+)' , str1 )[0])
Even if there is only one number present in str1, re.findall will still return a list, so you need to retrieve the first element A[0] manually.
To extract a single number from a string you can use re.search(), which returns the first match (or None):
>>> import re
>>> string = '3158 reviews'
>>> int(re.search(r'\d+', string).group(0))
3158
In Python 3.6+ you can also index into a match object instead of using group():
>>> int(re.search(r'\d+', string)[0])
3158
If the format is that simple (a space separates the number from the rest) then
int(str1.split()[0])
would do it
Best for every complex types
str1 = "sg-23.0 300sdf343fc -34rrf-3.4r" #All kinds of occurrence of numbers between strings
num = [float(s) for s in re.findall(r'-?\d+\.?\d*', str1)]
print(num)
Output:
[-23.0, 300.0, 343.0, -34.0, -3.4]
Above solutions seem to assume integers. Here's a minor modification to allow decimals:
num = float("".join(filter(lambda d: str.isdigit(d) or d == '.', inputString)
(Doesn't account for - sign, and assumes any period is properly placed in digit string, not just some english-language period lying around. It's not built to be indestructible, but worked for my data case.)
Python 2.7:
>>> str1 = "3158 reviews"
>>> int(filter(str.isdigit, str1))
3158
Python 3:
>>> str1 = "3158 reviews"
>>> int(''.join(filter(str.isdigit, str1)))
3158
There may be a little problem with code from Vishnu's answer. If there is no digits in the string it will return ValueError. Here is my suggestion avoid this:
>>> digit = lambda x: int(filter(str.isdigit, x) or 0)
>>> digit('3158 reviews')
3158
>>> digit('reviews')
0
For python3
input_str = '21ddd3322'
int(''.join(filter(str.isdigit, input_str)))
> 213322
a = []
line = "abcd 3455 ijkl 56.78 ij"
for word in line.split():
try:
a.append(float(word))
except ValueError:
pass
print(a)
OUTPUT
3455.0 56.78
I am a beginner in coding. This is my attempt to answer the questions. Used Python3.7 version without importing any libraries.
This code extracts and returns a decimal number from a string made of sets of characters separated by blanks (words).
Attention: In case there are more than one number, it returns the last value.
line = input ('Please enter your string ')
for word in line.split():
try:
a=float(word)
print (a)
except ValueError:
pass
My answer does not require any additional libraries, and it's easy to understand. But you have to notice that if there's more than one number inside a string, my code will concatenate them together.
def search_number_string(string):
index_list = []
del index_list[:]
for i, x in enumerate(string):
if x.isdigit() == True:
index_list.append(i)
start = index_list[0]
end = index_list[-1] + 1
number = string[start:end]
return number
#Use this, THIS IS FOR EXTRACTING NUMBER FROM STRING IN GENERAL.
#To get all the numeric occurences.
*split function to convert string to list and then the list comprehension
which can help us iterating through the list
and is digit function helps to get the digit out of a string.
getting number from string
use list comprehension+isdigit()
test_string = "i have four ballons for 2 kids"
print("The original string : "+ test_string)
# list comprehension + isdigit() +split()
res = [int(i) for i in test_string.split() if i.isdigit()]
print("The numbers list is : "+ str(res))
#To extract numeric values from a string in python
*Find list of all integer numbers in string separated by lower case characters using re.findall(expression,string) method.
*Convert each number in form of string into decimal number and then find max of it.
import re
def extractMax(input):
# get a list of all numbers separated by lower case characters
numbers = re.findall('\d+',input)
# \d+ is a regular expression which means one or more digit
number = map(int,numbers)
print max(numbers)
if __name__=="__main__":
input = 'sting'
extractMax(input)
you can use the below method to extract all numbers from a string.
def extract_numbers_from_string(string):
number = ''
for i in string:
try:
number += str(int(i))
except:
pass
return number
(OR) you could use i.isdigit() or i.isnumeric(in Python 3.6.5 or above)
def extract_numbers_from_string(string):
number = ''
for i in string:
if i.isnumeric():
number += str(int(i))
return number
a = '343fdfd3'
print (extract_numbers_from_string(a))
# 3433
Using a list comprehension and Python 3:
>>> int("".join([c for c in str1 if str.isdigit(c)]))
3158

How to make this simple string function "pythonic"

Coming from the C/C++ world and being a Python newb, I wrote this simple string function that takes an input string (guaranteed to be ASCII) and returns the last four characters. If there’s less than four characters, I want to fill the leading positions with the letter ‘A'. (this was not an exercise, but a valuable part of another complex function)
There are dozens of methods of doing this, from brute force, to simple, to elegant. My approach below, while functional, didn’t seem "Pythonic".
NOTE: I’m presently using Python 2.6 — and performance is NOT an issue. The input strings are short (2-8 characters), and I call this function only a few thousand times.
def copyFourTrailingChars(src_str):
four_char_array = bytearray("AAAA")
xfrPos = 4
for x in src_str[::-1]:
xfrPos -= 1
four_char_array[xfrPos] = x
if xfrPos == 0:
break
return str(four_char_array)
input_str = "7654321"
print("The output of {0} is {1}".format(input_str, copyFourTrailingChars(input_str)))
input_str = "21"
print("The output of {0} is {1}".format(input_str, copyFourTrailingChars(input_str)))
The output is:
The output of 7654321 is 4321
The output of 21 is AA21
Suggestions from Pythoneers?
I would use simple slicing and then str.rjust() to right justify the result using A as fillchar . Example -
def copy_four(s):
return s[-4:].rjust(4,'A')
Demo -
>>> copy_four('21')
'AA21'
>>> copy_four('1233423')
'3423'
You can simple adding four sentinel 'A' character before the original string, then take the ending four characters:
def copy_four(s):
return ('AAAA'+s)[-4:]
That's simple enough!
How about something with string formatting?
def copy_four(s):
return '{}{}{}{}'.format(*('A'*(4-len(s[-4:])) + s[-4:]))
Result:
>>> copy_four('abcde')
'bcde'
>>> copy_four('abc')
'Aabc'
Here's a nicer, more canonical option:
def copy_four(s):
return '{:A>4}'.format(s[-4:])
Result:
>>> copy_four('abcde')
'bcde'
>>> copy_four('abc')
'Aabc'
You could use slicing to get the last 4 characters, then string repetition (* operator) and concatenation (+ operator) as below:
def trailing_four(s):
s = s[-4:]
s = 'A' * (4 - len(s)) + s
return s
You can try this
def copy_four_trailing_chars(input_string)
list_a = ['A','A','A','A']
str1 = input_string[:-4]
if len(str1) < 4:
str1 = "%s%s" % (''.join(list_a[:4-len(str1)]), str1)
return str1

Extract Number from String in Python

I am new to Python and I have a String, I want to extract the numbers from the string. For example:
str1 = "3158 reviews"
print (re.findall('\d+', str1 ))
Output is ['4', '3']
I want to get 3158 only, as an Integer preferably, not as List.
You can filter the string by digits using str.isdigit method,
>>> int(filter(str.isdigit, str1))
3158
For Python3:
int(list(filter(str.isdigit, my_str))[0])
This code works fine. There is definitely some other problem:
>>> import re
>>> str1 = "3158 reviews"
>>> print (re.findall('\d+', str1 ))
['3158']
Your regex looks correct. Are you sure you haven't made a mistake with the variable names? In your code above you mixup total_hotel_reviews_string and str.
>>> import re
>>> s = "3158 reviews"
>>>
>>> print(re.findall("\d+", s))
['3158']
IntVar = int("".join(filter(str.isdigit, StringVar)))
You were quite close to the final answer. Your re.finadall expression was only missing the enclosing parenthesis to catch all detected numbers:
re.findall( '(\d+)', str1 )
For a more general string like str1 = "3158 reviews, 432 users", this code would yield:
Output: ['3158', '432']
Now to obtain integers, you can map the int function to convert strings into integers:
A = list(map(int,re.findall('(\d+)',str1)))
Alternatively, you can use this one-liner loop:
A = [ int(x) for x in re.findall('(\d+)',str1) ]
Both methods are equally correct. They yield A = [3158, 432].
Your final result for the original question would be first entry in the array A, so we arrive at any of these expressions:
result = list(map(int,re.findall( '(\d+)' , str1 )))[0]
result = int(re.findall( '(\d+)' , str1 )[0])
Even if there is only one number present in str1, re.findall will still return a list, so you need to retrieve the first element A[0] manually.
To extract a single number from a string you can use re.search(), which returns the first match (or None):
>>> import re
>>> string = '3158 reviews'
>>> int(re.search(r'\d+', string).group(0))
3158
In Python 3.6+ you can also index into a match object instead of using group():
>>> int(re.search(r'\d+', string)[0])
3158
If the format is that simple (a space separates the number from the rest) then
int(str1.split()[0])
would do it
Best for every complex types
str1 = "sg-23.0 300sdf343fc -34rrf-3.4r" #All kinds of occurrence of numbers between strings
num = [float(s) for s in re.findall(r'-?\d+\.?\d*', str1)]
print(num)
Output:
[-23.0, 300.0, 343.0, -34.0, -3.4]
Above solutions seem to assume integers. Here's a minor modification to allow decimals:
num = float("".join(filter(lambda d: str.isdigit(d) or d == '.', inputString)
(Doesn't account for - sign, and assumes any period is properly placed in digit string, not just some english-language period lying around. It's not built to be indestructible, but worked for my data case.)
Python 2.7:
>>> str1 = "3158 reviews"
>>> int(filter(str.isdigit, str1))
3158
Python 3:
>>> str1 = "3158 reviews"
>>> int(''.join(filter(str.isdigit, str1)))
3158
There may be a little problem with code from Vishnu's answer. If there is no digits in the string it will return ValueError. Here is my suggestion avoid this:
>>> digit = lambda x: int(filter(str.isdigit, x) or 0)
>>> digit('3158 reviews')
3158
>>> digit('reviews')
0
For python3
input_str = '21ddd3322'
int(''.join(filter(str.isdigit, input_str)))
> 213322
a = []
line = "abcd 3455 ijkl 56.78 ij"
for word in line.split():
try:
a.append(float(word))
except ValueError:
pass
print(a)
OUTPUT
3455.0 56.78
I am a beginner in coding. This is my attempt to answer the questions. Used Python3.7 version without importing any libraries.
This code extracts and returns a decimal number from a string made of sets of characters separated by blanks (words).
Attention: In case there are more than one number, it returns the last value.
line = input ('Please enter your string ')
for word in line.split():
try:
a=float(word)
print (a)
except ValueError:
pass
My answer does not require any additional libraries, and it's easy to understand. But you have to notice that if there's more than one number inside a string, my code will concatenate them together.
def search_number_string(string):
index_list = []
del index_list[:]
for i, x in enumerate(string):
if x.isdigit() == True:
index_list.append(i)
start = index_list[0]
end = index_list[-1] + 1
number = string[start:end]
return number
#Use this, THIS IS FOR EXTRACTING NUMBER FROM STRING IN GENERAL.
#To get all the numeric occurences.
*split function to convert string to list and then the list comprehension
which can help us iterating through the list
and is digit function helps to get the digit out of a string.
getting number from string
use list comprehension+isdigit()
test_string = "i have four ballons for 2 kids"
print("The original string : "+ test_string)
# list comprehension + isdigit() +split()
res = [int(i) for i in test_string.split() if i.isdigit()]
print("The numbers list is : "+ str(res))
#To extract numeric values from a string in python
*Find list of all integer numbers in string separated by lower case characters using re.findall(expression,string) method.
*Convert each number in form of string into decimal number and then find max of it.
import re
def extractMax(input):
# get a list of all numbers separated by lower case characters
numbers = re.findall('\d+',input)
# \d+ is a regular expression which means one or more digit
number = map(int,numbers)
print max(numbers)
if __name__=="__main__":
input = 'sting'
extractMax(input)
you can use the below method to extract all numbers from a string.
def extract_numbers_from_string(string):
number = ''
for i in string:
try:
number += str(int(i))
except:
pass
return number
(OR) you could use i.isdigit() or i.isnumeric(in Python 3.6.5 or above)
def extract_numbers_from_string(string):
number = ''
for i in string:
if i.isnumeric():
number += str(int(i))
return number
a = '343fdfd3'
print (extract_numbers_from_string(a))
# 3433
Using a list comprehension and Python 3:
>>> int("".join([c for c in str1 if str.isdigit(c)]))
3158

Retrieving a full number

Assume I have a string as follows: expression = '123 + 321'.
I am walking over the string character-by-character as follows: for p in expression. I am I am checking if p is a digit using p.isdigit(). If p is a digit, I'd like to grab the whole number (so grab 123 and 321, not just p which initially would be 1).
How can I do that in Python?
In C (coming from a C background), the equivalent would be:
int x = 0;
sscanf(p, "%d", &x);
// the full number is now in x
EDIT:
Basically, I am accepting a mathematical expression from a user that accepts positive integers, +,-,*,/ as well as brackets: '(' and ')'. I am walking the string character by character and I need to be able to determine whether the character is a digit or not. Using isdigit(), I can that. If it is a digit however, I need to grab the whole number. How can that be done?
>>> from itertools import groupby
>>> expression = '123 + 321'
>>> expression = ''.join(expression.split()) # strip whitespace
>>> for k, g in groupby(expression, str.isdigit):
if k: # it's a digit
print 'digit'
print list(g)
else:
print 'non-digit'
print list(g)
digit
['1', '2', '3']
non-digit
['+']
digit
['3', '2', '1']
This is one of those problems that can be approached from many different directions. Here's what I think is an elegant solution based on itertools.takewhile:
>>> from itertools import chain, takewhile
>>> def get_numbers(s):
... s = iter(s)
... for c in s:
... if c.isdigit():
... yield ''.join(chain(c, takewhile(str.isdigit, s)))
...
>>> list(get_numbers('123 + 456'))
['123', '456']
This even works inside a list comprehension:
>>> def get_numbers(s):
... s = iter(s)
... return [''.join(chain(c, takewhile(str.isdigit, s)))
... for c in s if c.isdigit()]
...
>>> get_numbers('123 + 456')
['123', '456']
Looking over other answers, I see that this is not dissimilar to jamylak's groupby solution. I would recommend that if you don't want to discard the extra symbols. But if you do want to discard them, I think this is a bit simpler.
The Python documentation includes a section on simulating scanf, which gives you some idea of how you can use regular expressions to simulate the behavior of scanf (or sscanf, it's all the same in Python). In particular, r'\-?\d+' is the Python string that corresponds to the regular expression for an integer. (r'\d+' for a nonnegative integer.) So you could embed this in your loop as
integer = re.compile(r'\-?\d+')
for p in expression:
if p.isdigit():
# somehow find the current position in the string
integer.match(expression, curpos)
But that still reflects a very C-like way of thinking. In Python, your iterator variable p is really just an individual character that has actually been pulled out of the original string and is standing on its own. So in the loop, you don't naturally have access to the current position within the string, and trying to calculate it is going to be less than optimal.
What I'd suggest instead is using Python's built in regexp matching iteration method:
integer = re.compile(r'\-?\d+') # only do this once in your program
all_the_numbers = integer.findall(expression)
and now all_the_numbers is a list of string representations of all the integers in the expression. If you wanted to actually convert them to integers, then you could do this instead of the last line:
all_the_numbers = [int(s) for s in integer.finditer(expression)]
Here I've used finditer instead of findall because you don't have to make a list of all the strings before iterating over them again to convert them to integers.
Though I'm not familiar with sscanf, I'm no C developer, it looks like it's using format strings in a way not dissimilar to what I'd use python's re module for. Something like this:
import re
nums = re.compile('\d+')
found = nums.findall('123 + 321')
# if you know you're only looking for two values.
left, right = found
You can use shlex http://docs.python.org/library/shlex.html
>>> from shlex import shlex
>>> expression = '123 + 321'
>>> for e in shlex(expression):
... print e
...
123
+
321
>>> expression = '(92831 * 948) / 32'
>>> for e in shlex(expression):
... print e
...
(
92831
*
948
)
/
32
I'd split the string up on the ' + ' string, giving you what's outside of them:
>>> expression = '123 + 321'
>>> ex = expression.split(' + ')
>>> ex
['123', '321']
>>> int_ex = map(int, ex)
>>> int_ex
[123, 321]
>>> sum(int_ex)
444
It's dangerous, but you could use eval:
>>> eval('123 + 321')
444
I'm just taking a stab at you parsing the string, and doing raw calculations on it.
e_array = expression.split('+')
i_array = map(int, e_array)
And i_array holds all integers in the expression.
UPDATE
If you already know all the special characters in your expression and you want to eliminate them all
import re
e_array = re.split('[*/+\-() ]', expression) # all characters here is mult, div, plus, minus, left- right- parathesis and space
i_array = map(int, filter(lambda x: len(x), e_array))

Count lower case characters in a string

What is the most pythonic and/or efficient way to count the number of characters in a string that are lowercase?
Here's the first thing that came to mind:
def n_lower_chars(string):
return sum([int(c.islower()) for c in string])
Clever trick of yours! However, I find it more readable to filter the lower chars, adding 1 for each one.
def n_lower_chars(string):
return sum(1 for c in string if c.islower())
Also, we do not need to create a new list for that, so removing the [] will make sum() work over an iterator, which consumes less memory.
def n_lower_chars(string):
return len(filter(str.islower, string))
def n_lower_chars(string):
return sum(map(str.islower, string))
If you want to divide things a little more finely:
from collections import Counter
text = "ABC abc 123"
print Counter("lower" if c.islower() else
"upper" if c.isupper() else
"neither" for c in text)

Categories

Resources