How to separate number and ascii character from string?

How to separate number and ascii character from string? - python

Let me keep it simple,
I have a string that I want it from "10fo22baar" into ["1022","fobaar"] or ["10","fo","22","baar"]
Is there a way to do something like that in Python 3 or 2?

Part 1: You can use filter with str.isdigit() to filter numeric characters as:
>>> my_str = "10fo22baar"
>>> ''.join(filter(str.isdigit, my_str))
'1022'
To get non-numeric, you can use itertools.filterfalse():
>>> from itertools import filterfalse
>>> ''.join(filterfalse(str.isdigit, my_str))
'fobaar'
# OR, for older python versions, use list comprehension:
# ''.join(c for c in my_str if not c.isdigit())
Store above values in list to get list of your desired format.
Alternatively, you can also use regex to filter out digits and alphabets into separate lists as:
import re
my_str = "10fo22baar"
# - To extract digits, use expressions as "\d+"
# - To extract alphabets, use expressions as "[a-zA-Z]+"
digits = ''.join(re.findall('\d+', my_str))
# where `digits` variable will hold string:
# '1022'
alphabets = ''.join(re.findall('[a-zA-Z]+', my_str))
# where `alphabets` variable will hold string:
# 'fobaar'
# Create your desired list from above variables:
# my_list = [digits, alphabets]
You can simplify above logic in one-line as:
my_regex = ['\d+', '[a-zA-Z]+']
my_list = [''.join(re.findall(r, my_str)) for r in my_regex]
# where `my_list` will give you:
# ['1022', 'fobaar']
Part 2: You can use itertools.groupby() to get your second desired format of list with digits and alphabets grouped together maintaining the ordwe in single list as:
from itertools import groupby
my_list = [''.join(x) for _, x in groupby(my_str, str.isdigit)]
# where `my_list` will give you:
# ['10', 'fo', '22', 'baar']

You could try to make a for loop. Like this:
str = "10fo22baar"
nums = []
chars = []
for char in str:
try:
int(char)
nums.append(char)
except ValueError:
chars.append(char)
sep = ["".join(nums), "".join(chars)]
print(sep)
Output would be: ['1022', 'fobaar']

Using string methods:
s = "10fo22baar"
num = ""
string = ""
for char in s:
if char.isnumeric():
num += str(char)
else:
string += str(char)
print(num, string)
Gives ('1022', 'fobaar')

Here's a simple and easy to understand solution.
mystring = '10fo22baar'
nums = []
chars = []
for char in mystring:
if char in ['0','1','2','3','4','5','6','7','8''9']:
nums.append(char)
else:
chars.append(char)
How it works:
We start with mystring set to the string we want to read.
We define two new lists for our numbers and regular chars.
We loop through each char in mystring.
If the current char of the loop iteration is a number, we append it to the number list.
If it's not a number, it must be a normal char. We append it to chars.
That's it

1-First, we would have to do a for to go through the entire string.
2-After that to check if the character is a number or not, we could use two methods:
string.isnumeric()
or
string.isalpha()
3-After checking, we separate the characters into lists and format them to our liking.
Our code looks like this:
myString = '10fo22baar'
charString = []
charNum = []
for char in myString:
if char.isnumeric():
charNum.append(char)
else:
charString.append(char)
myString = [''.join(charNum), ''.join(charString)]
print(myString)

Related

Finding exact number of characters in word

I'm looking for a way to find words with the exact number of the given character.
For example:
If we have this input: ['teststring1','strringrr','wow','strarirngr'] and we are looking for 4 r characters
It will return only ['strringrr','strarirngr'] because they are the words with 4 letters r in it.
I decided to use regex and read the documentation and I can't find a function that satisfies my needs.
I tried with [r{4}] but it apparently returns any word with letters r in it.
Please help

something like this:
import collections
def map_characters(string):
characters = collections.defaultdict(lambda: 0)
for char in string:
characters[char] += 1
return characters
items = ['teststring1','strringrr','wow','strarirngr']
for item in items:
characters_map = map_characters(item)
# if any of string has 4 identical letters
# we print it
if max(characters_map.values()) >= 4:
print(item)
# in the result it outputs strringrr and strarirngr
# because these words have 4 r letters

You can use str.count() to count the occurrences of a character, combined with list comprehensions to create a new list:
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
filtered = [item for item in myArray if item.count(letter) == amount]
print(filtered) # ['strringrr', 'strarirngr']
If you wanted to make this reusable (to look for different letters or different amounts), you could pack it into a function:
def filterList(stringList, pattern, occurrences):
return [item for item in stringList if item.count(pattern)==occurrences]
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
print(filterList(myArray, letter, amount)) # ['strringrr', 'strarirngr']

The square brackets are for matching any items in the set, e.g. [abc] matches any words with a,b or c. In your case, it evaluates to [rrrr], so any one r is a match. Try it without the brackets: r{4}

Since you asked about using regex, you could use the following:
import re
l = ['teststring1', 'strringrr', 'wow', 'strarirngr']
[ word for word in l if re.match(r'(.*r.*){4}', word) ]
output: ['strringrr', 'strarirngr']

How can we remove word with repeated single character?

I am trying to remove word with single repeated characters using regex in python, for example :
good => good
gggggggg => g
What I have tried so far is following
re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
Problem with above solution is that it changes good to god and I just want to remove words with single repeated characters.

A better approach here is to use a set
def modify(s):
#Create a set from the string
c = set(s)
#If you have only one character in the set, convert set to string
if len(c) == 1:
return ''.join(c)
#Else return original string
else:
return s
print(modify('good'))
print(modify('gggggggg'))
If you want to use regex, mark the start and end of the string in our regex by ^ and $ (inspired from #bobblebubble comment)
import re
def modify(s):
#Create the sub string with a regex which only matches if a single character is repeated
#Marking the start and end of string as well
out = re.sub(r'^([a-z])\1+$', r'\1', s)
return out
print(modify('good'))
print(modify('gggggggg'))
The output will be
good
g

If you do not want to use a set in your method, this should do the trick:
def simplify(s):
l = len(s)
if l>1 and s.count(s[0]) == l:
return s[0]
return s
print(simplify('good'))
print(simplify('abba'))
print(simplify('ggggg'))
print(simplify('g'))
print(simplify(''))
output:
good
abba
g
g
Explanations:
You compute the length of the string
you count the number of characters that are equal to the first one and you compare the count with the initial string length
depending on the result you return the first character or the whole string

You can use trim command:
take a look at this examples:
"ggggggg".Trim('g');
Update:
and for characters which are in the middle of the string use this function, thanks to this answer
in java:
public static string RemoveDuplicates(string input)
{
return new string(input.ToCharArray().Distinct().ToArray());
}
in python:
used = set()
unique = [x for x in mylist if x not in used and (used.add(x) or True)]
but I think all of these answers does not match situation like aaaaabbbbbcda, this string has an a at the end of string which does not appear in the result (abcd). for this kind of situation use this functions which I wrote:
In:
def unique(s):
used = set()
ret = list()
s = list(s)
for x in s:
if x not in used:
ret.append(x)
used = set()
used.add(x)
return ret
print(unique('aaaaabbbbbcda'))
out:
['a', 'b', 'c', 'd', 'a']

Print the first, second occurred character in a list

I working on a simple algorithm which prints the first character who occurred twice or more.
for eg:
string ='abcabc'
output = a
string = 'abccba'
output = c
string = 'abba'
output = b
what I have done is:
string = 'abcabc'
s = []
for x in string:
if x in s:
print(x)
break
else:
s.append(x)
output: a
But its time complexity is O(n^2), how can I do this in O(n)?

Change s = [] to s = set() (and obviously the corresponding append to add). in over set is O(1), unlike in over list which is sequential.
Alternately, with regular expressions (O(n^2), but rather fast and easy):
import re
match = re.search(r'(.).*\1', string)
if match:
print(match.group(1))
The regular expression (.).*\1 means "any character which we'll remember for later, any number of intervening characters, then the remembered character again". Since regexp is scanned left-to-right, it will find a in "abba" rather than b, as required.

Use dictionaries
string = 'abcabc'
s = {}
for x in string:
if x in s:
print(x)
break
else:
s[x] = 0
or use sets
string = 'abcabc'
s = set()
for x in string:
if x in s:
print(x)
break
else:
s.add(x)
both dictionaries and sets use indexing and search in O(1)

Looping through string to only append integers to list [duplicate]

I am new to Python and I have a String, I want to extract the numbers from the string. For example:
str1 = "3158 reviews"
print (re.findall('\d+', str1 ))
Output is ['4', '3']
I want to get 3158 only, as an Integer preferably, not as List.

You can filter the string by digits using str.isdigit method,
>>> int(filter(str.isdigit, str1))
3158
For Python3:
int(list(filter(str.isdigit, my_str))[0])

This code works fine. There is definitely some other problem:
>>> import re
>>> str1 = "3158 reviews"
>>> print (re.findall('\d+', str1 ))
['3158']

Your regex looks correct. Are you sure you haven't made a mistake with the variable names? In your code above you mixup total_hotel_reviews_string and str.
>>> import re
>>> s = "3158 reviews"
>>>
>>> print(re.findall("\d+", s))
['3158']

IntVar = int("".join(filter(str.isdigit, StringVar)))

You were quite close to the final answer. Your re.finadall expression was only missing the enclosing parenthesis to catch all detected numbers:
re.findall( '(\d+)', str1 )
For a more general string like str1 = "3158 reviews, 432 users", this code would yield:
Output: ['3158', '432']
Now to obtain integers, you can map the int function to convert strings into integers:
A = list(map(int,re.findall('(\d+)',str1)))
Alternatively, you can use this one-liner loop:
A = [ int(x) for x in re.findall('(\d+)',str1) ]
Both methods are equally correct. They yield A = [3158, 432].
Your final result for the original question would be first entry in the array A, so we arrive at any of these expressions:
result = list(map(int,re.findall( '(\d+)' , str1 )))[0]
result = int(re.findall( '(\d+)' , str1 )[0])
Even if there is only one number present in str1, re.findall will still return a list, so you need to retrieve the first element A[0] manually.

To extract a single number from a string you can use re.search(), which returns the first match (or None):
>>> import re
>>> string = '3158 reviews'
>>> int(re.search(r'\d+', string).group(0))
3158
In Python 3.6+ you can also index into a match object instead of using group():
>>> int(re.search(r'\d+', string)[0])
3158

If the format is that simple (a space separates the number from the rest) then
int(str1.split()[0])
would do it

Best for every complex types
str1 = "sg-23.0 300sdf343fc -34rrf-3.4r" #All kinds of occurrence of numbers between strings
num = [float(s) for s in re.findall(r'-?\d+\.?\d*', str1)]
print(num)
Output:
[-23.0, 300.0, 343.0, -34.0, -3.4]

Above solutions seem to assume integers. Here's a minor modification to allow decimals:
num = float("".join(filter(lambda d: str.isdigit(d) or d == '.', inputString)
(Doesn't account for - sign, and assumes any period is properly placed in digit string, not just some english-language period lying around. It's not built to be indestructible, but worked for my data case.)

Python 2.7:
>>> str1 = "3158 reviews"
>>> int(filter(str.isdigit, str1))
3158
Python 3:
>>> str1 = "3158 reviews"
>>> int(''.join(filter(str.isdigit, str1)))
3158

There may be a little problem with code from Vishnu's answer. If there is no digits in the string it will return ValueError. Here is my suggestion avoid this:
>>> digit = lambda x: int(filter(str.isdigit, x) or 0)
>>> digit('3158 reviews')
3158
>>> digit('reviews')
0

For python3
input_str = '21ddd3322'
int(''.join(filter(str.isdigit, input_str)))
> 213322

a = []
line = "abcd 3455 ijkl 56.78 ij"
for word in line.split():
try:
a.append(float(word))
except ValueError:
pass
print(a)
OUTPUT
3455.0 56.78

I am a beginner in coding. This is my attempt to answer the questions. Used Python3.7 version without importing any libraries.
This code extracts and returns a decimal number from a string made of sets of characters separated by blanks (words).
Attention: In case there are more than one number, it returns the last value.
line = input ('Please enter your string ')
for word in line.split():
try:
a=float(word)
print (a)
except ValueError:
pass

My answer does not require any additional libraries, and it's easy to understand. But you have to notice that if there's more than one number inside a string, my code will concatenate them together.
def search_number_string(string):
index_list = []
del index_list[:]
for i, x in enumerate(string):
if x.isdigit() == True:
index_list.append(i)
start = index_list[0]
end = index_list[-1] + 1
number = string[start:end]
return number

#Use this, THIS IS FOR EXTRACTING NUMBER FROM STRING IN GENERAL.
#To get all the numeric occurences.
*split function to convert string to list and then the list comprehension
which can help us iterating through the list
and is digit function helps to get the digit out of a string.
getting number from string
use list comprehension+isdigit()
test_string = "i have four ballons for 2 kids"
print("The original string : "+ test_string)
# list comprehension + isdigit() +split()
res = [int(i) for i in test_string.split() if i.isdigit()]
print("The numbers list is : "+ str(res))
#To extract numeric values from a string in python
*Find list of all integer numbers in string separated by lower case characters using re.findall(expression,string) method.
*Convert each number in form of string into decimal number and then find max of it.
import re
def extractMax(input):
# get a list of all numbers separated by lower case characters
numbers = re.findall('\d+',input)
# \d+ is a regular expression which means one or more digit
number = map(int,numbers)
print max(numbers)
if __name__=="__main__":
input = 'sting'
extractMax(input)

you can use the below method to extract all numbers from a string.
def extract_numbers_from_string(string):
number = ''
for i in string:
try:
number += str(int(i))
except:
pass
return number
(OR) you could use i.isdigit() or i.isnumeric(in Python 3.6.5 or above)
def extract_numbers_from_string(string):
number = ''
for i in string:
if i.isnumeric():
number += str(int(i))
return number
a = '343fdfd3'
print (extract_numbers_from_string(a))
# 3433

Using a list comprehension and Python 3:
>>> int("".join([c for c in str1 if str.isdigit(c)]))
3158

Extract Number from String in Python

I am new to Python and I have a String, I want to extract the numbers from the string. For example:
str1 = "3158 reviews"
print (re.findall('\d+', str1 ))
Output is ['4', '3']
I want to get 3158 only, as an Integer preferably, not as List.

You can filter the string by digits using str.isdigit method,
>>> int(filter(str.isdigit, str1))
3158
For Python3:
int(list(filter(str.isdigit, my_str))[0])

This code works fine. There is definitely some other problem:
>>> import re
>>> str1 = "3158 reviews"
>>> print (re.findall('\d+', str1 ))
['3158']

Your regex looks correct. Are you sure you haven't made a mistake with the variable names? In your code above you mixup total_hotel_reviews_string and str.
>>> import re
>>> s = "3158 reviews"
>>>
>>> print(re.findall("\d+", s))
['3158']

IntVar = int("".join(filter(str.isdigit, StringVar)))

You were quite close to the final answer. Your re.finadall expression was only missing the enclosing parenthesis to catch all detected numbers:
re.findall( '(\d+)', str1 )
For a more general string like str1 = "3158 reviews, 432 users", this code would yield:
Output: ['3158', '432']
Now to obtain integers, you can map the int function to convert strings into integers:
A = list(map(int,re.findall('(\d+)',str1)))
Alternatively, you can use this one-liner loop:
A = [ int(x) for x in re.findall('(\d+)',str1) ]
Both methods are equally correct. They yield A = [3158, 432].
Your final result for the original question would be first entry in the array A, so we arrive at any of these expressions:
result = list(map(int,re.findall( '(\d+)' , str1 )))[0]
result = int(re.findall( '(\d+)' , str1 )[0])
Even if there is only one number present in str1, re.findall will still return a list, so you need to retrieve the first element A[0] manually.

To extract a single number from a string you can use re.search(), which returns the first match (or None):
>>> import re
>>> string = '3158 reviews'
>>> int(re.search(r'\d+', string).group(0))
3158
In Python 3.6+ you can also index into a match object instead of using group():
>>> int(re.search(r'\d+', string)[0])
3158

If the format is that simple (a space separates the number from the rest) then
int(str1.split()[0])
would do it

Best for every complex types
str1 = "sg-23.0 300sdf343fc -34rrf-3.4r" #All kinds of occurrence of numbers between strings
num = [float(s) for s in re.findall(r'-?\d+\.?\d*', str1)]
print(num)
Output:
[-23.0, 300.0, 343.0, -34.0, -3.4]

Above solutions seem to assume integers. Here's a minor modification to allow decimals:
num = float("".join(filter(lambda d: str.isdigit(d) or d == '.', inputString)
(Doesn't account for - sign, and assumes any period is properly placed in digit string, not just some english-language period lying around. It's not built to be indestructible, but worked for my data case.)

Python 2.7:
>>> str1 = "3158 reviews"
>>> int(filter(str.isdigit, str1))
3158
Python 3:
>>> str1 = "3158 reviews"
>>> int(''.join(filter(str.isdigit, str1)))
3158

There may be a little problem with code from Vishnu's answer. If there is no digits in the string it will return ValueError. Here is my suggestion avoid this:
>>> digit = lambda x: int(filter(str.isdigit, x) or 0)
>>> digit('3158 reviews')
3158
>>> digit('reviews')
0

For python3
input_str = '21ddd3322'
int(''.join(filter(str.isdigit, input_str)))
> 213322

a = []
line = "abcd 3455 ijkl 56.78 ij"
for word in line.split():
try:
a.append(float(word))
except ValueError:
pass
print(a)
OUTPUT
3455.0 56.78

I am a beginner in coding. This is my attempt to answer the questions. Used Python3.7 version without importing any libraries.
This code extracts and returns a decimal number from a string made of sets of characters separated by blanks (words).
Attention: In case there are more than one number, it returns the last value.
line = input ('Please enter your string ')
for word in line.split():
try:
a=float(word)
print (a)
except ValueError:
pass

My answer does not require any additional libraries, and it's easy to understand. But you have to notice that if there's more than one number inside a string, my code will concatenate them together.
def search_number_string(string):
index_list = []
del index_list[:]
for i, x in enumerate(string):
if x.isdigit() == True:
index_list.append(i)
start = index_list[0]
end = index_list[-1] + 1
number = string[start:end]
return number

#Use this, THIS IS FOR EXTRACTING NUMBER FROM STRING IN GENERAL.
#To get all the numeric occurences.
*split function to convert string to list and then the list comprehension
which can help us iterating through the list
and is digit function helps to get the digit out of a string.
getting number from string
use list comprehension+isdigit()
test_string = "i have four ballons for 2 kids"
print("The original string : "+ test_string)
# list comprehension + isdigit() +split()
res = [int(i) for i in test_string.split() if i.isdigit()]
print("The numbers list is : "+ str(res))
#To extract numeric values from a string in python
*Find list of all integer numbers in string separated by lower case characters using re.findall(expression,string) method.
*Convert each number in form of string into decimal number and then find max of it.
import re
def extractMax(input):
# get a list of all numbers separated by lower case characters
numbers = re.findall('\d+',input)
# \d+ is a regular expression which means one or more digit
number = map(int,numbers)
print max(numbers)
if __name__=="__main__":
input = 'sting'
extractMax(input)

you can use the below method to extract all numbers from a string.
def extract_numbers_from_string(string):
number = ''
for i in string:
try:
number += str(int(i))
except:
pass
return number
(OR) you could use i.isdigit() or i.isnumeric(in Python 3.6.5 or above)
def extract_numbers_from_string(string):
number = ''
for i in string:
if i.isnumeric():
number += str(int(i))
return number
a = '343fdfd3'
print (extract_numbers_from_string(a))
# 3433

Using a list comprehension and Python 3:
>>> int("".join([c for c in str1 if str.isdigit(c)]))
3158

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to separate number and ascii character from string? - python

Let me keep it simple, I have a string that I want it from "10fo22baar" into ["1022","fobaar"] or ["10","fo","22","baar"] Is there a way to do something like that in Python 3 or 2?

You could try to make a for loop. Like this: str = "10fo22baar" nums = [] chars = [] for char in str: try: int(char) nums.append(char) except ValueError: chars.append(char) sep = ["".join(nums), "".join(chars)] print(sep) Output would be: ['1022', 'fobaar']

Using string methods: s = "10fo22baar" num = "" string = "" for char in s: if char.isnumeric(): num += str(char) else: string += str(char) print(num, string) Gives ('1022', 'fobaar')

Related

Finding exact number of characters in word

How can we remove word with repeated single character?

Print the first, second occurred character in a list

Looping through string to only append integers to list [duplicate]

Extract Number from String in Python

Categories

Resources