python split string number in digits - python

Hi I would like to split the following string "1234" in ['1', '2', '3', '4'] in python.
My current approach is using re module
import re
re.compile('(\d)').split("1234")
['', '1', '', '2', '', '3', '', '4', '']
But i get some extra empty strings. I am not an expert in regular expressions, what could be a proper regular expression in python to accomplish my task?
Please give me some advices.

Simply use list function, like this
>>> list("1234")
['1', '2', '3', '4']
The list function iterates the string, and creates a new list with all the characters in it.

Strings are by default character lists:
>>> nums = "1234"
>>> for i in nums:
... print i
...
1
2
3
4
>>> nums[:-1]
'123'

Related

Regex for split or findall each digit python

What is the best solution to split this str var into a continuous number list
My solution :
>>> str
> '2223334441214844'
>>> filter(None, re.split("(0+)|(1+)|(2+)|(3+)|(4+)|(5+)|(6+)|(7+)|(8+)|(9+)", str))
> ['222', '333', '444', '1', '2', '1', '4', '8', '44']
The more flexible way would be to use itertools.groupby which is made to match consecutive groups in iterables:
>>> s = '2223334441214844'
>>> import itertools
>>> [''.join(group) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
The key would be the single key that is being grouped on (in your case, the digit). And the group is an iterable of all the items in the group. Since the source iterable is a string, each item is a character, so in order to get back the fully combined group, we need to join the characters back together.
You could also repeat the key for the length of the group to get this output:
>>> [key * len(list(group)) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
If you wanted to use regular expressions, you could make use of backreferences to find consecutive characters without having to specify them explicitly:
>>> re.findall('((.)\\2*)', s)
[('222', '2'), ('333', '3'), ('444', '4'), ('1', '1'), ('2', '2'), ('1', '1'), ('4', '4'), ('8', '8'), ('44', '4')]
For finding consecutive characters in a string, this is essentially the same that groupby will do. You can then filter out the combined match to get the desired result:
>>> [x for x, *_ in re.findall('((.)\\2*)', s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
One solution without regex (that is not specific to digits) would be to use itertools.groupby():
>>> from itertools import groupby
>>> s = '2223334441214844'
>>> [''.join(g) for _, g in groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
If you only need to extract consecutive identical digits, you may use a matching approach using r'(\d)\1*' regex:
import re
s='2223334441214844'
print([x.group() for x in re.finditer(r'(\d)\1*', s)])
# => ['222', '333', '444', '1', '2', '1', '4', '8', '44']
See the Python demo
Here,
(\d) - matches and captures into Group 1 any digit
\1* - a backreference to Group 1 matching the same value, 0+ repetitions.
This solution can be customized to match any specific consecutive chars (instead of \d, you may use \S - non-whitespace, \w - word, [a-fA-F] - a specific set, etc.). If you replace \d with . and use re.DOTALL modifier, it will work as the itertools solutions posted above.
Use a capture group and backreference.
str = '2223334441214844'
import re
print([i[0] for i in re.findall(r'((\d)\2*)', str)])
\2 matches whatever the (\d) capture group matched. The list comprehension is needed because when the RE contains capture groups, findall returns a list of the capture groups, not the whole match. So we need an extra group to get the whole match, and then need to extract that group from the result.
What about without importing any external module ?
You can create your own logic in pure python without importing any module Here is recursive approach,
string_1='2223334441214844'
list_2=[i for i in string_1]
def con(list_1):
group = []
if not list_1:
return 0
else:
track=list_1[0]
for j,i in enumerate(list_1):
if i==track[0]:
group.append(i)
else:
print(group)
return con(list_1[j:])
return group
print(con(list_2))
output:
['2', '2', '2']
['3', '3', '3']
['4', '4', '4']
['1']
['2']
['1']
['4']
['8']
['4', '4']

Convert a String in List format into List - python 3 [duplicate]

This question already has answers here:
Convert string to list. Python [string.split() acting weird]
(2 answers)
Closed 5 years ago.
In python3, I'd like to turn a string, like this:
my_str = "['1', '2', '3', '4', '72']"
into a list, like this:
my_list = ['1', '2', '3', '4', '72']
Is there a simple way to do this?
Many thanks, y'all.
Use ast.literal_eval:
>>> import ast
>>> my_str = "['1', '2', '3', '4', '72']"
>>> ast.literal_eval(my_str)
['1', '2', '3', '4', '72']
This is a much more safer option than using eval() because it fails if the data isn't safe.
import re
my_str = "[1', '2', '3', '4', '72']"
re.compile(r'(\d+)').findall(my_str)
['1', '2', '3', '4', '72']
Note: Using re you can get the desired output even if you had not put that ' in the string.

How to use a dictionary to print its partner value

Not sure if the title is specific enough.
words = ['sense', 'The', 'makes', 'sentence', 'perfect', 'sense', 'now']
numbers = ['1', '2', '3', '4', '5', '6']
dictionary = dict(zip(numbers, words))
print(dictionary)
correctorder = ['2', '4', '7', '3', '5', '6']
I'm simply trying to figure out how exactly I can print specific values from the dictionary using the correctorder array so that the sentence makes sense.
You can just iterate over correctorder and get the corresponding dict value, then join the result together.
' '.join(dictionary[ele] for ele in correctorder)
This is assuming that you fix numbers to include '7' at the end.
>>> ' '.join(dictionary[ele] for ele in correctorder)
'The sentence now makes perfect sense'
What you want is this.
for i in correctorder:
print dictionary[i]," ",
Short and simple. As Mitch said, fix the 7 though.
You could use operator.itemgetter to avoid an explicit loop:
>>> from operator import itemgetter
>>> print(itemgetter(*correctorder)(dictionary))
To concatenate this simply use str.join:
>>> ' '.join(itemgetter(*correctorder)(dictionary))

Capturing multiple optional groups in a regex both repeating and non repeating

I have to match an expression similar to these
STAR 13
STAR 13, 23
STAR 1, 2 and 3 and STAR 1
But only capture the digits.
The number of digits is unspecified.
I've tried with STAR(?:\s*(?:,|and)\s*(#\d+))+
But it doesn't seem to capture the terms exactly.
No other dependencies could be added. Just the re module only.
The problem is a much larger one where STAR is another regular expression which has already been solved. Please don't bother about it and just consider it as a letter combination. Just include the letters STAR in regular expressions.
If you don't know the number of the digit r'[0-9]+' to specifie 1 digit or more. And to capture all number, you can use : r'(\d+)'
Do it with one regex:
re.findall("STAR ([0-9]+),? ?([0-9]+)? ?a?n?d? ?([0-9]+)?",a)
[('13', '', '')]
[('13', '23', '')]
[('1', '2', '3'), ('1', '', '')]
May be esaier and cleaner resultut with two step, first you need to have variable in a list like that:
tab = ["STAR 13","STAR 13, 23","STAR 1, 2 and 3 and STAR 1"]
list = filter(lambda x: re.match("^STAR",x),tab)
list_star = filter(lambda x: re.match("^STAR",x),tab)
for i in list_star:
re.findall(r'\d+', i)
['13']
['13', '23']
['1', '2', '3', '1']
You just need to put it in a new list after that my_digit += re.findall(r'\d+', i)
In 1 line:
import functools
tab = ["STAR 13","STAR 13, 23","STAR 1, 2 and 3 and STAR 1"]
digit=functools.reduce(lambda x,y: x+re.findall("\d+",y),filter(lambda x: re.match("^STAR ",x),tab),[])
['13', '13', '23', '1', '2', '3', '1']

How to remove specific strings from a list

From the following list how can I remove elements ending with Text.
My expected result is a=['1,2,3,4']
My List is a=['1,2,3,4,5Text,6Text']
Should i use endswith to go about this problem?
Split on commas, then filter on strings that are only digits:
a = [','.join(v for v in a[0].split(',') if v.isdigit())]
Demo:
>>> a=['1,2,3,4,5Text,6Text']
>>> [','.join(v for v in a[0].split(',') if v.isdigit())]
['1,2,3,4']
It looks as if you really wanted to work with lists of more than one element though, at which point you could just filter:
a = ['1', '2', '3', '4', '5Text', '6Text']
a = filter(str.isdigit, a)
or, using a list comprehension (more suitable for Python 3 too):
a = ['1', '2', '3', '4', '5Text', '6Text']
a = [v for v in a if v.isdigit()]
Use str.endswith to filter out such items:
>>> a = ['1,2,3,4,5Text,6Text']
>>> [','.join(x for x in a[0].split(',') if not x.endswith('Text'))]
['1,2,3,4']
Here str.split splits the string at ',' and returns a list:
>>> a[0].split(',')
['1', '2', '3', '4', '5Text', '6Text']
Now filter out items from this list and then join them back using str.join.
try this. This works with every text you have in the end.
a=['1,2,3,4,5Text,6Text']
a = a[0].split(',')
li = []
for v in a:
try : li.append(int(v))
except : pass
print li

Categories

Resources