Python - parsing a string with letters and numbers not blocked together - python

I'm having trouble parsing out a string that contains letters and numbers and getting a list back. For example:
>>> s = '105Bii2016'
>>> foo(s)
['105', 'Bii', '2016']
Right now I can only do it if the numbers are together:
def foo(s):
num, letter = '', ''
for i in s:
if i.isdigit():
num += i
else:
letter += i
return [letter, num]
And when I call this:
>>> s = '1234gdfh1234'
>>> foo(s)
['gdfh', '12341234']

How about itertools.groupby:
>>> s = '1234gdfh1234'
>>> from itertools import groupby
>>> print [''.join(v) for k,v in groupby(s,str.isdigit)]
['1234', 'gdfh', '1234']
Another solution uses regex:
>>> print [x for x in re.split(r'(\d+)',s) if x]
['1234', 'gdfh', '1234']

>>> from re import split
>>> s = '1234gdfh1234'
>>> [i for i in split(r'(\d+)',s) if i]
['1234', 'gdfh', '1234']

Related

Getting the nth char of each string in a list of strings

Let's they I have the list ['abc', 'def', 'gh'] I need to get a string with the contents of the first char of the first string, the first of the second and so on.
So the result would look like this: "adgbehcf" But the problem is that the last string in the array could have two or one char.
I already tried to nested for loop but that didn't work.
Code:
n = 3 # The encryption number
for i in range(n):
x = [s[i] for s in partiallyEncrypted]
fullyEncrypted.append(x)
a version using itertools.zip_longest:
from itertools import zip_longest
lst = ['abc', 'def', 'gh']
strg = ''.join(''.join(item) for item in zip_longest(*lst, fillvalue=''))
print(strg)
to get an idea why this works it may help having a look at
for tpl in zip_longest(*lst, fillvalue=''):
print(tpl)
I guess you can use:
from itertools import izip_longest
l = ['abc', 'def', 'gh']
print "".join(filter(None, [i for sub in izip_longest(*l) for i in sub]))
# adgbehcf
Having:
l = ['abc', 'def', 'gh']
This would work:
s = ''
In [18]: for j in range(0, len(max(l, key=len))):
...: for elem in l:
...: if len(elem) > j:
...: s += elem[j]
In [28]: s
Out[28]: 'adgbehcf'
Please don't use this:
''.join(''.join(y) for y in zip(*x)) +
''.join(y[-1] for y in x if len(y) == max(len(j) for j in x))

Replace in strings of list

I can use re.sub easily in single string like this:
>>> a = "ajhga':&+?%"
>>> a = re.sub('[.!,;+?&:%]', '', a)
>>> a
"ajhga'"
If I use this on list of strings then I am not getting the result. What I am doing is:
>>> a = ["abcd:+;", "(l&'kka)"]
>>> for x in a:
... x = re.sub('[\(\)&\':+]', '', x)
...
>>> a
['abcd:+;', "(l&'kka)"]
How can I strip expressions from strings in list?
>>> a = ["abcd:+;", "(l&'kka)"]
>>> a = [re.sub('[\(\)&\':+]', '', x) for x in a]
>>> a
['abcd;', 'lkka']
>>>
for index,x in enumerate(a):
a[index] = re.sub('[\(\)&\':+]', '', x)
Your changing the value but not updating your list. enumerate is function that return tuple (index,value) for each item of list

Split string into tuple (Upper,lower) 'ABCDefgh' . Python 2.7.6

my_string = 'ABCDefgh'
desired = ('ABCD','efgh')
the only way I can think of doing this is creating a for loop and then scanning through and checking each element in the string individually and adding to string and then creating the tuple . . . is there a more efficient way to do this?
it will always be in the format UPPERlower
print re.split("([A-Z]+)",my_string)[1:]
Simple way (two passes):
>>> import itertools
>>> my_string = 'ABCDefgh'
>>> desired = (''.join(itertools.takewhile(lambda c:c.isupper(), my_string)), ''.join(itertools.dropwhile(lambda c:c.isupper(), my_string)))
>>> desired
('ABCD', 'efgh')
Efficient way (one pass):
>>> my_string = 'ABCDefgh'
>>> uppers = []
>>> done = False
>>> i = 0
>>> while not done:
... c = my_string[i]
... if c.isupper():
... uppers.append(c)
... i += 1
... else:
... done = True
...
>>> lowers = my_string[i:]
>>> desired = (''.join(uppers), lowers)
>>> desired
('ABCD', 'efgh')
Because I throw itertools.groupby at everything:
>>> my_string = 'ABCDefgh'
>>> from itertools import groupby
>>> [''.join(g) for k,g in groupby(my_string, str.isupper)]
['ABCD', 'efgh']
(A little overpowered here, but scales up to more complicated problems nicely.)
my_string='ABCDefg'
import re
desired = (re.search('[A-Z]+',my_string).group(0),re.search('[a-z]+',my_string).group(0))
print desired
A more robust approach without using re
import string
>>> txt = "ABCeUiioualfjNLkdD"
>>> tup = (''.join([char for char in txt if char in string.ascii_uppercase]),
''.join([char for char in txt if char not in string.ascii_uppercase]))
>>> tup
('ABCUNLD', 'eiioualfjkd')
the char not in string.ascii_uppercase instead of char in string.ascii_lowercase means that you'll never lose any data in case your string has non-letters in it, which could be useful if you suddenly start having errors when this input starts being rejected 20 function calls later.

How to remove the punctuation in the middle of a word or numbers?

For example if I have a string of numbers and a list of word:
My_number = ("5,6!7,8")
My_word =["hel?llo","intro"]
Using str.translate:
>>> from string import punctuation
>>> lis = ["hel?llo","intro"]
>>> [ x.translate(None, punctuation) for x in lis]
['helllo', 'intro']
>>> strs = "5,6!7,8"
>>> strs.translate(None, punctuation)
'5678'
Using regex:
>>> import re
>>> [ re.sub(r'[{}]+'.format(punctuation),'',x ) for x in lis]
['helllo', 'intro']
>>> re.sub(r'[{}]+'.format(punctuation),'', strs)
'5678'
Using a list comprehension and str.join:
>>> ["".join([c for c in x if c not in punctuation]) for x in lis]
['helllo', 'intro']
>>> "".join([c for c in strs if c not in punctuation])
'5678'
Function:
>>> from collections import Iterable
def my_strip(args):
if isinstance(args, Iterable) and not isinstance(args, basestring):
return [ x.translate(None, punctuation) for x in args]
else:
return args.translate(None, punctuation)
...
>>> my_strip("5,6!7,8")
'5678'
>>> my_strip(["hel?llo","intro"])
['helllo', 'intro']
Assuming you meant for my_number to be a string,
>>> from string import punctuation
>>> my_number = "5,6!7,8"
>>> my_word = ["hel?llo", "intro"]
>>> remove_punctuation = lambda s: s.translate(None, punctuation)
>>> my_number = remove_punctuation(my_number)
>>> my_word = map(remove_punctuation, my_word)
>>> my_number
'5678'
>>> my_word
['helllo', 'intro']
Here's a Unicode aware solution. Po is the Unicode Category for punctuation.
>>> import unicodedata
>>> mystr = "1?2,3!abc"
>>> mystr = "".join([x for x in mystr if unicodedata.category(x) != "Po"])
>>> mystr
'123abc'
You can do it with regex too, using the re module and re.sub. Sadly the standard library regex module doesn't support Unicode Categories, so you would've to specify all the characters you want to remove manually. There's a separate library called regex which has such a feature, but it is non-standard.
Using filter + str.isalnum:
>>> filter(str.isalnum, '5,6!7,8')
'5678'
>>> filter(str.isalnum, 'hel?llo')
'helllo'
>>> [filter(str.isalnum, word) for word in ["hel?llo","intro"]]
['helllo', 'intro']
This works only in python2. In python3 filter will always return an iterable and you have to do ''.join(filter(str.isalnum, the_text))

How can I remove multiple characters in a list?

Having such list:
x = ['+5556', '-1539', '-99','+1500']
How can I remove + and - in nice way?
This works but I'm looking for more pythonic way.
x = ['+5556', '-1539', '-99', '+1500']
n = 0
for i in x:
x[n] = i.replace('-','')
n += 1
n = 0
for i in x:
x[n] = i.replace('+','')
n += 1
print x
Edit
+ and - are not always in leading position; they can be anywhere.
Use string.translate(), or for Python 3.x str.translate:
Python 2.x:
>>> import string
>>> identity = string.maketrans("", "")
>>> "+5+3-2".translate(identity, "+-")
'532'
>>> x = ['+5556', '-1539', '-99', '+1500']
>>> x = [s.translate(identity, "+-") for s in x]
>>> x
['5556', '1539', '99', '1500']
Python 2.x unicode:
>>> u"+5+3-2".translate({ord(c): None for c in '+-'})
u'532'
Python 3.x version:
>>> no_plus_minus = str.maketrans("", "", "+-")
>>> "+5-3-2".translate(no_plus_minus)
'532'
>>> x = ['+5556', '-1539', '-99', '+1500']
>>> x = [s.translate(no_plus_minus) for s in x]
>>> x
['5556', '1539', '99', '1500']
Use str.strip() or preferably str.lstrip():
In [1]: x = ['+5556', '-1539', '-99','+1500']
using list comprehension:
In [3]: [y.strip('+-') for y in x]
Out[3]: ['5556', '1539', '99', '1500']
using map():
In [2]: map(lambda x:x.strip('+-'),x)
Out[2]: ['5556', '1539', '99', '1500']
Edit:
Use the str.translate() based solution by #Duncan if you've + and - in between the numbers as well.
x = [i.replace('-', "").replace('+', '') for i in x]
string.translate() will only work on byte-string objects not unicode. I would use re.sub:
>>> import re
>>> x = ['+5556', '-1539', '-99','+1500', '45+34-12+']
>>> x = [re.sub('[+-]', '', item) for item in x]
>>> x
['5556', '1539', '99', '1500', '453412']
These functions clean a list of strings of undesired characters.
lst = ['+5556', '-1539', '-99','+1500']
to_be_removed = "+-"
def remove(elem, to_be_removed):
""" Remove characters from string"""
return "".join([char for char in elem if char not in to_be_removed])
def clean_str(lst, to_be_removed):
"""Clean list of strings"""
return [remove(elem, to_be_removed) for elem in lst]
clean_str(lst, to_be_removed)
# ['5556', '1539', '99', '1500']
basestr ="HhEEeLLlOOFROlMTHEOTHERSIDEooEEEEEE"
def replacer (basestr, toBeRemove, newchar) :
for i in toBeRemove :
if i in basestr :
basestr = basestr.replace(i, newchar)
return basestr
newstring = replacer(basestr,['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'], "")
print(basestr)
print(newstring)
Output :
HhEEeLLlOOFROlMTHEOTHERSIDEooEEEEEE
helloo

Categories

Resources