Need to remove quotes inside the array of a string using python - python

input = "'Siva', ['Aswin','latha'], 'Senthil',['Aswin','latha']"
expected output:
"'Siva', [Aswin,latha], 'Senthil',[Aswin,latha]"
I have used positive lookbehind and lookahead but its not working.
pattern
(?<=\[)\'+(?=\])

We can use re.sub here with a callback lambda function:
inp = "'Siva', ['Aswin','latha'], 'Senthil',['Aswin','latha']"
output = re.sub(r'\[.*?\]', lambda x: x.group().replace("'", ""), inp)
print(output)
This prints:
'Siva', [Aswin,latha], 'Senthil',[Aswin,latha]

import re
input = "'Siva', ['Aswin','latha'], 'Senthil',['Aswin','latha']"
print(re.sub(r"(?<=\[).*?(?=\])", lambda val: re.sub(r"'(\w+?)'", r"\1", val.group()), input))
# 'Siva', [Aswin,latha], 'Senthil',[Aswin,latha]

You can try something like this if you don't want to import re:
X = eval("'Siva', ['Aswin','latha'], 'Senthil',['Aswin','latha']")
Y = []
for x in X:
Y.append(f"[{', '.join(x)}]" if isinstance(x, list) else f"'{x}'")
print(", ".join(Y))

You can use re.findall with an alternation pattern to find either fragments between ] and [, or otherwise non-single-quote characters, and then join the fragments together with ''.join:
''.join(re.findall(r"[^\]]*\[|\][^\[]*|[^']+", input))
Demo: https://replit.com/#blhsing/ClassicFrostyBookmark
This is generally more efficient than using re.sub with a callback since there is overhead involved in making a callback for each match.

Related

Python finding multiple strings

I've a list ['test_x', 'text', 'x']. Is there any way to use regex in python to find if the string inside the list contains either '_x' or 'x'?
'x' should be a single string and not be part of a word without _.
The output should result in ['test_x', 'x'].
Thanks.
Using one line comprehension:
l = ['test_x', 'text', 'x']
result = [i for i in l if '_x' in i or 'x' == i]
You can use regexp this way:
import re
print(list(filter(lambda x: re.findall(r'_x|^x$',x),l)))
The regexp searches for exact patterns ('_x' or 'x') within each element of the list. applies the func to each element of the iterable.
You can make your expression more genric this way:
print(list(filter(lambda x: re.findall(r'[^A-Za-z]x|^\W*x\W*$',x),l)))
Here am telling python to search for expressions which DON't start with A to Z or a to z but end in x OR search for expressions that start and end with 0 or more non-word characters but have x in between. You can refer this quick cheatsheet on regular expressions https://www.debuggex.com/cheatsheet/regex/python
[re.findall('x|_x', s) for s in your_list]

Regex: Split characters with "/"

I have these strings, for example:
['2300LO/LCE','2302KO/KCE']
I want to have output like this:
['2300LO','2300LCE','2302KO','2302KCE']
How can I do it with Regex in Python?
Thanks!
You can make a simple generator that yields the pairs for each string. Then you can flatten them into a single list with itertools.chain()
from itertools import product, chain
def getCombos(s):
nums, code = re.match(r'(\d+)(.*)', s).groups()
for pair in product([nums], code.split("/")):
yield ''.join(pair)
a = ['2300LO/LCE','2302KO/KCE']
list(chain.from_iterable(map(getCombos, a)))
# ['2300LO', '2300LCE', '2302KO', '2302KCE']
This has the added side benefit or working with strings like '2300LO/LCE/XX/CC' which will give you ['2300LO', '2300LCE', '2300XX', '2300CC',...]
You can try something like this:
list1 = ['2300LO/LCE','2302KO/KCE']
list2 = []
for x in list1:
a = x.split('/')
tmp = re.findall(r'\d+', a[0]) # extracting digits
list2.append(a[0])
list2.append(tmp[0] + a[1])
print(list2)
This can be implemented with simple string splits.
Since you asked the output with regex, here is your answer.
list1 = ['2300LO/LCE','2302KO/KCE']
import re
r = re.compile("([0-9]{1,4})([a-zA-Z].*)/([a-zA-Z].*)")
out = []
for s in list1:
items = r.findall(s)[0]
out.append(items[0]+items[1])
out.append(items[2])
print(out)
The explanation for the regex - (4 digit number), followed by (any characters), followed by a / and (rest of the characters).
they are grouped with () , so that when you use find all, it becomes individual elements.

python regular expression split function issue

I'm using python2 and I want to get rid of these empty strings in the output of the following python regular expression:
import re
x = "010101000110100001100001"
print re.split("([0-1]{8})", x)
and the output is this :
['', '01010100', '', '01101000', '', '01100001', '']
I just want to get this output:
['01010100', '01101000', '01100001']
Regex probably isn't what you want to use in this case. It seems that you want to just split the string into groups of n (8) characters.
I poached an answer from this question.
def split_every(n, s):
return [ s[i:i+n] for i in xrange(0, len(s), n) ]
split_every(8, "010101000110100001100001")
Out[2]: ['01010100', '01101000', '01100001']
One possible way:
print filter(None, re.split("([0-1]{8})", x))
import re
x = "010101000110100001100001"
l = re.split("([0-1]{8})", x)
l2 = [i for i in l if i]
out:
['01010100', '01101000', '01100001']
This is exactly what is split for. It is split string using regular expression as separator.
If you need to find all matches try use findall instead:
import re
x = "010101000110100001100001"
print(re.findall("([0-1]{8})", x))
print([a for a in re.split("([0-1]{8})", x) if a != ''])
Following your regex approach, you can simply use a filter to get your desired output.
import re
x = "010101000110100001100001"
unfiltered_list = re.split("([0-1]{8})", x)
print filter(None, unfiltered_list)
If you run this, you should get:
['01010100', '01101000', '01100001']

how to change the case of first letter of a string?

s = ['my', 'name']
I want to change the 1st letter of each element in to Upper Case.
s = ['My', 'Name']
Both .capitalize() and .title(), changes the other letters in the string to lower case.
Here is a simple function that only changes the first letter to upper case, and leaves the rest unchanged.
def upcase_first_letter(s):
return s[0].upper() + s[1:]
You can use the capitalize() method:
s = ['my', 'name']
s = [item.capitalize() for item in s]
print s # print(s) in Python 3
This will print:
['My', 'Name']
You can use 'my'.title() which will return 'My'.
To get over the complete list, simply map over it like this:
>>> map(lambda x: x.title(), s)
['My', 'Name']
Actually, .title() makes all words start with uppercase. If you want to strictly limit it the first letter, use capitalize() instead. (This makes a difference for example in 'this word' being changed to either This Word or This word)
It probably doesn't matter, but you might want to use this instead of the capitalize() or title() string methods because, in addition to uppercasing the first letter, they also lowercase the rest of the string (and this doesn't):
s = map(lambda e: e[:1].upper() + e[1:] if e else '', s)
Note: In Python 3, you'd need to use:
s = list(map(lambda e: e[:1].upper() + e[1:] if e else '', s))
because map() returns an iterator that applies function to every item of iterable instead of a list as it did in Python 2 (so you have to turn it into one yourself).
You can use
for i in range(len(s)):
s[i]=s[i].capitalize()
print s

How do I coalesce a sequence of identical characters into just one?

Suppose I have this:
My---sun--is------very-big---.
I want to replace all multiple hyphens with just one hyphen.
import re
astr='My---sun--is------very-big---.'
print(re.sub('-+','-',astr))
# My-sun-is-very-big-.
If you want to replace any run of consecutive characters, you can use
>>> import re
>>> a = "AA---BC++++DDDD-EE$$$$FF"
>>> print(re.sub(r"(.)\1+",r"\1",a))
A-BC+D-E$F
If you only want to coalesce non-word-characters, use
>>> print(re.sub(r"(\W)\1+",r"\1",a))
AA-BC+DDDD-EE$FF
If it's really just hyphens, I recommend unutbu's solution.
If you really only want to coalesce hyphens, use the other suggestions. Otherwise you can write your own function, something like this:
>>> def coalesce(x):
... n = []
... for c in x:
... if not n or c != n[-1]:
... n.append(c)
... return ''.join(n)
...
>>> coalesce('My---sun--is------very-big---.')
'My-sun-is-very-big-.'
>>> coalesce('aaabbbccc')
'abc'
As usual, there's a nice itertools solution, using groupby:
>>> from itertools import groupby
>>> s = 'aaaaa----bbb-----cccc----d-d-d'
>>> ''.join(key for key, group in groupby(s))
'a-b-c-d-d-d'
How about:
>>> import re
>>> re.sub("-+", "-", "My---sun--is------very-big---.")
'My-sun-is-very-big-.'
the regular expression "-+" will look for 1 or more "-".
re.sub('-+', '-', "My---sun--is------very-big---")
How about an alternate without the re module:
'-'.join(filter(lambda w: len(w) > 0, 'My---sun--is------very-big---.'.split("-")))
Or going with Tim and FogleBird's previous suggestion, here's a more general method:
def coalesce_factory(x):
return lambda sent: x.join(filter(lambda w: len(w) > 0, sent.split(x)))
hyphen_coalesce = coalesce_factory("-")
hyphen_coalesce('My---sun--is------very-big---.')
Though personally, I would use the re module first :)
mcpeterson
Another simple solution is the String object's replace function.
while '--' in astr:
astr = astr.replace('--','-')
if you don't want to use regular expressions:
my_string = my_string.split('-')
my_string = filter(None, my_string)
my_string = '-'.join(my_string)
I have
my_str = 'a, b,,,,, c, , , d'
I want
'a,b,c,d'
compress all the blanks (the "replace" bit), then split on the comma, then if not None join with a comma in between:
my_str_2 = ','.join([i for i in my_str.replace(" ", "").split(',') if i])

Categories

Resources