Python: list strip overkill - python

I just want to remove the '.SI' in the list but it will overkill by remove any that contain S or I in the list.
ab = ['abc.SI','SIV.SI','ggS.SI']
[x.strip('.SI') for x in ab]
>> ['abc','V','gg']
output which I want is
>> ['abc','SIV','ggS']
any elegant way to do it? prefer not to use for loop as my list is long

Why strip ? you can use .replace():
[x.replace('.SI', '') for x in ab]
Output:
['abc', 'SIV', 'ggS']
(this will remove .SI anywhere, have a look at other answers if you want to remove it only at the end)
The reason strip() doesn't work is explained in the docs:
The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped
So it will strip any character in the string that you pass as an argument.

If you want to remove the substring only from the end, the correct way to achieve this will be:
>>> ab = ['abc.SI','SIV.SI','ggS.SI']
>>> sub_string = '.SI'
# checks the presence of substring at the end
# v
>>> [s[:-len(sub_string)] if s.endswith(sub_string) else s for s in ab]
['abc', 'SIV', 'ggS']
Because str.replace() (as mentioned in TrakJohnson's answer) removes the substring even if it is within the middle of string. For example:
>>> 'ab.SIrt'.replace('.SI', '')
'abrt'

use this [x[:-3] for x in ab].

Use split instead of strip and get the first element:
[x.split('.SI')[0] for x in ab]

Related

Python3: Replace splitted string

This is my string:
VISA1240129006|36283354A|081016860665
I need to replace first string.
FIXED_REPLACED_STRING|36283354A|081016860665
I mean, I for example, I need to get next string:
Is there any elegant way to get it using python3?
You can do this way:
>>> l = 'VISA1240129006|36283354A|081016860665'.split('|')
>>> l[0] = 'FIXED_REPLACED_STRING'
>>> l
['FIXED_REPLACED_STRING', '36283354A', '081016860665']
>>> '|'.join(l)
'FIXED_REPLACED_STRING|36283354A|081016860665'
Explanation: first, you split a string into a list. Then, you change what you need in the position(s) you want. Finally, you rebuild the string from such a modified list.
If you need a complete replacement of all the occurrences regardless of their position, check out also the other answers here.
You can use the .replace() method:
l="VISA1240129006|36283354A|081016860665"
l=l.replace("VISA1240129006","FIXED_REPLACED_STRING")
You can use re.sub() from regex library. See similar problem with yours. replace string
My solution using regex is:
import re
l="VISA1240129006|36283354A|081016860665"
new_l = re.sub('^(\w+|)',r'FIXED_REPLACED_STRING',l)
It replaces first string before "|" character

python string, delete character, count from right

I have some strings I created with elements coming from many sources, number of elements will vary each time the program is run; I created a sample string that my program creates now.
I want to count in [:-3] for the following string and delete the last comma:
'{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000,}'
So my string looks like:
'{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000}'
I just cant quite get there, help appreciated.
To remove the third last character from the string you can use:
string[:-3] + string[-2:]
>>> string = "hellothere"
>>> string[:-3] + string[-2:]
'hellothre'
I would use rsplit to split on the right most occurrence of a substring (limiting to two results) and then join them with an empty string
''.join(s.rsplit(',', 2))
a = '{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000,}'
a[:len(a) - 2] + a[len(a) - 1:]
You could obviously use different expressions in the brackets, I just wanted to show that you could use any expressions you wanted.
you can try with rfind to find the last comma
s = '{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000,}'
idx = s.rfind(",")
s[:idx]+s[idx+1:]
you get,
'{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000}'
Using regex:
>>> print re.sub(r ",(?=[^.]*$)", r '', s)
{"SEignjExQfumwZRacPNHvq8UcsBjKWPERB":1.00000000,"SCaWymicaunRLAxNSTTRhVxLMAB9PaKBDK":2.80000000,"SGFHTxuRLttUShUjZyFMzs8NgC1JopSUK6":1.20000000}
This will match a ',' all before a any potential NOT ','. It matches the last ',' right before the end of a string.

Python - Parse strings with variable repeating substring

I am trying to do something which I thought would be simple (and probably is), however I am hitting a wall. I have a string that contains document numbers. In most cases the format is ######-#-### however in some cases, where the single digit should be, there are multiple single digits separated by a comma (i.e. ######-#,#,#-###). The number of single digits separated by a comma is variable. Below is an example:
For the string below:
('030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003')
I need to return:
['030421-1-001', '030421-2-001' '030421-1-002', '030421-1-002', '030421-2-002', '030421-3-002' '030421-1-003']
I have only gotten as far as returning the strings that match the ######-#-### pattern:
import re
p = re.compile('\d{6}-\d{1}-\d{3}')
m = p.findall('030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003')
print m
Thanks in advance for any help!
Matt
Perhaps something like this:
>>> import re
>>> s = '030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003'
>>> it = re.finditer(r'(\b\d{6}-)(\d(?:,\d)*)(-\d{3})\b', s)
>>> for m in it:
a, b, c = m.groups()
for x in b.split(','):
print a + x + c
...
030421-1-001
030421-2-001
030421-1-002
030421-1-002
030421-2-002
030421-3-002
030421-1-003
Or using a list comprehension
>>> [a+x+c for a, b, c in (m.groups() for m in it) for x in b.split(',')]
['030421-1-001', '030421-2-001', '030421-1-002', '030421-1-002', '030421-2-002', '030421-3-002', '030421-1-003']
Use '\d{6}-\d(,\d)*-\d{3}'.
* means "as many as you want (0 included)".
It is applied to the previous element, here '(,\d)'.
I wouldn't use a single regular expression to try and parse this. Since it is essentially a list of strings, you might find it easier to replace the "&" with a comma globally in the string and then use split() to put the elements into a list.
Doing a loop of the list will allow you to write a single function to parse and fix the string and then you can push it onto a new list and the display your string.
replace(string, '&', ',')
initialList = string.split(',')
for item in initialList:
newItem = myfunction(item)
newList.append(newItem)
newstring = newlist(join(','))
(\d{6}-)((?:\d,?)+)(-\d{3})
We take 3 capturing groups. We match the first part and last part the easy way. The center part is optionally repeated and optionally contains a ','. Regex will however only match the last one, so ?: won't store it at all. What where left with is the following result:
>>> p = re.compile('(\d{6}-)((?:\d,?)+)(-\d{3})')
>>> m = p.findall('030421-1,2-001 & 030421-1-002,030421-1,2,3-002, 030421-1-003')
>>> m
[('030421-', '1,2', '-001'), ('030421-', '1', '-002'), ('030421-', '1,2,3', '-002'), ('030421-', '1', '-003')]
You'll have to manually process the 2nd term to split them up and join them, but a list comprehension should be able to do that.

Python - regex, blank element at the end of the list?

I have a code
print(re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!"))
which results
['Holy', 'moly', 'feferoni', '']
How can I get rid of this last blank element, what caused it?
If this is a dirty way to get rid of punctuation and spaces from a string, how else can I write but in regex?
Expanding on what #HamZa said in his comment, you would use re.findall and a negative character set:
>>> from re import findall
>>> findall(r"[^\s?!,;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni']
>>>
You get the empty string as the last element of you list, because the RegEx splits after the last !. It ends up giving you what's before the ! and what's after it, but after it, there's simply nothing, i.e. an empty string! You might have the same problem in the middle of the string if you didn't wisely add the + to your RegEx.
Add a call to list if you can't work with an iterator. If you want to elegantly get rid of the optional empty string, do:
filter(None, re.split(r"[\s?!,;]+", "Holy moly, feferoni!"))
This will result in:
['Holy', 'moly', 'feferoni']
What this does is remove every element that is not a True value. The filter function generally only returns elements that satisfy a requirement given as a function, but if you pass None it will check if the value itself is True. Because an empty string is False and every other string is True it will remove every empty string from the list.
Also note I removed the escaping of special characters in the character class, as it is simply not neccessary and just makes the RegEx harder to read.
the first thing which comes to my mind is something like this:
>>> mystring = re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni', '']
>>> mystring.pop(len(mystring)-1)
>>> print mystring
['Holy', 'moly', 'feferoni']
__import__('re').findall('[^\s?!,;]+', 'Holy moly, feferoni!')

How do I strip a string given a list of unwanted characters? Python

Is there a way to pass in a list instead of a char to str.strip() in python? I have been doing it this way:
unwanted = [c for c in '!##$%^&*(FGHJKmn']
s = 'FFFFoFob*&%ar**^'
for u in unwanted:
s = s.strip(u)
print s
Desired output, this output is correct but there should be some sort of a more elegant way than how i'm coding it above:
oFob*&%ar
Strip and friends take a string representing a set of characters, so you can skip the loop:
>>> s = 'FFFFoFob*&%ar**^'
>>> s.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
(the downside of this is that things like fn.rstrip(".png") seems to work for many filenames, but doesn't really work)
Since, you are looking to not delete elements from the middle, you can just use.
>>> 'FFFFoFob*&%ar**^'.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
Otherwise, Use str.translate().
>>> 'FFFFoFob*&%ar**^'.translate(None, '!##$%^&*(FGHJKmn')
'oobar'

Categories

Resources