Python - removing characters from a list - python

I have list of elements like this:
['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
My question is:
How can I remove specific characters from the elements of this list ?
As a result I want to have :
['test', 'test', '1989', 'test', '']
Any suggestions, solutions ?
Thanks in advance.

>>> re.findall(r'\{(.*)\}', '1:{test}')
['test']
Just make a loop with it:
[(re.findall(r'\{(.*)\}', i) or [''])[0] for i in your_list]
or maybe:
[''.join(re.findall(r'\{(.*)\}', i)) for i in your_list]

You could use a regular expression, like so:
import re
s = re.compile("\d+:{(.*)}")
data = ['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
result = [s.match(d).group(1) if s.match(d) else d for d in data]
results in
['test', 'test', '1989', 'test', '']

Python's strip() function does exactly that -- remove specific characters from the ends of a string -- but there are probably better ways to do what you want.

You haven't said exactly what the pattern is, or what you want if there are no braces, but this will work on your example:
stripped = []
for x in my_data:
m = re.search("{.*}", x)
stripped.append(m.group if m else x)

t = ['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
map(lambda string: re.search(r'(?<=\{).+(?=\})', string).group(0), t)
Granted, this is not the most well-formatted or easiest to read of answers. This maps an anonymous function that finds and returns what is inside the brackets to each element of the list, returning the whole list.
(?<=...) means "match only that has this at the beginning, but don't include it in the result
(?=...) means "match only that has this at the end, but don't include it in the result
.+ means "at least one character of any kind"

Related

Python: list strip overkill

I just want to remove the '.SI' in the list but it will overkill by remove any that contain S or I in the list.
ab = ['abc.SI','SIV.SI','ggS.SI']
[x.strip('.SI') for x in ab]
>> ['abc','V','gg']
output which I want is
>> ['abc','SIV','ggS']
any elegant way to do it? prefer not to use for loop as my list is long
Why strip ? you can use .replace():
[x.replace('.SI', '') for x in ab]
Output:
['abc', 'SIV', 'ggS']
(this will remove .SI anywhere, have a look at other answers if you want to remove it only at the end)
The reason strip() doesn't work is explained in the docs:
The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped
So it will strip any character in the string that you pass as an argument.
If you want to remove the substring only from the end, the correct way to achieve this will be:
>>> ab = ['abc.SI','SIV.SI','ggS.SI']
>>> sub_string = '.SI'
# checks the presence of substring at the end
# v
>>> [s[:-len(sub_string)] if s.endswith(sub_string) else s for s in ab]
['abc', 'SIV', 'ggS']
Because str.replace() (as mentioned in TrakJohnson's answer) removes the substring even if it is within the middle of string. For example:
>>> 'ab.SIrt'.replace('.SI', '')
'abrt'
use this [x[:-3] for x in ab].
Use split instead of strip and get the first element:
[x.split('.SI')[0] for x in ab]

How can split string in python and get result with delimiter?

I have code like
a = "*abc*bbc"
a.split("*")#['','abc','bbc']
#i need ["*","abc","*","bbc"]
a = "abc*bbc"
a.split("*")#['abc','bbc']
#i need ["abc","*","bbc"]
How can i get list with delimiter in python split function or regex or partition ?
I am using python 2.7 , windows
You need to use RegEx with the delimiter as a group and ignore the empty string, like this
>>> [item for item in re.split(r"(\*)", "abc*bbc") if item]
['abc', '*', 'bbc']
>>> [item for item in re.split(r"(\*)", "*abc*bbc") if item]
['*', 'abc', '*', 'bbc']
Note 1: You need to escape * with \, because RegEx has special meaning for *. So, you need to tell RegEx engine that * should be treated as the normal character.
Note 2: You ll be getting an empty string, when you are splitting the string where the delimiter is at the beginning or at the end. Check this question to understand the reason behind it.
import re
x="*abc*bbc"
print [x for x in re.split(r"(\*)",x) if x]
You have to use re.split and group the delimiter.
or
x="*abc*bbc"
print re.findall(r"[^*]+|\*",x)
Or thru re.findall
Use partition();
a = "abc*bbc"
print (a.partition("*"))
>>>
('abc', '*', 'bbc')
>>>

Regex findall on a function definition. Want to match args but not function

I have a list of strings that look like
"funcname(arg, another_arg)*20 + second_func(arg1, arg2)"
and I want to pull out just the args. I've tried the following:
re.findall(r'\w[\w\d_]+(?!\()', string)
however this returns
['funcnam', 'arg', 'another_arg', '20', 'second_fun', 'arg1', 'arg2']
Firstly, I'm a bit confused as to why I am seeing the '20', since I specified the string should start with a word character. Secondly, I'm wondering how I can improve my look-ahead to match what I'm looking for.
I should note that some of the strings don't have functions and look like
"value1 + value_two"
so I can't simply search inside the parentheses.
>>> pattern = '[a-zA-Z_]\w*(?![\(\w])'
>>> re.findall(pattern, "funcname(arg, another_arg)*20 + second_func(arg1, arg2)")
['arg', 'another_arg', 'arg1', 'arg2']
>>> re.findall(pattern, "value1 + value_two")
['value1', 'value_two']
Here is a regex that should work better:
(?!\w+\()[^\W\d]\w+
For example:
>>> s = "funcname(arg, another_arg)*20 + second_func(arg1, arg2)"
>>> re.findall(r'(?!\w+\()[^\W\d]\w+', s)
['arg', 'another_arg', 'arg1', 'arg2']
[^\W\d] is equivalent to [a-zA-Z_].
This uses the same logic as your regex, but by moving the lookahead to the beginning of the string you prevent a match like funcnam from funcname(...). Here is a similar alternative:
[^\W\d]\w+(?![\w(])
This is probably a bad solution, but it works for me...:
R=r"[a-zA-Z_]\w*(?:s*\()?" #This captures everything, leaving the left parenthesis on functions
values=filter(lambda x: '(' != x[-1], re.findall(R,s)) #now filter off everything containing a left parenthesis
#Or if you prefer list comprehensions...
values=[ x for x in re.findall(R,s) if x[-1]!='(' ]
The other answers will probably be better than this though...The one benefit of this is that it allows you to easily pick out functions after the fact -- they end with '('

"/1/2/3/".split("/")

It's too hot & I'm probably being retarded.
>>> "/1/2/3/".split("/")
['', '1', '2', '3','']
Whats with the empty elements at the start and end?
Edit: Thanks all, im putting this down to heat induced brain failure. The docs aren't quite the clearest though, from http://docs.python.org/library/stdtypes.html
"Return a list of the words in the string, using sep as the delimiter string"
Is there a word before the first, or after the last "/"?
Compare with:
"1/2/3".split("/")
Empty elements are still elements.
You could use strip('/') to trim the delimiter from the beginning/end of your string.
As JLWarlow says, you have an extra '/' in the string. Here's another example:
>>> "//2//3".split('/')
['', '', '2', '', '3']
Slashes are separators, so there are empty elements before the first and after the last.
you're splitting on /. You have 4 /, so, the list returned will have 5 elements.
That is exactly what I would expect, but we are all different :)
What would you expect from: : "1,,2,3".split(",") ?
You can use strip() to get rid of the leading and trailing fields... Then call split() as before.
[x for x in "//1///2/3///".split("/") if x != ""]

How to split a string by using [] in Python

So from this string:
"name[id]"
I need this:
"id"
I used str.split ('[]'), but it didn't work. Does it only take a single delimiter?
Use a regular expression:
import re
s = "name[id]"
re.find(r"\[(.*?)\]", s).group(1) # = 'id'
str.split() takes a string on which to split input. For instance:
"i,split,on commas".split(',') # = ['i', 'split', 'on commas']
The re module also allows you to split by regular expression, which can be very useful, and I think is what you meant to do.
import re
s = "name[id]"
# split by either a '[' or a ']'
re.split('\[|\]', s) # = ['name', 'id', '']
Either
"name[id]".split('[')[1][:-1] == "id"
or
"name[id]".split('[')[1].split(']')[0] == "id"
or
re.search(r'\[(.*?)\]',"name[id]").group(1) == "id"
or
re.split(r'[\[\]]',"name[id]")[1] == "id"
Yes, the delimiter is the whole string argument passed to split. So your example would only split a string like 'name[]id[]'.
Try eg. something like:
'name[id]'.split('[', 1)[-1].split(']', 1)[0]
'name[id]'.split('[', 1)[-1].rstrip(']')
I'm not a fan of regex, but in cases like it often provides the best solution.
Triptych already recommended this, but I'd like to point out that the ?P<> group assignment can be used to assign a match to a dictionary key:
>>> m = re.match(r'.*\[(?P<id>\w+)\]', 'name[id]')
>>> result_dict = m.groupdict()
>>> result_dict
{'id': 'id'}
>>>
You don't actually need regular expressions for this. The .index() function and string slicing will work fine.
Say we have:
>>> s = 'name[id]'
Then:
>>> s[s.index('[')+1:s.index(']')]
'id'
To me, this is easy to read: "start one character after the [ and finish before the ]".
def between_brackets(text):
return text.partition('[')[2].partition(']')[0]
This will also work even if your string does not contain a […] construct, and it assumes an implied ] at the end in the case you have only a [ somewhere in the string.
I'm new to python and this is an old question, but maybe this?
str.split('[')[1].strip(']')
You can get the value of the list use []. For example, create a list from URL like below with split.
>>> urls = 'http://quotes.toscrape.com/page/1/'
This generates a list like the one below.
>>> print( urls.split("/") )
['http:', '', 'quotes.toscrape.com', 'page', '11', '']
And what if you wanna get value only "http" from this list? You can use like this
>>> print(urls.split("/")[0])
http:
Or what if you wanna get value only "1" from this list? You can use like this
>>> print(urls.split("/")[-2])
1
str.split uses the entire parameter to split a string. Try:
str.split("[")[1].split("]")[0]

Categories

Resources