How to split a string by using [] in Python - python

So from this string:
"name[id]"
I need this:
"id"
I used str.split ('[]'), but it didn't work. Does it only take a single delimiter?

Use a regular expression:
import re
s = "name[id]"
re.find(r"\[(.*?)\]", s).group(1) # = 'id'
str.split() takes a string on which to split input. For instance:
"i,split,on commas".split(',') # = ['i', 'split', 'on commas']
The re module also allows you to split by regular expression, which can be very useful, and I think is what you meant to do.
import re
s = "name[id]"
# split by either a '[' or a ']'
re.split('\[|\]', s) # = ['name', 'id', '']

Either
"name[id]".split('[')[1][:-1] == "id"
or
"name[id]".split('[')[1].split(']')[0] == "id"
or
re.search(r'\[(.*?)\]',"name[id]").group(1) == "id"
or
re.split(r'[\[\]]',"name[id]")[1] == "id"

Yes, the delimiter is the whole string argument passed to split. So your example would only split a string like 'name[]id[]'.
Try eg. something like:
'name[id]'.split('[', 1)[-1].split(']', 1)[0]
'name[id]'.split('[', 1)[-1].rstrip(']')

I'm not a fan of regex, but in cases like it often provides the best solution.
Triptych already recommended this, but I'd like to point out that the ?P<> group assignment can be used to assign a match to a dictionary key:
>>> m = re.match(r'.*\[(?P<id>\w+)\]', 'name[id]')
>>> result_dict = m.groupdict()
>>> result_dict
{'id': 'id'}
>>>

You don't actually need regular expressions for this. The .index() function and string slicing will work fine.
Say we have:
>>> s = 'name[id]'
Then:
>>> s[s.index('[')+1:s.index(']')]
'id'
To me, this is easy to read: "start one character after the [ and finish before the ]".

def between_brackets(text):
return text.partition('[')[2].partition(']')[0]
This will also work even if your string does not contain a […] construct, and it assumes an implied ] at the end in the case you have only a [ somewhere in the string.

I'm new to python and this is an old question, but maybe this?
str.split('[')[1].strip(']')

You can get the value of the list use []. For example, create a list from URL like below with split.
>>> urls = 'http://quotes.toscrape.com/page/1/'
This generates a list like the one below.
>>> print( urls.split("/") )
['http:', '', 'quotes.toscrape.com', 'page', '11', '']
And what if you wanna get value only "http" from this list? You can use like this
>>> print(urls.split("/")[0])
http:
Or what if you wanna get value only "1" from this list? You can use like this
>>> print(urls.split("/")[-2])
1

str.split uses the entire parameter to split a string. Try:
str.split("[")[1].split("]")[0]

Related

Removing variable number from string

I have several strings in a list:
['~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_item2.png', '~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_item3.png', '~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_item4.png'
I need to remove the 'item2', 'item3', 'item4' so I can later replace with another variable that changes each time that I am passing in: variable = {changing item}
I have tried things like string.replace("item{i}.format(i) for i in range(20), "") or re.sub but I can't seem to get it to work - any suggestions?
I would expect the output [~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_{changing item1}.png, ~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_{changing item2}.png, ~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_{changing item3}.png]
You can use re.sub to replace string ('item<number>') like so:
re.sub(r'item\d+', var, x)
Code:
import re
lst = ['~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_item2.png', 'some/thing/0034_item5.png']
var = 'foo'
result = [re.sub(r'item\d+', var, x) for x in lst]
# ['~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_foo.png', 'some/thing/0034_foo.png']
Try using the re regex module. You need to use a valid regex. You can specify one or more characters 0 to 9 by using [0-9]+.
fixed_str = re.replace("item[0-9]+", "", input_str)
Here is the reference to how to format regexs:
https://docs.python.org/3/library/re.html
You can also use online sites such as regex101.com to experiment with regex formatting in real time to make sure it works ahead of time.

Why is the split() returning list objects that are empty? [duplicate]

I have the following file names that exhibit this pattern:
000014_L_20111007T084734-20111008T023142.txt
000014_U_20111007T084734-20111008T023142.txt
...
I want to extract the middle two time stamp parts after the second underscore '_' and before '.txt'. So I used the following Python regex string split:
time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
But this gives me two extra empty strings in the returned list:
time_info=['', '20111007T084734', '20111008T023142', '']
How do I get only the two time stamp information? i.e. I want:
time_info=['20111007T084734', '20111008T023142']
I'm no Python expert but maybe you could just remove the empty strings from your list?
str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
time_info = filter(None, str_list)
Don't use re.split(), use the groups() method of regex Match/SRE_Match objects.
>>> f = '000014_L_20111007T084734-20111008T023142.txt'
>>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups()
>>> time_info
('20111007T084734', '20111008T023142')
You can even name the capturing groups and retrieve them in a dict, though you use groupdict() rather than groups() for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.')
If the timestamps are always after the second _ then you can use str.split and str.strip:
>>> strs = "000014_L_20111007T084734-20111008T023142.txt"
>>> strs.strip(".txt").split("_",2)[-1].split("-")
['20111007T084734', '20111008T023142']
Since this came up on google and for completeness, try using re.findall as an alternative!
This does require a little re-thinking, but it still returns a list of matches like split does. This makes it a nice drop-in replacement for some existing code and gets rid of the unwanted text. Pair it with lookaheads and/or lookbehinds and you get very similar behavior.
Yes, this is a bit of a "you're asking the wrong question" answer and doesn't use re.split(). It does solve the underlying issue- your list of matches suddenly have zero-length strings in it and you don't want that.
>>> f='000014_L_20111007T084734-20111008T023142.txt'
>>> f[10:-4].split('-')
['0111007T084734', '20111008T023142']
or, somewhat more general:
>>> f[f.rfind('_')+1:-4].split('-')
['20111007T084734', '20111008T023142']

python: Match string using regular expression

I am learning regular expressions. Don't understand how to match the following pattern:
" myArray = ["Var1","Var2"]; "
Ideally I want to get the data in the array and to convert into python array
Are the array items guaranteed to be surrounded by double-quotes?
This is a quick and dirty method:
re.findall('"([^,]+)"', source)
where source is your string.
I didn't escape the double-quotes in the regex since you can also use single-quotes in Python.
This returns a list of each item surrounded by double quotes
so in your example: ['Var1', 'Var2']
Regular expression complexity differs much depending on variations of input. The easiest expressions that matches given string are:
>>> from re import search, findall
>>> s = ' myArray = ["Var1","Var2"]; '
>>> name, body = search(r'\s*(\w*)\s*=\s*\[(.*)\]', s).groups(0)
>>> contents = findall(r'"(\w*)"', body)
>>> name, contents
('myArray', ['Var1', 'Var2'])
"Converting" to python array can be done like this:
>>> globals().update({name: contents})
>>> myArray
['Var1', 'Var2']
Though it is actually a bad idea as it writes garbage in globals. Instead, try using separate dictionary, or something.
If you are interested in just getting the data in the array, you can skip using regex and use eval instead.
Consider this:
myArray = eval('["Var1","Var2"]')
If you must use the line you gave in the example, you can also use exec. However this command is somewhat dangerous and needs special care if used.
Without using an re you could use builtin string methods and literal_eval which given your example returns a usable list object:
from ast import literal_eval
text = ' myArray = ["Var1","Var2"]; '
name, arr_text = (el.strip('; ') for el in text.split('='))
arr = literal_eval(arr_text)
print name, arr
Then do what you want with name and arr...

Python - removing characters from a list

I have list of elements like this:
['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
My question is:
How can I remove specific characters from the elements of this list ?
As a result I want to have :
['test', 'test', '1989', 'test', '']
Any suggestions, solutions ?
Thanks in advance.
>>> re.findall(r'\{(.*)\}', '1:{test}')
['test']
Just make a loop with it:
[(re.findall(r'\{(.*)\}', i) or [''])[0] for i in your_list]
or maybe:
[''.join(re.findall(r'\{(.*)\}', i)) for i in your_list]
You could use a regular expression, like so:
import re
s = re.compile("\d+:{(.*)}")
data = ['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
result = [s.match(d).group(1) if s.match(d) else d for d in data]
results in
['test', 'test', '1989', 'test', '']
Python's strip() function does exactly that -- remove specific characters from the ends of a string -- but there are probably better ways to do what you want.
You haven't said exactly what the pattern is, or what you want if there are no braces, but this will work on your example:
stripped = []
for x in my_data:
m = re.search("{.*}", x)
stripped.append(m.group if m else x)
t = ['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
map(lambda string: re.search(r'(?<=\{).+(?=\})', string).group(0), t)
Granted, this is not the most well-formatted or easiest to read of answers. This maps an anonymous function that finds and returns what is inside the brackets to each element of the list, returning the whole list.
(?<=...) means "match only that has this at the beginning, but don't include it in the result
(?=...) means "match only that has this at the end, but don't include it in the result
.+ means "at least one character of any kind"

parsing in python

I have following string
adId:4028cb901dd9720a011e1160afbc01a3;siteId:8a8ee4f720e6beb70120e6d8e08b0002;userId:5082a05c-015e-4266-9874-5dc6262da3e0
I need only the value of adId,siteId and userId.
means
4028cb901dd9720a011e1160afbc01a3
8a8ee4f720e6beb70120e6d8e08b0002
5082a05c-015e-4266-9874-5dc6262da3e0
all the 3 in different variable or in a array so that i can use all three
You can split them to a dictionary if you don't need any fancy parsing:
In [2]: dict(kvpair.split(':') for kvpair in s.split(';'))
Out[2]:
{'adId': '4028cb901dd9720a011e1160afbc01a3',
'siteId': '8a8ee4f720e6beb70120e6d8e08b0002',
'userId': '5082a05c-015e-4266-9874-5dc6262da3e0'}
You could do something like this:
input='adId:4028cb901dd9720a011e1160afbc01a3;siteId:8a8ee4f720e6beb70120e6d8e08b0002;userId:5082a05c-015e-4266-9874-5dc6262da3e0'
result={}
for pair in input.split(';'):
(key,value) = pair.split(':')
result[key] = value
print result['adId']
print result['siteId']
print result['userId']
matches = re.findall("([a-z0-9A-Z_]+):([a-zA-Z0-9\-]+);", buf)
for m in matches:
#m[1] is adid and things
#m[2] is the long string.
You can also limit the lengths using {32} like
([a-zA-Z0-9]+){32};
Regular expressions allow you to validate the string and split it into component parts.
There is an awesome method called split() for python that will work nicely for you. I would suggest using it twice, once for ';' then again for each one of those using ':'.

Categories

Resources