Convert the formatted string to array in Python - python

I have the following string
myString = "cat(50),dog(60),pig(70)"
I try to convert the above string to 2D array.
The result I want to get is
myResult = [['cat', 50], ['dog', 60], ['pig', 70]]
I already know the way to solve by using the legacy string method but it is quite complicated. So I don't want to use this approach.
# Legacy approach
# 1. Split string by ","
# 2. Run loop and split string by "(" => got the <name of animal>
# 3. Got the number by exclude ")".
Any suggestion would appreciate.

You can use the re.findall method:
>>> import re
>>> re.findall(r'(\w+)\((\d+)\)', myString)
[('cat', '50'), ('dog', '60'), ('pig', '70')]
If you want a list of lists as noticed by RomanPerekhrest convert it with a list comprehension:
>>> [list(t) for t in re.findall(r'(\w+)\((\d+)\)', myString)]
[['cat', '50'], ['dog', '60'], ['pig', '70']]

Alternative solution using re.split() function:
import re
myString = "cat(50),dog(60),pig(70)"
result = [re.split(r'[)(]', i)[:-1] for i in myString.split(',')]
print(result)
The output:
[['cat', '50'], ['dog', '60'], ['pig', '70']]
r'[)(]' - pattern, treats parentheses as delimiters for splitting
[:-1] - slice containing all items except the last one(which is empty space ' ')

Related

strip all strings in list of specific character

I have been looking for an answer to this for a while but keep finding answers about stripping a specific string from a list.
Let's say this is my list of strings
stringList = ["cat\n","dog\n","bird\n","rat\n","snake\n"]
But all list items contain a new line character (\n)
How can I remove this from all the strings within the list?
Use a list comprehension with rstrip():
stringList = ["cat\n","dog\n","bird\n","rat\n","snake\n"]
output = [x.rstrip() for x in stringList]
print(output) # ['cat', 'dog', 'bird', 'rat', 'snake']
If you really want to target a single newline character only at the end of each string, then we can get more precise with re.sub:
stringList = ["cat\n","dog\n","bird\n","rat\n","snake\n"]
output = [re.sub(r'\n$', '', x) for x in stringList]
print(output) # ['cat', 'dog', 'bird', 'rat', 'snake']
By applying the method strip (or rstrip) to all terms of the list with map
out = list(map(str.strip, stringList))
print(out)
or with a more rudimental check and slice
strip_char = '\n'
out = [s[:-len(strip_char)] if s.endswith(strip_char) else s for s in stringList]
print(out)
Since you can use an if to check if a new line character exists in a string, you can use the code below to detect string elements with the new line character and replace those characters with empty strings
stringList = ["cat\n","dog\n","bird\n","rat\n","snake\n"]
nlist = []
for string in stringList:
if "\n" in string:
nlist.append(string.replace("\n" , ""))
print(nlist)
You could also use map() along with str.rstrip:
>>> string_list = ['cat\n', 'dog\n', 'bird\n', 'rat\n', 'snake\n']
>>> new_string_list = list(map(str.rstrip, string_list))
>>> new_string_list
['cat', 'dog', 'bird', 'rat', 'snake']

Trailing empty string after re.split()

I have two strings where I want to isolate sequences of digits from everything else.
For example:
import re
s = 'abc123abc'
print(re.split('(\d+)', s))
s = 'abc123abc123'
print(re.split('(\d+)', s))
The output looks like this:
['abc', '123', 'abc']
['abc', '123', 'abc', '123', '']
Note that in the second case, there's a trailing empty string.
Obviously I can test for that and remove it if necessary but it seems cumbersome and I wondered if the RE can be improved to account for this scenario.
You can use filter and don't return this empty string like below:
>>> s = 'abc123abc123'
>>> re.split('(\d+)', s)
['abc', '123', 'abc', '123', '']
>>> list(filter(None,re.split('(\d+)', s)))
['abc', '123', 'abc', '123']
By thanks #chepner you can generate list comprehension like below:
>>> [x for x in re.split('(\d+)', s) if x]
['abc', '123', 'abc', '123']
If maybe you have symbols or other you need split:
>>> s = '&^%123abc123$##123'
>>> list(filter(None,re.split('(\d+)', s)))
['&^%', '123', 'abc', '123', '$##', '123']
This has to do with the implementation of re.split() itself: you can't change it. When the function splits, it doesn't check anything that comes after the capture group, so it can't choose for you to either keep or discard the empty string that is left after splitting. It just splits there and leaves the rest of the string (which can be empty) to the next cycle.
If you don't want that empty string, you can get rid of it in various ways before collecting the results into a list. user1740577's is one example, but personally I prefer a list comprehension, since it's more idiomatic for simple filter/map operations:
parts = [part for part in re.split('(\d+)', s) if part]
I recommend against checking and getting rid of the element after the list has already been created, because it involves more operations and allocations.
A simple way to use regular expressions for this would be re.findall:
def bits(s):
return re.findall(r"(\D+|\d+)", s)
bits("abc123abc123")
# ['abc', '123', 'abc', '123']
But it seems easier and more natural with itertools.groupby. After all, you are chunking an iterable based on a single condition:
from itertools import groupby
def bits(s):
return ["".join(g) for _, g in groupby(s, key=str.isdigit)]
bits("abc123abc123")
# ['abc', '123', 'abc', '123']

Python Regex capture multiple sections within string

I have string that are always of the format track-a-b where a and b are integers.
For example:
track-12-29
track-1-210
track-56-1
How do I extract a and b from such strings in python?
If it's just a single string, I would approach this using split:
>>> s = 'track-12-29'
>>> s.split('-')[1:]
['12', '29']
If it is a multi-line string, I would use the same approach ...
>>> s = 'track-12-29\ntrack-1-210\ntrack-56-1'
>>> results = [x.split('-')[1:] for x in s.splitlines()]
[['12', '29'], ['1', '210'], ['56', '1']]
You'll want to use re.findall() with capturing groups:
results = [re.findall(r'track-(\d+)-(\d+)', datum) for datum in data]

Split numbers and letters in a string Python

How would I split numbers and letters in a string? So if given:
string = "12really happy15blob"
splitString = []
splitString = mySplitter(string)
print splitString
would return ["12","really happy","15","blob"]
You could use re.split here:
>>> import re
>>> re.split(r'(\d+)', "12really happy15blob")
['', '12', 'really happy', '15', 'blob']
Note that you actually get an empty string from splitting between the start of the string and the initial 12. You'd have to filter that out if you didn't want it.

Splitting strings in python based on index

This sounds pretty basic but I ca't think of a neat straightforward method to do this in Python yet
I have a string like "abcdefgh" and I need to create a list of elements picking two characters at a time from the string to get ['ab','cd','ef','gh'].
What I am doing right now is this
output = []
for i in range(0,len(input),2):
output.append(input[i:i+2])
Is there a nicer way?
In [2]: s = 'abcdefgh'
In [3]: [s[i:i+2] for i in range(0, len(s), 2)]
Out[3]: ['ab', 'cd', 'ef', 'gh']
Just for the fun of it, if you hate for
>>> s='abcdefgh'
>>> map(''.join, zip(s[::2], s[1::2]))
['ab', 'cd', 'ef', 'gh']
Is there a nicer way?
Sure. List comprehension can do that.
def n_chars_at_a_time(s, n=2):
return [s[i:i+n] for i in xrange(0, len(s), n)]
should do what you want. The s[i:i+n] returns the substring starting at i and ending n characters later.
n_chars_at_a_time("foo bar baz boo", 2)
produces
['fo', 'o ', 'ba', 'r ', 'ba', 'z ', 'bo', 'o']
in the python REPL.
For more info see Generator Expressions and List Comprehensions:
Two common operations on an iterator’s output are
performing some operation for every element,
selecting a subset of elements that meet some condition.
For example, given a list of strings, you might want to strip off trailing whitespace from each line or extract all the strings containing a given substring.
List comprehensions and generator expressions (short form: “listcomps” and “genexps”) are a concise notation for such operations...

Categories

Resources