given a string as shown below,
"[xyx],[abc].[cfd],[abc].[dgr],[abc]"
how to print it like shown below ?
1.[xyz]
2.[cfd]
3.[dgr]
The original string will always maintain the above-mentioned format.
I did not realize you had periods and commas... that adds a bit of trickery. You have to split on the periods too
I would use something like this...
list_to_parse = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
count = 0
for i in list_to_parse.split('.'):
for j in i.split(','):
string = str(count + 1) + "." + j
if string:
count += 1
print(string)
string = None
Another option is split on the left bracket, and then just re-add it with enumerate - then strip commas and periods - this method is also probably a tiny bit faster, as it's not a loop inside a loop
list_to_parse = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
for index, i in enumerate(list.split('[')):
if i:
print(str(index) + ".[" + i.rstrip(',.'))
also strip is really "what characters to remove" not a specific pattern. so you can add any characters you want removed from the right, and it will work through the list until it hits a character it can't remove. there is also lstrip() and strip()
string manipulation can always get tricky, so pay attention. as this will output a blank first object, so index zero isn't printed etc... always practice and learn your needs :D
You can use split() function:
a = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
desired_strings = [i.split(',')[0] for i in a.split('.')]
for i,string in enumerate(desired_strings):
print(f"{i+1}.{string}")
This is just a fun way to solve it:
lst = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
count = 1
var = 1
for char in range(0, len(lst), 6):
if var % 2:
print(f"{count}.{lst[char:char + 5]}")
count += 1
var += 1
output:
1.[xyx]
2.[cfd]
3.[dgr]
explanation : "[" appears in these indexes: 0, 6, 12, etc. var is for skipping the next pair. count is the counting variable.
Here we can squeeze the above code using list comprehension and slicing instead of those flag variables. It's now more Pythonic:
lst = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
lst = [lst[i:i+5] for i in range(0, len(lst), 6)][::2]
res = (f"{i}.{item}" for i, item in enumerate(lst, 1))
print("\n".join(res))
You can use RegEx:
import regex as re
pattern=r"(\[[a-zA-Z]*\])\,\[[a-zA-Z]*\]\.?"
results=re.findall(pattern, '[xyx],[abc].[cfd],[abc].[dgr],[abc]')
print(results)
Using re.findall:
import re
s = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
print('\n'.join(f'{i+1}.{x}' for i,x in
enumerate(re.findall(r'(\[[^]]+\])(?=,)', s))))
Output:
1.[xyx]
2.[cfd]
3.[dgr]
Related
I'm trying to format any number by inserting ',' every 3 numbers from the end by not using format()
123456789 becomes 123,456,789
1000000 becomes 1,000,000
What I have so far only seems to go from the start, I've tried different ideas to get it to reverse but they seem to not work as I hoped.
def format_number(number):
s = [x for x in str(number)]
for a in s[::3]:
if s.index(a) is not 0:
s.insert(s.index(a), ',')
return ''.join(s)
print(format_number(1123456789))
>> 112,345,678,9
But obviously what I want is 1,123,456,789
I tried reversing the range [:-1:3] but I get 112,345,6789
Clarification: I don't want to use format to structure the number, I'd prefer to understand how to do it myself just for self-study's sake.
Here is a solution for you, without using built-in functions:
def format_number(number):
s = list(str(number))[::-1]
o = ''
for a in range(len(s)):
if a and a % 3 == 0:
o += ','
o += s[a]
return o[::-1]
print(format_number(1123456789))
And here is the same solution using built-in functions:
def format_number(number):
return '{:,}'.format(number)
print(format_number(1123456789))
I hope this helps. :D
One way to do it without built-in functions at all...
def format_number(number):
i = 0
r = ""
while True:
r = "0123456789"[number % 10] + r
number //= 10
if number == 0:
return r
i += 1
if i % 3 == 0:
r = "," + r
Here's a version that's almost free of built-in functions or methods (it does still have to use str)
def format_number(number):
i = 0
r = ""
for character in str(number)[::-1]:
if i > 0 and i % 3 == 0:
r = "," + r
r = character + r
i += 1
return r
Another way to do it without format but with other built-ins is to reverse the number, split it into chunks of 3, join them with a comma, and reverse it again.
def format_number(number):
backward = str(number)[::-1]
r = ",".join(backward[i:i+3] for i in range(0, len(backward), 3))
return r[::-1]
Your current approach has following drawbacks
checking for equality/inequality in most cases (especially for int) should be made using ==/!= operators, not is/is not ones,
using list.index returns first occurence from the left end (so s.index('1') will be always 0 in your example), we can iterate over range if indices instead (using range built-in).
we can have something like
def format_number(number):
s = [x for x in str(number)]
for index in range(len(s) - 3, 0, -3):
s.insert(index, ',')
return ''.join(s)
Test
>>> format_number(1123456789)
'1,123,456,789'
>>> format_number(6789)
'6,789'
>>> format_number(135)
'135'
If range, list.insert and str.join are not allowed
We can replace
range with while loop,
list.insert using slicing and concatenation,
str.join with concatenation,
like
def format_number(number):
s = [x for x in str(number)]
index = len(s) - 3
while index > 0:
s = s[:index] + [','] + s[index:]
index -= 3
result = ''
for character in s:
result += character
return result
Using str.format
Finally, following docs
The ',' option signals the use of a comma for a thousands separator. For a locale aware separator, use the 'n' integer presentation type instead.
your function can be simplified to
def format_number(number):
return '{:,}'.format(number)
and it will even work for floats.
This might be more information than necessary to explain my question, but I am trying to combine 2 scripts (I wrote for other uses) together to do the following.
TargetString (input_file) 4FOO 2BAR
Result (output_file) 1FOO 2FOO 3FOO 4FOO 1BAR 2BAR
My first script finds the pattern and copies to file_2
pattern = "\d[A-Za-z]{3}"
matches = re.findall(pattern, input_file.read())
f1.write('\n'.join(matches))
My second script opens the output_file and, using re.sub, replaces and alters the target string(s) using capturing groups and back-references. But I am stuck here on how to turn i.e. 3 into 1 2 3.
Any ideas?
This simple example doesn't need to use regular expression, but if you want to use re anyway, here's example (note: you have minor error in your pattern, should be A-Z, not A-A):
text_input = '4FOO 2BAR'
import re
matches = re.findall(r"(\d)([A-Za-z]{3})", text_input)
for (count, what) in matches:
for i in range(1, int(count)+1):
print(f'{i}{what}', end=' ')
print()
Prints:
1FOO 2FOO 3FOO 4FOO 1BAR 2BAR
Note: If you want to support multiple digits, you can use (\d+) - note the + sign.
Assuming your numbers are between 1 and 9, without regex, you can use a list comprehension with f-strings (Python 3.6+):
L = ['4FOO', '2BAR']
res = [f'{j}{i[1:]}' for i in L for j in range(1, int(i[0])+1)]
['1FOO', '2FOO', '3FOO', '4FOO', '1BAR', '2BAR']
Reading and writing to CSV files are covered elsewhere: read, write.
More generalised, to account for numbers greater than 9, you can use itertools.groupby:
from itertools import groupby
L = ['4FOO', '10BAR']
def make_var(x, int_flag):
return int(''.join(x)) if int_flag else ''.join(x)
vals = ((make_var(b, a) for a, b in groupby(i, str.isdigit)) for i in L)
res = [f'{j}{k}' for num, k in vals for j in range(1, num+1)]
print(res)
['1FOO', '2FOO', '3FOO', '4FOO', '1BAR', '2BAR', '3BAR', '4BAR',
'5BAR', '6BAR', '7BAR', '8BAR', '9BAR', '10BAR']
My function looks like this:
def accum(s):
a = []
for i in s:
b = s.index(i)
a.append(i * (b+1))
x = "-".join(a)
return x.title()
with the expected input of:
'abcd'
the output should be and is:
'A-Bb-Ccc-Dddd'
but if the input has a recurring character:
'abccba'
it returns:
'A-Bb-Ccc-Ccc-Bb-A'
instead of:
'A-Bb-Ccc-Cccc-Bbbbb-Aaaaaa'
how can I fix this?
Don't use str.index(), it'll return the first match. Since c and b and a appear early in the string you get 2, 1 and 0 back regardless of the position of the current letter.
Use the enumerate() function to give you position counter instead:
for i, letter in enumerate(s, 1):
a.append(i * letter)
The second argument is the starting value; setting this to 1 means you can avoid having to + 1 later on. See What does enumerate mean? if you need more details on what enumerate() does.
You can use a list comprehension here rather than use list.append() calls:
def accum(s):
a = [i * letter for i, letter in enumerate(s, 1)]
x = "-".join(a)
return x.title()
which could, at a pinch, be turned into a one-liner:
def accum(s):
a = '-'.join([i * c for i, c in enumerate(s, 1)]).title()
This is because s.index(a) returns the first index of the character. You can use enumerate to pair elements to their indices:
Here is a Pythonic solution:
def accum(s):
return "-".join(c*(i+1) for i, c in enumerate(s)).title()
simple:
def accum(s):
a = []
for i in range(len(s)):
a.append(s[i]*(i+1))
x = "-".join(a)
return x.title()
Is there a Python-way to split a string after the nth occurrence of a given delimiter?
Given a string:
'20_231_myString_234'
It should be split into (with the delimiter being '_', after its second occurrence):
['20_231', 'myString_234']
Or is the only way to accomplish this to count, split and join?
>>> n = 2
>>> groups = text.split('_')
>>> '_'.join(groups[:n]), '_'.join(groups[n:])
('20_231', 'myString_234')
Seems like this is the most readable way, the alternative is regex)
Using re to get a regex of the form ^((?:[^_]*_){n-1}[^_]*)_(.*) where n is a variable:
n=2
s='20_231_myString_234'
m=re.match(r'^((?:[^_]*_){%d}[^_]*)_(.*)' % (n-1), s)
if m: print m.groups()
or have a nice function:
import re
def nthofchar(s, c, n):
regex=r'^((?:[^%c]*%c){%d}[^%c]*)%c(.*)' % (c,c,n-1,c,c)
l = ()
m = re.match(regex, s)
if m: l = m.groups()
return l
s='20_231_myString_234'
print nthofchar(s, '_', 2)
Or without regexes, using iterative find:
def nth_split(s, delim, n):
p, c = -1, 0
while c < n:
p = s.index(delim, p + 1)
c += 1
return s[:p], s[p + 1:]
s1, s2 = nth_split('20_231_myString_234', '_', 2)
print s1, ":", s2
I like this solution because it works without any actuall regex and can easiely be adapted to another "nth" or delimiter.
import re
string = "20_231_myString_234"
occur = 2 # on which occourence you want to split
indices = [x.start() for x in re.finditer("_", string)]
part1 = string[0:indices[occur-1]]
part2 = string[indices[occur-1]+1:]
print (part1, ' ', part2)
I thought I would contribute my two cents. The second parameter to split() allows you to limit the split after a certain number of strings:
def split_at(s, delim, n):
r = s.split(delim, n)[n]
return s[:-len(r)-len(delim)], r
On my machine, the two good answers by #perreal, iterative find and regular expressions, actually measure 1.4 and 1.6 times slower (respectively) than this method.
It's worth noting that it can become even quicker if you don't need the initial bit. Then the code becomes:
def remove_head_parts(s, delim, n):
return s.split(delim, n)[n]
Not so sure about the naming, I admit, but it does the job. Somewhat surprisingly, it is 2 times faster than iterative find and 3 times faster than regular expressions.
I put up my testing script online. You are welcome to review and comment.
>>>import re
>>>str= '20_231_myString_234'
>>> occerence = [m.start() for m in re.finditer('_',str)] # this will give you a list of '_' position
>>>occerence
[2, 6, 15]
>>>result = [str[:occerence[1]],str[occerence[1]+1:]] # [str[:6],str[7:]]
>>>result
['20_231', 'myString_234']
It depends what is your pattern for this split. Because if first two elements are always numbers for example, you may build regular expression and use re module. It is able to split your string as well.
I had a larger string to split ever nth character, ended up with the following code:
# Split every 6 spaces
n = 6
sep = ' '
n_split_groups = []
groups = err_str.split(sep)
while len(groups):
n_split_groups.append(sep.join(groups[:n]))
groups = groups[n:]
print n_split_groups
Thanks #perreal!
In function form of #AllBlackt's solution
def split_nth(s, sep, n):
n_split_groups = []
groups = s.split(sep)
while len(groups):
n_split_groups.append(sep.join(groups[:n]))
groups = groups[n:]
return n_split_groups
s = "aaaaa bbbbb ccccc ddddd eeeeeee ffffffff"
print (split_nth(s, " ", 2))
['aaaaa bbbbb', 'ccccc ddddd', 'eeeeeee ffffffff']
As #Yuval has noted in his answer, and #jamylak commented in his answer, the split and rsplit methods accept a second (optional) parameter maxsplit to avoid making splits beyond what is necessary. Thus, I find the better solution (both for readability and performance) is this:
s = '20_231_myString_234'
first_part = text.rsplit('_', 2)[0] # Gives '20_231'
second_part = text.split('_', 2)[2] # Gives 'myString_234'
This is not only simple, but also avoids performance hits of regex solutions and other solutions using join to undo unnecessary splits.
I have a string '0000000000000201' in python
dpid_string = '0000000000000201'
Which is the best way to convert this to the following string
00:00:00:00:00:00:02:01
You'd partition the string into chunks of size 2, and join them with str.join():
':'.join([dpid_string[i:i + 2] for i in range(0, len(dpid_string), 2)])
Demo:
>>> dpid_string = '0000000000000201'
>>> ':'.join([dpid_string[i:i + 2] for i in range(0, len(dpid_string), 2)])
'00:00:00:00:00:00:02:01'
seq = '0000000000000201'
length = 2
":".join([seq[i:i+length] for i in range(0, len(seq), length)])
Although not very simple, you can do
dpid_string = '0000000000000201'
''.join([':' + char if not i % 2 else char for i, char in enumerate(dpid_string)])[1:]
To break it down from within the list comprehension:
[char for char in dpid_string] just loops over characters and returns them as a list.
We want it to return a string, so we join the full list using ''.join(list).
Now we want it to react on the location of the character, so we want to assess the index. Therefore we use i, value in enumerate(list)
If this index is even, add a colon before the char (modulus 2 is False).
Now this leaves us with a colon at index 0, we remove it by indexing [1:]
An alternative using re.sub:
import re
dpid_string = '0000000000000201'
subbed = re.sub('(..)(?!$)', r'\1:', dpid_string)
# 00:00:00:00:00:00:02:01
Read as take every 2 characters that aren't at the end of the string, and replace it with those two characters followed by :.