Python findall in a string - python

There must be an easier way or function to do this code here:
#!/usr/bin/env python
string = "test [*string*] test [*st[ *ring*] test"
points = []
result = string.find("[*")
new_string = string[result+1:]
while result != -1:
points.append(result)
new_string = new_string[result+1:]
result = new_string.find("[*")
print points
Any ideas?

import re
string = "test [*string*] test [*st[ *ring*] test"
points = [m.start() for m in re.finditer('\[', string)]

It looks like you're trying to get the indices in the string that match '[*'...
indices=[i for i in range(len(string)-1) if string[i:i+2] == '[*']
But this output is different than what your code will produce. Can you verify that your code does what you want?
Also note that string is the name of a python module in the standard library -- while it isn't used very often, it's probably a good idea to avoid using it as a variable name. (don't use str either)

>>> indexes = lambda str_, pattern: reduce(
... lambda acc, x: acc + [acc[-1] + len(x) + len(pattern)],
... str_.split(pattern), [-len(pattern)])[1:-1]
>>> indexes('123(456(', '(')
[3, 7]
>>> indexes('', 'x')
[]
>>> indexes("test [*string*] test [*st[ *ring*] test", '[*')
[5, 21]
>>> indexes('1231231','1')
[0, 3, 6]

Related

python split and remove duplicates

I have the following output with print var:
test.qa.home-page.website.com-3412-jan
test.qa.home-page.website.net-5132-mar
test.qa.home-page.website.com-8422-aug
test.qa.home-page.website.net-9111-jan
I'm trying to find the correct split function to populate below:
test.qa.home-page.website.com
test.qa.home-page.website.net
test.qa.home-page.website.com
test.qa.home-page.website.net
...as well as remove duplicates:
test.qa.home-page.website.com
test.qa.home-page.website.net
The numeric values after "com-" or "net-" are random so I think my struggle is finding out how to rsplit ("-" + [CHECK_FOR_ANY_NUMBER])[0] . Any suggestions would be great, thanks in advance!
How about :
import re
output = [
"test.qa.home-page.website.com-3412-jan",
"test.qa.home-page.website.net-5132-mar",
"test.qa.home-page.website.com-8422-aug",
"test.qa.home-page.website.net-9111-jan"
]
trimmed = set([re.split("-[0-9]", item)[0] for item in output])
print(trimmed)
# out : {'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}
If you have an array of values, and you want to remove duplicates, you can use set.
>>> l = [1,2,3,1,2,3]
>>> l
[1, 2, 3, 1, 2, 3]
>>> set(l)
{1, 2, 3}
You can get to a useful array by str.split('-')[0]-ing every value.
You could use a regex to parse the individual lines and a set comprehension to uniqueify:
txt='''\
test.qa.home-page.website.com-3412-jan
test.qa.home-page.website.net-5132-mar
test.qa.home-page.website.com-8422-aug
test.qa.home-page.website.net-9111-jan'''
import re
>>> {re.sub(r'^(.*\.(?:com|net)).*', r'\1', s) for s in txt.split() }
{'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}
Or just use the same regex with set and re.findall with the re.M flag:
>>> set(re.findall(r'^(.*\.(?:com|net))', txt, flags=re.M))
{'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}
If you want to maintain order, use {}.fromkeys() (since Python 3.6):
>>> list({}.fromkeys(re.findall(r'^(.*\.(?:com|net))', txt, flags=re.M)).keys())
['test.qa.home-page.website.com', 'test.qa.home-page.website.net']
Or, if you know your target is always 2 - from the end, just use .rsplit() with maxsplit=2:
>>> {s.rsplit('-',maxsplit=2)[0] for s in txt.splitlines()}
{'test.qa.home-page.website.com', 'test.qa.home-page.website.net'}

I am able to parse the log file but not getting output in correct format in python [duplicate]

How do I concatenate a list of strings into a single string?
For example, given ['this', 'is', 'a', 'sentence'], how do I get "this-is-a-sentence"?
For handling a few strings in separate variables, see How do I append one string to another in Python?.
For the opposite process - creating a list from a string - see How do I split a string into a list of characters? or How do I split a string into a list of words? as appropriate.
Use str.join:
>>> words = ['this', 'is', 'a', 'sentence']
>>> '-'.join(words)
'this-is-a-sentence'
>>> ' '.join(words)
'this is a sentence'
A more generic way (covering also lists of numbers) to convert a list to a string would be:
>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
12345678910
It's very useful for beginners to know
why join is a string method.
It's very strange at the beginning, but very useful after this.
The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).
.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).
Once you learn it, it's very comfortable and you can do tricks like this to add parentheses.
>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'
>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'
Edit from the future: Please don't use the answer below. This function was removed in Python 3 and Python 2 is dead. Even if you are still using Python 2 you should write Python 3 ready code to make the inevitable upgrade easier.
Although #Burhan Khalid's answer is good, I think it's more understandable like this:
from str import join
sentence = ['this','is','a','sentence']
join(sentence, "-")
The second argument to join() is optional and defaults to " ".
list_abc = ['aaa', 'bbb', 'ccc']
string = ''.join(list_abc)
print(string)
>>> aaabbbccc
string = ','.join(list_abc)
print(string)
>>> aaa,bbb,ccc
string = '-'.join(list_abc)
print(string)
>>> aaa-bbb-ccc
string = '\n'.join(list_abc)
print(string)
>>> aaa
>>> bbb
>>> ccc
We can also use Python's reduce function:
from functools import reduce
sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)
We can specify how we join the string. Instead of '-', we can use ' ':
sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)
If you have a mixed content list and want to stringify it, here is one way:
Consider this list:
>>> aa
[None, 10, 'hello']
Convert it to string:
>>> st = ', '.join(map(str, map(lambda x: f'"{x}"' if isinstance(x, str) else x, aa)))
>>> st = '[' + st + ']'
>>> st
'[None, 10, "hello"]'
If required, convert back to the list:
>>> ast.literal_eval(st)
[None, 10, 'hello']
If you want to generate a string of strings separated by commas in final result, you can use something like this:
sentence = ['this','is','a','sentence']
sentences_strings = "'" + "','".join(sentence) + "'"
print (sentences_strings) # you will get "'this','is','a','sentence'"
def eggs(someParameter):
del spam[3]
someParameter.insert(3, ' and cats.')
spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)
Without .join() method you can use this method:
my_list=["this","is","a","sentence"]
concenated_string=""
for string in range(len(my_list)):
if string == len(my_list)-1:
concenated_string+=my_list[string]
else:
concenated_string+=f'{my_list[string]}-'
print([concenated_string])
>>> ['this-is-a-sentence']
So, range based for loop in this example , when the python reach the last word of your list, it should'nt add "-" to your concenated_string. If its not last word of your string always append "-" string to your concenated_string variable.

Extraction of 2 or more digit number into a list from string like 12+13

I'm trying to extract numbers from a string like "12+13".
When I extract only the numbers from it into a list it becomes [1,2,1,3]
actually I want the list to take the numbers as [12,13] and 12,13 should be integers also.
I have tried my level best to solve this,the following is the code
but it still has a disadvantage .
I am forced to put a space at the end of the string...for it's correct functioning.
My Code
def extract(string1):
l=len(string1)
pos=0
num=[]
continuity=0
for i in range(l):
if string[i].isdigit()==True:
continuity+=1
else:
num=num+ [int(string[pos:continuity])]
continuity+=1
pos=continuity
return num
string="1+134-15 "#added a spaces at the end of the string
num1=[]
num1=extract(string)
print num1
This will work perfectly with your situation (and with all operators, not just +):
>>> import re
>>> equation = "12+13"
>>> tmp = re.findall('\\b\\d+\\b', equation)
>>> [int(i) for i in tmp]
[12, 13]
But if you format your string to be with spaces between operators (which I think is the correct way to go, and still supports all operators, with a space) then you can do this without even using regex like this:
>>> equation = "12 + 13"
>>> [int(s) for s in equation.split() if s.isdigit()]
[12, 13]
Side note: If your only operator is the + one, you can avoid regex by doing:
>>> equation = "12+13"
>>> [int(s) for s in equation.split("+") if s.isdigit()]
[12, 13]
The other answer is great (as of now), but I want to provide you with a detailed explanation. What you are trying to do is split the string on the "+" symbol. In python, this can be done with str.split("+").
When that translates into your code, it turns out like this.
ourStr = "12+13"
ourStr = ourStr.split("+")
But, don't you want to convert those to integers? In python, we can use list comprehension with int() to achieve this result.
To convert the entire array to ints, we can use. This pretty much loops over each index, and converts the string to an integer.
str = [int(s) for s in ourStr]
Combining this together, we get
ourStr = "12+13"
ourStr = ourStr.split("+")
ourStr = [int(s) for s in ourStr]
But lets say their might be other unknown symbols in the array. Like #Idos used, it is probably a good idea to check to make sure it is a number before putting it in the array.
We can further refine the code to:
ourStr = "12+13"
ourStr = ourStr.split("+")
ourStr = [int(s) for s in ourStr if s.isdigit()]
This can be solved with just list comprehension or built-in methods, no need for regex:
s = '12+13+14+15+16'
l = [int(x) for x in s.split('+')]
l = map(int, s.split('+'))
l = list(map(int, s.split('+'))) #If Python3
[12, 13, 14, 15, 16]
If you are not sure whether there are any non-digit strings, then just add condition to the list comprehension:
l = [int(x) for x in s.split('+') if x.isdigit()]
l = map(lambda s:int(s) if s.isdigit() else None, s.split('+'))
l = list(map(lambda s:int(s) if s.isdigit() else None, s.split('+'))) #If python3
Now consider a case where you could have something like:
s = '12 + 13 + 14+15+16'
l = [int(x.strip()) for x in s.split('+') if x.strip().isdigit()]#had to strip x for any whitespace
l = (map(lambda s:int(s.strip()) if s.strip().isdigit() else None, s.split('+'))
l = list(map(lambda s:int(s.strip()) if s.strip().isdigit() else None, s.split('+'))) #Python3
[12, 13, 14, 15, 16]
Or:
l = [int(x) for x in map(str.strip,s.split('+')) if x.isdigit()]
l = map(lambda y:int(y) if y.isdigit() else None, map(str.strip,s.split('+')))
l = list(map(lambda y:int(y) if y.isdigit() else None, map(str.strip,s.split('+')))) #Python3
You can just use Regular Expressions, and this becomes very easy:
>>> s = "12+13"
>>> import re
>>> re.findall(r'\d+',s)
['12', '13']
basically, \d matches any digit and + means 1 or more. So re.findall(r'\d+',s) is looking for any part of the string that is 1 or more digits in a row and returns each instance it finds!
in order to turn them to integers, as many people have said, you can just use a list comprehension after you get the result:
result = ['12', '13']
int_list = [int(x) for x in result]
python regex documentation
I have made a function which extracts number from a string.
def extract(string1):
string1=string1+" "
#added a spaces at the end of the string so that last number is also extracted
l=len(string1)
pos=0
num=[]
continuity=0
for i in range(l):
if string1[i].isdigit()==True:
continuity+=1
else:
if pos!=continuity:
''' This condition prevents consecutive execution
of else part'''
num=num+ [int(string1[pos:continuity])]
continuity+=1
pos=continuity
return num
string="ab73t9+-*/182"
num1=[]
num1=extract(string)
print num1

Replacing an item in a python list by index.. failing?

Any idea why when I call:
>>> hi = [1, 2]
>>> hi[1]=3
>>> print hi
[1, 3]
I can update a list item by its index, but when I call:
>>> phrase = "hello"
>>> for item in "123":
>>> list(phrase)[int(item)] = list(phrase)[int(item)].upper()
>>> print phrase
hello
It fails?
Should be hELLo
You haven't initialised phrase (The list you were intending to make) into a variable yet. So pretty much you have created a list in each loop, it being the exact same.
If you were intending to actually change the characters of phrase, well that's not possible, as in python, strings are immutable.
Perhaps make phraselist = list(phrase), then edit the list in the for-loop. Also, you can use range():
>>> phrase = "hello"
>>> phraselist = list(phrase)
>>> for i in range(1,4):
... phraselist[i] = phraselist[i].upper()
...
>>> print ''.join(phraselist)
hELLo
>>> phrase = "hello"
>>> list_phrase = list(phrase)
>>> for index in (1, 2, 3):
list_phrase[index] = phrase[index].upper()
>>> ''.join(list_phrase)
'hELLo'
If you prefer one-liner:
>>> ''.join(x.upper() if index in (1, 2, 3) else x for
index, x in enumerate(phrase))
'hELLo'
Another answer, just for fun :)
phrase = 'hello'
func = lambda x: x[1].upper() if str(x[0]) in '123' else x[1]
print ''.join(map(func, enumerate(phrase)))
# hELLo
To make this robust, I created a method: (because I am awesome, and bored)
def method(phrase, indexes):
func = lambda x: x[1].upper() if str(x[0]) in indexes else x[1]
return ''.join(map(func, enumerate(phrase)))
print method('hello', '123')
# hELLo
consider that strings are immutable in python You can't modify existing string can create new.
''.join([c if i not in (1, 2, 3) else c.upper() for i, c in enumerate(phrase)])
list() creates a new list. Your loop creates and instantly discards two new lists on each iteration. You could write it as:
phrase = "hello"
L = list(phrase)
L[1:4] = phrase[1:4].upper()
print("".join(L))
Or without a list:
print("".join([phrase[:1], phrase[1:4].upper(), phrase[4:]]))
Strings are immutable in Python therefore to change it, you need to create a new string.
Or if you are dealing with bytestrings, you could use bytearray which is mutable:
phrase = bytearray(b"hello")
phrase[1:4] = phrase[1:4].upper()
print(phrase.decode())
If indexes are not consecutive; you could use an explicit for-loop:
indexes = [1, 2, 4]
for i in indexes:
L[i] = L[i].upper()

Python - Make sure string is converted to correct Float

I have possible strings of prices like:
20.99, 20, 20.12
Sometimes the string could be sent to me wrongly by the user to something like this:
20.99.0, 20.0.0
These should be converted back to :
20.99, 20
So basically removing anything from the 2nd . if there is one.
Just to be clear, they would be alone, one at a time, so just one price in one string
Any nice one liner ideas?
For a one-liner, you can use .split() and .join():
>>> '.'.join('20.99.0'.split('.')[:2])
'20.99'
>>> '.'.join('20.99.1231.23'.split('.')[:2])
'20.99'
>>> '.'.join('20.99'.split('.')[:2])
'20.99'
>>> '.'.join('20'.split('.')[:2])
'20'
You could do something like this
>>> s = '20.99.0, 20.0.0'
>>> s.split(',')
['20.99.0', ' 20.0.0']
>>> map(lambda x: x[:x.find('.',x.find('.')+1)], s.split(','))
['20.99', ' 20.0']
Look at the inner expression of find. I am finding the first '.' and incrementing by 1 and then find the next '.' and leaving everything from that in the string slice operation.
Edit: Note that this solution will not discard everything from the second decimal point, but discard only the second point and keep additional digits. If you want to discard all digits, you could use e.g. #Blender's solution
It only qualifies as a one-liner if two instructions per line with a ; count, but here's what I came up with:
>>> x = "20.99.1234"
>>> s = x.split("."); x = s[0] + "." + "".join(s[1:])
>>> x
20.991234
It should be a little faster than scanning through the string multiple times, though. For a performance cost, you can do this:
>>> x = x.split(".")[0] + "." + "".join(x.split(".")[1:])
For a whole list:
>>> def numify(x):
>>> s = x.split(".")
>>> return float( s[0] + "." + "".join(s[1:]))
>>> x = ["123.4.56", "12.34", "12345.6.7.8.9"]
>>> [ numify(f) for f in x ]
[123.456, 12.34, 12345.6789]
>>> s = '20.99, 20, 20.99.23'
>>> ','.join(x if x.count('.') in [1,0] else x[:x.rfind('.')] for x in s.split(','))
'20.99, 20, 20.99'
If you are looking for a regex based solution and your intended behaviour is to discard everthing after the second .(decimal) than
>>> st = "20.99.123"
>>> string_decimal = re.findall(r'\d+\.\d+',st)
>>> float(''.join(string_decimal))
20.99

Categories

Resources