regex help to remove a +

regex help to remove a + - python

I am having a problem with a + sign.
Here is my format of my CSV
wifichannelnumber+ssid+macaddress of AP
Here is an example of a good line
6+Jills-Equinox+78:61:7c:19:xx:xx
And here is the problem mine. Note the + next to S8.
11+Samsung-Galaxy-S8+-4469+a2:cc:2b:8d:xx:xx
I would like to remove plus in bash or python (Edit) for a whole CSV

phone = "11+Samsung-Galaxy-S8+-4469+a2:cc:2b:8d:xx:xx"
print(phone.replace("S8+","S8"))
>>>11+Samsung-Galaxy-S8-4469+a2:cc:2b:8d:xx:xx

Your desire regex is:
^(\d+)\+(.*)\+(([\w\d]{2}\:){5}[\d\w]{2})$
Then you can use python to remove every '+' sign in the second regex group

Python solution:
s = '11+Samsung-Galaxy-S8+-4469+a2:cc:2b:8d:xx:xx'
if s.count('+') > 2:
parts = s.split('+')
s = '{}+{}+{}'.format(parts[0], ''.join(parts[1:-1]), parts[-1])
print(s)
The output:
11+Samsung-Galaxy-S8-4469+a2:cc:2b:8d:xx:xx

Related

Better way to change specific chars in a string separated by underscores without using re

I have files with names like centerOne_camera_2_2018-04-11_15:11:21_2.0.jpg. I want to change the last string i.e. image_name.split('_')[5].split('.')[0] to some other string. I can't seem to find a neat way to do this and ended up doing the following which is very crude
new_name = image_base.split('_')[0] + image_base.split('_')[1] + image_base.split('_')[2] + image_base.split('_')[3] + image_base.split('_')[4] + frameNumber
That is, my output should be centerOne_camera_2_2018-04-11_15:11:21_<some string>.0.jpg
Any better way is appreciated. Note: I want to retain the rest of the string too.

I think you may be looking for this:
>>> "centerOne_camera_2_2018-04-11_15:11:21_2.0.jpg".rpartition("_")
('centerOne_camera_2_2018-04-11_15:11:21', '_', '2.0.jpg')
That is for the last element. But from the comments I gather you want to split at delimiter n.
>>> n = 3
>>> temp = "centerOne_camera_2_2018-04-11_15:11:21_2.0.jpg".split("_",n)
>>> "_".join(temp[:n]),temp[n]
('centerOne_camera_2', '2018-04-11_15:11:21_2.0.jpg')
I'm not sure what your objection to using + is, but you can do this if you like:
>>> temp="centerOne_camera_2_2018-04-11_15:11:21_2.0.jpg".rpartition("_")
>>> "{0}<some_string>{2}".format(*temp)
'centerOne_camera_2_2018-04-11_15:11:21<some_string>2.0.jpg'

You can try rsplit:
"centerOne_camera_2_2018-04-11_15:11:21_2.0.jpg".rsplit("_", 1)
['centerOne_camera_2_2018-04-11_15:11:21', '2.0.jpg']

Parsing a MAC address with python

How can I convert a hex value "0000.0012.13a4" into "00:00:00:12:13:A4"?

text = '0000.0012.13a4'
text = text.replace('.', '').upper() # a little pre-processing
# chunk into groups of 2 and re-join
out = ':'.join([text[i : i + 2] for i in range(0, len(text), 2)])
print(out)
00:00:00:12:13:A4

import re
old_string = "0000.0012.13a4"
new_string = ':'.join(s for s in re.split(r"(\w{2})", old_string.upper()) if s.isalnum())
print(new_string)
OUTPUT
> python3 test.py
00:00:00:12:13:A4
>
Without modification, this approach can handle some other MAC formats that you might run into like, "00-00-00-12-13-a4"

Try following code
import re
hx = '0000.0012.13a4'.replace('.','')
print(':'.join(re.findall('..', hx)))
Output: 00:00:00:12:13:a4

There is a pretty simple three step solution:
First we strip those pesky periods.
step1 = hexStrBad.replace('.','')
Then, if the formatting is consistent:
step2 = step1[0:2] + ':' + step1[2:4] + ':' + step1[4:6] + ':' + step1[6:8] + ':' + step1[8:10] + ':' + step1[10:12]
step3 = step2.upper()
It's not the prettiest, but it will do what you need!

It's unclear what you're asking exactly, but if all you want is to make a string all uppercase, use .upper()
Try to clarify your question somewhat, because if you're asking about converting some weirdly formatted string into what looks like a MAC address, we need to know that to answer your question.

In Python how to strip dollar signs and commas from dollar related fields only

I'm reading in a large text file with lots of columns, dollar related and not, and I'm trying to figure out how to strip the dollar fields ONLY of $ and , characters.
so say I have:
a|b|c
$1,000|hi,you|$45.43
$300.03|$MS2|$55,000
where a and c are dollar-fields and b is not.
The output needs to be:
a|b|c
1000|hi,you|45.43
300.03|$MS2|55000
I was thinking that regex would be the way to go, but I can't figure out how to express the replacement:
f=open('sample1_fixed.txt','wb')
for line in open('sample1.txt', 'rb'):
new_line = re.sub(r'(\$\d+([,\.]\d+)?k?)',????, line)
f.write(new_line)
f.close()
Anyone have an idea?
Thanks in advance.

Unless you are really tied to the idea of using a regex, I would suggest doing something simple, straight-forward, and generally easy to read:
def convert_money(inval):
if inval[0] == '$':
test_val = inval[1:].replace(",", "")
try:
_ = float(test_val)
except:
pass
else:
inval = test_val
return inval
def convert_string(s):
return "|".join(map(convert_money, s.split("|")))
a = '$1,000|hi,you|$45.43'
b = '$300.03|$MS2|$55,000'
print convert_string(a)
print convert_string(b)
OUTPUT
1000|hi,you|45.43
300.03|$MS2|55000

A simple approach:
>>> import re
>>> exp = '\$\d+(,|\.)?\d+'
>>> s = '$1,000|hi,you|$45.43'
>>> '|'.join(i.translate(None, '$,') if re.match(exp, i) else i for i in s.split('|'))
'1000|hi,you|45.43'

It sounds like you are addressing the entire line of text at once. I think your first task would be to break up your string by columns into an array or some other variables. Once you've don that, your solution for converting strings of currency into numbers doesn't have to worry about the other fields.
Once you've done that, I think there is probably an easier way to do this task than with regular expressions. You could start with this SO question.
If you really want to use regex though, then this pattern should work for you:
\[$,]\g
Demo on regex101
Replace matches with empty strings. The pattern gets a little more complicated if you have other kinds of currency present.

I Try this regex take if necessary.
\$(\d+)[\,]*([\.]*\d*)
SEE DEMO : http://regex101.com/r/wM0zB6/2

Use the regexx
((?<=\d),(?=\d))|(\$(?=\d))
eg
import re
>>> x="$1,000|hi,you|$45.43"
re.sub( r'((?<=\d),(?=\d))|(\$(?=\d))', r'', x)
'1000|hi,you|45.43'

Try the below regex and then replace the matched strings with \1\2\3
\$(\d+(?:\.\d+)?)(?:(?:,(\d{2}))*(?:,(\d{3})))?
DEMO

Defining a black list and checking if the characters are in it, is an easy way to do this:
blacklist = ("$", ",") # define characters to remove
with open('sample1_fixed.txt','wb') as f:
for line in open('sample1.txt', 'rb'):
clean_line = "".join(c for c in line if c not in blacklist)
f.write(clean_line)

\$(?=(?:[^|]+,)|(?:[^|]+\.))
Try this.Replace with empty string.Use re.M option.See demo.
http://regex101.com/r/gT6kI4/6

Python regex - faster search

I need a way to optimize by regex, here is the string I am working with:
rr='JA=3262SGF432643;KL=ASDF43TQ;ME=FQEWF43344;JA=4355FF;PE=FDSDFHSDF;EB=SFGDASDSD;JA=THISONE;IH=42DFG43;'
and i want to take only JA=4355FF which is before JA=THISONE, so i did it this way:
aa='.*JA=([^.]*)JA=THISONE[^.]*'
aa=re.compile(aa)
print (re.findall(aa,rr))
and i get:
['4355FF;PE=FDSDFHSDF;EB=SFGDASDSD;']
My first problem is slow searching apropriete part of string (becouse the string which i want to search is too large and usually JA=THISONE is at the end of string)
And second problem is i dont get 4355FF but all string until JA=THISONE.
Can someone help me optimize my regex? Thank you!

I. Consider using string search instead of regexes:
thisone_pos = rr.find('JA=THISONE')
range_start = rr.rfind("JA=", 0, thisone_pos) + 3
range_end = rr.find(';', range_start)
print rr[range_start:range_end]
II. Consider flipping the string and constructing your regex in reverse:
re.findall(pattern, rr[::-1])

You could consider the following solution:
import re
rr='JA=3262SGF432643;KL=ASDF43TQ;ME=FQEWF43344;JA=4355FF;PE=FDSDFHSDF;EB=SFGDASDSD;JA=THISONE;IH=42DFG43;'
m = re.findall( r"(JA=[^;]+;)", rr )
# Print all hits
print m
# Print the hit preceding "JA=THISONE;"
print m[ m.index( "JA=THISONE;" ) - 1]
First, you look for all instances starting with "JA;" and then, you pick the last instance located before "JA=THISONE;".

Regex + Python - Remove all lines beginning with a *

I want to remove all lines from a given file that begin with a *. So for example, the following:
* This needs to be gone
But this line should stay
*remove
* this too
End
Should generate this:
But this line should stay
End
What I ultimately need to do is the following:
Remove all text inside parenthesis and brackets (parenthesis/brackets included),
As mentioned above, remove lines starting with ''.
So far I was able to address #1 with the following: re.sub(r'[.?]|(.*?)', '', fileString). I tried several things for #2 but always end up removing things I don't want to
Solution 1 (no regex)
>>> f = open('path/to/file.txt', 'r')
>>> [n for n in f.readlines() if not n.startswith('*')]
Solution 2 (regex)
>>> s = re.sub(r'(?m)^\*.*\n?', '', s)
Thanks everyone for the help.

Using regex >>
s = re.sub(r'(?m)^\*.*\n?', '', s)
Check this demo.

You don't need regex for this.
text = file.split('\n') # split everything into lines.
for line in text:
# do something here
Let us know if you need any more help.

You should really give more information here. At the minimum, what version of python you are using and a code snippet. But, that said, why do you need a regular expression? I don't see why you can't just use startswith.
The following works for me with Python 2.7.3
s = '* this line gotta go!!!'
print s.startswith('*')
>>>True

>>> f = open('path/to/file.txt', 'r')
>>> [n for n in f.readlines() if not n.startswith('*')]
['But this line should stay\n', 'End\n']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

regex help to remove a + - python

phone = "11+Samsung-Galaxy-S8+-4469+a2:cc:2b:8d:xx:xx" print(phone.replace("S8+","S8")) >>>11+Samsung-Galaxy-S8-4469+a2:cc:2b:8d:xx:xx

Your desire regex is: ^(\d+)\+(.*)\+(([\w\d]{2}\:){5}[\d\w]{2})$ Then you can use python to remove every '+' sign in the second regex group

Python solution: s = '11+Samsung-Galaxy-S8+-4469+a2:cc:2b:8d:xx:xx' if s.count('+') > 2: parts = s.split('+') s = '{}+{}+{}'.format(parts[0], ''.join(parts[1:-1]), parts[-1]) print(s) The output: 11+Samsung-Galaxy-S8-4469+a2:cc:2b:8d:xx:xx

Related

Better way to change specific chars in a string separated by underscores without using re

Parsing a MAC address with python

In Python how to strip dollar signs and commas from dollar related fields only

Python regex - faster search

Regex + Python - Remove all lines beginning with a *

Categories

Resources