Parse a string to find and remove a float - python

I am created a pythod method that will take in a string of variable length, that will always include a floating point number at the end :
"adsfasdflkdslf:asldfasf-adslfk:1.5698464586546"
OR
"asdif adfi=9393 adfkdsf:1.84938"
I need to parse the string and return the floating point number at the end. There usually a delimiter character before the float, such as : - or a space.
def findFloat(stringArg):
stringArg.rstrip()
stringArg.replace("-",":")
if stringArg.rfind(":"):
locateFloat = stringArg.rsplit(":")
#second element should be the desired float
magicFloat = locateFloat[1]
return magicFloat
I am recieving a
magicFloat = locateFloat[1]
IndexError: list index out of range
Any guidence on how to locate the float and return it would be awesome.

In Python, strings are immutable. No matter what function you call on a string, the actual text of that string does not change. Thus, methods like rstrip, replace etc. create a new string representing the modified version. (You would know this if you read the documentation.) In your code, you do not assign the results of these calls anywhere in the first two statements, so the results are lost.
Without specifying a number of splits, rsplit does the exact same thing that split does. It checks for splits from the end, sure, but it still splits at every possible point, so the net effect is the same. You need to specify that you want to split at most one time.
However, you shouldn't do that anyway; a much simpler way to get "everything after the last colon, or everything if there is no colon" is to use rpartition.
You don't actually have to remove whitespace from the end for float conversion. Although you probably should actually, you know, perform the conversion.
Finally, there is no point in assigning to a variable just to return it; just return the expression directly.
Putting that together gives us the exceptionally simple:
def findFloat(stringArg):
return float(stringArg.replace('-', ':').rpartition(':')[2])

re always rocks. Depending on what your floating point number looks like (leading 0?) something like:
magicFloat = re.search('.*([0-9]\.[0-9]+)',st).group(1)
p.s. if you do this a lot, precompile the regex first:
re_float = re.compile('.*([0-9]\.[0-9]+)')
# later in your code
magicFloat = re_float.search(st).group(1)

You could do it in an easier manner:
def findFloat(stringArg):
s = stringArg.rstrip()
return s.split('-:')[-1]
rstrip() will return the stripped string, you must store it somewhere
split() can take multiple token, you can avoid the replace then
rsplit() is an optimization, but split()[-1] will always take the latest element in the split list
locateFloat is not defined if no rfind() is found
in you need to find a char, you could write if ':' in stringArg: instead.
Hope thoses tips would help you later :)

If it's always at the end you should use $ in your re:
import re
def findFloat(stringArg):
fl = re.search(r'([\.0-9]+)$', stringArg)
return fl and float(fl.group(1))

You can use regular expressions.
>>> st = "adsfasdflkdslf:asldfasf-adslfk:1.5698464586546"
>>> float(re.split(r':|\s|-',st)[-1])
1.5698464586545999
I have used re.split(pattern, string, maxsplit=0, flags=0) which split string by the occurrences of pattern.
Here pattern is your delimiter like :,white-space(\s),-.

Related

How to find the indexes of certain character not in quotes in Python?

I ultimately want to split a string by a certain character. I tried Regex, but it started escaping \, so I want to avoid that with another approach (all the attempts at unescaping the string failed). So, I want to get all positions of a character char in a string that is not within quotes, so I can split them up accordingly.
For example, given the phase hello-world:la\test, I want to get back 11 if char is :, as that is the only : in the string, and it is in the 11th index. However, re does split it, but I get ['hello-world,lat\\test'].
EDIT:
#BoarGules made me realize that re didn't actually change anything, but it's just how Python displays slashes.
Here's a function that works:
def split_by_char(string,char=':'):
PATTERN = re.compile(rf'''((?:[^\{char}"']|"[^"]*"|'[^']*')+)''')
return [string[m.span()[0]:m.span()[1]] for m in PATTERN.finditer(string)]
string = 'hello-world:la\test'
char = ':'
print(string.find(char))
Prints
11
char_index = string.find(char)
string[:char_index]
Returns
'hello-world'
string[char_index+1:]
Returns
'la\test'
Solution for the case you're likely encountering (a pseudo-CSV format you're hand-rolling a parser for; if you're not in that situation, it's still a likely situation for people finding this question later):
Just use the csv module.
import csv
import io
test_strings = ['field1:field2:field3', 'field1:"field2:with:embedded:colons":field3']
for s in test_strings:
for row in csv.reader(io.StringIO(s), delimiter=':'):
print(row)
Try it online!
which outputs:
['field1', 'field2', 'field3']
['field1', 'field2:with:embedded:colons', 'field3']
correctly ignoring the colons within the quoted field, requiring no kludgy, hard-to-verify hand-written regexes.

Format a string to a proper JSON object

I have a string (from an API call) that looks something like this:
val=
{input:a,matches:[{in:["w","x","y","z"],output:{num1:0d-2,num2:7.0d-1}},
{in:["w","x"],output:{num1:0d-2,num2:8.0d-1}}]}
I need to do temp=json.loads(val); but the problem is that the string is not a valid JSON. The keys and values do not have the quotes around them. I tried explicitly putting the quotes and that worked.
How can I programatically include the quotes for such a string before reading it as a JSON?
Also, how can I replace the numbers scientific notations with decimals? eg. 0d-2 becomes "0" and 8.0d-1 becomes "0.8"?
You could catch anything thats a string with regex and replace it accordingly.
Assuming your strings that need quotes:
start with a letter
can have numbers at the end
never start with numbers
never have numbers or special characters in between them
This would be a regex code to catch them:
([a-z]*\d*):
You can try it out here. Or learn more about regex here.
Let's do it in python:
import re
# catch a string in json
json_string = '{input:a,matches:[{in:["w","x","y","z"],output:{num1:0d-2,num2:7.0d-1}},
{in:["w","x"],output:{num1:0d-2,num2:8.0d-1}}]}' # note the single quotes!
# search the strings according to our rule
string_search = re.search('([a-z]*\d*):', json_string)
# extract the first capture group; so everything we matched in brackets
# this is to exclude the colon at the end from the found string as
# we don't want to enquote the colons as well
extracted_strings = string_search.group(1)
This is a solution in case you will build a loop later.
However if you just want to catch all possible strings in python as a list you can do simply the following instead:
import re
# catch ALL strings in json
json_string = '{input:a,matches:[{in:["w","x","y","z"],output:{num1:0d-2,num2:7.0d-1}},
{in:["w","x"],output:{num1:0d-2,num2:8.0d-1}}]}' # note the single quotes!
extract_all_strings = re.findall(r'([a-z]*\d*):', json_string)
# note that this by default catches only our capture group in brackets
# so no extra step required
This was about basically regex and finding everything.
With these basics you could either use re.sub to replace everything with itself just in quotes, or generate a list of replacements to verify first that everything went right (probably somethign you'd rather want to do with this maybe a little bit unstable approach) like this.
Note that this is why I made this kind of comprehensive answer instead of just pointing you to a "re.sub" one-liner.
You can apporach your scientific number notation problem accordingly.

Equivalent to vlookup for strings in Python?

Is there a way to search for a substring in a larger string, and then return a different substring x places left or right of the original substring?
I want to look through a string like "blahblahlink:"www.example.com"blahblah for the string "link:" and return the subsequent url.
Thanks!
Python 3, if that matters.
I think you should use regular expressions. There is module called re in python for that.
Pure Python solution would be to use index which tells you where in the string the first match occurs, then use the [start:end] slicing notation to select the string from that point:
"blahblahlink:www.example.comblahblah".index("link")
# returns 8
i = "blahblahlink:www.example.comblahblah".index("link")
"blahblahlink:www.example.comblahblah"[i+5:i+20]
# returns 'www.example.com'

Search a delimited string in a file - Python

I have the following read.json file
{:{"JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr","LAPTOP":"error"}
and python script :
import re
shakes = open("read.json", "r")
needed = open("needed.txt", "w")
for text in shakes:
if re.search('JOL":"(.+?).tr', text):
print >> needed, text,
I want it to find what's between two words (JOL":" and .tr) and then print it. But all it does is printing all the text set in "read.json".
You're calling re.search, but you're not doing anything with the returned match, except to check that there is one. Instead, you're just printing out the original text. So of course you get the whole line.
The solution is simple: just store the result of re.search in a variable, so you can use it. For example:
for text in shakes:
match = re.search('JOL":"(.+?).tr', text)
if match:
print >> needed, match.group(1)
In your example, the match is JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr, and the first (and only) group in it is EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD, which is (I think) what you're looking for.
However, a couple of side notes:
First, . is a special pattern in a regex, so you're actually matching anything up to any character followed by tr, not .tr. For that, escape the . with a \. (And, once you start putting backslashes into a regex, use a raw string literal.) So: r'JOL":"(.+?)\.tr'.
Second, this is making a lot of assumptions about the data that probably aren't warranted. What you really want here is not "everything between JOL":" and .tr", it's "the value associated with key 'JOL' in the JSON object". The only problem is that this isn't quite a JSON object, because of that prefixed :. Hopefully you know where you got the data from, and therefore what format it's actually in. For example, if you know it's actually a sequence of colon-prefixed JSON objects, the right way to parse it is:
d = json.loads(text[1:])
if 'JOL' in d:
print >> needed, d['JOL']
Finally, you don't actually have anything named needed in your code; you opened a file named 'needed.txt', but you called the file object love. If your real code has a similar bug, it's possible that you're overwriting some completely different file over and over, and then looking in needed.txt and seeing nothing changed each timeā€¦
If you know that your starting and ending matching strings only appear once, you can ignore that it's JSON. If that's OK, then you can split on the starting characters (JOL":"), take the 2nd element of the split array [1], then split again on the ending characters (.tr) and take the 1st element of the split array [0].
>>> text = '{:{"JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr","LAPTOP":"error"}'
>>> text.split('JOL":"')[1].split('.tr')[0]
'EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD'

in python find index in list if combination of strings exist

I'm writing my first script and trying to learn python.
But I'm stuck and can't get out of this one.
I'm writing a script to change file names.
Lets say I have a string = "this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv"
I want the result to be string = "This Is Test3 E00"
this is what I have so far:
l = list(string)
//Transform the string into list
for i in l:
if "E" in l:
p = l.index("E")
if isinstance((p+1), int () is True:
if isinstance((p+2), int () is True:
delp = p+3
a = p-3
del l[delp:]
new = "".join(l)
new = new.replace("."," ")
print (new)
get in index where "E" and check if after "E" there are 2 integers.
Then delete everything after the second integer.
However this will not work if there is an "E" anyplace else.
at the moment the result I get is:
this is tEst
because it is finding index for the first "E" on the list and deleting everything after index+3
I guess my question is how do I get the index in the list if a combination of strings exists.
but I can't seem to find how.
thanks for everyone answers.
I was going in other direction but it is also not working.
if someone could see why it would be awesome. It is much better to learn by doing then just coping what others write :)
this is what I came up with:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
anyone can tell me why this isn't working. I get an error.
Thank you so much
Have you ever heard of a Regular Expression?
Check out python's re module. Link to the Docs.
Basically, you can define a "regex" that would match "E and then two integers" and give you the index of it.
After that, I'd just use python's "Slice Notation" to choose the piece of the string that you want to keep.
Then, check out the string methods for str.replace to swap the periods for spaces, and str.title to put them in Title Case
An easy way is to use a regex to find up until the E followed by 2 digits criteria, with s as your string:
import re
up_until = re.match('(.*?E\d{2})', s).group(1)
# this.is.tEst3.E00
Then, we replace the . with a space and then title case it:
output = up_until.replace('.', ' ').title()
# This Is Test3 E00
The technique to consider using is Regular Expressions. They allow you to search for a pattern of text in a string, rather than a specific character or substring. Regular Expressions have a bit of a tough learning curve, but are invaluable to learn and you can use them in many languages, not just in Python. Here is the Python resource for how Regular Expressions are implemented:
http://docs.python.org/2/library/re.html
The pattern you are looking to match in your case is an "E" followed by two digits. In Regular Expressions (usually shortened to "regex" or "regexp"), that pattern looks like this:
E\d\d # ('\d' is the specifier for any digit 0-9)
In Python, you create a string of the regex pattern you want to match, and pass that and your file name string into the search() method of the the re module. Regex patterns tend to use a lot of special characters, so it's common in Python to prepend the regex pattern string with 'r', which tells the Python interpreter not to interpret the special characters as escape characters. All of this together looks like this:
import re
filename = 'this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv'
match_object = re.search(r'E\d\d', filename)
if match_object:
# The '0' means we want the first match found
index_of_Exx = match_object.end(0)
truncated_filename = filename[:index_of_Exx]
# Now take care of any more processing
Regular expressions can get very detailed (and complex). In fact, you can probably accomplish your entire task of fully changing the file name using a single regex that's correctly put together. But since I don't know the full details about what sorts of weird file names might come into your program, I can't go any further than this. I will add one more piece of information: if the 'E' could possibly be lower-case, then you want to add a flag as a third argument to your pattern search which indicates case-insensitive matching. That flag is 're.I' and your search() method would look like this:
match_object = re.search(r'E\d\d', filename, re.I)
Read the documentation on Python's 're' module for more information, and you can find many great tutorials online, such as this one:
http://www.zytrax.com/tech/web/regex.htm
And before you know it you'll be a superhero. :-)
The reason why this isn't working:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
...is because 'i' contains a character from the string 'l', not an integer. You compare it with 'E' (which works), but then try to add 1 to it, which errors out.

Categories

Resources