I have a list of strings and i would like to extract : "000000_5.612230" of :
A = '/calibration/test_min000000_5.612230.jpeg'
As the size of the strings could evolve, I try with monitoring the position of "n" of "min". I try to get the good index with :
print sorted(A, key=len).index('n')
But i got "11" which corresponds to the "n" of "calibration". I would like to know how to get the maximum index value of the string?
it is difficult to answer since you don't specify what part of the filename remains constant and what is subject to change. is it always a jpeg? is the number always the last part? is it always preceded with '_min' ?
in any case, i would suggest using a regex instead:
import re
A = '/calibration/test_min000000_5.612230.jpeg'
p = re.compile('.*min([_\d\.]*)\.jpeg')
value = p.search(A).group(1)
print value
output :
000000_5.612230
note that this snippet assumes that a match is always found, if the filename doesn't contain the pattern then p.search(...) will return None and an exception will be raised, you'll check for that case.
You can use re module and the regex to do that, for example:
import re
A = '/calibration/test_min000000_5.612230.jpeg'
text = re.findall('\d.*\d', A)
At now, text is a list. If you print it the output will be like this: ['000000_5.612230']
So if you want to extract it, just do this or use for:
import re
A = '/calibration/test_min000000_5.612230.jpeg'
text = re.findall('\d.*\d', A)
print text[0]
String slicing seems like a good solution for this
>>> A = '/calibration/test_min000000_5.612230.jpeg'
>>> start = A.index('min') + len('min')
>>> end = A.index('.jpeg')
>>> A[start:end]
'000000_5.612230'
Avoids having to import re
Try this (if extension is always '.jpeg'):
A.split('test_min')[1][:-5]
If your string is regular at the end, you can use negative indices to slice the string:
>>> a = '/calibration/test_min000000_5.612230.jpeg'
>>> a[-20:-5]
'000000_5.612230'
Related
How can I copy data from changing string?
I tried to slice, but length of slice is changing.
For example in one case I should copy number 128 from string '"edge_liked_by":{"count":128}', in another I should copy 15332 from "edge_liked_by":{"count":15332}
You could use a regular expression:
import re
string = '"edge_liked_by":{"count":15332}'
number = re.search(r'{"count":(\d*)}', string).group(1)
Really depends on the situation, however I find regular expressions to be useful.
To grab the numbers from the string without caring about their location, you would do as follows:
import re
def get_string(string):
return re.search(r'\d+', string).group(0)
>>> get_string('"edge_liked_by":{"count":128}')
'128'
To only get numbers from the *end of the string, you can use an anchor to ensure the result is pulled from the far end. The following example will grab any sequence of unbroken numbers that is both preceeded by a colon and ends within 5 characters of the end of the string:
import re
def get_string(string):
rval = None
string_match = re.search(r':(\d+).{0,5}$', string)
if string_match:
rval = string_match.group(1)
return rval
>>> get_string('"edge_liked_by":{"count":128}')
'128'
>>> get_string('"edge_liked_by":{"1321":1}')
'1'
In the above example, adding the colon will ensure that we only pick values and don't match keys such as the "1321" that I added in as a test.
If you just want anything after the last colon, but excluding the bracket, try combining split with slicing:
>>> '"edge_liked_by":{"count":128}'.split(':')[-1][0:-1]
'128'
Finally, considering this looks like a JSON object, you can add curly brackets to the string and treat it as such. Then it becomes a nested dict you can query:
>>> import json
>>> string = '"edge_liked_by":{"count":128}'
>>> string = '{' + string + '}'
>>> string = json.loads(string)
>>> string.get('edge_liked_by').get('count')
128
The first two will return a string and the final one returns a number due to being treated as a JSON object.
It looks like the type of string you are working with is read from JSON, maybe you are getting it as the output of some API you are working with?
If it is JSON, you've probably gone one step too far in atomizing it to a string like this. I'd work with the original output, if possible, if I were you.
If not, to make it more JSON like, I'd convert it to JSON by wrapping it in {}, and then working with the json.loads module.
import json
string = '"edge_liked_by":{"count":15332}'
string = "{"+string+"}"
json_obj = json.loads(string)
count = json_obj['edge_liked_by']['count']
count will have the desired output. I prefer this option to using regular expressions because you can rely on the structure of the data and reuse the code in case you wish to parse out other attributes, in a very intuitive way. With regular expressions, the code you use will change if the data are decimal, or negative, or contain non-numeric characters.
Does this help ?
a='"edge_liked_by":{"count":128}'
import re
b=re.findall(r'\d+', a)[0]
b
Out[16]: '128'
I have a spreadsheet with text values like A067,A002,A104. What is most efficient way to do this? Right now I am doing the following:
str = 'A067'
str = str.replace('A','')
n = int(str)
print n
Depending on your data, the following might be suitable:
import string
print int('A067'.strip(string.ascii_letters))
Python's strip() command takes a list of characters to be removed from the start and end of a string. By passing string.ascii_letters, it removes any preceding and trailing letters from the string.
If the only non-number part of the input will be the first letter, the fastest way will probably be to slice the string:
s = 'A067'
n = int(s[1:])
print n
If you believe that you will find more than one number per string though, the above regex answers will most likely be easier to work with.
You could use regular expressions to find numbers.
import re
s = 'A067'
s = re.findall(r'\d+', s) # This will find all numbers in the string
n = int(s[0]) # This will get the first number. Note: If no numbers will throw exception. A simple check can avoid this
print n
Here's some example output of findall with different strings
>>> a = re.findall(r'\d+', 'A067')
>>> a
['067']
>>> a = re.findall(r'\d+', 'A067 B67')
>>> a
['067', '67']
You can use the replace method of regex from re module.
import re
regex = re.compile("(?P<numbers>.*?\d+")
matcher = regex.search(line)
if matcher:
numbers = int(matcher.groupdict()["numbers"] #this will give you the numbers from the captured group
import string
str = 'A067'
print (int(str.strip(string.ascii_letters)))
Let's say I have a string like this:
string1 = 'bla/bla1/blabla/bla2/bla/bla/wowblawow1'
I need to take the text after the last '/' and delete everything else:
string2 = 'wowblawow1'
Is there any method I could use?
string1 = 'bla/bla1/blabla/bla2/bla/bla/wowblawow1'
string2 = string1.split(r'/')[-1] # Out[2]: 'wowblawow1'
see https://docs.python.org/2/library/stdtypes.html#str.split to see how it works. But as #Emilien suggested, if are looking for extracting basename, use os.path: https://docs.python.org/2/library/os.path.html
Or maybe you are even looking for this?
>>> import os
>>> os.path.basename("/var/log/syslog")
'syslog'
>>> os.path.dirname("/var/log/syslog")
'/var/log'
I generally use os.path.basename when dealing with forward slashes.
I know this may not be the most practical way, but for generally trying to locate the content after the last occurrence of something:
string1 = 'bla/bla1/blabla/bla2/bla/bla/wowblawow1'
index = (len(string1)-1) - string1[::-1].find('/')
string1 = string1[index+1:]
deatils:
string1[::-1] # reverse the string
string1[::-1].find(my_string_to_search_for) # gets the index of the first occurance of the argument in the string.
(len(string1)-1) # the maximum index value
(len(string1)-1) - string[::-1].find(my_string_to_search_for) # the index as taken from the front of the string
string1 = string1[index+1:] # gives the substring of everything after the index of the last occurance of your string
You could make the code more readable by doing something like:
def get_last_index_of(string,search_content):
return (len(string)-1) - string[::-1].find(search_content)
string1 = 'bla/bla1/blabla/bla2/bla/bla/wowblawow1'
string1 = string1[get_last_index_of('/')+1:]
i was wondering if anyone has a simpler solution to extract a few letters in the middle of a string. i want to retrive the 3 letters (in this case, GMB) and all the entries follow the same patter. i'struggling o get a simpler way of doing this.
here is an example of what i've been using.
entry = "entries-alphabetical.jsp?raceid13=GMB$20140313A"
symbol = entry.strip('entries-alphabetical.jsp?raceid13=')
symbol = symbol[0:3]
print symbol
thanks
First of all the argument passed to str.strip is not prefix or suffix, it is just a combination of characters that you want to be stripped off from the string.
Since the string looks like an url, you can use urlparse.parse_qsl:
>>> import urlparse
>>> urlparse.parse_qsl(entry)
[('entries-alphabetical.jsp?raceid13', 'GMB$20140313A')]
>>> urlparse.parse_qsl(entry)[0][1][:3]
'GMB'
This is what regular expressions are for. http://docs.python.org/2/library/re.html
import re
val = re.search(r'(GMB.*)', entry)
print val.group(1)
I want to write a simple regular expression in Python that extracts a number from HTML. The HTML sample is as follows:
Your number is <b>123</b>
Now, how can I extract "123", i.e. the contents of the first bold text after the string "Your number is"?
import re
m = re.search("Your number is <b>(\d+)</b>",
"xxx Your number is <b>123</b> fdjsk")
if m:
print m.groups()[0]
Given s = "Your number is <b>123</b>" then:
import re
m = re.search(r"\d+", s)
will work and give you
m.group()
'123'
The regular expression looks for 1 or more consecutive digits in your string.
Note that in this specific case we knew that there would be a numeric sequence, otherwise you would have to test the return value of re.search() to make sure that m contained a valid reference, otherwise m.group() would result in a AttributeError: exception.
Of course if you are going to process a lot of HTML you want to take a serious look at BeautifulSoup - it's meant for that and much more. The whole idea with BeautifulSoup is to avoid "manual" parsing using string ops or regular expressions.
import re
x = 'Your number is <b>123</b>'
re.search('(?<=Your number is )<b>(\d+)</b>',x).group(0)
this searches for the number that follows the 'Your number is' string
import re
print re.search(r'(\d+)', 'Your number is <b>123</b>').group(0)
The simplest way is just extract digit(number)
re.search(r"\d+",text)
val="Your number is <b>123</b>"
Option : 1
m=re.search(r'(<.*?>)(\d+)(<.*?>)',val)
m.group(2)
Option : 2
re.sub(r'([\s\S]+)(<.*?>)(\d+)(<.*?>)',r'\3',val)
import re
found = re.search("your number is <b>(\d+)</b>", "something.... Your number is <b>123</b> something...")
if found:
print found.group()[0]
Here (\d+) is the grouping, since there is only one group [0] is used. When there are several groupings [grouping index] should be used.
To extract as python list you can use findall
>>> import re
>>> string = 'Your number is <b>123</b>'
>>> pattern = '\d+'
>>> re.findall(pattern,string)
['123']
>>>
You can use the following example to solve your problem:
import re
search = re.search(r"\d+",text).group(0) #returns the number that is matched in the text
print("Starting Index Of Digit", search.start())
print("Ending Index Of Digit:", search.end())
import re
x = 'Your number is <b>123</b>'
output = re.search('(?<=Your number is )<b>(\d+)</b>',x).group(1)
print(output)