python string manipulation, finding a substring within a string [duplicate]

python string manipulation, finding a substring within a string [duplicate] - python

This question already has answers here:
How do I get a substring of a string in Python? [duplicate]
(16 answers)
Closed 8 years ago.
I am trying to find a substring within a larger string in python. I am trying to find the text present after the string "Requests per second:" is found. It seems my knowledge of python strings and python in general is lacking.
My error is on the 3rd line of code minusStuffBeforeReqPer = output[reqPerIndx[0], len(output)], I get the error that without the [0] on reqPerIndx I am trying to access a tuple, but with it I get the error that I int object has no attribute __getitem__. I am trying to find the index of the start of the reqPerStr in the output string.
The code
#output contains the string reqPerStr.
reqPerStr = "Requests per second:"
reqPerIndx = output.find(reqPerStr)
minusStuffBeforeReqPer = output[reqPerIndx[0], len(output)]
eolIndx = minusStuffBeforeReqPer.find("\n")
semiColIndx = minusStuffBeforeReqPer.find(":")
instanceTestObj.reqPerSec = minusStuffBeforeReqPer[semiColIndx+1, eolIndx]

You must use output[begin:end], not output[begin, end] (that's just how the syntax for slicing ordinary strings/lists/etc works). So:
minusStuffBeforeReqPer = output[reqPerIndx:len(output)]
However, this is redundant. So you should instead probably do this:
minusStuffBeforeReqPer = output[reqPerIndx:]
By omitting the end part of the slice, the slice will go all the way to the end of output.
You get a error about accessing a tuple without the [0] because you have passed a tuple (namely (reqPerIndx, len(output)) to the slicing [...]), and you get an error about int having no __getitem__ because when you write reqPerIndx[0], you are trying to get the 0th element of reqPerIndx, which is an integer, but there is of course no such thing as the "0th element of an integer", because integers do not have elements.
As #AshwiniChaudhary points out in the comments, str.find will return -1 if the substring is not found. If you are certain that the thing you're looking for will always be found somewhere in output, I suppose you don't need to handle the -1 case, but it might be a good idea to do so anyway.
reqPerIndx = output.find(reqPerStr)
if reqPerIndx != -1:
minusStuffBeforeReqPer = ...
# etc
else:
# handle this case separately
You might have better luck with regexes. I don't know what output looks like, so I sort of just guessed - you should adapt this to match whatever you have in output.
>>> import re
>>> re.findall(r'(?:Requests per second:)\s*(\d+)', "Requests: 24")
[]
>>> re.findall(r'(?:Requests per second:)\s*(\d+)', "Requests per second: 24")
['24']

You have the error on those two lines:
minusStuffBeforeReqPer = output[reqPerIndx[0], len(output)]
instanceTestObj.reqPerSec = minusStuffBeforeReqPer[semiColIndx+1, eolIndx]
You have to use the : to create a range. start:end.
You can omit the last parameter to get to the end or omit the first parameter to omit the begining. The parameters can be negative number too. Since find might return -1 you'll have to handle it differently because if the string isn't found, you'll end up with:
minusStuffBeforeReqPer = output[-1:]
Which is the last char in the string.
You should have code that looks like this:
#output contains the string reqPerStr.
reqPerStr = "Requests per second:"
reqPerIndx = output.find(reqPerStr)
if reqPerIndx != -1:
minusStuffBeforeReqPer = output[reqPerIndx[0]:]
eolIndx = minusStuffBeforeReqPer.find("\n")
semiColIndx = minusStuffBeforeReqPer.find(":")
if eolIndx > semiColIndx >= 0:
instanceTestObj.reqPerSec = minusStuffBeforeReqPer[semiColIndx+1:eolIndx]
This is good but, you should definitely change the code with a regex. As I understand, you really want to match a string that starts with reqPerStr and ends with \n and get everything that is in between : and \n.
You could do that with such pattern:
"Requests per second:(.*)\n"
You'll end up with:
import re
reqPerIndx = output.find(reqPerStr)
match = re.match("Requests per second:(.*)\n", output)
if match:
instanceTestObj.reqPerSec = match.group(1)
If you want to find all matches, you can do that:
for match in re.finditer("Requests per second:(.*)", output)
instanceTestObj.reqPerSec = match.group(1)

Related

How to print multiple letters in a string when combining two prompts [duplicate]

This question already has answers here:
Understanding slicing
(38 answers)
Closed 29 days ago.
I want to get a new string from the third character to the end of the string, e.g. myString[2:end]. If omitting the second part means 'to the end', and if you omit the first part, does it start from the start?

>>> x = "Hello World!"
>>> x[2:]
'llo World!'
>>> x[:2]
'He'
>>> x[:-2]
'Hello Worl'
>>> x[-2:]
'd!'
>>> x[2:-2]
'llo Worl'
Python calls this concept "slicing" and it works on more than just strings. Take a look here for a comprehensive introduction.

Just for completeness as nobody else has mentioned it. The third parameter to an array slice is a step. So reversing a string is as simple as:
some_string[::-1]
Or selecting alternate characters would be:
"H-e-l-l-o- -W-o-r-l-d"[::2] # outputs "Hello World"
The ability to step forwards and backwards through the string maintains consistency with being able to array slice from the start or end.

Substr() normally (i.e. PHP and Perl) works this way:
s = Substr(s, beginning, LENGTH)
So the parameters are beginning and LENGTH.
But Python's behaviour is different; it expects beginning and one after END (!). This is difficult to spot by beginners. So the correct replacement for Substr(s, beginning, LENGTH) is
s = s[ beginning : beginning + LENGTH]

A common way to achieve this is by string slicing.
MyString[a:b] gives you a substring from index a to (b - 1).

One example seems to be missing here: full (shallow) copy.
>>> x = "Hello World!"
>>> x
'Hello World!'
>>> x[:]
'Hello World!'
>>> x==x[:]
True
>>>
This is a common idiom for creating a copy of sequence types (not of interned strings), [:]. Shallow copies a list, see Python list slice syntax used for no obvious reason.

Is there a way to substring a string in Python, to get a new string from the 3rd character to the end of the string?
Maybe like myString[2:end]?
Yes, this actually works if you assign, or bind, the name,end, to constant singleton, None:
>>> end = None
>>> myString = '1234567890'
>>> myString[2:end]
'34567890'
Slice notation has 3 important arguments:
start
stop
step
Their defaults when not given are None - but we can pass them explicitly:
>>> stop = step = None
>>> start = 2
>>> myString[start:stop:step]
'34567890'
If leaving the second part means 'till the end', if you leave the first part, does it start from the start?
Yes, for example:
>>> start = None
>>> stop = 2
>>> myString[start:stop:step]
'12'
Note that we include start in the slice, but we only go up to, and not including, stop.
When step is None, by default the slice uses 1 for the step. If you step with a negative integer, Python is smart enough to go from the end to the beginning.
>>> myString[::-1]
'0987654321'
I explain slice notation in great detail in my answer to Explain slice notation Question.

I would like to add two points to the discussion:
You can use None instead on an empty space to specify "from the start" or "to the end":
'abcde'[2:None] == 'abcde'[2:] == 'cde'
This is particularly helpful in functions, where you can't provide an empty space as an argument:
def substring(s, start, end):
"""Remove `start` characters from the beginning and `end`
characters from the end of string `s`.
Examples
--------
>>> substring('abcde', 0, 3)
'abc'
>>> substring('abcde', 1, None)
'bcde'
"""
return s[start:end]
Python has slice objects:
idx = slice(2, None)
'abcde'[idx] == 'abcde'[2:] == 'cde'

You've got it right there except for "end". It's called slice notation. Your example should read:
new_sub_string = myString[2:]
If you leave out the second parameter it is implicitly the end of the string.

text = "StackOverflow"
#using python slicing, you can get different subsets of the above string
#reverse of the string
text[::-1] # 'wolfrevOkcatS'
#fist five characters
text[:5] # Stack'
#last five characters
text[-5:] # 'rflow'
#3rd character to the fifth character
text[2:5] # 'rflow'
#characters at even positions
text[1::2] # 'tcOefo'

If myString contains an account number that begins at offset 6 and has length 9, then you can extract the account number this way: acct = myString[6:][:9].
If the OP accepts that, they might want to try, in an experimental fashion,
myString[2:][:999999]
It works - no error is raised, and no default 'string padding' occurs.

Well, I got a situation where I needed to translate a PHP script to Python, and it had many usages of substr(string, beginning, LENGTH).
If I chose Python's string[beginning:end] I'd have to calculate a lot of end indexes, so the easier way was to use string[beginning:][:length], it saved me a lot of trouble.

str1='There you are'
>>> str1[:]
'There you are'
>>> str1[1:]
'here you are'
#To print alternate characters skipping one element in between
>>> str1[::2]
'Teeyuae'
#To print last element of last two elements
>>> str1[:-2:-1]
'e'
#Similarly
>>> str1[:-2:-1]
'e'
#Using slice datatype
>>> str1='There you are'
>>> s1=slice(2,6)
>>> str1[s1]
'ere '

Maybe I missed it, but I couldn't find a complete answer on this page to the original question(s) because variables are not further discussed here. So I had to go on searching.
Since I'm not yet allowed to comment, let me add my conclusion here. I'm sure I was not the only one interested in it when accessing this page:
>>>myString = 'Hello World'
>>>end = 5
>>>myString[2:end]
'llo'
If you leave the first part, you get
>>>myString[:end]
'Hello'
And if you left the : in the middle as well you got the simplest substring, which would be the 5th character (count starting with 0, so it's the blank in this case):
>>>myString[end]
' '

Using hardcoded indexes itself can be a mess.
In order to avoid that, Python offers a built-in object slice().
string = "my company has 1000$ on profit, but I lost 500$ gambling."
If we want to know how many money I got left.
Normal solution:
final = int(string[15:19]) - int(string[43:46])
print(final)
>>>500
Using slices:
EARNINGS = slice(15, 19)
LOSSES = slice(43, 46)
final = int(string[EARNINGS]) - int(string[LOSSES])
print(final)
>>>500
Using slice you gain readability.

a="Helloo"
print(a[:-1])
In the above code, [:-1] declares to print from the starting till the maximum limit-1.
OUTPUT :
>>> Hello
Note: Here a [:-1] is also the same as a [0:-1] and a [0:len(a)-1]
a="I Am Siva"
print(a[2:])
OUTPUT:
>>> Am Siva
In the above code a [2:] declares to print a from index 2 till the last element.
Remember that if you set the maximum limit to print a string, as (x) then it will print the string till (x-1) and also remember that the index of a list or string will always start from 0.

I have a simpler solution using for loop to find a given substring in a string.
Let's say we have two string variables,
main_string = "lullaby"
match_string = "ll"
If you want to check whether the given match string exists in the main string, you can do this,
match_string_len = len(match_string)
for index,value in enumerate(main_string):
sub_string = main_string[index:match_string_len+index]
if sub_string == match_string:
print("match string found in main string")

How do I remove specific elements from a string without using replace() method?

I'm trying to traverse through a string using its indices and remove specific elements. Due to string length getting shorter as elements are removed, it always goes out of range by the time the final element is reached.
Here's some code to ilustrate what I'm trying to do. For example, going from "1.2.3.4" to "1234".
string = "1.2.3.4"
for i in range(len(string)):
if string[i] == ".":
string = string[:i] + string[i+1:]
I know there are alternate approaches like using string method called replace() and I can run string = string.replace(string[i], "", 1) OR I can traverse through individual elements (not indicies).
But how would I solve it using the approach above (traversing string indices)? What techniques can I use to halt the loop after it reaches the final element of the string? Without continuing to advance the index, which will go out of range as elements are removed earlier in the string.

Use this:
string = "1.2.3.4"
res = ""
for s in string:
if s != '.':
res += s
The result is of course '1234'.

you can use the re module:
import re
string = "1.2.3.4"
string = re.sub('\.','',string)
print(string)

If I understand correctly, you want to modify a string by its index while the length of it keep changing.
That's pretty dangerous.
The problem you ran into is caused by range(len(string)).See, once the range is fixed, it won't change.And in the loop, string changes, it gets shorter, and that's why you got out of range error.
So what you want to do is to track the string while looping, and use if-else to find the '.'s, here is an example:
string = '1.2.3.4'
i = 0
while i < len(string):
if string[i] == '.':
string = string[:i] + string[i+1:]
else:
i += 1
Still, there are plenty of ways to deal with your string, don't use this, this is not good.

it could be done like this (with a try/except block), but that's not really a great way to approach this problem (or any problem)
string = "1.2.3.4"
for i in range(len(string)):
try:
if string[i] == ".":
string= string[:i]+string[i+1:]
except:
IndexError
result is 1234
The only real change of course is that by adding a try/except around our loop, we save ourselves from the IndexError that would normally come up once we try to access an element in the string that is now out of bounds
Once that happens, the Exception is caught and we simply exit the loop with our finished string

Python - Capture string with or without specific character

I am trying to capture the sentence after a specific word. Each sentences are different in my code and those sentence doesn't necessarily have to have this specific word to split by. If the word doesn't appear, I just need like blank string or list.
Example 1: working
my_string="Python is a amazing programming language"
print(my_string.split("amazing",1)[1])
programming language
Example 2:
my_string="Java is also a programming language."
print(my_string.split("amazing",1)[1]) # amazing word doesn't appear in the sentence.
Error: IndexError: list index out of range
Output needed :empty string or list ..etc.
I tried something like this, but it still fails.
my_string.split("amazing",1)[1] if my_string.split("amazing",1)[1] == None else my_string.split("amazing",1)[1]

When you use the .split() argument you can specify what part of the list you want to use with either integers or slices. If you want to check a specific word in your string you can do is something like this:
my_str = "Python is cool"
my_str_list = my_str.split()
if 'cool' in my_str_list:
print(my_str)`
output:
"Python is cool"
Otherwise, you can run a for loop in a list of strings to check if it finds the word in multiple strings.

You have some options here. You can split and check the result:
tmp = my_string.split("amazing", 1)
result = tmp[1] if len(tmp) > 1 else ''
Or you can check for containment up front:
result = my_string.split("amazing", 1)[1] if 'amazing' in my_string else ''
The first option is more efficient if most of the sentences have matches, the second one if most don't.
Another option similar to the first is
result = my_string.split("amazing", 1)[-1]
if result == my_string:
result = ''
In all cases, consider doing something equivalent to
result = result.lstrip()

Instead of calling index 1, call index -1. This calls the last item in the list.
my_string="Java is also a programming language."
print(my_string.split("amazing",1)[1])
returns ' programming language.'

Algorithm to extract number of varying length from title of file

I have a list of 400,000 file names (column in excel) of the format
xxx.Number.Date.zzz.txt
and I want to extract the Number from the string
Normally I would just set it to take the 5th through 9th character in that string, but the numbers vary in length (2 - 4 digits) and I am not sure how to design an algorithm that can tell how long the number is.
Using python3 if anyone is interested, but really I just need help with the pseudocode
I looked at this previous question, but it did not really answer the question in terms that I can use since it seems like it is using bash functions or I did not understand the explanation:
Extract number of variable length from string

If the format of the file is always xxx.Number.Date.zzz.txt, and we only care about Number, then you could convert the string into a list, and then extract the 1st element of that list. Example:
file = "xxx.4432.Date.zzz.txt"
num = file.split(".")[1]
print(num) # prints 4432
You could write this in a loop to go through your Excel column (check out openpyxl if you haven't yet).

You can use a regular expression (available in most languages):
.*?\.(\d+)\.
which matches the number between the first two dots:
import re
re.match('.*?\.(\d+)\.', 'xxx.12345.Date.zzz.txt').group(1)
#'12345'
An explanation on regex101.
This can also be done in pure Python (easily translatable to other languages):
s = 'xxx.12345.Date.zzz.txt'
out = ''
in_num = False
for c in s:
if in_num:
if c == '.':
break
out += c
elif c == '.':
in_num = True
giving out as: '12345'.
Note that with this second method, we do not verify that the characters between the first fullstops are digits.

How do I find the first of a few characters in a string in python

How do I find the first of a few characters in a string in python? I have used find() and index() but they find only one character. How do I find the first position of a single character out of the few characters I want to be searched for?
So I want to find the position of the first operator(out of the 4 arithmetic operators) in an inputted string else it should return -1.
Sorry if this is a very stupid question but I have been searching and trying out multiple options over the past few days. I am also a beginner in python.
I tried this but i know its wrong:
>>> str1 ='12-23+23*12/12'
>>> str1.find('+') or str1.find('-') or str1.find('*') or str1.find('/')
This returns the first operator shown that is the + operator.
Also, I have tried
for x in str1:
if (x=='+' or x=='-' or x=='*' or x=='/'):
print(str1[x])
I know this is wrong.
I am a beginner and I'm trying to learn over a summer course I have taken. So I have not much knowledge on the topic.

str1 ='12-23+23*12/12+65'
while(1):
if('+' not in str2): #Taking '+' as an example
break;
int found = str1.find(+)
print(found)
str2 = str1.replace('+','',1)
#str2 = str1[:found]+str1[found+1:]
There is another way to do what I did in the last line of the code. I have added it in the comment above.
str2 = str1[:found]+str1[found+1:]

you can do do something like below:
validInput = set(['+','-','*','/'])
checkString = '12-23+23*12*12'
def checkInput(input):
if input not in validInput:
raise Exception
return input
def findSign(sign,string):
sign = checkInput(sign)
if sign in string:
return string.find(sign)
return -1
print(findSign('/',checkString))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python string manipulation, finding a substring within a string [duplicate] - python

Related

How to print multiple letters in a string when combining two prompts [duplicate]

How do I remove specific elements from a string without using replace() method?

Python - Capture string with or without specific character

Algorithm to extract number of varying length from title of file

How do I find the first of a few characters in a string in python

Categories

Resources