how should I delete everything on the string that starts with the letter I specify?
Let's say:
PHP/USD
I want to delete everything that starts with '/' so what's left would be 'PHP'
and how do I do the reverse? Starts so that USD would be the only one left?
I only found answers on deleting the middle or deleting string that starts with something, but no a PART of the string. What is the better approach on this?
is it something with oldstr.replace() replace documentation that I have misread?
Use re.sub and replace the pattern /.*$ with empty string:
import re
input = "PHP/USD"
line = re.sub(r"/.*$", "", input)
print line
# prints PHP
If you wanted the reverse, then use the pattern ^.*/.
Related
I am looking to remove the last statement in a rule used for parsing. The statements are encapsulated with # characters, and the rule itself is encapsulated with pattern tags.
What I want to do is just remove the last rule statement.
My current idea to achieve this goes like this:
Opens the rules file, saves each line as an element into a list.
Selects the line that contains the correct rule-id and then saves the rule pattern as a new string.
Reverses the saved rule pattern.
Removes the last rule statement.
Re-reverses the rule pattern.
Adds in the trailing pattern tag.
So the input will look like:
<pattern>#this is a statement# #this is also a statement#</pattern>
Output will look like:
<pattern>#this is a statement# </pattern>
My current attempt goes like this:
with open(rules) as f:
lines = f.readlines()
string = ""
for line in lines:
if ruleid in line:
position = lines.index(line)
string = lines[position + 2] # the rule pattern will be two lines down
# from where the rule-id is located, hence
# the position + 2
def reversed_string(a_string): #reverses the string
return a_string[::-1]
def remove_at(x): #removes everything until the # character
return re.sub('^.*?#','',x)
print(reversed_string(remove_at(remove_at(reversed_string(string)))))
This will reverse the string but not remove the last rule statement once it has been reversed.
Running just the reversed_string() function will successfully reverse the string, but trying to run that same string through the remove_at() function will not work at all.
But, if you manually create the input string (to the same rule pattern), and forgo opening and grabbing the rule pattern, it will successfully remove the trailing rule statement.
The successful code looks like this:
string = '<pattern>#this is a statement# #this is also a statement#</pattern>'
def reversed_string(a_string): #reverses the string
return a_string[::-1]
def remove_at(x): #removes everything until the # character
return re.sub('^.*?#','',x)
print(reversed_string(remove_at(remove_at(reversed_string(string)))))
As well, how would I add in the pattern tag after the removal is complete?
The lines you are reading probably have a \n at the end and that's why your replacement is not working. This question can guide you about reading the file without new lines.
Among the options, one could be removing the \n using rstrip() like this:
string = lines[position + 2].rstrip("\n")
Now, about the replacement, I think you could simplify it by using this regular expression:
#[^#]+#(?!.*#)
It consists of the following parts:
#[^#]+# matches one # followed by one or more characters that are not an # and then another #.
(?!.*#) is a negative lookahead to check that no # is found ahead, preceded by zero or more occurrences of any other character.
Here you can see a demo of this regular expression.
This expression should match the last statement and you would not need to reverse the string:
re.sub("#[^#]+#(?!.*#)", "", string)
I'm trying to replace a string element, but only if it doesn't have additional characters after the match, though the characters before the match can vary... For example, if I tokenize a name containing underscores, and I want to replace anything that ends with "R", but not elements that start with it... so it would replace "R", or "SideR", but not "Rear" because there are characters that follow after "R". I remember someone showing me something like this before, but can't find it. It was something akin to \n (but wasn't \n, which is a new line, there is no new line), but could be put at the end of a string to denote no further characters (There was ether one for the start... may have been the same thing for start or end).
test="New_R_SideR_Rear_Object"
tokens=test.split("_")
newtest=""
for each in tokens:
if "R" in each:
each=each.replace("R", "L")
newtest=(newtest+each+"_")
I'm positive there is something I can add to the end of the "if "R" in each" line, or the .replace line, that will allow me to ensure that "Rear" doesn't become "Lear", but both "R" and "SideR" doe get replaced.
The above is just simplified for ease of explanation. Thanks for your time
You can use a regular expression. The regular expression language provides a compact way to express how to match text. For your example:
$ python3
>>> import re
>>> test="New_R_SideR_Rear_Object"
>>> re.sub(r"R(_|\b)", r"L\1", test)
'New_L_SideL_Rear_Object'
>>>
I am trying to find if a "\n" character is in a string using this:
if "\n" in errors.text
This works fine for a string like "one\ntwo" but when the newline is at the end of the string like "one\n", it doesn't seem to work. I am using selenium to get this string from a website. Is it possible that it is not catching the newline at the end and simply not including it?
Or could this be the problem?
fixedText = errors.text.split("\n")[0]
I want the fixed text to remove all newlines and only get the first line of text. It works except for the case discussed above
If you want the fixed text to only be the first line in a string, you can do this:
if errors.text: # skips empty strings
fixedText = errors.text.split("\n")[0]
This is because split() is reasonably robust:
>>> 'a'.split()[0]
'a'
>>> 'a\n'.split()[0]
'a'
>>> 'a\n1'.split()[0]
'a'
>>> ''.split()
[]
That last example demonstrates why we check for an empty string before trying to index the resulting list.
I have the following read.json file
{:{"JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr","LAPTOP":"error"}
and python script :
import re
shakes = open("read.json", "r")
needed = open("needed.txt", "w")
for text in shakes:
if re.search('JOL":"(.+?).tr', text):
print >> needed, text,
I want it to find what's between two words (JOL":" and .tr) and then print it. But all it does is printing all the text set in "read.json".
You're calling re.search, but you're not doing anything with the returned match, except to check that there is one. Instead, you're just printing out the original text. So of course you get the whole line.
The solution is simple: just store the result of re.search in a variable, so you can use it. For example:
for text in shakes:
match = re.search('JOL":"(.+?).tr', text)
if match:
print >> needed, match.group(1)
In your example, the match is JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr, and the first (and only) group in it is EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD, which is (I think) what you're looking for.
However, a couple of side notes:
First, . is a special pattern in a regex, so you're actually matching anything up to any character followed by tr, not .tr. For that, escape the . with a \. (And, once you start putting backslashes into a regex, use a raw string literal.) So: r'JOL":"(.+?)\.tr'.
Second, this is making a lot of assumptions about the data that probably aren't warranted. What you really want here is not "everything between JOL":" and .tr", it's "the value associated with key 'JOL' in the JSON object". The only problem is that this isn't quite a JSON object, because of that prefixed :. Hopefully you know where you got the data from, and therefore what format it's actually in. For example, if you know it's actually a sequence of colon-prefixed JSON objects, the right way to parse it is:
d = json.loads(text[1:])
if 'JOL' in d:
print >> needed, d['JOL']
Finally, you don't actually have anything named needed in your code; you opened a file named 'needed.txt', but you called the file object love. If your real code has a similar bug, it's possible that you're overwriting some completely different file over and over, and then looking in needed.txt and seeing nothing changed each timeā¦
If you know that your starting and ending matching strings only appear once, you can ignore that it's JSON. If that's OK, then you can split on the starting characters (JOL":"), take the 2nd element of the split array [1], then split again on the ending characters (.tr) and take the 1st element of the split array [0].
>>> text = '{:{"JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr","LAPTOP":"error"}'
>>> text.split('JOL":"')[1].split('.tr')[0]
'EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD'
I was testing out a function that I wrote. It is supposed to give me the count of full stops (.) in a line or string. The full stop (.) that I am interested in counting has a tab space before and after it.
Here is what I have written.
def Seek():
a = '1 . . 3 .'
b = a.count(r'\t\.\t')
return b
Seek()
However, when I test it, it returns 0. From a, there are 2 full stops (.) with both a tab space before and after it. Am I using regular expressions improperly? Represented a incorrectly? Any help is appreciated.
Thanks.
It doesn't look like a has any tabs in it. Although you may have hit the tab key on your keyboard, that character would have been interpreted by the text editor as "insert a number of spaces to align with the next tab character". You need your line to look like this:
a = '1\t.\t.\t3\t.'
That should do it.
A more complete example:
from re import *
def Seek():
a = '1\t.\t.\t3\t\.'
re = compile(r'(?<=\t)\.(?=\t)');
return len(re.findall(a))
print Seek()
This uses "lookahead" and "lookbehind" to match the tab character without consuming it. What does that mean? It means that when you have \t.\t.\t, you will actually match both the first and the second \.. The original expression would have matched the initial \t\.\t and discarded them. After, there would have been a \. with nothing in front of it, and thus no second match. The lookaround syntax is "zero width" - the expression is tested but it ends up taking no space in the final match. Thus, the code snippet I just gave returns 2, just as you would expect.
It will work if you replace the '\t' with a single tab key press.
Note that count only counts non-overlapping occurrences of a substring so it won't work as intended unless you use regex instead, or change your substring to only test for a tab in front of the period.