I installed a plugin that will alphabetize blocks. I just need a way to select all the defs in a python file. So far I've got this regex.
This doesn't select the last line because there isn't any newline. I could enter a newline at the end, but I'd like to avoid that. In fact, ideally I'd like to avoid grabbing all the newlines above.
But I'm worried that if I don't grab the newline, then it won't match functions that have a blank line in the middle.
If there's a better way than what I'm trying--by selecting the blocks and using an alphabetizer plugin--then please suggest it. Otherwise, is there some way I can get the regex to match just the defs?
def.+(\n?\n.+)+
Will accomplish what you want. (Sublime seems to follow the usual "dot is not newline" convention)
Breaking down the components of the expression:
def.+ - match the def line, up to a newline
\n?\n.+ - match a newline, followed by some characters, optionally prepended by another newline (the prepend handles the case of an empty line in the middle of a def)
(...)+ - start a capture group, and match its pattern one or more times
(\n?\n.+)+ - combine the previous two pieces, so we match any sequence of non-empty lines with at most one empty line between any two non-empty lines (pedantically, any sequence of non-empty-line and empty-line-then-non-empty-line blocks)
The final + could be a * instead if it's permissable to match "empty" defs like
def empty():
Try this
^(\s*)(def.*(?:\n\1\s+.*|\n\s*)+$)
Related
I am looking to find all matches in a string and print all substrings until I match these strings to a new line.
e.g.
"123ABC97edfABCaaabbdd1234ABC0009ui50ABC_1234"
should print:
ABC97edf
ABCaaabbdd1234
ABC0009ui50
ABC_1234
where "ABC" is the pattern match which is recurring.
Is there an efficient way I can do so using findall?
New to Python here, using python version 2.4.3
Edit just an F.Y.I:
What I am trying to do is basically I have a 250+Gb file which has control characters showing start and end of line but these Ctrl Characters (because of issues.. mostly network) are embedded within these lines i.e. in between the start/end indicating control characters.
With that, there is no specific distinction between the start/end control chars and the ones that come in between these messages.
So I am basically removing these control chars, and have I wish to have a complete message per line pertaining to some specific regex.
The regex here is not necessarily ABC or in order for all of these messages.
I have tried using findall and am able to find all the matches, just I did not know how to get the strings following these until i find the next match. (the regex here can be either -ABC=35nga|DEF=64325:dfaf:1234| or **ABC=35632|DEF=61 and many different forms.
And I have to break for each line and for the ones which have multiple lines embededed within a line.
Using re.findall:
See the regex in action on regex101.
s = "123ABC97edfABCaaabbdd1234ABC0009ui50ABC_1234"
re.findall("ABC.*?(?=ABC|$)",s)
which gives a list:
['ABC97edf', 'ABCaaabbdd1234', 'ABC0009ui50', 'ABC_1234']
And if you wanted to print the elements in this list, you could simply do:
for sub in re.findall("ABC.*?(?=ABC|$)",s):
print(sub)
which would output:
ABC97edf
ABCaaabbdd1234
ABC0009ui50
ABC_1234
I am using a small function to loop over files so that any hyphens - get replaced by en-dashes – (alt + 0150).
The function I use adds some regex flavor to a solution in a related problem (how to replace a character INSIDE the text content of many files automatically?)
def mychanger(fileName):
with open(fileName,'r') as file:
str = file.read()
str = str.decode("utf-8")
str = re.sub(r"[^{]{1,4}(-)","–", str).encode("utf-8")
with open(fileName,'wb') as file:
file.write(str)
I used the regular expression [^{]{1,4}(-) because the search is actually performed on latex regression tables and I only want to replace the hyphens that occur around numbers.
To be clear: I want to replace all hyphens EXCEPT in cases where we have genuine latex code such as \cmidrule(lr){2-4}.
In this case there is a { close (within 3-4 characters max) to the hyphen and to the left of it. Of course, this hyphen should not be changed into an en-dash otherwise the latex code will break.
I think the left part condition of the exclusion is important to write the correct exception in regex. Indeed, in a regression table you can have things like -0.062\sym{***} (that is, a { on the close right of the hyphen) and in that case I do want to replace the hyphen.
A typical line in my table is
variable & -2.061\sym{***}& 4.032\sym{**} & 1.236 \\
& (-2.32) & (-2.02) & (-0.14)
However, my regex does not appear to be correct. For instance, a (-1.2) will be replaced as –1.2, dropping the parenthesis.
What is the problem here?
Thanks!
I can offer the following two step replacement:
str = "-1 Hello \cmidrule(lr){2-4} range 1-5 other stuff a-5"
str = re.sub(r"((?:^|[^{])\d+)-(\d+[^}])","\\1$\\2", str).encode("utf-8")
str = re.sub(r"(^|[^0-9])-(\d+)","\\1$\\2", str).encode("utf-8")
print(str)
The first replacement targets all ranges which are not of the LaTex form {1-9} i.e. are not contained within curly braces. The second replacement targets all numbers prepended with a non number or the start of the string.
Demo
re.sub replaces the entire match. In this case that includes the non-{ character preceding your -. You can wrap that bit in parentheses to create a \1 group and include that in your substitution (you also don't need parentheses around your –):
re.sub(r"([^{]{1,4})-",r"\1–", str)
Given the following code
print("aaa")
#print("bbb")
# print("ccc")
def doSomething():
print("doSomething")
How can I use regular expression in Atom text editor to find all the print functions that are not commented out? I mean I only want to match the prints in print("aaa") and print("doSomething").
I've tried [^#]print, but this also matches the print in # print("ccc"), which is something that is not desired.
[^# ]print doesn't match any line here.
The reason I want to do this is that I want to disable the log messages inside a legacy project written by others.
Since you confirm my first suggestion (^(?![ \t]*#)[ \t]*print) worked for you (I deleted that first comment), I believe you just want to find the print on single lines.
The \s matches any whitespace, incl. newline symbols. If you need to just match tabs or spaces, use a [ \t] character class.
Use
^[ \t]*print
or (a bit safer in order not to find any printers):
^[ \t]*print\(
I you want to match only the print (and not all arguments), you can use :
^\s*(print)
See this live sample : http://refiddle.com/refiddles/57b56c8075622d22e8080000
I have some config file from which I need to extract only some values. For example, I have this:
PART
{
title = Some Title
description = Some description here. // this 2 params are needed
tags = qwe rty // don't need this param
...
}
I need to extract value of certain param, for example description's value. How do I do this in Python3 with regex?
Here is the regex, assuming that the file text is in txt:
import re
m = re.search(r'^\s*description\s*=\s*(.*?)(?=(//)|$)', txt, re.M)
print(m.group(1))
Let me explain.
^ matches at beginning of line.
Then \s* means zero or more spaces (or tabs)
description is your anchor for finding the value part.
After that we expect = sign with optional spaces before or after by denoting \s*=\s*.
Then we capture everything after the = and optional spaces, by denoting (.*?). This expression is captured by parenthesis. Inside the parenthesis we say match anything (the dot) as many times as you can find (the asterisk) in a non greedy manner (the question mark), that is, stop as soon as the following expression is matched.
The following expression is a lookahead expression, starting with (?= which matches the thing right after the (?=.
And that thing is actually two options, separated by the vertical bar |.
The first option, to the left of the bar says // (in parenthesis to make it atomic unit for the vertical bar choice operation), that is, the start of the comment, which, I suppose, you don't want to capture.
The second option is $, meaning the end of the line, which will be reached if there is no comment // on the line.
So we look for everything we can after the first = sign, until either we meet a // pattern, or we meet the end of the line. This is the essence of the (?=(//)|$) part.
We also need the re.M flag, to tell the regex engine that we want ^ and $ match the start and end of lines, respectively. Without the flag they match the start and end of the entire string, which isn't what we want in this case.
The better approach would be to use an established configuration file system. Python has built-in support for INI-like files in the configparser module.
However, if you just desperately need to get the string of text in that file after the description, you could do this:
def get_value_for_key(key, file):
with open(file) as f:
lines = f.readlines()
for line in lines:
line = line.lstrip()
if line.startswith(key + " ="):
return line.split("=", 1)[1].lstrip()
You can use it with a call like: get_value_for_key("description", "myfile.txt"). The method will return None if nothing is found. It is assumed that your file will be formatted where there is a space and the equals sign after the key name, e.g. key = value.
This avoids regular expressions altogether and preserves any whitespace on the right side of the value. (If that's not important to you, you can use strip instead of lstrip.)
Why avoid regular expressions? They're expensive and really not ideal for this scenario. Use simple string matching. This avoids importing a module and simplifies your code. But really I'd say to convert to a supported configuration file format.
This is a pretty simple regex, you just need a positive lookbehind, and optionally something to remove the comments. (do this by appending ?(//)? to the regex)
r"(?<=description = ).*"
Regex101 demo
How do I use the ^ and $ symbols to parse only /blog/articles in the following?
I've created ex3.txt that contains this:
/blog/article/1
/blog/articles
/blog
/admin/blog/articles
and the regex:
^/blog/articles$
doesn't appear to work, as in when I type it using 'regetron' (see learning regex the hard way) there is no output on the next line.
This is my exact procedure:
At command line in the correct directory, I type: regetron ex3.txt. ex3.txt contains one line with the following:
/blog/article/1 /blog/articles /blog /admin/blog/articles
although I have tried it with newlines between entries.
I type in ^/blog/article/[0-9]$ and nothing is returned on the next line.
I try the first solution posted,^\/blog\/articles$ and nothing is returned.
Thanks in advance SOers!
Change your regex to:
^\/blog\/articles$
You need to escape your slashes.
Also, ensure there are no trailing spaces on the end of each line in your ex3.txt file.
Based on your update, it sounds like ^ and $ might not be the right operators for you. Those match the beginning and end of a line respectively. If you have multiple strings that you want to match on the same line, then you'll need something more like this:
(?:^|\s)(\/blog\/articles)(?:$|\s)
What this does:
(?:^|\s) Matches, but does not capture (?:), a line start (^) OR (|) a whitespace (\s)
(\/blog\/articles) Matches and captures /blog/articles.
(?:$|\s) Matches, but does not capture (?:), a line end ($) OR (|) a whitespace (\s)
This will work for both cases, but be aware that it will match (but will not capture) up to a single whitespace before and after /blog/articles.