Why does pep8 suggest using two spaces in comments? - python

PEP-8 states:
You should use two spaces after a sentence-ending period.
In my usual refactoring, I am used to replacing such consecutive double spaces with a single one, thinking that this habit has come from the typewriter days (I have went through this Wikipedia page briefly).
Also most of the times I have seen mono-space fonts being used for the programming, so it's much clearer than the other cases which can sometimes need 2 spaces to easily identify sentences.
Is there any reason behind this being used in PEP-8?

Only those who authored the PEP can answer the "why" with any degree of certainty.
I've had a look at the standard library source code, and my conclusion is that this particular aspect of the style guide is not followed consistently: some standard modules follow it and some don't.
Until you pointed it out, I've never heard of the double space convention, and have never noticed anyone following it.

The answer is simple: readability :)

The reasoning behind the double space still exists in code.
For people who believe that two spaces improves readability, the reasoning has become less relevant with regard to WYSIWYG editors with kerning. However, code is written in monospaced fonts, which means that if you want extra space between sentences, you have to put it there.
That being said, I prefer single spaces :)

Related

(python - cpp) - How to split the c++ codes while writing a lexical analyzer in python?

I wrote a lexical analyzer for cpp codes in python, but the problem is when I use input.split(" ") it won't recognize codes like x=2 or function() as three different tokens unless I add an space between them manually, like: x = 2 .
also it fails to recognize the tokens at the beginning of each line.
(if i add spaces between each two tokens and also at the beginning of each line, my code works correctly)
I tried splitting the code first by lines then by space but it got complicated and still I wasn't able to solve the first problem.
Also I thought about splitting it by operators, yet I couldn't actually implement it. plus I need the operators to be recognized as tokens as well, so this might not be a good idea.
I would appreciate it if anyone could give any solution or suggestion, Thank You.
f=open("code.txt")
input=f.read()
input=input.split(" ")
f=open("code.txt")
input=f.read()
input1=input.split("\n")
for var in input1:
var=var.split(" ")
Obviously, if you try to have success splitting such an expression like x=2 and also x = 2... it seems pretty obvious that isn't going to work.
What you are looking is for a solution that works with both right?
Basic solution is to use an and operator, and use the conditions that you need to parse. Note that this solution isn't scalable, neither fits into the category of good practices, but it can help you to figure out better but harder solutions.
if input.split(' ') and input.split('='):
An intermediate solution would be to use regex.
Regex isn't an easy topic, but you can checkout online documentation, and then you have wonderful online tools to check your regex codes.
Regex 101
The last one, would be to convert your input data into an AST, which stands for abstract syntax tree. This is the technique employed by C++ compilers like, for example, Clang.
This last one is a real hard topic, so for figure out a basic lexer, probably will be really time consuming, but maybe it could fit your needs.
The usual approach is to scan the incoming text from left to right. At each character position, the lexical analyser selects the longest string which fits some pattern for a "lexeme", which is either a token or ignored input (whitespace and comments, for example). Then the scan continues at the next character.
Lexical patterns are often described using regular expressions, but the standard regular expression module re is not as much help as it could be for this procedure, because it does not have the facility of checking multiple regular expressions in parallel. (And neither does the possible future replacement, the regex module.) Or, more precisely, the library can check multiple expressions in parallel (using alternation syntax, (...|...|...)), but it lacks an interface which can report which of the alternatives was matched. [Note 1]. So it would be necessary to try every possible pattern one at a time and select whichever one turns out to have the longest match.
Note that the matches are always anchored at the current input point; the lexical analyser does not search for a matching pattern. Every input character becomes part of some lexeme, even if that lexeme is ignored, and lexemes do not overlap.
You can write such an analyser by hand for a simple language, but C++ is hardly a simple language. Hand-built lexical analysers most certainly exist, but all the ones I've seen are thousands of lines of not very readable code. So it's usually easier to build an analyzer automatically using software designed for that purpose. These have been around for a long time -- Lex was written almost 50 years ago, for example -- and if you are planning on writing more than one lexical analyser, you would be well advised to investigate some of the available tools.
Notes
The PCRE2 and Oniguruma regex libraries provide a "callout" feature which I believe could be used for this purpose. I haven't actually seen it used in lexical analysis, but it's a fairly recent addition, particularly for Oniguruma, and as far as I can see, the Python bindings for those two libraries do not wrap the callout feature. (Although, as usual with Python bindings to C libraries, documentation is almost non-existent, so I can't say for certain.)

Regular Expression in Python matching string of alphanumerics [duplicate]

I've seen regex patterns that use explicitly numbered repetition instead of ?, * and +, i.e.:
Explicit Shorthand
(something){0,1} (something)?
(something){1} (something)
(something){0,} (something)*
(something){1,} (something)+
The questions are:
Are these two forms identical? What if you add possessive/reluctant modifiers?
If they are identical, which one is more idiomatic? More readable? Simply "better"?
To my knowledge they are identical. I think there maybe a few engines out there that don't support the numbered syntax but I'm not sure which. I vaguely recall a question on SO a few days ago where explicit notation wouldn't work in Notepad++.
The only time I would use explicitly numbered repetition is when the repetition is greater than 1:
Exactly two: {2}
Two or more: {2,}
Two to four: {2,4}
I tend to prefer these especially when the repeated pattern is more than a few characters. If you have to match 3 numbers, some people like to write: \d\d\d but I would rather write \d{3} since it emphasizes the number of repetitions involved. Furthermore, down the road if that number ever needs to change, I only need to change {3} to {n} and not re-parse the regex in my head or worry about messing it up; it requires less mental effort.
If that criteria isn't met, I prefer the shorthand. Using the "explicit" notation quickly clutters up the pattern and makes it hard to read. I've worked on a project where some developers didn't know regex too well (it's not exactly everyone's favorite topic) and I saw a lot of {1} and {0,1} occurrences. A few people would ask me to code review their pattern and that's when I would suggest changing those occurrences to shorthand notation and save space and, IMO, improve readability.
I can see how, if you have a regex that does a lot of bounded repetition, you might want to use the {n,m} form consistently for readability's sake. For example:
/^
abc{2,5}
xyz{0,1}
foo{3,12}
bar{1,}
$/x
But I can't recall ever seeing such a case in real life. When I see {0,1}, {0,} or {1,} being used in a question, it's virtually always being done out of ignorance. And in the process of answering such a question, we should also suggest that they use the ?, * or + instead.
And of course, {1} is pure clutter. Some people seem to have a vague notion that it means "one and only one"--after all, it must mean something, right? Why would such a pathologically terse language support a construct that takes up a whole three characters and does nothing at all? Its only legitimate use that I know of is to isolate a backreference that's followed by a literal digit (e.g. \1{1}0), but there are other ways to do that.
They're all identical unless you're using an exceptional regex engine. However, not all regex engines support numbered repetition, ? or +.
If all of them are available, I'd use characters rather than numbers, simply because it's more intuitive for me.
They're equivalent (and you'll find out if they're available by testing your context.)
The problem I'd anticipate is when you may not be the only person ever needing to work with your code.
Regexes are difficult enough for most people. Anytime someone uses an unusual syntax, the question
arises: "Why didn't they do it the standard way? What were they thinking that I'm missing?"

Why python code shouldn't exceed 80 chars? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Why in this millennium should Python PEP-8 specify a maximum line length of 79 characters?
Pretty much every code editor under the sun can handle longer lines. What to do with wrapping should be the choice of the content consumer, not the responsibility of the content creator.
Are there any (legitimately) good reasons for adhering to 79 characters in this age?
Much of the value of PEP-8 is to stop people arguing about inconsequential formatting rules, and get on with writing good, consistently formatted code. Sure, no one really thinks that 79 is optimal, but there's no obvious gain in changing it to 99 or 119 or whatever your preferred line length is. I think the choices are these: follow the rule and find a worthwhile cause to battle for, or provide some data that demonstrates how readability and productivity vary with line length. The latter would be extremely interesting, and would have a good chance of changing people's minds I think.
Keeping your code human readable not just machine readable. A lot of devices still can only show 80 characters at a time. Also it makes it easier for people with larger screens to multi-task by being able to set up multiple windows to be side by side.
Readability is also one of the reasons for enforced line indentation.
I am a programmer who has to deal with a lot of code on a daily basis. Open source and what has been developed in house.
As a programmer, I find it useful to have many source files open at once, and often organise my desktop on my (widescreen) monitor so that two source files are side by side. I might be programming in both, or just reading one and programming in the other.
I find it dissatisfying and frustrating when one of those source files is >120 characters in width, because it means I can't comfortably fit a line of code on a line of screen. It upsets formatting to line wrap.
I say '120' because that's the level to which I would get annoyed at code being wider than. After that many characters, you should be splitting across lines for readability, let alone coding standards.
I write code with 80 columns in mind. This is just so that when I do leak over that boundary, it's not such a bad thing.
I believe those who study typography would tell you that 66 characters per a line is supposed to be the most readable width for length. Even so, if you need to debug a machine remotely over an ssh session, most terminals default to 80 characters, 79 just fits, trying to work with anything wider becomes a real pain in such a case. You would also be suprised by the number of developers using vim + screen as a day to day environment.
Printing a monospaced font at default sizes is (on A4 paper) 80 columns by 66 lines.
Here's why I like the 80-character with: at work I use Vim and work on two files at a time on a monitor running at, I think, 1680x1040 (I can never remember). If the lines are any longer, I have trouble reading the files, even when using word wrap. Needless to say, I hate dealing with other people's code as they love long lines.
Since whitespace has semantic meaning in Python, some methods of word wrapping could produce incorrect or ambiguous results, so there needs to be some limit to avoid those situations. An 80 character line length has been standard since we were using teletypes, so 79 characters seems like a pretty safe choice.
I agree with Justin. To elaborate, overly long lines of code are harder to read by humans and some people might have console widths that only accommodate 80 characters per line.
The style recommendation is there to ensure that the code you write can be read by as many people as possible on as many platforms as possible and as comfortably as possible.
because if you push it beyond the 80th column it means that either you are writing a very long and complex line of code that does too much (and so you should refactor), or that you indented too much (and so you should refactor).

Selecting column headers in read_table [duplicate]

I've seen regex patterns that use explicitly numbered repetition instead of ?, * and +, i.e.:
Explicit Shorthand
(something){0,1} (something)?
(something){1} (something)
(something){0,} (something)*
(something){1,} (something)+
The questions are:
Are these two forms identical? What if you add possessive/reluctant modifiers?
If they are identical, which one is more idiomatic? More readable? Simply "better"?
To my knowledge they are identical. I think there maybe a few engines out there that don't support the numbered syntax but I'm not sure which. I vaguely recall a question on SO a few days ago where explicit notation wouldn't work in Notepad++.
The only time I would use explicitly numbered repetition is when the repetition is greater than 1:
Exactly two: {2}
Two or more: {2,}
Two to four: {2,4}
I tend to prefer these especially when the repeated pattern is more than a few characters. If you have to match 3 numbers, some people like to write: \d\d\d but I would rather write \d{3} since it emphasizes the number of repetitions involved. Furthermore, down the road if that number ever needs to change, I only need to change {3} to {n} and not re-parse the regex in my head or worry about messing it up; it requires less mental effort.
If that criteria isn't met, I prefer the shorthand. Using the "explicit" notation quickly clutters up the pattern and makes it hard to read. I've worked on a project where some developers didn't know regex too well (it's not exactly everyone's favorite topic) and I saw a lot of {1} and {0,1} occurrences. A few people would ask me to code review their pattern and that's when I would suggest changing those occurrences to shorthand notation and save space and, IMO, improve readability.
I can see how, if you have a regex that does a lot of bounded repetition, you might want to use the {n,m} form consistently for readability's sake. For example:
/^
abc{2,5}
xyz{0,1}
foo{3,12}
bar{1,}
$/x
But I can't recall ever seeing such a case in real life. When I see {0,1}, {0,} or {1,} being used in a question, it's virtually always being done out of ignorance. And in the process of answering such a question, we should also suggest that they use the ?, * or + instead.
And of course, {1} is pure clutter. Some people seem to have a vague notion that it means "one and only one"--after all, it must mean something, right? Why would such a pathologically terse language support a construct that takes up a whole three characters and does nothing at all? Its only legitimate use that I know of is to isolate a backreference that's followed by a literal digit (e.g. \1{1}0), but there are other ways to do that.
They're all identical unless you're using an exceptional regex engine. However, not all regex engines support numbered repetition, ? or +.
If all of them are available, I'd use characters rather than numbers, simply because it's more intuitive for me.
They're equivalent (and you'll find out if they're available by testing your context.)
The problem I'd anticipate is when you may not be the only person ever needing to work with your code.
Regexes are difficult enough for most people. Anytime someone uses an unusual syntax, the question
arises: "Why didn't they do it the standard way? What were they thinking that I'm missing?"

Parsing a parenthesis

I'm trying to make a calculator in Python 3 (just to learn). I want to be able to evaluate (just as an example) "5 * ( 2 + 1 )^2" from an input(). I would like to be able to detect if parenthesis are closed or contain another set of parenthesis. I also need to be able to isolate the information within so I can evaluate it in the proper order.
I realize that this could be a significant chunk of code, so if you could point me in the right direction I would be very grateful. I'm looking for links to documentation, function names, and any hints you could provide to help me.
A stack based calculator is what you are looking for !
http://en.wikipedia.org/wiki/Reverse_Polish_notation
This is a classic stack data structure practice problem. There are two approaches you can use, one being converting from infix to post/prefix notation which is considerably easier to process but still require the additional step of converting, or you can go and directly evaluate the expression.
Here is a good starting point on the subject, a basic implementation of a stack and some more in-depth information about your subject. Starting from there, you should easily find your way, otherwise give me a comment and I'll try to help you out.

Categories

Resources