As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Why in this millennium should Python PEP-8 specify a maximum line length of 79 characters?
Pretty much every code editor under the sun can handle longer lines. What to do with wrapping should be the choice of the content consumer, not the responsibility of the content creator.
Are there any (legitimately) good reasons for adhering to 79 characters in this age?
Much of the value of PEP-8 is to stop people arguing about inconsequential formatting rules, and get on with writing good, consistently formatted code. Sure, no one really thinks that 79 is optimal, but there's no obvious gain in changing it to 99 or 119 or whatever your preferred line length is. I think the choices are these: follow the rule and find a worthwhile cause to battle for, or provide some data that demonstrates how readability and productivity vary with line length. The latter would be extremely interesting, and would have a good chance of changing people's minds I think.
Keeping your code human readable not just machine readable. A lot of devices still can only show 80 characters at a time. Also it makes it easier for people with larger screens to multi-task by being able to set up multiple windows to be side by side.
Readability is also one of the reasons for enforced line indentation.
I am a programmer who has to deal with a lot of code on a daily basis. Open source and what has been developed in house.
As a programmer, I find it useful to have many source files open at once, and often organise my desktop on my (widescreen) monitor so that two source files are side by side. I might be programming in both, or just reading one and programming in the other.
I find it dissatisfying and frustrating when one of those source files is >120 characters in width, because it means I can't comfortably fit a line of code on a line of screen. It upsets formatting to line wrap.
I say '120' because that's the level to which I would get annoyed at code being wider than. After that many characters, you should be splitting across lines for readability, let alone coding standards.
I write code with 80 columns in mind. This is just so that when I do leak over that boundary, it's not such a bad thing.
I believe those who study typography would tell you that 66 characters per a line is supposed to be the most readable width for length. Even so, if you need to debug a machine remotely over an ssh session, most terminals default to 80 characters, 79 just fits, trying to work with anything wider becomes a real pain in such a case. You would also be suprised by the number of developers using vim + screen as a day to day environment.
Printing a monospaced font at default sizes is (on A4 paper) 80 columns by 66 lines.
Here's why I like the 80-character with: at work I use Vim and work on two files at a time on a monitor running at, I think, 1680x1040 (I can never remember). If the lines are any longer, I have trouble reading the files, even when using word wrap. Needless to say, I hate dealing with other people's code as they love long lines.
Since whitespace has semantic meaning in Python, some methods of word wrapping could produce incorrect or ambiguous results, so there needs to be some limit to avoid those situations. An 80 character line length has been standard since we were using teletypes, so 79 characters seems like a pretty safe choice.
I agree with Justin. To elaborate, overly long lines of code are harder to read by humans and some people might have console widths that only accommodate 80 characters per line.
The style recommendation is there to ensure that the code you write can be read by as many people as possible on as many platforms as possible and as comfortably as possible.
because if you push it beyond the 80th column it means that either you are writing a very long and complex line of code that does too much (and so you should refactor), or that you indented too much (and so you should refactor).
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've started following PEP-8 strictly, because I thought I should at least try it before just picking and choosing the things I like.
However, there seems to be a conflict. They strongly recommend limiting each line to 79 characters, yet they strongly recommend that method and variable names use_underscores_between_words.
On a typical line, where you're nested in a
class ->
method ->
loop ->
conditional ->
my_var_name += do_some_function(some_parameter, another_parameter)
you already have 79 - 16 = 63 characters to work with, and you're wasting 6 on just underscores. So the line above is already too long and it's actually pretty short.
It seems productivity would suffer if I have to count characters so often, or split a rudimentary line like this onto several lines.
I understand that now it says "if your team agrees, use 99", but it seems this should be the standard, or rather that camelCaseVars should be the standard since we like short lines so much.
My issue with coding standards-compliant Python is that I can't seem to write any code without either using cryptic names, or violating the line length or naming convention. I could post my code here to show you my specific issues, but my example above represents the issue I'm having with my code.
Which ideal is less important? Clear names, short lines, or using_underscores?
UPDATE: while no one actually suggested this, I'm getting the feeling that using less descriptive / function and variable names is actually what is tacitly being asked of me. I know people would say "of course not, just wrap your lines", but in practice, it seems to be a mix of "use very short names" and "wrap lines", but stick to 80.
I think that's what people have done, but I think on the business project level, where productivity is king, teams have just thrown out that rule and jumped to 120. For now, I think I'll just stick to 79 with lots of (imho) ugly line wraps, and feel comforted by the thought of me being able to view 2 files side by side on a small monitor.
Python Zen says:
Flat is better than nested.
Therefore consider decreasing a nesting of your code by code decomposition - make the code less nested by extracting some parts into separate methods or functions. And you'll always have enough horizontal space in your editor.
Update
For the nested loops with conditions you can also decrease the nesting level by the decomposition.
For example you can change this code:
class MyClass(BaseClass):
def my_method():
for item in self.my_collection:
if item.numeric_property % 2:
self.my_property = item.other_property + item.age * self.my_coefficient
self.do_other_stuff()
to this one:
class MyClass(BaseClass):
def my_method():
"""People see what happen because of clear names"""
self.process_my_collection()
self.do_other_stuff()
def process_my_collection():
"""People see what happen because of clear names.
They probably don't even need to read your math from
process_my_collection_item(item) at all.
And you make the code more flat as well.
"""
for item in self.my_collection:
self.process_my_collection_item(item)
def process_my_collection_item(item):
"""The most people don't need to read these calculations every time
since they just know it's working and tested
but they'd like to work with methods above frequently:
"""
if not item.numeric_property % 2:
return
self.my_property = item.other_property + item.age * self.my_coefficient
As you can see I divided one method to a few simple operations and made it less nested.
Your goal should be to write code that is easy to understand.
Generally speaking, adhering to PEP8 gets you a step closer to that goal. If the nature of your code and/or your team is such that camelcase works better, by all means use camelcase.
That being said, if you think it is import to save six characters for what you perceive is a typical line, maybe that is telling you that your typical lines are nested too deep, and there's a better solution than to change naming conventions.
Use underscores, it's what everyone expects.
-However- the 79 character limit is one of the more easily set aside recommendations. My team uses pep8 with 119 character lines, but the vast majority of our lines are under 80 characters anyway.
PEP-8 states:
You should use two spaces after a sentence-ending period.
In my usual refactoring, I am used to replacing such consecutive double spaces with a single one, thinking that this habit has come from the typewriter days (I have went through this Wikipedia page briefly).
Also most of the times I have seen mono-space fonts being used for the programming, so it's much clearer than the other cases which can sometimes need 2 spaces to easily identify sentences.
Is there any reason behind this being used in PEP-8?
Only those who authored the PEP can answer the "why" with any degree of certainty.
I've had a look at the standard library source code, and my conclusion is that this particular aspect of the style guide is not followed consistently: some standard modules follow it and some don't.
Until you pointed it out, I've never heard of the double space convention, and have never noticed anyone following it.
The answer is simple: readability :)
The reasoning behind the double space still exists in code.
For people who believe that two spaces improves readability, the reasoning has become less relevant with regard to WYSIWYG editors with kerning. However, code is written in monospaced fonts, which means that if you want extra space between sentences, you have to put it there.
That being said, I prefer single spaces :)
import string,random,platform,os,sys
def rPass():
sent = os.urandom(random.randrange(900,7899))
print sent,"\n"
intsent=0
for i in sent:
intsent += ord(i)
print intsent
intset=0
rPass()
I need help figuring out total possible outputs for the bytecode section of this algorithm. Don't worry about the for loop and the ord stuff that's for down the line. -newbie crypto guy out.
I won't worry about the loop and the ord stuff, so let's just throw that out and look at the rest.
Also, I don't understand "I need help figuring out total possible outputs for the unicode section of this algorithm", because there is no Unicode section of the algorithm, or in fact any Unicode anything anywhere in your code. But I can help you figure out the total possible outputs of the whole thing. Which we'll do by simplifying it step by step.
First:
li=[]
for a in range(900,7899):
li.append(a)
This is exactly equivalent to:
li = range(900, 7899)
Meanwhile:
li[random.randint(0,7000)]
Because li happens to be exactly 6999 elements long, this is exactly the same as random.choice(li).
And, putting the last two together, this means it's equivalent to:
random.choice(range(900,7899))
… which is equivalent to:
random.randrange(900,7899)
But wait, what about that random.shuffle(li, random.random)? Well (ignoring the fact that random.random is already the default for the second parameter), the choice is already random-but-not-cryptographically-so, and adding another shuffle doesn't change that. If someone is trying to mathematically predict your RNG, adding one more trivial shuffle with the same RNG will not make it any harder to predict (while adding a whole lot more work based on the results may make a timing attack easier).
In fact, even if you used a subset of li instead of the whole thing, there's no way that could make your code more unpredictable. You'd have a smaller range of values to brute-force through, for no benefit.
So, your whole thing reduces to this:
sent = os.urandom(random.randrange(900, 7899))
The possible output is: Any byte string between 900 and 7899 bytes long.
The length is random, and roughly evenly distributed, but it's not random in a cryptographically-unpredictable sense. Fortunately, that's not likely to matter, because presumably the attacker can see how many bytes he's dealing with instead of having to predict it.
The content is random, both evenly distributed and cryptographically unpredictable, at least to the extent that your system's urandom is.
And that's all there is to say about it.
However, the fact that you've made it much harder to read, write, maintain, and think through gives you a major disadvantage, with no compensating disadvantage to your attacker.
So, just use the one-liner.
I think in your followup questions, you're asking how many possible values there are for 900-7898 bytes of random data.
Well, how many values are there for 900 bytes? 256**900. How many for 901? 256**901. So, the answer is:
sum(256**i for i in range(900, 7899))
… which is about 2**63184, or 10**19020.
So, 63184 bits of security sounds pretty impressive, right? Probably not. If your algorithm has no flaws in it, 100 bits is more than you could ever need. If your algorithm is flawed (and of course it is, because they all are), blindly throwing thousands more bits at it won't help.
Also, remember, the whole point of crypto is that you want cracking to be 2**N slower than legitimate decryption, for some large N. So, making legitimate decryption much slower makes your scheme much worse. This is why every real-life working crypto scheme uses a few hundred bits of key, salt, etc. (Yes, public-key encryption uses a few thousand bits for its keys, but that's because its keys aren't randomly distributed. And generally, all you do with those keys it to encrypt a randomly-generated session/document key of a few hundred bits.)
One last thing: I know you said to ignore the ord, but…
First you can write that whole part as intsent=sum(bytearray(sent)).
But, more importantly, if all you're doing with this buffer is summing it up, you're using a lot of entropy to generate a single number with a lot less entropy. (This should be obvious once you think about it. If you have two separate bytes, there are 65536 possibilities; if you add them together, there are only 512.)
Also, by generating a few thousand one-byte random numbers and adding them up, that's basically a very close approximation of a normal or gaussian distribution. (If you're a D&D player, think of how 3D6 gives 10 and 11 more often than 3 and 18… and how that's more true for 3D6 than for 2D6… and then consider 6000D6.) But then, by making the number of bytes range from 900 to 7899, you're flattening it back toward a uniform distribution from 700*127.5 to 7899*127.5. At any rate, if you can describe the distribution you're trying to get, you can probably generate that directly, without wasting all this urandom entropy and computation.
It's worth noting that there are very few cryptographic applications that can possibly make use of this much entropy. Even things like generating SSL certs use on the order of 128-1024 bits, not 64K bits.
You say:
trying to kill the password.
If you're trying to encrypt a password so it can be, say, stored on disk or sent over the network, this is almost always the wrong approach. You want to use some kind of zero-knowledge proof—store hashes of the password, or use challenge-response instead of sending data, etc. If you want to build a "keep me logged in feature", do that by actually keeping the user logged in (create and store a session auth token, rather than storing the password). See the Wikipedia article password for the basics.
Occasionally, you do need to encrypt and store passwords. For example, maybe you're building a "password locker" program for a user to store a bunch of passwords in. Or a client to a badly-designed server (or a protocol designed in the 70s). Or whatever. If you need to do this, you want one layer of encryption with a relatively small key (remember that a typical password is itself only about 256 bits long, and has less than 64 bits of actual information, so there is absolutely no benefit from using a key thousands of times as long as they). The only way to make it more secure is to use a better algorithm—but really, the encryption algorithm will almost never be the best attack surface (unless you've tried to design one yourself); put your effort into the weakest areas of the infrastructure, not the strongest.
You ask:
Also is urandom's output codependent on the assembler it's working with?
Well… there is no assembler it's working with, and I can't think of anything else you could be referring to that makes any sense.
All that urandom is dependent on is your OS's entropy pool and PRNG. As the docs say, urandom just reads /dev/urandom (Unix) or calls CryptGenRandom (Windows).
If you want to know exactly how that works on your system, man urandom or look up CryptGenRandom in MSDN. But all of the major OS's can generate enough entropy and mix it well enough that you basically don't have to worry about this at all. Under the covers, they all effectively have some pool of entropy, and some cryptographically-secure PRNG to "stretch" that pool, and some kernel device (linux, Windows) or user-space daemon (OS X) that gathers whatever entropy it can get from unpredictable things like user actions to mix it into the pool.
So, what is that dependent on? Assuming you don't have any apps wasting huge amounts of entropy, and your machine hasn't been compromised, and your OS doesn't have a major security flaw… it's basically not dependent on anything. Or, to put it another way, it's dependent on those three assumptions.
To quote the linux man page, /dev/urandom is good enough for "everything except long-lived GPG/SSL/SSH keys". (And on many systems, if someone tries to run a program that, like your code, reads thousands of bytes of urandom, or tries to kill the entropy-seeding daemon, or whatever, it'll be logged, and hopefully the user/sysadmin can deal with it.)
hmmmm python goes through an interpreter of its own so i'm not sure how that plays in
It doesn't. Obviously calling urandom(8) does a bunch of extra stuff before and after the syscall to read 8 bytes from /dev/urandom than you'd do in, say, a C problem… but the actual syscall is identical. So the urandom device can't even tell the difference between the two.
but I'm simply asking if urandom will produce different results on a different architecture.
Well, yes, obviously. For example, Linux and OS X use entirely different CSPRNGs and different ways of accumulating entropy. But the whole point is that it's supposed to be different, even on an identical machine, or at a different time on the same machine. As long as it produces "good enough" results on every platform, that's all that matters.
For instance would a processor\assembler\interpreter cause a fingerprint specific to said architecture, which is within reason stochastically predictable?
As mentioned above, the interpreter ultimately makes the same syscall as compiled code would.
As for an assembler… there probably isn't any assembler involved anywhere. The relevant parts of the Python interpreter, the random device, the entropy-gathering service or driver, etc. are most likely written in C. And even if they were hand-coded in assembly, the whole point of coding in assembly is that you pretty much directly control the machine code that gets generated, so different assemblers wouldn't make any difference.
The processor might leave a "fingerprint" in some sense. For example, I'll bet that if you knew the RNG algorithm, and controlled its state directly, you could write code that could distinguish an x86 vs. an x86_64, or maybe even one generation of i7 vs. another, based on timing. But I'm not sure what good that would do you. The algorithm will still generate the same results from the same state. And the actual attacks used against RNGs are about attacking the algorithm the entropy accumulator, and/or the entropy estimator.
At any rate, I'm willing to bet large sums of money that you're safer relying on urandom than on anything you come up with yourself. If you need something better (and you don't), implement—or, better, find a well-tested implementation of—Fortuna or BBS, or buy a hardware entropy-generating device.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
The community reviewed whether to reopen this question 11 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
In "The Zen of Python", by Tim Peters, the sentence "Complex is better than complicated" confused me. Can anyone give a more detailed explanation or an example?
although complex and complicated sound alike, they do not mean the same in this context.
The Zen therefore says: It is okay to build very complex applications, as long as the need for it is reasonable.
To give an example:
counter = 0
while counter < 5:
print counter
counter += 1
The code is very easy to understand. It is not complex. However, it is complicated. You do not need to manually perform most of the steps above.
for i in xrange(5):
print i
This code is more complex than the above example. But: knowing the documentation of ´xrange´ you can understand it by a single glance. Many steps are hidden behind an easy-to-use-interface.
As processes grow bigger, the gap between complicated and complex gets wider and wider.
A general rule of thumb is to follow the other principles of the Zen of Python:
If it is hard to explain, it is not a good idea.
If it's easy to explain, it might be a good idea.
Complex: Does a lot. Usually unavoidable.
Complicated: Difficult to understand.
I like this quote (source):
A complex person is like an iPod. That
is to say that they are consistent,
straightforward and ‘user friendly’
while also being rather sophisticated.
Unlike the complicated person,
interacting with a complex person does
not require special knowledge of their
complicated ways-because their ways
are not complicated. When mistakes are
made, they tend to be very forgiving
because they understand that people
are imperfect. In short, they are
mature, sensible human beings.
and this one (source):
An Airbus A380 is complicated. A
jellyfish is complex. The Paris Metro
network is complicated. How people use
it is complex. Your skeleton is
complicated. You are complex. A
building is complicated. A city is
complex.
Some more articles on this:
Simple vs. Complicated vs. Complex vs. Chaotic
More on Complex versus Complicated
i haven't read this book.
complex is in my opinion a solution that might be not easy to understand but is writen in simple and logic code.
complicated is a solution that might be simple (or complex) but is written in code which is not easy to understand because there are no patterns or logic in it and no proper metaphors and naming.
Complicated systems are highly coupled and therefore fragile.
Complex systems are made of simple parts operating together to create complex emergent behavior. While the emergent behaviors may still be a challenge, the individual parts can be isolated, studied, and debugged. Individual parts can be removed and reused.
I comment more on this topic and provide examples on my blog
For "complicated", the difficult thing is on the surface. (You description is complicated.)
For "complex", the difficult thing is under the table. (A car is complex.)
Just as shown by EinLama's examples.
According to Pro Python third edition:
For the sake of this guideline, most situations tend to take the following view of
the two terms:
• Complex: Made up of many interconnected parts
• Complicated: So complex as to be difficult to understand
So in the face of an interface that requires a large number of things to keep track of,
it’s even more important to retain as much simplicity as possible. This can take the form
of consolidating methods onto a smaller number of objects, perhaps grouping objects
into more logical arrangements or even simply making sure to use names that make
sense without having to dig into the code to understand them.
So you as the book said, you need to make your code and file more organized and use most readble names to define variables/funtions as you can.
Actually, the best answer accepted is more likely to describe the upper rule:
Simple is better than complex.
Here is the snippet example of "Simple is better than complex." from the book:
if value is not None and value != ":
if value:
Obviously, the second line is simpler than first one and more easy to munipulate, and more likely with the best answer example code.
Complicated: Need a lot of brain juice (your internal CPU) to solve. But once you solved it, you know it is right. Solving a math problem is complicated. Once done, easy for you to do it a second time. But difficult again for your friend.
Complex: Need a lot of intuition to solve (your accumulated experience). And once you choose a way, you cannot be sure this was the best one. Human relations are complex. Doing it again a second time will still be challenging for you. But someone following your path (reading your code...) can follow you easily.
Algorithms aim to solve complicated problems.
Machine learning aims to find answers to complex problems.
Algorithms are predictable. Deep Learning raises explainability questions on why the computer has decided to select that specific answer.
So now it is your time to answer: was your question complex or complicated?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm trying to come up with a good coding problem to ask interview candidates to solve with Python.
They'll have an hour to work on the problem, with an IDE and access to documentation (we don't care what people have memorized).
I'm not looking for a tough algorithmic problem - there are other sections of the interview where we do that kind of thing. The point of this section is to sit and watch them actually write code. So it should be something that makes them use just the data structures which are the everyday tools of the application developer - lists, hashtables (dictionaries in Python), etc, to solve a quasi-realistic task. They shouldn't be blocked completely if they can't think of something really clever.
We have a problem which we use for Java coding tests, which involves reading a file and doing a little processing on the contents. It works well with candidates who are familiar with Java (or even C++). But we're running into a number of candidates who just don't know Java or C++ or C# or anything like that, but do know Python or Ruby. Which shouldn't exclude them, but leaves us with a dilemma: On the one hand, we don't learn much from watching someone struggle with the basics of a totally unfamiliar language. On the other hand, the problem we use for Java turns out to be pretty trivial in Python (or Ruby, etc) - anyone halfway competent can do it in 15 minutes. So, I'm trying to come up with something better.
Surprisingly, Google doesn't show me anyone doing something like this, unless I'm just too dumb to enter the obvious search term. The best idea I've come up with involves scheduling workers to time slots, but it's maybe a little too open-ended. Have you run into a good example? Or a bad one? Or do you just have an idea?
I've asked candidates to write code to implement bowling scoring before, which is readily comprehensible but contains enough wrinkles that most people have to iterate their approach a couple times to cover all the edge cases.
A lot of the problems at http://www.streamtech.nl/site/problem+set, which are taken from ACM competitions, are also suitable for your use. I used them to familiarize myself with python syntax and language features. A lot amount to straightforward application of standard data structures; some are more focused on algorithmic issues. If you sort through them I'm sure you'll find several that fit your needs.
I can recommend to you Checkio.org
You can always just give them a few more questions on top of the Java one, like ask them to do the Java task, then ask them to define a class, then ask them to do FizzBuzz. That should be about as rigorous as your Java task.
Don't be afraid to ask a series of questions. Maybe you can even ask them to write a few one-liners to make sure they get the finer points of Python (write a list comprehension, how do you define a lambda, etc.)
Here's a question I answered on SO recently that might be the start of something suitable:
Given a string "O João foi almoçar :)
.", split it into a list of words.
You must strip all punctuation except
for emoticons. Result for example:
['O','João', 'foi', 'almoçar', ':)']
I've tidied up the question a bit. See the original linked above along with my answer. It tests a number of things, and there are different ways of tackling the problem. They can also get a half-solution out that first disregards the emoticons and punctuation aspect. Just finding the emoticons is another sub-problem that can be solved separately. And so on...
You could extend it to asking about emoticons adjacent to other punctuation, adjacent emoticons, overlapping emoticons, defining emoticons in :) form but also searching for those of the form :-). You could also turn it into a frequency count problem instead of just splitting to somewhat line up with your Java question.
I also suggest searching through the python+interview-questions questions posted on SO. There are some good ones, and you may even want to broaden your search to skim all interview-questions posts if you have time.
I don't know about Python specifically, but I found that interview questions which involve recursion are a very effective filter. I have asked candidates to produce all the permutations of a string (and think about how to test it), and I have been asked to pseudo-code the Longest Common Subsequence.