Maximum characters that can be stuffed into raw_input() in Python - python

For an InterviewStreet challenge, we have to be able to accomodate for a 10,000 character String input from the keyboard, but when I copy/paste a 10k long word into my local testing, it cuts off at a thousand or so.
What's the official limit in Python? And is there a way to change this?
Thanks guys
Here's the challenge by-the-by:
http://www.interviewstreet.com/recruit/challenges/solve/view/4e1491425cf10/4edb8abd7cacd

Are you sure of the fact that your 10k long word doesn't contain newlines?
raw_input([prompt])
If the prompt argument is present, it is written to standard output without a trailing newline. The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that. When EOF is read, EOFError is raised.
...
If the readline module was loaded, then raw_input() will use it to provide elaborate line editing and history features.
There is no maximum limit (in python) of the buffer returned by raw_input, and as I tested some big length of input to stdin I could not reproduce your result. I tried to search the web for information regarding this but came up with nothing that would help me answer your question.
my tests
:/tmp% python -c 'print "A"*1000000' | python -c 'print len (raw_input ())';
1000000
:/tmp% python -c 'print "A"*210012300' | python -c 'print len (raw_input ())';
210012300
:/tmp% python -c 'print "A"*100+"\n"+"B"*100' | python -c 'print len (raw_input ())';
100

I had this same experience, and found python limits the length of input to raw_input if you do not import the readline module. Once I imported the readline module, it lifted the limit (or at least raised it significantly enough to where the text I was using worked just fine). This was on my Mac with Python 2.7.15. Additionally, it’s been confirmed working on at least 3.9.5.

I guess this is part of the challenges. The faq suggest raw_input() might not be the optimal approach:
The most common (possibly naive) methods are listed below. (...)
There are indeed Python standard modules helping to handle system input/output.

Related

Python 3.6.3 vs 2.7.3 for regular expressions: same script different results

I am running the same script with Python versions 3.6.3 and 2.7.3. The script works fine in 2.7.3, but not in 3.6.3. It seems the difference is in the regular expression portion of my code.
I am searching for some strings in the same external file for both script versions, saving the hits in lists. The len() of the resulting lists are different for the two versions.
I tried to make a MWE that reproduces the error by creating a small file to use for the regexes, but then both versions of Python produce the same output. The only solution I have is to provide the original file. But this is quite a long text file, so you can download it from here: https://ufile.io/jjc56 This file is available for 30 days. I thought perhaps this was better than pasting everything into the question.
This piece of code reproduces the error.
import re
inputfile = "opt-guess-firsttetint-r-h2o.out"
with open(inputfile,"r") as input_file:
input_string = input_file.read()
input_file.close()
match_geometry = list(re.findall('CARTESIAN COORDINATES \(ANGSTROEM\)(.*?)CARTESIAN COORDINATES \(A\.U\.\)', input_string, re.DOTALL))
match_energy = list(re.findall('FINAL SINGLE POINT ENERGY(.*?)-------------------------', input_string, re.DOTALL))
print(len(match_geometry))
print(len(match_energy))
Output with Python 3.6.3:
78
77
Output with Python 2.7.3:
188
188
For comparison:
$ grep "CARTESIAN COORDINATES (ANGSTROEM)" externalfile | wc -l
> 188
$ grep "FINAL SINGLE POINT ENERGY" externalfile | wc -l
> 188
If you need more information, please say so!
The main difference between Python 2 and Python 3 is text handling: while in Python 2 text is treated like in bare C, i.e. a sequence of bytes which happen to match ASCII characters in the range 32-128, that is not true for Python 3 - where the bytes in your file are assumed to be in some text encoding, and decoded to proper unicode character points before being treated in the program.
Likewise, in Python2, regexps operate by default on "byte strings", and on Python 3 on text strings (in Python 2 you can work with text as well if both the expression and the text are 'unicode' objects, rather than 'str')
We'd need more context there, but your problem likely lies on Python 3 reading your text file assuming an incorrect encoding - like, your data is utf-8, but Python is assuming it as Latin 1 - that would read characters out of the ASCII range as incorrect, without giving you an error, since all bytes from 0-255 are valid Latin-1 - but the resulting mojibake would fail the regexp.
Just force a proper encoding="..." to match your file when reading your data and you should be fine.
FYI, one character that would trigger the behavior I described above is "Å" - which I don't find unlikely to occur in this particular case.

C program char buffer unexpected overflow

I am trying to understand two different behaviors of an overflow from a C program(call it vulnerable_prog)in Linux that asks for input, in order to allow you to overflow a buffer. I understand that the compiler lays out the stack frame in particular ways, causing a bit of unpredictability sometimes. What I can't understand is the difference in the way the memory is handled when I overflow the buffer using a python script to feed 20 characters to the program, as opposed to running vulnerable_prog manually and inputting the 20 characters manually.
The example program declares an array of "char name[20]", and the goal is to overflow it and write a specific value into the other variable that will be overwritten. (This is from a classic wargaming site).
I understand that the processor(64 bit) reads 8 bytes at a time, so this requires padding of arrays that are not multiples of 8 to keep memory organized. Therefore my char [20] is actually occupying 24 bytes of memory and accessible to the processor as 8-byte words.
The unexpected behavior is this:
When using a python script, the overflow behaves as follows:
$python -c'print "A"*20 + "\xre\xhe\xyt\xhe"' | /path/vulnerable_prog
The 20 characters overflow the buffer, and the expected value is written into the correct spot in memory.
HOWEVER, when you try to overflow the buffer by running the program from the command prompt and inputting 20 characters manually, followed by the required hex string to be written to memory, you must use one additional hex character in order to have your value end up in the correct place that you want it:
$echo$ 'AAAAAAAAAAAAAAAAAAAA\xre\xhe\xyt\xhe\xaf'
(output of the 'echo' is then copied and pasted into the prompt that vulnerable_prog offers when run from the command line)
Where does this difference in the padding of the character array between the script and the command line exploitation come into play?
I have been doing a lot of research of C Structure padding and reading in the ISO/IEC 9899:201x, but cannot find anything that would explain this nuance.
(This is my first question on Stack Overflow so I apologize if I did not quite ask this correctly.)
Your Python script, when piped, actually sends 25 characters into /path/vulnerable_prog. The print statement adds a newline character. Here is your Python program plus a small Python script that counts the characters written to its standard input:
python -c'print "A"*20 + "\xre\xhe\xyt\xhe"' | python -c "import sys; print(len(sys.stdin.read()))"
I'm guessing you're not pasting the newline character that comes from echo into the program's prompt. Unfortunately, I don't think I have enough information to explain why you need 25, not 24, characters to achieve what you're attempting.
P.S. Welcome to Stack Overflow!

Are there anything similar to "perl -pe" option in python?

TL;DR I want to know what is an option in Python, which might roughly correspond to the "-pe" option in Perl.
I used to use PERL for some time, but these days, I have been switching to PYTHON for its easiness and simplicity. When I needed to do simple text processing on the shell prompt, I had used perl -pe option. The following is an example.
grep foo *.txt | perl -pe 's/foo/bar/g'
To do a similar thing in Python, could someone explain to me how I can do this?
-pe is a combination of two arguments, -p and -e. The -e option is roughly equivalent to Python's -c option, which lets you specify code to run on the command line. There is no Python equivalent of -p, which effectively adds code to run your passed code in a loop that reads from standard input. To get that, you'll actually have to write the corresponding loop in your code, along with the code that reads from standard input.
Perl, although a fully fledged as programming language, was initially thought, and evolved as, a tool for text manipulation.
Python, on th other hand, has always been a general purpose programing language. It can handle text and text flexibility, with enormous flexibility, when compared with, say Java or C++, but it will do so within its general syntax, without exceptions to those rules, including special syntax for regular expressions, that in absence of other directives become a program in themselves.The same goes for opening, reading and writting files, given their names.
So, you can do that with "python -c ..." to run a Python script from the command line - but your command must be a full program - with beginning, middle and end.
So, for a simple regular expression substitution in several files passed in the stdin, you could try something along:
grep foo *txt| python3 -c "import sys, re, os; [(open(filename + 'new', 'wt').write(re.sub ('foo', 'bar', open(filename).read())), os.rename(filename + "new", filename))for filename in sys.stdin]"

running python in an argument of a program & how to give hex when string expected

I've come across this command and want to understand how it works. program is just simple c program command for mac or linux.
./program `python -c 'print "\xC8\xCE\xC5\x06"'`
1) Can someone explain how this command works?
2) is this the only way to give a hex value to a program when string is expected?
This is a way of evaluating python expressions from the command line in bash. It has nothing to do with C. The only python code here is print "\xC8\xCE\xC5\x06". The rest is bash code.
You can try this command in bash python -c "print 'Hello World'"
You can also read man python for more information on python command line flags.
In python strings \xHH is used to translate a hex value into characters. In this case u"\xC8\xCE\xC5" == u"ÈÎÅ". If you don't use unicode strings, the output will be some non-ascii characters, which program might make sense of, but that cannot be entered or printed in a regular bash session. program might not care if the string is printable, and instead just treat it as binary data.
The backticks in bash will run the enclosed command first, and then use the string as a regular bash expression in the parent scope. Another way of doing this in bash would be this:
./program $(python -c "print '\xC8\xCE\xC5\x06'")
To answer you second question: There are other ways of doing this as well. You could probably use printf instead of python. Like this:
./program $(printf "\xC8\xCE\xC5\x06")

Shortest Python Quine?

Python 2.x (30 bytes):
_='_=%r;print _%%_';print _%_
Python 3.x (32 bytes)
_='_=%r;print(_%%_)';print(_%_)
Is this the shortest possible Python quine, or can it be done better? This one seems to improve on all the entries on The Quine Page.
I'm not counting the trivial 'empty' program.
I'm just going to leave this here (save as exceptionQuine.py):
File "exceptionQuine.py", line 1
File "exceptionQuine.py", line 1
^
IndentationError: unexpected indent
Technically, the shortest Python quine is the empty file. Apart from this trivial case:
Since Python's print automatically appends a newline, the quine is actually _='_=%r;print _%%_';print _%_\n (where \n represents a single newline character in the file).
Both
print open(__file__).read()
and anything involving import are not valid quines, because a quine by definition cannot take any input. Reading an external file is considered taking input, and thus a quine cannot read a file -- including itself.
For the record, technically speaking, the shortest possible quine in python is a blank file, but that is sort of cheating too.
In a slightly non-literal approach, taking 'shortest' to mean short in terms of the number of statements as well as just the character count, I have one here that doesn't include any semicolons.
print(lambda x:x+str((x,)))('print(lambda x:x+str((x,)))',)
In my mind this contends, because it's all one function, whereas others are multiple. Does anyone have a shorter one like this?
Edit: User flornquake made the following improvement (backticks for repr() to replace str() and shave off 6 characters):
print(lambda x:x+`(x,)`)('print(lambda x:x+`(x,)`)',)
Even shorter:
print(__file__[:-3])
And name the file print(__file__[:-3]).py (Source)
Edit: actually,
print(__file__)
named print(__file__) works too.
Python 3.8
exec(s:='print("exec(s:=%r)"%s)')
Here is another similar to postylem's answer.
Python 3.6:
print((lambda s:s%s)('print((lambda s:s%%s)(%r))'))
Python 2.7:
print(lambda s:s%s)('print(lambda s:s%%s)(%r)')
I would say:
print open(__file__).read()
Source
As of Python 3.8 I have a new quine! I'm quite proud of it because until now I have never created my own. I drew inspiration from _='_=%r;print(_%%_)';print(_%_), but made it into a single function (with only 2 more characters). It uses the new walrus operator.
print((_:='print((_:=%r)%%_)')%_)
This one is least cryptic, cor is a.format(a)
a="a={1}{0}{1};print(a.format(a,chr(34)))";print(a.format(a,chr(34)))
I am strictly against your solution.
The formatting prarameter % is definitively a too advanced high level language function. If such constructs are allowed, I would say, that import must be allowed as well. Then I can construct a shorter Quine by introducing some other high level language construct (which, BTW is much less powerful than the % function, so it is less advanced):
Here is a Unix shell script creating such a quine.py file and checking it really works:
echo 'import x' > quine.py
echo "print 'import x'" > x.py
python quine.py | cmp - quine.py; echo $?
outputs 0
Yes, that's cheating, like using %. Sorry.

Categories

Resources