Should a convert a long binary string before operation on it

Should a convert a long binary string before operation on it - python

In a question I should deal with long inputs given as binaries. Like
"1000101011111101010100100101010101010101"
I am required to use the bitwise opertator OR | in this question. I have researched the use of this operator and it seems to work on regular integers not binaries. So I call int(thing, 2) on it. After that, I use the bitwise operator. However something troubles me. Isn't the python interpreter changes it back to binary again to apply Bitwise OR on it ? So isn't it seems like a repeated step ?
Is there no other way to directly use this string, maybe an iteration over all the letters is a better approach ? There is also another problem that about integer precision. Because sometimes the input is larger than 500 characters so I can't store it as an integer.
I tried something like this, Imagine a and b are two binary strings.
for comparison in zip(a, b):
if any(comparison):
# Do stuff if OR gives 1
This is proven to be very slow indeed. Please enlighten me.
Thanks in advance.

Firstly definitely use int(binary_string, 2) any other method will take longer.
(although the for loop using zip and any is quite clever, however not optimal)
Python interpreter will not change your number back to binary as the computer already stores the number as binary in memory, it will use the CPU instruction for OR on the 2 numbers without converting them first. No repeated step.

Related

Is it possible to loop through 10^8 possibilities to determine the correct answer?

I have a number which is 615 digits long. Throughout it, there are 8 places where digits are missing. I have to find out what the digits are. There are 10^8 possibilities.
It is for an RSA problem. The number in question is the private key, and I am trying to find out what it is. To help me, I have the public key pair (n, e), both of which are also 615 digits long, and also a plaintext and corresponding ciphertext.
So the only way to figure out d is to bruteforce it. I am trying to use gmpy2 in python to figure it out. I had to jump through a lot of hoops to get it to work. I do not even know if I correctly did it. I had to download Python2.7 so I could run the gmpy2 installer just to not get an error message. But I think it works now, as I can type
>>>import gmpy2
in the terminal and it doesnt give me an error.
Before I try to loop through 10^8 possibilities, I want to know if its possible to do so in a relatively short amount of time, considering my situation. I do not want to fry my computer or freeze it trying to compute this. I also want to know if I am using the right tools for this, or is gmpy2 not the correct version, or Python2.7 is not good/fast enough. I am running gmpy2 on Python2.7 on a laptop.
In the end I suppose I want to take all 10^8 answers and raise such that C^d = M mod n. So thats an (already) large number to the power of number 615 digits long, 10^8 times. Is this possible? If it is, how can I do this using gmpy2? Is there a more efficient way to compute this?
I sincerely apologize if this is not the right place to ask this. Thank you for any help.

I'm the gmpy2 maintainer.
To calculate C**d mod n, you should use the builtin pow() and specify all three values. pow(C,d,n) will be much faster than C**d % n.
Using gmpy2 should be easy for this. Instead of using int() to covert a string to a Python integer, you just need to use gmpy2.mpz(). You can use pow() with mpz instances. (And if even one of the three values to pow() is an mpz, gmpy2 will be used to for the calculation.)
I estimate running time with gmpy2 to be range from less than an hour to a few hours. Python's native integers might be 10x slower.

You're not going to fry your computer.
It may take a long time to run, but it seems like this is a straight O(n) problem, so it won't blow up to infinity. As long as it doesn't take an obscene amount of time to check if one hash is valid or not, this may even take less than a minute to run. Modern day machines measure clock cycles in gHz. That's 10^9 cycles per second. And besides, since you say you can't make any inferences on what the correct answer would be from wrong guesses, brute force seems like the only solution.

How do arithmetic operators work in python?

I am wondering how the "+" operator works in python, or indeed how any of the basic arithmetic operators work. My knowledge is very limited with regards to this topic, so I hope this isn't a repeat of a question already here.
More specifically, I would like to know how this code:
a = 5
b = 2
c = a + b
print (c)
produces the result of c = 7 when ran. How does the computer perform this operation? I found a thread on Reddit explaining how the computer performs the calculation in binary (https://www.reddit.com/r/askscience/comments/1oqxfr/how_do_computers_do_math/) which I can understand. What I fail to comprehend however is how the computer knows how to convert the values of 5 and 2 into binary and then perform the calculation. Is there a set formula for doing this for all integers or base 10 numbers? Or is there something else happening at a deeper hardware level here?
Again I'm sorry if this a repeat or if the question seems completely silly, I just can't seem to understand how python can take any two numbers and then sum them, add them, divide them or multiply them. Cheers.

The numbers are always in binary. The computer just isn't capable of keeping then in a different numerical system (well, there are ternary computers but these are a rare exception). The decimal system is just used for a "human representation", so that it is easier to read, but all the symbols (including the symbol "5" in the file, it's just a character) are mapped to numbers through some encoding (e. g. ASCII). These numbers are, of course in binary, just the computer knows (through the specification of the encoding) that if there is a 1000001 in a context of some string of characters, it has to display the symbol a (in the case of ASCII). That's it, the computer don't know the number 58, for it, these are just two symbols and are kept in the memory as ones and zeros.
Now, memory. This is where it's getting interesting. All the instructions and the data are kept in one place as a large buffer of ones and zeros. These are passed to the CPU which (using its instruction set) knows what the first chunk of ones and zeros (this is what we call a "word") means. The first word is an instruction, then the argument(s) follow. Depending on the instruction different things happen. Ok, what happens if the instruction means "add these two numbers" and store the result here?
Well, now it's a hardware job. Adding binary numbers isn't that complicated, it's explained in the link you provided. But how the CPU knows that this is the algorithm and how to execute it? Well, it uses a bunch of "full-adders". What is a "full-adder"? This is a hardware circuit that by given two inputs (each one of them is one bit, i. e. either one or zero) "adds" them and outputs the result to two other bits (one of which it uses for carry). But how the full-adder works? Well, it is constructed (physically) by half-adders, which are constructed by standard and and xor gates. If you're familiar with similar operators (& and ^ in Python) you probably know how they work. These gates are designed to work as expected using the physical properties of the elements (the most important of them being silicon) used in the electronic components. And I think this is where I'll stop.

How can I use python to calculate very large numbers?

Like, really, really large numbers..
I'm trying out a variation of the fiboncci series (most significant variation being it squares each term before feeding it in again, although there are a few other modifications as well.), and I need to obtain a particular term whose value is too large for python to handle. I'm talking like well over a thousand digits, probably more. The program just starts and does nothing at all.
Is there any way I can use python to print such massive numbers, or can it be done with JavaScript (preferred) or any other language?
Program in question:
g=[0 for y in range(31)]
g[0]=0
g[1]=1
for x in range(2,31):
g[x]=pow((g[x-1]+g[x-2]),2)
print(g[30])

your program does nothing because it has probably consumed all the memory. As far as python, it can handle very large numbers. Check this link:
https://www.python.org/dev/peps/pep-0237/

How best to store large sequences of text in Python?

I recently discovered that a student of mine was doing an independent project in which he was using very large strings (2-4MB) as values in a dictionary.
I've never had a reason to work with such large blocks of text and it got me wondering if there were performance issues associated with creating such large strings.
Is there a better way of doing it than to simply create a string? I realize this question is largely context dependent, but I'm looking for generalized answers that may cover more than one possible use-case.
If you were working with that much text, how would you store it in your code, and would you do anything different than if you were simply working with an ordinary string of only a few characters?

It depends a lot on what you're doing with the strings. I'm not exactly sure how Python stores strings but I've done a lot of work on XEmacs (similar to GNU Emacs) and on the underlying implementation of Emacs Lisp, which is a dynamic language like Python, and I know how strings are implemented there. Strings are going to be stored as blocks of memory similar to arrays. There's not a huge issue creating large arrays in Python, so I don't think simply storing the strings this way will cause performance issues. Some things to consider though:
How are you building up the string? If you build up piece-by-piece by simply appending to ever larger strings, you have an O(N^2) algorithm that will be very slow. Java handles this with a StringBuilder class. I'm not sure if there's an exact equivalent in Python but you can simply create an array with all the parts you want to join together, then join at the end using ''.join(array).
Do you need to search the string? This isn't related to creating the strings but it's something to consider. Searching will in general be O(n) in the size of the string; there are speedups that make it O(n/m) where m is the size of the substring you're searching for, but that's about it. The main consideration here is whether to store one big string or a series of substrings. If you need to search all the substrings, that won't help much over searching a big string, but it's possible you might know in advance that some parts don't need to be searched.
Do you need to access substrings? Again, this isn't related to creating the strings, it's something to consider. Accessing a substring by position is just a matter of indexing to the right memory location, but if you need to take large substrings, it may be inefficient, and you might be able to speed things up by storing your string as an array of substrings, and then creating a new string as another array with some of the strings shared. However, doing it this way takes work, and shouldn't be done unless it's really necessary.
In sum, I think for simple cases it's fine to have large strings like this, but you should think about the sorts of operations you're going to perform and what their O(...) time is.

I would say that potential issues depend on two things:
how many strings of this kind are hold in memory at the same time, compared to the capacity of the memory (the RAM) ?
what are the operations done on these strings ?
It seems to me I've read that operations on strings in Python are very efficient, so it isn't supposed to present problem working on very long strings. But in fact it depends on the algorithm of each operation performed on a big string.
This answer is rather vague, I haven't enough eperience to make more useful estimation of the problem. But the question is also very broad.

Why is there an error when dividing 2/5.0 in Python? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Python float - str - float weirdness
In python, 2/5.0 or 2/float(5) returns 0.40000000000000002
Why do I get that error at the end and how can I get the right value to use in additional calculations?

Welcome to IEEE754, enjoy your stay.
Use decimal instead.

Because floating point arithmetic is not exact. You should use this value in your additional calculations, and round off the result when you're finished. If you need it to be exact, use another data type.

Ignacio above has the right answer.
There is are IEEE standards for efficiently storing floating point numbers into binary computers. These go in excruciating detail about exactly how numbers are stored and these rules are followed on almost every computer.
They are also wrong. Binary numbers cannot handle most normal numbers, just powers of two. Instead of doing something tricky requiring recomputation of the bottom bits to round-off or other tricks, the standards choose efficiency.
That way, you can curse at your system that runs slightly faster. There are occasional debates about changing Python in some way to work around these problems, but the answers are not trivial without a huge loss in efficiency.
Getting around this:
One option is digging into the "decimal" package of the standard library. If you stick to the examples and ignore the long page, it will get you what you want. No bets on efficiency.
Second is to do a manual rounding and string truncate yourself in one output function. Who cares if the number is off a bit if you never print those bits?

Note that Python 3.1 has a new floating point formatting function that avoids this sort of appearance. See What's new in Python 3.1 for more details (search for "floating point").

See this question for the explanation. The right way would be to either
Use integers until the "final" calculation
Live with rounding errors.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.