I am currently implementing a messaging system. I want to send an error protected message to a receiver, but I am failing at the basics, i.e. calculating the error correcting codes. I use the following library for error correction.
Consider the following MWE:
from reedsolo import RSCodec
with open("imageToSend.png", "rb") as pic:
picContent = pic.read()
correctionLength = int((len(picContent)/100)*20)
rs = RSCodec(correctionLength)
rs.encode(picContent)
As you can see I want to protect the image from 20% errors that might occur. The problem here? The encoded bytearray is empty. And my question: Is it possible to protect large files from errors, without chunking them into smaller pieces and then calculating the error correcting codes?
Is it possible to protect large files from errors, without chunking
them into smaller pieces
Depends on the code. With bytewise RS, chunks are necessary (but this lib does the work for you).
As you can see I want to protect the image from 20% errors that might
occur. The problem here?
Yes. The number isn't meant to be a percent-like thing in the first place. You should really read the examples of the lib, and get to know a bit how RS works.
The number is how many byte out of 255 should be used for error correcting. Eg. 40 means that for each 215 byte data, there will be 40 byte RS code (about 20%), and in that 255 byte it can correct up to 20 bytes error.
Finally, the LDPC principle might be something you want to look into. A bit worse than RS in correcting errors, but noch much, and it's much faster.
Addition from the comments:
If it can be corrected depends on the locations of the error, yes. If full 255-blocks are gone, it can't correct it. To make the span larger, higher-order RS codes could be used (eg. one independent block could have 65536 byte instead of 255), but a) that's again much slower than the (already slow) 255-RS, and b) The RS libs I know can't do it (yours inclded). You would have to write it yourself.
Again, LDPC could help, if it doesn't bother you that it's a completely different thing. Eg. it has no clear values how many errors are too much to correct/detect, it depends on the error pattern too. And since it's newer than RS, there are less codes/libraries online, maybe none for your case.
((Well, it's old too, but for decades nobody was interested in it, until someone realized that it's useful)).
I am wondering how the "+" operator works in python, or indeed how any of the basic arithmetic operators work. My knowledge is very limited with regards to this topic, so I hope this isn't a repeat of a question already here.
More specifically, I would like to know how this code:
a = 5
b = 2
c = a + b
print (c)
produces the result of c = 7 when ran. How does the computer perform this operation? I found a thread on Reddit explaining how the computer performs the calculation in binary (https://www.reddit.com/r/askscience/comments/1oqxfr/how_do_computers_do_math/) which I can understand. What I fail to comprehend however is how the computer knows how to convert the values of 5 and 2 into binary and then perform the calculation. Is there a set formula for doing this for all integers or base 10 numbers? Or is there something else happening at a deeper hardware level here?
Again I'm sorry if this a repeat or if the question seems completely silly, I just can't seem to understand how python can take any two numbers and then sum them, add them, divide them or multiply them. Cheers.
The numbers are always in binary. The computer just isn't capable of keeping then in a different numerical system (well, there are ternary computers but these are a rare exception). The decimal system is just used for a "human representation", so that it is easier to read, but all the symbols (including the symbol "5" in the file, it's just a character) are mapped to numbers through some encoding (e. g. ASCII). These numbers are, of course in binary, just the computer knows (through the specification of the encoding) that if there is a 1000001 in a context of some string of characters, it has to display the symbol a (in the case of ASCII). That's it, the computer don't know the number 58, for it, these are just two symbols and are kept in the memory as ones and zeros.
Now, memory. This is where it's getting interesting. All the instructions and the data are kept in one place as a large buffer of ones and zeros. These are passed to the CPU which (using its instruction set) knows what the first chunk of ones and zeros (this is what we call a "word") means. The first word is an instruction, then the argument(s) follow. Depending on the instruction different things happen. Ok, what happens if the instruction means "add these two numbers" and store the result here?
Well, now it's a hardware job. Adding binary numbers isn't that complicated, it's explained in the link you provided. But how the CPU knows that this is the algorithm and how to execute it? Well, it uses a bunch of "full-adders". What is a "full-adder"? This is a hardware circuit that by given two inputs (each one of them is one bit, i. e. either one or zero) "adds" them and outputs the result to two other bits (one of which it uses for carry). But how the full-adder works? Well, it is constructed (physically) by half-adders, which are constructed by standard and and xor gates. If you're familiar with similar operators (& and ^ in Python) you probably know how they work. These gates are designed to work as expected using the physical properties of the elements (the most important of them being silicon) used in the electronic components. And I think this is where I'll stop.
I have a Matlab script that is old and decrepit, and I am trying to rewrite parts of it in Python. Unfortunately, not only am I unfamiliar with Matlab, but the script was written some 5-6 years ago, so there is no one that fully understands what it does or how it does it. For now the line I am interested in is this:
% Matlab
[rawData,count,errorMsg] = fscanf(serialStream, '%f')
Now, I incorrectly tried to do that as:
# Python
rawData = []
count = 0
while True:
rawData.append(struct.unpack('f', ser.read(4))[0])
count += 1
However, this prints out completely garbage values. Upon further research, I learned that, in Matlab, %f does not mean float like it does in any sensible language, but fixed point number. As such, it makes sense that my data looked like garbage.
Through trial and error, I have determined that I should be getting blocks of 156 bytes from a serial port. However, I am unsure of how many values that translates to, as I can't find documentation that explains how large fixed point numbers are in Matlab (this says they can be up to 128 bits, but that's not very helpful). I have also found the python library decimal, and it seems like I would want to form them from the constituent parts (i.e. provide sign, digits and exponent), but I'm not sure how those are stored in the stream of data I'm getting.
Is there a good way of making a fixed point number from a binary stream in Python? Or do I have to look up the implementation in Matlab and recreate it? Perhaps there's a better way of doing what I want to do?
import string,random,platform,os,sys
def rPass():
sent = os.urandom(random.randrange(900,7899))
print sent,"\n"
intsent=0
for i in sent:
intsent += ord(i)
print intsent
intset=0
rPass()
I need help figuring out total possible outputs for the bytecode section of this algorithm. Don't worry about the for loop and the ord stuff that's for down the line. -newbie crypto guy out.
I won't worry about the loop and the ord stuff, so let's just throw that out and look at the rest.
Also, I don't understand "I need help figuring out total possible outputs for the unicode section of this algorithm", because there is no Unicode section of the algorithm, or in fact any Unicode anything anywhere in your code. But I can help you figure out the total possible outputs of the whole thing. Which we'll do by simplifying it step by step.
First:
li=[]
for a in range(900,7899):
li.append(a)
This is exactly equivalent to:
li = range(900, 7899)
Meanwhile:
li[random.randint(0,7000)]
Because li happens to be exactly 6999 elements long, this is exactly the same as random.choice(li).
And, putting the last two together, this means it's equivalent to:
random.choice(range(900,7899))
… which is equivalent to:
random.randrange(900,7899)
But wait, what about that random.shuffle(li, random.random)? Well (ignoring the fact that random.random is already the default for the second parameter), the choice is already random-but-not-cryptographically-so, and adding another shuffle doesn't change that. If someone is trying to mathematically predict your RNG, adding one more trivial shuffle with the same RNG will not make it any harder to predict (while adding a whole lot more work based on the results may make a timing attack easier).
In fact, even if you used a subset of li instead of the whole thing, there's no way that could make your code more unpredictable. You'd have a smaller range of values to brute-force through, for no benefit.
So, your whole thing reduces to this:
sent = os.urandom(random.randrange(900, 7899))
The possible output is: Any byte string between 900 and 7899 bytes long.
The length is random, and roughly evenly distributed, but it's not random in a cryptographically-unpredictable sense. Fortunately, that's not likely to matter, because presumably the attacker can see how many bytes he's dealing with instead of having to predict it.
The content is random, both evenly distributed and cryptographically unpredictable, at least to the extent that your system's urandom is.
And that's all there is to say about it.
However, the fact that you've made it much harder to read, write, maintain, and think through gives you a major disadvantage, with no compensating disadvantage to your attacker.
So, just use the one-liner.
I think in your followup questions, you're asking how many possible values there are for 900-7898 bytes of random data.
Well, how many values are there for 900 bytes? 256**900. How many for 901? 256**901. So, the answer is:
sum(256**i for i in range(900, 7899))
… which is about 2**63184, or 10**19020.
So, 63184 bits of security sounds pretty impressive, right? Probably not. If your algorithm has no flaws in it, 100 bits is more than you could ever need. If your algorithm is flawed (and of course it is, because they all are), blindly throwing thousands more bits at it won't help.
Also, remember, the whole point of crypto is that you want cracking to be 2**N slower than legitimate decryption, for some large N. So, making legitimate decryption much slower makes your scheme much worse. This is why every real-life working crypto scheme uses a few hundred bits of key, salt, etc. (Yes, public-key encryption uses a few thousand bits for its keys, but that's because its keys aren't randomly distributed. And generally, all you do with those keys it to encrypt a randomly-generated session/document key of a few hundred bits.)
One last thing: I know you said to ignore the ord, but…
First you can write that whole part as intsent=sum(bytearray(sent)).
But, more importantly, if all you're doing with this buffer is summing it up, you're using a lot of entropy to generate a single number with a lot less entropy. (This should be obvious once you think about it. If you have two separate bytes, there are 65536 possibilities; if you add them together, there are only 512.)
Also, by generating a few thousand one-byte random numbers and adding them up, that's basically a very close approximation of a normal or gaussian distribution. (If you're a D&D player, think of how 3D6 gives 10 and 11 more often than 3 and 18… and how that's more true for 3D6 than for 2D6… and then consider 6000D6.) But then, by making the number of bytes range from 900 to 7899, you're flattening it back toward a uniform distribution from 700*127.5 to 7899*127.5. At any rate, if you can describe the distribution you're trying to get, you can probably generate that directly, without wasting all this urandom entropy and computation.
It's worth noting that there are very few cryptographic applications that can possibly make use of this much entropy. Even things like generating SSL certs use on the order of 128-1024 bits, not 64K bits.
You say:
trying to kill the password.
If you're trying to encrypt a password so it can be, say, stored on disk or sent over the network, this is almost always the wrong approach. You want to use some kind of zero-knowledge proof—store hashes of the password, or use challenge-response instead of sending data, etc. If you want to build a "keep me logged in feature", do that by actually keeping the user logged in (create and store a session auth token, rather than storing the password). See the Wikipedia article password for the basics.
Occasionally, you do need to encrypt and store passwords. For example, maybe you're building a "password locker" program for a user to store a bunch of passwords in. Or a client to a badly-designed server (or a protocol designed in the 70s). Or whatever. If you need to do this, you want one layer of encryption with a relatively small key (remember that a typical password is itself only about 256 bits long, and has less than 64 bits of actual information, so there is absolutely no benefit from using a key thousands of times as long as they). The only way to make it more secure is to use a better algorithm—but really, the encryption algorithm will almost never be the best attack surface (unless you've tried to design one yourself); put your effort into the weakest areas of the infrastructure, not the strongest.
You ask:
Also is urandom's output codependent on the assembler it's working with?
Well… there is no assembler it's working with, and I can't think of anything else you could be referring to that makes any sense.
All that urandom is dependent on is your OS's entropy pool and PRNG. As the docs say, urandom just reads /dev/urandom (Unix) or calls CryptGenRandom (Windows).
If you want to know exactly how that works on your system, man urandom or look up CryptGenRandom in MSDN. But all of the major OS's can generate enough entropy and mix it well enough that you basically don't have to worry about this at all. Under the covers, they all effectively have some pool of entropy, and some cryptographically-secure PRNG to "stretch" that pool, and some kernel device (linux, Windows) or user-space daemon (OS X) that gathers whatever entropy it can get from unpredictable things like user actions to mix it into the pool.
So, what is that dependent on? Assuming you don't have any apps wasting huge amounts of entropy, and your machine hasn't been compromised, and your OS doesn't have a major security flaw… it's basically not dependent on anything. Or, to put it another way, it's dependent on those three assumptions.
To quote the linux man page, /dev/urandom is good enough for "everything except long-lived GPG/SSL/SSH keys". (And on many systems, if someone tries to run a program that, like your code, reads thousands of bytes of urandom, or tries to kill the entropy-seeding daemon, or whatever, it'll be logged, and hopefully the user/sysadmin can deal with it.)
hmmmm python goes through an interpreter of its own so i'm not sure how that plays in
It doesn't. Obviously calling urandom(8) does a bunch of extra stuff before and after the syscall to read 8 bytes from /dev/urandom than you'd do in, say, a C problem… but the actual syscall is identical. So the urandom device can't even tell the difference between the two.
but I'm simply asking if urandom will produce different results on a different architecture.
Well, yes, obviously. For example, Linux and OS X use entirely different CSPRNGs and different ways of accumulating entropy. But the whole point is that it's supposed to be different, even on an identical machine, or at a different time on the same machine. As long as it produces "good enough" results on every platform, that's all that matters.
For instance would a processor\assembler\interpreter cause a fingerprint specific to said architecture, which is within reason stochastically predictable?
As mentioned above, the interpreter ultimately makes the same syscall as compiled code would.
As for an assembler… there probably isn't any assembler involved anywhere. The relevant parts of the Python interpreter, the random device, the entropy-gathering service or driver, etc. are most likely written in C. And even if they were hand-coded in assembly, the whole point of coding in assembly is that you pretty much directly control the machine code that gets generated, so different assemblers wouldn't make any difference.
The processor might leave a "fingerprint" in some sense. For example, I'll bet that if you knew the RNG algorithm, and controlled its state directly, you could write code that could distinguish an x86 vs. an x86_64, or maybe even one generation of i7 vs. another, based on timing. But I'm not sure what good that would do you. The algorithm will still generate the same results from the same state. And the actual attacks used against RNGs are about attacking the algorithm the entropy accumulator, and/or the entropy estimator.
At any rate, I'm willing to bet large sums of money that you're safer relying on urandom than on anything you come up with yourself. If you need something better (and you don't), implement—or, better, find a well-tested implementation of—Fortuna or BBS, or buy a hardware entropy-generating device.
I have a little C program that's continuously acquiring a stream of data and sending it via UDP, and in real time, to a different computer. The basic framework for what I originally set out to do has been laid. In addition, however, I'd like to visualize in real time the data that's being acquired. To that end, I was thinking of using Python and its various plotting libraries. My question is how difficult it would be to let Python have access to what is essentially a first in, first out circular buffer of my C program. For concreteness, let's assume there are 1024 samples in this buffer. Does the idea of "letting Python have a continuous peek at dynamic C array" even sound reasonable/possible? If not, what sort of plotting options are best suited to this problem?
Thanks.
You can quite easily listen to your UDP port with the standard socket module. Examples are available.
As a first step, your data could go in a simple Python list, as lists are optimized for appending data. Removing the first elements takes much more time, so you might want to only do this from time to time, and only plot, in the mean time, the last 1024 (or whatever) elements of the list.
Plotting can then conveniently be done with the famous Matplotlib plotting library: matplotlib.pyplot.plot(data_list). Since you want real time, you might find the animation examples useful.
If you need to optimize the data acquisition speed, you can have the (also famous) NumPy array-manipulation library directly interpret the data from the stream as an array of numbers (Matplotlib can plot such arrays), with the numpy.frombuffer() function.
It is possible, but not too simple.
You should inform yourself about the API and maybe have a look at some implementations.
If you have done so, you can maybe provide a function which not only gives you a peek at the raw array, but maybe even reassembles it into the right order and length (if it is a circular buffer). This might be very convenient as you nevertheless have to copy the data.