python - obfuscating distributed text/image/sound files

python - obfuscating distributed text/image/sound files - python

In distributing my app, I'd like to prevent casual users from viewing my png files, playing my mp3s or reading/modifying the plain text files I use to load and store data. The text I guess could be binary pickled? What about the images/sounds? What do you do when distributing your app?
Assuming py2exe or py2app.

You can use zip files, but they'll be visible while the program is running; you could extract them to a run-time generated temporary directory with tempfile.mkdtemp(), but it still would not be difficult to track them down.
Another solution would be to use a light-weight encryption, or even simple obfuscation (such as ROT13 for the text files, and a simple xor cipher on the binary files). This will add some time to the execution of your program, so make sure and take that into account.

You could archive those files and at runtime unarchive, use, then delete them:
Here is an article regarding Work with ZIP archives
Not a very strong protection method, but it will discourage hobby hackers

Related

In my "small" python exe GUI program the tcl folder has 820 files (mostly tzdata). Any chance of reducing this number?

As stated in the title:; I have a "small" python exe GUI program generated by pyinstaller which creates a tcl folder that has 820 files (mostly tzdata). Any chance of reducing this number?
It takes a long time to copy the program because of all the tiny files.
I've used the datetime library. I just need the date and time to pop up on a pdf that I'm printing, so doesn't need to be that fancy. I just need the time on the computer :)
I can use "--onefile" to just get the .exe, but that takes too long to open.
Program is only for Windows atm.

You can almost certainly delete the http1.0 and opt0.4 directories outright. They're obsolete packages included for backward compatibility only.
The *.tcl and tclIndex files should be left (except for parray.tcl, which you likely don't need).
Of the encoding, msgs and tzdata directories, if you're deploying in a restricted set of locations, you can delete a lot of that; you only need the encodings, message catalogs and timezone definitions that you actually use when running. Thus, if you're only supporting English speakers in the USA, you can delete a very large fraction of the files. (If you're not using Tcl to format or parse dates at all, you don't need any timezone definitions.) The main encoding that you must retain is the one that the scripts are written in! (NB: support for the UTF-8 and ISO8859-1 encodings, and the UTF-16-derived ones used for talking to the Windows API, are all built in directly to Tcl; you can't remove support for them.)
Which things you can remove depend on your application and where you deploy it. That's why we can't tell you outright which files to delete.

Generally the 'blunt' approach is to attack the problem by deleting the files(or some files) and see if your programm works as intended without any bugs.That can be some times rather complicating and time consuming,and some times not even possible.
Libraries like pyinstaller and cx_freeze tends to be super inclusive of files that you don't even need so the programm is guaranteed to work.
Generally i advise you to create an installer for your programm(like Inno Setup) that will look really more professional and will diminish your current problem.
Also python supports ziped libraries that can drastically discreaze the size of the app on some libraries.Look one of my own question on the topic Python3 compiled App Decrease Size with zip?.
Have fun!

Is there any way to protect executable's resources?

Is there any way to protect my executable file's resources such as .png and more which i used to make button designs etc in my python executable. Like if someone mess with them the executable will fail.
I mean like zipping or something which user cannot read or write but program or executable can.

You could protect them with a checksum (Like SHA-2), if the resource is changed, checksum will be changed, and you can emit an error.
Another approach would be to load it from a blob embedded into the program as byte array. This approach is worse, but it would help to prevent accidental tampering.
But: As soon as somebody, with enough interest downloads your program, everything you try to protect your resources, will fail

This can get pretty complicated fast, what you're looking for obfuscation.
You could go as simple as just checking an SHA1 checksum of a file you load to make sure it hasn't been altered, to cryptographically encoding your source to prevent targeted reverse engineering attacks.
I would recommend the following websites to read more about this:
https://pyob.oxyry.com/
https://docs.python.org/3/library/hashlib.html
But overall, this is a topic which is a bit too complex for a simple answer.

Attribute system similar to HTTP Headers for local files

I am in the process of writing a program and need some guidance. Essentially, I am trying to determine if a file has some marker or flag attached to it. Sort of like the attributes for a HTTP Header.
If such a marker exists, that file will be manipulated in some way (moved to another directory).
My question is:
Where exactly should I be storing this flag/marker? Do files have a system similar to HTTP Headers? I don't want to access or manipulate the contents of the file, just some kind of property of the file that can be edited without corrupting the actual file--and it must be rather universal among file types as my potential domain of file types is unbound. I have some experience with Web APIs so I am familiar with HTTP Headers and json. Does any similar system exist for local files in windows? I am especially interested in anyone who has professional/industry knowledge of common techniques that programmers use when trying to store 'meta data' in files in order to access them later. Or if anyone knows of where to point me, as I am unsure to what I should be researching.
For the record, I am going to write a program for Windows probably using Golang or Python. And the files I am going to manipulate will be potentially all common ones (.docx, .txt, .pdf, etc.)

Metadata you wish to add is best kept in a separate file or database for all files.
Or in another file with same name and different extension or prefix, that you can make hidden.
Relying on a file system is very tricky and your data will be bound by the restrictions and capabilities of the file system your file is stored on.
And, you cannot count on your data remaining intact as any application may wish to change these flags.
And some of those have very specific, clearly defined use, such as creation time, modification time, access time...
See, if you need only flagging the document, you may wish to use creation time, which will stay unchanged through out the live of this document (until is copied) to store your flags. :D
Very dirty business, unprofessional, unreliable and all that.
But it's a solution. Poor one, but exists.
I do not know that FAT32 or NTFS file systems support any extra bits for flagging except those already used by the OS.
Unixes EXT family FS's do support some extra bits. And even than you should be careful in case some other important application makes use of them for something.
Mac OS may support some metadata by itself, but I am not 100% sure.
On Windows, you have one more option to associate more data with a file, but I wouldn't use that as well.
Well, NTFS file system (FAT doesn't support that) has a feature called streams.
In essential, same file can have multiple data streams under itself. I.e. You have more than one file contents under same file node.
To be more clear. Same file contains two different files.
When you open the file normally only main stream is visible to the application. Applications must check whether the other streams are present and choose the one they want to follow.
So, you may choose to store metadata under the second stream of the file.
But, what if all streams are taken?
Even more, anti-virus programs may prevent you access to the metadata out of paranoya, or at least ask for a permission.
I don't know why MS included that option, probably for file duplication or something, but bad hackers made use of the fact that you can store some data, under existing regular file, that nobody is aware of.
Imagine a virus writing it's copy into another stream of one of programs already there.
All that is needed for it to start, instead of your old program next time you run it is a batch script added to task scheduler that flips two streams making the virus data the main one.
Nasty trick! So when this feature started to be abused, anti-virus software started restricting files with multiple streams, so it's like this feature doesn't exist.
If you want to add some metadata using OS's technology, use Windows registry,
but even that is unwise.
What to tell you?
Don't add metadata to files, organize a separate file, or index your data in special files with same name as the file you are refering to and in same folder.

If you are dealing with binary files like docx and pdf, you're best off storing the metadata in seperate files or in a sqlite file.
Metadata is usually stored seperate from files, in data structures called inodes (at least in Unix systems, Windows probably has something similar). But you probably don't want to get that deep into the rabbit hole.
If your goal is to query the system based on metadata, then it would be easier and more efficient to use something SQLite. Having the meta data in the file would mean that you would need to open the file, read it into memory from disk, and then check the meta data - i.e slower queries.
If you don't need to query based on metadata, then storing metadata in the file might make sense. It would reduce the dependencies in your application, but in order to access the contents of the file through Word or Adobe Reader, you'd need to strip the metadata before handing it off to the application. Not worth the hassle, usually

Create split archives (zip, rar, 7z)?

In short:
I need to split a single (or more) file(s) into multiple max-sized archives using dummy-safe format (e.g. zip or rar anything that work will do!).
I would love to know when a certain part is done (callback?) so I could start shipping it away.
I would rather not do it using rar or zip command line utilities unless impossible otherwise.
I'm trying to make it os independent for the future but right now I can live if the compression could be made only on linux (my main pc) I still need to make it easily opened in windows (wife's pc)
In long:
I'm writing an hopefully to-be-awesome backup utility that scans my pictures folder, zips each folder and uploads them to whatever uploading class is registered (be it mail-sending, ftp-uploading, http-uploading).
I used zipfile to create a gigantic archive for each folder but since my uploading speed is really bad I let it work at only at nights but my internet goes off occassionally and the whole thing messes up. So I decided to split it into ~10MB pieces. I found no way of doing it with zipfile so I just added files to the zip until it reached > 10MB.
Problem is there are often 200-300MB and sometimes more videos in there and again we reach the middle-of-the-night cutoffs.
I am using Subprocess with "rar" right now to create the split archives but since directories are so big and I'm using large compression this thing takes ages even the first files are already ready - this is why I love to know when a file is ready to be sent.
So short story long I need a good way to split it into max-sized archives.
I am looking at making it somewhat generic and as dummy-proof as possible as eventually I'm planning on making it some awesome extensible backup library..

What is the best utility/library/strategy with Python to copy files across multiple computers?

I have data across several computers stored in folders. Many of the folders contain 40-100 G of files of size from 500 K to 125 MB. There are some 4 TB of files which I need to archive, and build a unfied meta data system depending on meta data stored in each computer.
All systems run Linux, and we want to use Python. What is the best way to copy the files, and archive it.
We already have programs to analyze files, and fill the meta data tables and they are all running in Python. What we need to figure out is a way to successfully copy files wuthout data loss,and ensure that the files have been copied successfully.
We have considered using rsync and unison use subprocess.POPEn to run them off, but they are essentially sync utilities. These are essentially copy once, but copy properly. Once files are copied the users would move to new storage system.
My worries are 1) When the files are copied there should not be any corruption 2) the file copying must be efficient though no speed expectations are there. The LAN is 10/100 with ports being Gigabit.
Is there any scripts which can be incorporated, or any suggestions. All computers will have ssh-keygen enabled so we can do passwordless connection.
The directory structures would be maintained on the new server, which is very similar to that of old computers.

I would look at the python fabric library. This library is for streamlining the use of SSH, and if you are concerned about data integrity I would use SHA1 or some other hash algorithm for creating a fingerprint for each file before transfer and compare the fingerprint values generated at the initial and final destinations. All of this could be done using fabric.

If a more seamless python integration is the goal you can look at,
Duplicity
pyrsync

I think rsync is the solution. If you are concerned about data integrity, look at the explanation of the "--checksum" parameter in the man page.
Other arguments that might come in handy are "--delete" and "--archive". Make sure the exit code of the command is checked properly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.