Identifying an unknown file encoding and reading in it Python

Identifying an unknown file encoding and reading in it Python - python

I am accustomed to dealing with top level data using SQL (I used some Fortran IV and COBOL back in the day), and am trying to re-train myself in Python. I have a problem reading a file which I think is uuencoded. Could you confirm this, or suggest what it may be, and what the best way to read it in Python would be? Here it is:
4¬xUÕÀÀBAyJ¾ ‚Å

Related

How do I convert code of another language into Python?

I am wondering how can I convert Stata code into Python code.
For example, my Stata code looks like
if ("`var1'"=="") {
local l_QS "select distinct CountryName from `l_tableName'"
ODBCLoad, exec("`l_QS'") dsn("`dsn'") clear
}
And I want to convert it to Python code such as
if (f"{var1}"=="") :
l_QS = f"select distinct CountryName from {l_tableName}"
SQL_read(f"{l_QS}", dsn = f"{dsn}")
I am new to coding so I don't know what branch of computer science knowledge or what tools/techniques are relevant. I suppose knowledge about compilers and/or using regular expressions may help so I put those tags on my question. Any high-level pointers are appreciated, and specific code examples would be even better. Thanks in advance.

A very simple workaround would be to use the subprocess module included with python and write a basic command line wrapper to your scripts to use their functionality, then build your code from now on in python.
You could also look into possible API functionality in Stata if you have a whole lot of Stata code and it would take forever to convert it manually to python. This would require you to have access to a server and could be potentially costly, but would be cleaner than the subprocess module and wouldn't require the source code to be contained on your local machine. Also note that it's possible that Stata does not have tools to build an API.
As far as I am aware there are no projects that will directly parse a file from any language and convert it into python. This would be a huge project, although maybe with machine learning or AI it would be possible, though still very difficult. There are libraries for wrapping code in C and C++ (others too I'm sure I just know that these are available), but I can't find anything for Stata.

Append unique fingerprint to file

I have a set of files (compiled software) that I want to give an unique fingerprint before distribution. The idea is to write a script that:
Randomly generates a character sequence
Appends the character sequence to a file in the project
Stores the fingerprint in a database with the addressee
Distributes the software to the addressee
The requirements for the fingerprint process is that:
The fingerprint is difficult to detect (i.e. not stored in the file metadata or easily accessible areas)
The fingerprint does not corrupt the data of the file the sequence is added to
The fingerprint can be added to an executable or dll file
It's easy to read the fingerprint if you know where to look
Are there any open source solutions that is built for the purpose of fingerprinting files?

Storing information in the file without corrupting it and in a way that is not easily detectable is an exercise in steganography, and quite a hard one. This theoretical tool needs to be able to parse executable structure, and properly modify it, edit offsets if needed, or detect padding arias, or basically do some of the work that the compiler is doing. I doubt that it exists or is reliable.
However, there are quite a few steganography tools that can store information in pictures by subtly changing the colors of the pixels, perhaps you can store your information in the icon of the exe file or any included asset.
Another way is to hide the data at compilation time, in optimization level of the performance-uncritical parts of the executable, so that compiler generates slightly different code, but the behavior is guaranteed to stay consistent. You can now use file hashes as your fingerprint.
Yet another way is to just create unused string inside some random function, mark it as volatile or analog in your language of choice to prevent the compiler from optimizing it out of your program and put something noticeable in it, like REPLACE_ME. Now you can open this file, search for this string and replace it with the identifier that you have generated. If identifier and the string were the same length - you can’t damage your software.
Another, more subtle way is to create multiple different rephrasings of the same messages in your app and swap them in and out as a way to differentiate versions. If your programming language stores null-terminated strings then this is very easy, just make your strings in the code as long as the longest rephrasing. If your language stores length of the string then you have to dynamically recalculate it too.
Alternatively, if you are working with the Unicode strings in your code, then you can use similar-looking glyphs in some strings as a less effort version of previous idea. Basically you are performing a homograph attack on your strings. Alternatively you can use unicode control chars (ZWJ, ZWNJ, etc.) that do not affect most languages and are invisible.
All schemes is easily discovered by diffing two different distributions of the software, the one with the different optimization levels could be plausibly written off as just different builds of the software, but the persistent attacker still could figure it out.

Since you are talking about compiled software, maybe an alternative solution could be to use an execbinary encrypting tool. When you execute the file it will ask for a password, if it's correct then it will use the password to generate a key. Then it uses that key to decrypt the program directly in memory. That way they won't be able to analyze the binary and even with the key it would be a lot more difficult to do so, much less modify it. You can put as many fingerprints as you like, regular text strings, into the code and they will most likely stay there.

How to detect end of file using scipy.io.FortranFile

I am reading an formatted sequential file output from a Fortran program. I am using the scipy.io.FortranFile class to do this, and am successfully extracting the information I need.
My problem: I do not know how long the input file is, and have no way of knowing how many records to read in. Currently, I am simply iteratively reading the file until an exception is raised (a TypeError, but I don't know if this is how it would always fail). I would prefer to do this more elegantly.
Is there anyway to detect EOF using the FortranFile class? Or alternately is there a better way to read in unformatted sequential files?
Some cursory research (I am not a Fortran programmer) indicates to me that if reading this using the Fortran READ function, one can check the IOSTAT flag to determine if you are at the end of the file. I would be surprised if a similar capability isn't provided in the FortranFile class, but I don't see any mention of it in the documentation.

Interfacing with Python code via file read/write?

Working with a Windows program that has it's own language with minimal interfacing options with external code, but it can read & write to files. I am looking for a method to send a set of configuration values to Python 3 code like "12,43,47,62" to query data in Pandas and return the associated results.
Someone mentioned this could possibly be done through a file interface where inputs were written to a file from the originating program and values were read back from an alternate file. I have a couple of questions regarding this concept hopefully someone could clarify for me.
How well does this method handle simultaneous access where multiple calls are being made for different queries?
What is the correct terminology for this type of task?
Is there a way to do it so the Python code senses the change as opposed to repeatedly checking for changes?

1) Poorly. You should put each query in its own file, responses in their own files, and encode request ID's or other information in the file names.
2) I'm not sure there is one. "File Based Communication" maybe.
3) Yes, Python watchdog.

Python Audio Edit

I am searching for a way to write a simple python
program to perform an automatic edit on an audio file.
I wrote with PIL automatic picture resizing to a predefined size.
I would like to write the same for automatic file re-encoding into a predefined bitrate.
similarly, i would like to write a python program that can stretch an audio file and re-encode it.
do i have to parse MP3's by myself, or is there a library that can be used for this?

Rather than doing this natively in Python, I strongly recommend leaving the heavy lifting up to FFMPEG, by executing it from your script.
It can chop, encode, and decode just about anything you throw at it. You can find a list of common parameters here: http://howto-pages.org/ffmpeg/
This way, you can leave your Python program to figure out the logic of what you want to cut and where, and not spend a decade writing code to deal with all of the audio formats available.
If you don't like the idea of directly executing it, there is also a Python wrapper available for FFMPEG.

There is pydub. It's an easy to use library.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.