decoding a file with .msr extension - python

I am trying to decode a file with .msr extension. This is the data file for the old version of the program "PHYWE measure 4". This program is for measuring various physical experiments. It has a completely incomprehensible encoding, I went through all the available encodings in notepad++ and tried to read bytes using python. The first line contains data like this:
\x19\x05\x06\x07\x08\tmeasure45 FileFormat\x04\x00\x01\x00X\x00\x00\x00\x02\x00\xfc\x1d\xba\x13\x00\x00\x00\x004\xa1\xd3\xdf\xb7\xca\xe5#\xa4\xf8\xb8\x13\xe4\x17\xb9\x13\x00\x00\x00\x00\x02\x00\x00\x00\xa3\xf3\x00\x00\x01\x00\x00\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\xa4p}?\n
Can you please tell me if it is possible to get numerical data in my case?

The PHYWE measure software uses a proprietary (binary) file format, but you can download the software for free from the PHYWE website:
https://repository.curriculab.net/files/software/setupm.exe
In the measure application, you can use the option "Export data..." to either save the numerical data as text file or copy the values to the clipboard.

Related

Identifying an unknown file encoding and reading in it Python

I am accustomed to dealing with top level data using SQL (I used some Fortran IV and COBOL back in the day), and am trying to re-train myself in Python. I have a problem reading a file which I think is uuencoded. Could you confirm this, or suggest what it may be, and what the best way to read it in Python would be? Here it is:
4¬xUÕÀÀBAyJ¾ ‚Å

Windows file names displayed corrupted characters in Linux

I believe this is a common issue when it comes to the default encoding of characters on Linux and Windows. However after I searched the internet I have not got any easy way to fix it automatically and therefore I am about to write a script to do it.
Here is the scenario:
I created some files on Windows system, some with non-English names (Chinese specifically in my case). And I compressed them into a zip file using 7-zip. After that I downloaded the zip file to a Linux and extract the files on the Linux system (Ubuntu 16.04 LTS) (the default archive program). As much as I have guessed, all the non-English file names are now displayed as some corrupted characters! At first I thought it should be easy with convmv, but...
I tried convmv, and it says:"Skipping, already utf8". Nothing got changed.
And then I decided to write a tool using Python to do the dirty job, after some testing I come to a point where I cannot associate the original file names to the corrupted file names, (unless by hashing the contents.)
Here is an example. I setup a webserver to list the file names on Windows, and one file, after encoded with "gbk" in python, is displayed as
u'j\u63a5\u53e3\u6587\u6863'
And I can query the file names on my Linux system. I can create a file directly with the name as shown above, and the name is CORRECT. I can also encode the unicode gbk string to utf8 encoding and create a file, the name is also CORRECT. (Thus I am not able to do them at the same time since they are indeed the same name). Now when I read the file name which I extracted earlier, which should be the same file. BUT the file name is completely different as:
'j\xe2\x95\x9c\xe2\x95\x99.....'
decoding it with utf8, it is something like u'j\u255c\u2559...'. decoding it with gbk resulted in UnicodeDecodeError exception, and I also tried to decode it with utf8 and then encode with gbk, but the result is still something else.
To summarize it, I cannot inspect the original file name by decoding or encoding it after it was extracted to the linux system. If I really want to let a program do the job, I have to either re-do the archive with possibly some encoding options maybe, or just go with my script but using file content hash (like md5 or sha1) to determine its original file name on Windows.
Do I still got any chance to infer the original name from a python script in above case other than comparing file contents between two systems?
With a little experimentation with common encodings, I was able to reverse your mojibake:
bad = 'j\xe2\x95\x9c\xe2\x95\x99\xe2\x94\x90\xe2\x94\x8c\xe2\x95\xac\xe2\x94\x80\xe2\x95\xa1\xe2\x95\xa1'
>>> good = bad.decode('utf8').encode('cp437').decode('gbk')
>>> good
u'j\u63a5\u53e3\u6587\u6863' # u'j接口文档'
gbk - common Chinese Windows encoding
cp437 - common US Windows OEM console encoding
utf8 - common Linux encoding

Python write to ram file when using command line, ghostscript

I want to run this command from python:
gs.exe -sDEVICE=jpeg -dTextAlphaBits=4 -r300 -o a.jpg a.pdf
Using ghostscript, to convert pdf to series of images. How do I use the RAM for the input and output files? Is there something like StringIO that gives you a file path?
I noticed there's a python ghostscript library, but it does not seem to give much more over the command line
You can't use RAM for the input and output file using the Ghostscript demo code, it doesn't support it. You can pipe input from stdin and out to stdout but that's it for the standard code.
You can use the Ghostscript API to feed data from any source, and you can write your own device (or co-opt the display device) to have the page buffer (which is what the input is rendered to) made available elsewhere. Provided you have enough memory to hold the entire page of course.
Doing that will require you to write code to interface with the Ghostscript shared object or DLL of course. Possibly the Python library does this, I wouldn't know not being a Python developer.
I suspect that the pointer from John Coleman is sufficient for your needs though.

Read entire physical file including file slack with python?

Is there a simple way to read all the allocated clusters of a given file with python? The usual python read() seemingly only allows me to read up to the logical size of the file (which is reasonable, of course), but I want to read all the clusters including slack space.
For example, I have a file called "test.bin" that is 1234 byte in logical size, but because my file system uses clusters of size 4096 bytes, the file has a physical size of 4096 bytes on disk. I.e., there are 2862 bytes in file slack space.
I'm not sure where to even start with this problem... I know I can read the raw disk from /dev/sda, but I'm not sure how to locate the clusters of interest... of course this is the whole point of having a file-system (to match up names of files to sectors on disk), but I don't know enough about how python interacts with the file-system to figure this out... yet... any help or pointers to references would be greatly appreciated.
Assuming an ext2/3/4 filesytem, as you guess yourself, your best bet is probably to:
use a wrapper (like this one) around debugfs to get the list of blocks associated with a given file:
debugfs: blocks ./f.txt
2562
to read-back that/those block(s) from the block device / image file
>>> f = open('/tmp/test.img','rb')
>>> f.seek(2562*4*1024)
10493952
>>> bytes = f.read(4*1024)
>>> bytes
b'Some data\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
...
Not very fancy but that will work. Please note you don't have to mount the FS to do any of these steps. This is especially important for forensic applications where you cannot trust in anyway the content of the disk and/or are not allowed per regulation to mount the disk image.
There is a C open source forensic tool that implements file access successfully.
Here is an overview of the tool Link.
You can download it here.
It basically uses the POSIX system call open() that returns a raw file discriptor (integer) that you can use with the POSIX system calls read() and write() without the restrinction of stopping at EOF which stops you from accessing the file slack.
There are lots of examples online how to do a system call with python e.g. this one

How can I load byte strings into a file and put them in another location (python)

Sorry if that doesn't describes what I need to do but i need to make a program or script that goes to the 'Pictures' folder on a windows system grab a picture (I assume using a byte string) store it in a file (pickle or...) and load the file into another folder...
long story short, I have a program that would be complete if I could add a function that can be run on a computer (mine or anyone with my program installed) go to there 'pictures' folder on a windows os and take a picture image file and store them in a transportable file (pickle) then take that file and unload(pickle) it on my/another computer using a function preferably the same one
As I mentioned in my earlier comment you could write something that open()ed both the source image file and a destination file in binary mode, and then use the file read() and write() methods to copy the bytes from one file to the other.
However that's a somewhat low-level approach would be reinventing the wheel.
A better alternative would be to just use one of the existing copyfile...() or even higher-level copy...() file copying functions in the shutil module, which you can read about here.

Categories

Resources