Embed Strings as a Resource

Embed Strings as a Resource - python

I'm writing a DLL, and I would like to, post-compilation, add some strings to it as an embedded resource.
To do that, I'm using a Python script that looks similar to the following:
hRes = win32api.BeginUpdateResource(myFile, 0)
win32api.UpdateResource(hRes, win32con.RT_STRING, 409, buf, 1033)
win32api.EndUpdateResource(hRes, 0)
And that appears to work, I can see the strings in the PE with my hex-editor.
The problem occurs when my Dll tries to use LoadString() to pull that string back out.
The call is something like:
LoadString(myDll, 409, someBuf, lenOfBuf);
And my program is appearing to de-reference a bad pointer in the LoadString() call.
Does my problem lie with how I'm adding the string, or pulling it out? And can anyone point me towards example code that does both steps?
Edit: I'd prefer to use the Win32 APIs for this.

You can do it manually, by taking the DLL, appending the data you want to store to it, and after it a 4byte integer containing the size of the appended data in bytes.
Now, if you want to read the data, read the latest 4 bytes of the file, interpret it as an integer (watch byte order), and read that amount of bytes from the end of the file (if the amount of bytes is N you read from END - N - 4 to END - 4).

Silly me, I just needed to use the STRINGTABLE structure instead of just dumping in raw strings.

Related

How to read complex data from TB size binary file, fast and keep the most accuracy?

Use Python 3.9.2 read the beginning of TB size binary file (piece of it) as below:
file=open(filename,'rb')
bytes=file.read(8)
print(bytes)
b'\x14\x00\x80?\xb5\x0c\xf81'
I tried np.fromfile np.fromfile(np.complex64) ways to read the file filename.
float_data1 = np.fromfile(filename,np.float32)
float_data2 = np.fromfile(filename,np.complex64)
As the binary file always bigger than 500GB,even TB size,how to read complex data from TB size binary file, fast and keep the most acuuracy?

This is related to your ham post.
samples = np.fromfile(filename, np.complex128)
and
Those codes equal to -1.9726906072368233e-31,+3.6405886029665884e-23.
No, they don't equal that. That's just your interpretation of bytes as float64. That interpretation is incorrect!
You assume these are 64-bit floating point numbers. They are not; you really need to stop assuming that; it's wrong, and we can't help you if you still act as if it were 64-bit floats forming a 128 bit complex value.
Besides documents,I compare the byte content in the answer,that is more than reading docs.
As I already pointed out, that is wrong. Your computer can read anything as any type, just as you tell them, even if it's not the original type it's been stored in. You stored complex64, but read complex128. That's why your values are so inplausible.
It's 32-bit floats, forming a 64 bit complex value. The official block documentation for the file sink also points that out, and even explains the numpy dtype you need to use!
Anyways, you can use numpy's memmap functionality to map the file contents without reading them all to RAM. That works. Again, you need to use the right dtype, which is, to repeat this the 10th time, not complex128.
It's really easy:
data = numpy.memmap(filename, dtype=numpy.complex64)
done.

Divide and Conquer Lists in Python (to read sav files using pyreadstat)

I am trying to read sav files using pyreadstat in python but for some rare scenarios I am getting error of UnicodeDecodeError since the string variable has special characters.
To handle this I think instead of loading the entire variable set I will load only variables which do not have this error.
Below is the pseudo-code that I have with me. This is not a very efficient code since I check for error in each item of list using try and except.
# Reads only the medata to get information about the variables
df, meta = pyreadstat.read_sav('Test.sav', metadataonly=True)
list = meta.column_names # All variables are stored in list
result = []
for var in list:
print(var)
try:
df, meta = pyreadstat.read_sav('Test.sav', usecols=[str(var)])
# If no error that means we can store this variable in result
result.append(var)
except:
pass
# This will finally load the sav for non error variables
df, meta = pyreadstat.read_sav('Test.sav', usecols=result)
For a sav file with 1000+ variables it takes a long amount of time to process this.
I was thinking if there is a way to use divide and conquer approach and do it faster. Below is my suggested approach but I am not very good in implementing recursion algorithm. Can someone please help me with pseudo code it would be very helpful.
Take the list and try to read sav file
In case of no error then output can be stored in result and then we read the sav file
In case of error then split the list into 2 parts and run these again ....
Step 3 needs to run again until we have a list where it does not give any error
Using the second approach 90% of my sav files will get loaded on the first pass itself hence I think recursion is a good method
You can try to reproduce the issue for sav file here

For this specific case I would suggest a different approach: you can give an argument "encoding" to pyreadstat.read_sav to manually set the encoding. If you don't know which one it is, what you can do is iterate over the list of encodings here: https://gist.github.com/hakre/4188459 to find out which one makes sense. For example:
# here codes is a list with all the encodings in the link mentioned before
for c in codes:
try:
df, meta = p.read_sav("Test.sav", encoding=c)
print(encoding)
print(df.head())
except:
pass
I did and there were a few that may potentially make sense, assuming that the string is in a non-latin alphabet. However the most promising one is not in the list: encoding="UTF8" (the list contains UTF-8, with dash and that fails). Using UTF8 (no dash) I get this:
నేను గతంలో వాడిన బ
which according to google translate means "I used to come b" in Telugu. Not sure if that fully makes sense, but it's a way.
The advantage of this approach is that if you find the right encoding, you will not be loosing data, and reading the data will be fast. The disadvantage is that you may not find the right encoding.
In case you would not find the right encoding, you anyway would be reading the problematic columns very fast, and you can discard them later in pandas by inspecting which character columns do not contain latin characters. This will be much faster than the algorithm you were suggesting.

Loop over BIG json file without loading the file in advance - Python [duplicate]

This question already has answers here:
Is there a memory efficient and fast way to load big JSON files?
(11 answers)
Closed 3 years ago.
I would like to load one by one the items of my json file. The file could be up to 3gb so loading it in advance and looping over it is not an option.
My json file is basically a dictionary of key and value pairs (hundreds of pairs), and there is nothing I want to discard (ijson).
I just want to load one pair at a time to work with it. Is there anyway to do that?

So basically I found out in this answer how to do it in a much simple way:
https://stackoverflow.com/a/17326199/2933485
Using ijson, it looks like you can loop over the file without loadin it but opening the file and using ijson parse function over it, this is the example I found:
import ijson
for prefix, the_type, value in ijson.parse(open(json_file_name)):
print prefix, the_type, value

Why dont you populate a sqlite table with the data once and query the data using the record PK? See https://docs.python.org/3.7/library/sqlite3.html

OK, so json is a nested format, which means each repeating block (dict or list object) is surrounded by start and end characters. Normally, you read the entire file, and in doing so, can confirm the well-formed, structure and "closedness" of each object - in other words, it's verifiable that all objects are legally structured. When you load a json file into memory using the json library, part of that process is the validation.
If you want to do that for an extra large file - you have to forgo the normal library and roll your own, loading in a line (or chunk) at a time, and processing that under the assumption that validation will retrospectively succeed.
That's achievable (assuming you're able to put your faith in such an assumption) but it's probably something you'll have to write yourself.
One strategy might be to read a line at a time, splitting on the colon : character, with commas as record delimiters, which is a crude approximation of how key-value pairs are coded within json. Following this method, you're going to be able to process all but the first and final key-value pairs cleanly in sequence.
That just leaves you to write some special conditions for properly parsing the first and final records, which will come through garbled using this strategy.
Crudely then, call something like this (referencing the csv library) and treat the json like a massive, unusually formatted csv file.
import csv
with open('big.json', newline=',') as csv_json_franken_file:
jsonreader = csv.reader(csv_json_franken_file, delimiter=':', quotechar='"')
for row in jsonreader: # This bit reads in a "row" at a time, until finished
print(', '.join(row))
Then do some edge-case treatment of the first and last rows (more or less depending on the structure of your json) to repair the garbling caused by what is a fairly blatant hack. It's not clean, and it's not robust to changes in the content - but sometimes, you just have to play the hand you've been dealt.
To be honest, generating json files of 3GB in size is a little irresponsible, so if anyone comes asking, you've got that in your corner.

struct unpack (Type Error: a byte like object is required, not 'str"), Unpack a List?

Attempting to use script that was built in Python 2, and now use it in Python 3 without adding a version 2 onto system. For the script, the only error I get is related to a struct.unpack at this line....
def go_parse(rec_list):
size = (len(rec_list) - 4) / 4
first_arg = "%sI" % size
return struct.unpack(first_arg, (rec_list.rstrip ('\xff\xff\xff\xff')))
at line 4 of this function, I get the error:
TypeError: a bytes-like object is required, not 'str'
I read several other posts regarding this, and that it needs to be explicitly labeled as bytes, but I cant seem to figure out where to explicitly label that in this context. The other examples on SO i found here and here, did not provide much of an explanation to it. The struct pages don't seem to cover the error possibilities of 2-3...Only that struct.unpack(fmt, buffer) has the two arguments of fmt and buffer. Based on those examples, i have tried identifying this as bytes explicitly as a second arg using b, as well as bytes before the arguments and on the tuple for the .strip. I attempted return this as a bytearray, but that seemed to produce the same errors.
As an alternative, I am able to get the byte data I wanted into a list as shown below, what would be a way to get the list into unpack, attempt b'i' just looks at i as bytes.
list1 = [b'\x03\x00\x00\x00\xff\xff\xff\xff',
b'\x07\x00\x00\x00\x06\x00\x00\x00\xff\xff\xff\xff']
print(struct.unpack('<s', bytes(i)))
The bytes are of varying length, but all end in \xff\xff\xff\xff. The data I am looking at is text, just trying to bring it back to text.

I answered my own question spending some time in the Documentation, and someone pointing me in the right direction.
Generating a list of binary data that needed to be brought back to text for display, I used the codecs, a standard library. All my binary data was saved into a list named bin_list.
import codecs
bin_list = [b'Some Binary data', b'some more binary data']
for i in bin_list: # used the loop to print each to new line, nice and neat
print (codecs.decode(i, "utf-8", "ignore")) # or any other conversion avilable on codecs.

reading a binary file in python

I have to read a binary file in python. This is first written by a Fortran 90 program in this way:
open(unit=10,file=filename,form='unformatted')
write(10)table%n1,table%n2
write(10)table%nH
write(10)table%T2
write(10)table%cool
write(10)table%heat
write(10)table%cool_com
write(10)table%heat_com
write(10)table%metal
write(10)table%cool_prime
write(10)table%heat_prime
write(10)table%cool_com_prime
write(10)table%heat_com_prime
write(10)table%metal_prime
write(10)table%mu
if (if_species_abundances) write(10)table%n_spec
close(10)
I can easily read this binary file with the following IDL code:
n1=161L
n2=101L
openr,1,file,/f77_unformatted
readu,1,n1,n2
print,n1,n2
spec=dblarr(n1,n2,6)
metal=dblarr(n1,n2)
cool=dblarr(n1,n2)
heat=dblarr(n1,n2)
metal_prime=dblarr(n1,n2)
cool_prime=dblarr(n1,n2)
heat_prime=dblarr(n1,n2)
mu =dblarr(n1,n2)
n =dblarr(n1)
T =dblarr(n2)
Teq =dblarr(n1)
readu,1,n
readu,1,T
readu,1,Teq
readu,1,cool
readu,1,heat
readu,1,metal
readu,1,cool_prime
readu,1,heat_prime
readu,1,metal_prime
readu,1,mu
readu,1,spec
print,spec
close,1
What I want to do is reading this binary file with Python. But there are some problems.
First of all, here is my attempt to read the file:
import numpy
from numpy import *
import struct
file='name_of_my_file'
with open(file,mode='rb') as lines:
c=lines.read()
I try to read the first two variables:
dummy, n1, n2, dummy = struct.unpack('iiii',c[:16])
But as you can see I had to add to dummy variables because, somehow, the fortran programs add the integer 8 in those positions.
The problem is now when trying to read the other bytes. I don't get the same result of the IDL program.
Here is my attempt to read the array n
double = 8
end = 16+n1*double
nH = struct.unpack('d'*n1,c[16:end])
However, when I print this array I get non sense value. I mean, I can read the file with the above IDL code, so I know what to expect. So my question is: how can I read this file when I don't know exactly the structure? Why with IDL it is so simple to read it? I need to read this data set with Python.

What you're looking for is the struct module.
This module allows you to unpack data from strings, treating it like binary data.
You supply a format string, and your file string, and it will consume the data returning you binary objects.
For example, using your variables:
import struct
content = f.read() #I'm not sure why in a binary file you were using "readlines",
#but if this is too much data, you can supply a size to read()
n, T, Teq, cool = struct.unpack("dddd",content[:32])
This will make n, T, Teq, and cool hold the first four doubles in your binary file. Of course, this is just a demonstration. Your example looks like it wants lists of doubles - conveniently struct.unpack returns a tuple, which I take for your case will still work fine (if not, you can listify them). Keep in mind that struct.unpack needs to consume the whole string passed into it - otherwise you'll get a struct.error. So, either slice your input string, or only read the number of characters you'll use, like I said above in my comment.
For example,
n_content = f.read(8*number_of_ns) #8, because doubles are 8 bytes
n = struct.unpack("d"*number_of_ns,n_content)

Did you give scipy.io.readsav a try?
Simply read you file like this:
mydict = scipy.io.readsav('name_of_file')

It looks like you are trying to read the cooling_0000x.out file generated by RAMSES.
Note that the first two integers (n1, n2) provide the dimensions of the two dimentional tables (arrays) that follow in the body of the file... So you need to first process those two integers before you know how much real*8 data is in the rest of the file.
scipy should be of help -- it lets you read arbitrary dimensioned binary data:
http://wiki.scipy.org/Cookbook/InputOutput#head-e35c7736718209eea00ebf37a7e1dfb91df696e1
If you already have this python code, please let me know as I was going to write it today (17Sep2014).
Rick

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.