python - Parsing data from binary file using structs

python - Parsing data from binary file using structs - python

I have a save game file that I am trying to parse out all of the characters attributes by reading the file using hex offsets. I'm able to get all strings out properly since that plain text but I am having issues with parsing the binary portions that I am working with.
I'm pretty sure that I'm reading in the right data but when I unpack the string I am getting unexpected (incorrect) output
The file I'm working with is www.retro-gaming-world.com/SAVE.DAT
import struct
infile = open('SAVE.DAT','rb')
try:
buff = infile.read()
finally:
infile.close
infile.seek(0x00,0)
print "Save Signature: " + infile.read(0x18)
print "Save Version: " + str(struct.unpack('>i',buff[0x18:0x18+4])[0])
infile.seek(0x1C,0)
print "The letter R: " + infile.read(0x01)
infile.seek(0x1D,0)
print "Character Name: " + infile.read(0x20)
infile.seek(0x3D,0)
print "Save Game Name: " + infile.read(0x1E)
print "Save game day: " + str(struct.unpack('>i',buff[0x5B:0x5B+4])[0])
print "Save game month: " + str(struct.unpack('>i',buff[0x5D:0x5D+4])[0])
print "Save game year: " + str(struct.unpack('>i', buff[0x5F:0x5F+4])[0])
I'm having two different issues, either the wrong data is returned or when I try to unpack some of the fields I get an error that the string isn't long enough, I can read in more but the day month and year are only 2 and 4 bytes respectively and are integers, I'm not sure I'm going about this the right way, I believe I'm fetching the right fields but think I am unpacking or handling the data incorrectly some where if not completely.
version should return 0100
day should return 21
month should return 09
year should return 2013
What exactly am I getting wrong hrere, is there another way or a better way to go about parsing the fields from the binary?

The error is, that although the values are of integer type, they only have the length of 2, being an unsigned short in C. Thus, you have to read them as
struct.unpack('>H',buff[0x5B:0x5B+2])[0])
and so on. signed or unsigned does not seem to make a difference here. If available, check the documentation of the save file, it should be denoted there which is appropriate. If not, good luck trying (itertools can be helpful).
For more details of types, check the table on the Python documentation for structs
As a big fan of Fallout 1 and 2 I do wish you good luck and lots of success with the project (-;

Related

Searching in file using RegEx to find ID and its number with Python

I am fairly new to regEx and I am completely baffled by it at this point so any help will be greatly appreciated.
Sorry if this question is around already, could not find it.
All of this is happening in Python
I am trying to do a search inside text file which has some text like:
www.google.com, something, something : something, [{'id': 481, 'name': 'name it needs to match'}]
="1000" t5:someplace="7713" t5:somethingelse="10" t5:someotherthing="10"
it have multiple, very long lines (its a log, essentially).
What I need to do is to find a word "id" with its number by the "name" that it have inside the brackets and keep only the number (or all of it, doesn't matter) so I can put it back into the program (I am trying to create something that already exists trough API and I want to use Id of the existing thing to use it later on)
I am pretty sure I can do the slicing after I have filtered ID with its number but I am struggling to write RegEx that can find "Id" and its number by NAME.
Anyone have some insight? any advice let alone piece of code would be an immense help.
I concur that it is completely possible there is far easier solution that I am not seeing and I will be grateful if anyone can tell me said solution, I am here to learn as much as possible :)
PS: position of the "name" variable is almost always the same or at least very close to it, is there a way to use RegEx to say find the "name" and then slice everything around the "name" for approx 15-20 characters? I could then filter out the number only since I will know it is there.

Try
'id': (\d+),.*?'name': '([^']*)'
This way you'll get the ID in capture group 1 and the name in capture group 2.

So, in the end I found the answer, just not by using RegEx.
this problem occurred when I was doing error handling within a script and loads data trough API.
What I needed - I was creating something with "name" or "description" (just one of each, never both) and since it was already on API I was denied, so in that case I wanted it to fall into error handling which would list all the items on API that relates to it and find the "name" or "description" which I wanted to create in the first place, get its ID and use that on other steps which need the ID to use for its own creations.
I simply avoided using RegEx by getting a response from API in json and search trough that with for loop. I presume it is not very efficient but it does its job.
*args is a placeholder for "name" or "description" and since it is returned as a tuple of one element I simply slice the thing apart and use it to search trough the response in json format
Rather odd issue was the KeyError problem (since I get either "name" OR "description" I found it hard to detect it properly so I have gone with try and except method to pass KeyErrors and keep going until I found the one I need and pass its Id along.
def errorHandling(url, accessToken, args):
if args is ():
return
lookingForThis = str(args[0])
error = ("Creation of " + lookingForThis + " at " + url + "failed. It is probably because it already exists"
", anyhow, if it does, we found its ID and used that")
# print(error)
with open("whathaveIdone.txt", "a+") as json_file:
json_file.write("{0}\n".format(error))
response = requests.request("GET", url, headers=getHeadersWithAuthorization(accessToken), verify=False)
jsonData = json.loads(response.content)
objectID = 0
while objectID == 0:
for p in jsonData:
try:
# print(p)
if lookingForThis == p["description"]:
objectID = p["id"]
except KeyError:
# print("cant find description match of " + lookingForThis + " in " + str(p))
pass
try:
if lookingForThis == p["name"]:
objectID = p["id"]
except KeyError:
# print("cant find name match of " + lookingForThis + " in " + str(p))
pass
print("We managed to find and use " + lookingForThis + " at " + url + " with ID: " + str(objectID))
with open("whathaveIdone.txt", "a+") as log:
log.write("{0} {1} {2}\n".format(url, lookingForThis, str(objectID)))
return objectID

how to write 'list data' in a file using python logging.info module

I am trying to log list data in a text file using python logging.info module.
I am able to see the logged data in my log text file but it always put the list data inside '(' ')'brackets. I am not able to understand the reason for it.
Please help me to understand it. Please bear if I am missing any obvious thing as I am a beginner in python.
Here is my code:
therapy = "log data:",":".join ("{:02x}".format(x) for x in respList[4:])
logging.info(therapy)
date = "Date:","/".join("{:02x}".format(a) for a in ts[:-4])
logging.info(date)
time = "Time:",":".join("{:02x}".format(a) for a in ts[3:6])#-3:
logging.info(time)
Here is the output in log text file:
2019-03-04 17:31:18,943 -
('log data:','01:00:00:03:05:26:58:18:00:00:03:e8:00:00:32
:00:01:32:0e:00:c8:01:19:03:04:11:27:25:1c')
2019-03-04 17:31:18,943 - ('Date:', '19/03/04')
2019-03-04 17:31:18,943 - ('Time:', '11:27:25')
Thanks in advance!

You just made a tuple "string","string" instead of make contatenation "string"+"string"

By putting a comma between the two strings as a value for all of your assignments, you're assigning tuples to the variables, which, when passed the logger, would be formatted with parentheses around it because that's how the tuple's __repr__ function formats it.
You can instead concatenate the two strings with the + operator, but put a space after the colon for better readability:
therapy = "log data: " + ":".join ("{:02x}".format(x) for x in respList[4:])
logging.info(therapy)

How Do I Start Pulling Apart This Block of JSON Data?

I'd like to make a program that makes offline copies of math questions from Khan Academy. I have a huge 21.6MB text file that contains data on all of their exercises, but I have no idea how to start analyzing it, much less start pulling the questions from it.
Here is a pastebin containing a sample of the JSON data. If you want to see all of it, you can find it here. Warning for long load time.
I've never used JSON before, but I wrote up a quick Python script to try to load up individual "sub-blocks" (or equivalent, correct term) of data.
import sys
import json
exercises = open("exercises.txt", "r+b")
byte = 0
frontbracket = 0
backbracket = 0
while byte < 1000: #while byte < character we want to read up to
#keep at 1000 for testing purposes
char = exercises.read(1)
sys.stdout.write(char)
#Here we decide what to do based on what char we have
if str(char) == "{":
frontbracket = byte
while True:
char = exercises.read(1)
if str(char)=="}":
backbracket=byte
break
exercises.seek(frontbracket)
block = exercises.read(backbracket-frontbracket)
print "Block is " + str(backbracket-frontbracket) + " bytes long"
jsonblock = json.loads(block)
sys.stdout.write(block)
print jsonblock["translated_display_name"]
print "\nENDBLOCK\n"
byte = byte + 1

Ok, the repeated pattern appears to be this: http://pastebin.com/4nSnLEFZ
To get an idea of the structure of the response, you can use JSONlint to copy/paste portions of your string and 'validate'. Even if the portion you copied is not valid, it will still format it into something you can actually read.
First I have used requests library to pull the JSON for you. It's a super-simple library when you're dealing with things like this. The API is slow to respond because it seems you're pulling everything, but it should work fine.
Once you get a response from the API, you can convert that directly to python objects using .json(). What you have is essentially a mixture of nested lists and dictionaries that you can iterate through and pull specific details. In my example below, my_list2 has to use a try/except structure because it would seem that some of the entries do not have two items in the list under translated_problem_types. In that case, it will just put 'None' instead. You might have to use trial and error for such things.
Finally, since you haven't used JSON before, it's also worth noting that it can behave like a dictionary itself; you are not guaranteed the order in which you receive details. However, in this case, it seems the outermost structure is a list, so in theory it's possible that there is a consistent order but don't rely on it - we don't know how the list is constructed.
import requests
api_call = requests.get('https://www.khanacademy.org/api/v1/exercises')
json_response = api_call.json()
# Assume we first want to list "author name" with "author key"
# This should loop through the repeated pattern in the pastebin
# access items as a dictionary
my_list1 = []
for item in json_response:
my_list1.append([item['author_name'], item['author_key']])
print my_list1[0:5]
# Now let's assume we want the 'sha' of the SECOND entry in translated_problem_types
# to also be listed with author name
my_list2 = []
for item in json_response:
try:
the_second_entry = item['translated_problem_types'][0]['items'][1]['sha']
except IndexError:
the_second_entry = 'None'
my_list2.append([item['author_name'], item['author_key'], the_second_entry])
print my_list2[0:5]

Print works fine, but when I write() the same thing to file, I get "Expected a character buffer object"?

I'm working with Splunk, but this seems to be a python-related problem I'm having.
By an API call, I'm receiving a list of dictionaries, and I'm iterating through the individual dictionaries to print out a specific field. It looks like this:
with open(lobFileName, "w+") as LOBs: #read/write, truncates file!
for item in reader:
for key in item: # iterate through the dictionary
if key == 'cost_center':
print item[key] # TODO: Replace this with however I display it on the webpage.
LOBs.write(item[key]) # write the LOBs to the file, one LOB per line
LOBs.write("\n")
reader is the list, item is the individual dictionary.
The print call works perfectly. It prints out the lines of businesses as I want it to, as it should. So I don't give out personal information (the real words are English, similar in length, if that matters... one word no spaces), the output looks like:
Alpha
Bravo
Charlie
However, when I write() the same thing (item[key]), I get an: "expected a character buffer object" error.
So, I change it to LOBs.write(str(item[key]). But when I write the file, instead of getting the above output, I get (A,B,C bolded for ease of sight):
Alpha~1116~7F4F9983-72F8-48C8-BFAD-82C0F713CA34 1116:18886924 1437770160 1 07-24-2015 16:35:59.888 -0400 INFO Metrics -
group=per_index_thruput, series="clo", kbps=3.596555, eps=13.129038,
kb=111.493164, ev=407, avg_age=2.422604, max_age=27 199 ['ksplidx4c',
'_internal'] splunkd .888 2015-07-24T16:35:59.888-04:00
Bravo
psplfwd1a
_internal 1 clo /opt/splunk/var/log/splunk/metrics.log splunkd ksplidx4c
_internal~1116~7F4F9983-72F8-48C8-BFAD-82C0F713CA34 1116:18886931 1437770160 1 07-24-2015 16:35:59.888 -0400 INFO Metrics -
group=per_index_thruput, series="cos", kbps=564.982992,
eps=1387.129659, kb=17514.464844, ev=43001, avg_age=2.232622,
max_age=11 198 ['ksplidx4c', '_internal'] splunkd .888
2015-07-24T16:35:59.888-04:00
Charlie
psplfwd1a
_internal 1 cos /opt/splunk/var/log/splunk/metrics.log splunkd ksplidx4c
_internal~1116~7F4F9983-72F8-48C8-BFAD-82C0F713CA34 1116:18886952 1437770160 1 07-24-2015 16:35:59.888 -0400 INFO Metrics -
group=per_index_thruput, series="issofim", kbps=1.250410,
eps=12.193554, kb=38.762695, ev=378, avg_age=1.738095, max_age=8 195
['ksplidx4c', '_internal'] splunkd .888 2015-07-24T16:35:59.888-04:00
Now, I know that looks huge and you have no idea what that means Just hear me out :). Obviously there's a difference in how write() works vs. how print() works. Now that this is explained, my question:
Does anybody know how I can mimic the way print() works into how
write() works, so that I get the clean A, B, C output on each line?
Thank you so much. I think this^ is the best way to approach the problem, if possible.

Can you try again with this code instead and provide us the output?
with open(lobFileName, "w+") as LOBs: #read/write, truncates file!
for item in reader:
for key in item: # iterate through the dictionary
if key == 'cost_center':
print "%s\n" % (item[key]) # TODO: Replace this with however I display it on the webpage.
LOBs.write("%s\n" % (item[key])) # write the LOBs to the file, one LOB per line
Now you should see the same in the print as in the file

https://docs.python.org/3/library/functions.html#print
All non-keyword arguments are converted to strings like str() does and written to the stream,
As the error message indicates, you need to convert your objects to strings (however that is appropriate for your purpose) before you can write it to a stream.
Link to Python 2.7 docs: https://docs.python.org/release/2.7/library/functions.html#print

Whats the correct way to translate a "dynamic" string in PyQt

To allow the internationalization of a Python plugin for QGIS, I'm using QCoreApplication.translate() like this:
message = QCoreApplication.translate('Multipart split',"No multipart features selected.")
How can I prepare a dynamic string, like the following,
message = "Splited " + str(n_of_splitted_features) + " multipart feature(s)"
to translate, without the need to break each of sub-strings, like this
message = QCoreApplication.translate('Multipart split','Splited ') + str(n_of_splitted_features) + QCoreApplication.translate('Multipart split', 'multipart feature(s)')
which does not appear to be the best option.
I have found that in C++ using the tr() with .arg(), one can do this:
statusBar()->showMessage(tr("Host %1 found").arg(hostName))
But I was unable to replicate using Python.

Try the format command on the result of the tr method :
statusBar().showMessage(tr("Host {0} found").format(hostName))
The translation in the ts file should also contain the {0} string.
Edit: with Python 2.7, you can simply type {} without the 0.

I found the solution myself, maybe it's useful for someone else.
message = QCoreApplication.translate('Multipart split', "Splited %d multipart feature(s)") %(n_of_splitted_features)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.