Python rawkit how read metadata values from RAW file?

Python rawkit how read metadata values from RAW file? - python

I'm writing python script and I need to obtain exif information from raw photo file (.CR2 for example).
I found Python Rawkit offer the ability to do that.
with Raw(filename=image_path) as raw:
print raw.metadata
Metadata(aperture=-1.2095638073643314e+38, timestamp=4273602232L,
shutter=-1.1962713245823862e+38, flash=True,
focal_length=-1.2228562901462766e+38, height=3753,
iso=-1.182978841800441e+38,
make='Canon', model='EOS 5D Mark II',
orientation=0, width=5634)
But I'm a little bit confused, how read this values ?. For example I'm expecting iso value like 100/200/400 but what is -1.182978841800441e+38 ?
My question is not specific for iso, it's also for shutter, aperture, ...
I ckecked libraw and rawkit doc but was not able to find how read / convert this kind of values.
This part in the doc is not very detailed :
float iso_speed;
ISO sensitivity.
float shutter;
Shutter speed.
Can someone help me understand how to read these values?
Thanks
[Update]
As neo suggest, I will use ExifRead. In fact it's a better choice, I'm writting a python script. With ExifRead no need of extra C library dependency.
I was able to open Canon raw file and parse Exif but unfortunately facing a wrong value for the aperture :
EXIF ApertureValue (Ratio): 3
# My photo was taken in 2.8 (maybe a rounded value on this flag ?)
Quick answer : use Fnumber flag
EXIF FNumber (Ratio): 14/5
14/5 is in fact 2.8 (do the math)
Long answer (how I found / debug that) :
Reading this exelent link Understanding What is stored in a Canon RAW .CR2 file, How and Why ( http://lclevy.free.fr/cr2/ ) I decided to decode myself and know what is going on.
This link send me on the graal to decode a raw file cr2_poster.pdf
From that I thought the best value seems to be in my canon specific MakerNote section on the FNumber value. (All values description is here canon_tags)
Tag Id : 3 (In fact 0x0003 that you write 0x3)
Name : FNumber
I opened my file with an Hexa editor (hexedit) and ... I was totally lost.
Key things :
An offset is a address in the file that will contain your value.
Read : C8 05 in the file should be read 05C8. Example for an offset, the address is 0x5C8
With that found the MakeNote section is easy.
Quick way is to search directly the 0x927c MarkerNote (so in the file 7C 92) flag that contain the address of the MakerNote section.
If you are not able to found that, go throught the IFD section to find the EXIF subsection. And then in that subsection you will find the MakerNote section
Tag Type Count Value
7C 92 07 00 B8 A0 00 00 84 03 00 00
Offset : 84 03 00 00 -> 00 00 03 84 (0x384 address)
Go to this address and search in the MakerNote section the FNumber 0x3
Tag Type Count Value
03 00 03 00 04 00 00 00 C8 05 00 00
Go to the offset 0x5C8 to find our value (count 4 x type 3 ushort, 16 bits)
0x0x5C8 : 00 00 00 00 00 00 00 00
And ... fail, in fact my canon does not filled this section.
Reading http://www.exiv2.org/tags.html The FNumber can be found in EXIF subsection.
Do the same process to find the EXIF subsection and the tag "0x829d Exif.Image.FNumber type 5 Rational"
Rational type is composed of 64 bits (numerator and denominator ulongs) Rational_data_type
Tag Type Count Value
9D 82 05 00 01 00 00 00 34 03 00 00
And then read the 0x334 offset
1C 00 00 00 0A 00 00 00
As we can read in Hexa : 0x1C / 0XA
In decimal, do the math : 28/10 = 14/5 = 2.8
Verify I have this value in ExifRead
EXIF.py 100EOS5D/IMG_8813.CR2 -vv | grep -i 14/5
EXIF FNumber (Ratio): 14/5
And voila !
I was looking for 2.8 float and this value is stored in fraction format. So the library don't do the math and just simplify the fraction.
This is why we have 14/5 and not 2.8 as expected.

I suggest you use a library that is focused on EXIF reading. The stuff available in libraw/rawkit is really just a nice extra. I can recommend the ExifRead library. It's pure Python and also damn fast. And it gives you better to understand values.

If compatibility with many formats is more of an issue to you than performance you could call exiftool as a subprocess with -j option to give you a json string which you can turn into a dictionary.
That should set you up for most raw formats and even stuff that isn't images at all. And it is going to squeeze every last bit of exif info out of the file. However in comparison with other options it is rather sluggish (like 200x slower):
from PIL import Image
import PIL.ExifTags
import subprocess
import json
import datetime
import exifread
filePath = "someImage.jpg"
filePath = "someRawImage.CR2"
filePath = "someMovie.mov"
filePath = "somePhotoshopImage.psd"
try:
start = datetime.datetime.now()
img = Image.open(filePath)
exif_0 = {
PIL.ExifTags.TAGS[k]: v
for k, v in img.getexif().items()
if k in PIL.ExifTags.TAGS
}
end = datetime.datetime.now()
print("Pillow time:")
print(end-start)
print(str(len(exif_0)), "tags retrieved")
print (exif_0, "\n")
except:
pass
try:
start = datetime.datetime.now()
exif_1 = json.loads(subprocess.run(["/usr/local/bin/exiftool", "-j", filePath], stdout=subprocess.PIPE).stdout.decode("utf-8"))
end = datetime.datetime.now()
print("subprocess time:")
print(end-start)
print(str(len(exif_1[0])), "tags retrieved")
print(exif_1, "\n")
except:
pass
try:
start = datetime.datetime.now()
f = open(filePath, "rb")
exif_2 = exifread.process_file(f)
end = datetime.datetime.now()
print("Exifread time:")
print(end-start)
print(str(len(exif_2)), "tags retrieved")
print(exif_2, "\n")
except:
pass

Related

Need help decoding registry binary data

I am trying to build a small app using shellbags in the windows registry. I am trying to decode some data which is in the REG_BINARY form and have no idea where to begin.
If you go to:
Computer\HKEY_CLASSES_ROOT\Local Settings\Software\Microsoft\Windows\Shell\BagMRU\0
You will find a series of values, 0, 1, 2, 3 e.t.c of type REG_BINARY and opening them sometimes shows what seems to be a folder along with a ton of what seems like gibberish.
I also need to understand the binary columns ('Sort' and 'Colinfo') of keys of the form:
Computer\HKEY_CLASSES_ROOT\Local Settings\Software\Microsoft\Windows\Shell\Bags\1\Shell\{5C4F28B5-F869-4E84-8E60-F11DB97C5CC7}
I tried looking at shellbags python programs on the web but honestly have no idea what they are doing and they seem to be written with python2 in mind, so no dice.
I already wrote a small python program to help, but it is missing a way to get the node slot and I am trying to link any given folder name to a node slot. Here is my program currently.
from winreg import *
from codecs import decode
folder_reg_path = "Software\\Classes\\Local Settings\\Software\\Microsoft\\Windows\Shell\\Bags\\1375\\Shell"
bags_mru_path = "Software\\Classes\\Local Settings\\Software\\Microsoft\\Windows\\Shell\\BagMRU"
def get_sniffed_folder_type(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'SniffedFolderType')
return '%s' % (value[0])
def get_current_nodeid(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key, 0, KEY_READ) as key:
#value = QueryValueEx(key, '0')
#return value[0].hex().decode('utf-8')
value = EnumValue(key, 2)
return decode(value[1], 'ascii', 'ignore')
# which clsid should be used? the last one in the list
def get_current_clsid(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
key_idx = 0
value = None
# keep looping until the last clsid entry is found
while 1:
try:
temp = EnumKey(key, key_idx)
key_idx += 1
value = temp
except:
break
return value
# the size of icons used by the folder
def get_folder_icon_size(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'IconSize')
return '%d pixels' % (value[0])
# the folder view. details, list, tiles e.t.c
def get_logical_view_mode(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'LogicalViewMode')
logical_view_mode_dict = {1 : "Details view", 2 : "Tiles view", 3 : "Icons view", 4 : "List view", 5 : "Content view"}
return logical_view_mode_dict[value[0]]
# folder view is based on view mode. so you can have a logical view mode of icons view with a view mode of large icons for instance
def get_folder_view_mode(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'Mode')
# view_mode 7 is only available on xp. A dead os
view_mode_dict = {1 : "Medium icons", 2 : "Small icons", 3 : "List", 4 : "Details", 5 : "Thumbnail icons", 6 : "Large icons", 8 : "Content"}
return view_mode_dict[value[0]]
# how is the folder being sorted
def get_folder_sort_by(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'Sort')
folder_sort_dict = {"0E000000" : "Date Modified", "10000000" : "Date Accessed", "0F000000" : "Date Created", "0B000000" : "Type", "0C000000" : "Size", "0A000000" : "Name", "02000000" : "Title", "05000000" : "Tags"}
# we get a byte value which we will hexify and get a rather long string
# similar to : 000000000000000000000000000000000100000030f125b7ef471a10a5f102608c9eebac0c000000ffffffff
reg_value = value[0].hex()
# now for this string, we need to get the last 16 strings. then we now get the first 8 out of it. so we will have
folder_sort_dict_key = (reg_value[-16:][:8]).upper()
#return folder_sort_dict[folder_sort_dict_key]
print (reg_value)
return folder_sort_dict_key
# in what order is the folder being sorted. ascending or descending???
def get_folder_sort_by_order(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'Sort')
folder_sort_dict = {"01000000" : "Ascending", "FFFFFFFF" : "Descending"}
# we get a byte value which we will hexify and get a rather long string
# similar to : 000000000000000000000000000000000100000030f125b7ef471a10a5f102608c9eebac0c000000ffffffff
reg_value = value[0].hex()
# now for this string, we need to get the last 16 strings. then we now get the last 8 out of it. so we will have
folder_sort_dict_key = (reg_value[-16:][-8:]).upper()
return folder_sort_dict[folder_sort_dict_key]
# How is the folder being grouped
def get_folder_group_by(reg_key):
with OpenKey(HKEY_CURRENT_USER, reg_key) as key:
value = QueryValueEx(key, 'GroupByKey:PID')
folder_group_dict = {'10' : "Name", '14' : "Date Modified", '4*' : "Type", '12' : "Size", '15' : "Date Created", '5' : "Tags", '2' : "Title", '16' : "Date Accessed", '0' : "No Group Applied"}
return folder_group_dict[str(value[0])]
# Registry is of the form:
# HKEY_CURRENT_USER\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\Bags\1375\Shell\{5C4F28B5-F869-4E84-8E60-F11DB97C5CC7}
# where 1375 is a value called the NodeList, and {5C4F28B5-F869-4E84-8E60-F11DB97C5CC7} is a value under Shell chosen based on creation date
print ( 'The suggested folder type is %s' % get_sniffed_folder_type(folder_reg_path) )
# lets start by getting a value similar to {5C4F28B5-F869-4E84-8E60-F11DB97C5CC7} by finding the last child of folder_reg_path
folder_reg_path = folder_reg_path + '\\' + get_current_clsid(folder_reg_path)
print ( get_current_nodeid(bags_mru_path) )
print ( 'The registry path is %s' % (folder_reg_path) )
icon_size = get_folder_icon_size(folder_reg_path)
logical_view_mode = get_logical_view_mode(folder_reg_path)
view_mode = get_folder_view_mode(folder_reg_path)
sorted_by = get_folder_sort_by(folder_reg_path)
sorted_by_order = get_folder_sort_by_order(folder_reg_path)
folder_group_by = get_folder_group_by(folder_reg_path)
print ('The folder icon size is %s' % icon_size)
print('The folder logical view mode is %s' % logical_view_mode)
print('The folder view mode is %s' % view_mode)
print('The folder is sorted by %s in %s order' % (sorted_by, sorted_by_order))
print('The folder is grouped by: %s' % folder_group_by)

Answer in progress. Please be patient...
When Explorer is opened for the first time in a newly-created account, The folder view will be dtermined by the TopView settings of the corresponding FolderType defined in the registry under:
HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\FolderTypes
Most users are familiar with a subset of these, the templates available on the Customize tab of a folder's Properties dialog:
General Items (Generic)
Documents
Pictures
Music
Videos
Contacts and Downloads templates exist as well, but they are assigned only co their respective system folders using the paths found under:
HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\User Shell Folders
A complete list of FolderType IDs and their respective names can be generated with the following PowerShell code:
gp (gci HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\FolderTypes).PSPath |
select PSChildName, CanonicalName | sort CanonicalName |
Out-GridView # or Out-File FTList.txt # or Set-Clipboard
PSChildName CanonicalName
----------- -------------
{db2a5d8f-06e6-4007-aba6-af877d526ea6} AccountPictures
{91475fe5-586b-4eba-8d75-d17434b8cdf6} Communications
{503a4e73-1734-441a-8eab-01b3f3861156} Communications.SearchResults
{80213e82-bcfd-4c4f-8817-bb27601267a9} CompressedFolder
...
{5fa96407-7e77-483c-ac93-691d05850de8} Videos
{631958a6-ad0f-4035-a745-28ac066dc6ed} Videos.Library
{292108be-88ab-4f33-9a26-7748e62e37ad} Videos.LibraryFolder
{ea25fbd7-3bf7-409e-b97f-3352240903f4} Videos.SearchResults
Once a folder has been viewed in Explorer, Windows saves the view settings for that individual folder under two registry keys. If the folders are local, the keys are:
HKCU:\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\BagMRU
--- and ---
HKCU:\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\Bags
Network locations and a special bag, the Desktop layout, are here:
HKCU\Software\Microsoft\Windows\Shell\BagMRU
--- and ---
HKCU\Software\Microsoft\Windows\Shell\Bags
The numbered subkeys under Bags each correspond to a folder. Each can save three seperate view states: Shell (Explorer), ComDLg (eg Notepad), and ComDlgLegacy (eg Reg Export). These can each have a GUID-named sub-key corresponding to the ID of the FolderType applied to the folder. That sub-key contains the various properties of the view:
PS C:\> $Bags = 'HKCU:\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\Bags'
PS C:\> (gci $bags -Recurse | ? PSChildName -like "{*}" | select -first 2)[1] | ft -AutoSize
Hive: HKEY_CURRENT_USER\Software\Classes\Local
Settings\Software\Microsoft\Windows\Shell\Bags\1\Shell
Name Property
---- --------
{5C4F28B5-F869-4E84-8E60-F11DB97C5CC7} Rev : 67
FFlags : 1090519041
Vid : {137E7700-3573-11CF-AE69-08002B2E1262}
Mode : 4
LogicalViewMode : 1
IconSize : 16
Sort : {0, 0, 0, 0...}
ColInfo : {0, 0, 0, 0...}
GroupView : 0
GroupByKey:FMTID : {00000000-0000-0000-0000-000000000000}
GroupByKey:PID : 0
GroupByDirection : 1
Value Name Description
---------- -----------
Rev • Seems to be a counter that increments when BagMRU/Bags are deleted or Reset
Folders is executed from the Folder Options dialog
Fflags • The FolderFlags that apply to the folder view.
GroupView A DWORD flag determining whether or not the view is grouped:
• 0x00000000 = No grouping
• 0xffffffff = View grouped by column specified by the GroupByKey pair.
GroupByDirection A DWORD flag determing sort direction of group names:
• 0x00000001 = Ascending
• 0xffffffff = Descending
Vid, Mode, LogicalViewMode, and IconSize are all detremined by the Icon mode:
Name LVM Mode Vid IconSize
---- --- ---- --- --------
Details 1 4 {137E7700-3573-11CF-AE69-08002B2E1262} 16
Tiles 2 8 {65F125E5-7BE1-4810-BA9D-D271C8432CE3} 48
SmIcons 3 1 {089000C0-3573-11CF-AE69-08002B2E1262} 16..31
Icons(M-XL) 3 1 {0057D0E0-3573-11CF-AE69-08002B2E1262} 33..256
List 4 3 {0E1FA5E0-3573-11CF-AE69-08002B2E1262} 16
Content 5 8 {30C2C434-0889-4C8D-985D-A9F71830B0A9} 32
All of the values that specify columns:
ColInfo
GroupBy
Sort
do so using by specifying the corresponding property as an FMTID- PID pair. The easiest way to determine the pair for a given property is to group a folder on that property, close it, and examine the resulting GroupByKey:FMTID and GroupByKey:PID values in the corresponding Bag. Here’s what I’ve got so far:
FMTID PID Name
----- --- ----
{14B81DA1-0135-4D31-96D9-6CBFC9671A99} 0x010f Camera Maker
{14B81DA1-0135-4D31-96D9-6CBFC9671A99} 0x0110 Camera Model
{14B81DA1-0135-4D31-96D9-6CBFC9671A99} 0x829a Exposure Time
{14B81DA1-0135-4D31-96D9-6CBFC9671A99} 0x8822 Expsure Program
{14B81DA1-0135-4D31-96D9-6CBFC9671A99} 0x9003 Date Taken
{14B81DA1-0135-4D31-96D9-6CBFC9671A99} 0x9204 Expsure Bias
{14B81DA1-0135-4D31-96D9-6CBFC9671A99} 0x9209 Flash Mode
{14B81DA1-0135-4D31-96D9-6CBFC9671A99} 0x920a Focal Length
{28636AA6-953D-11D2-B5D6-00C04FD918D0} 0x05 Computer
{56A3372E-CE9C-11D2-9F0E-006097C686F6} 0x02 Contributing Artists
{56A3372E-CE9C-11D2-9F0E-006097C686F6} 0x04 Album
{56A3372E-CE9C-11D2-9F0E-006097C686F6} 0x05 Year
{56A3372E-CE9C-11D2-9F0E-006097C686F6} 0x07 #
{56A3372E-CE9C-11D2-9F0E-006097C686F6} 0x0b Genre
{56A3372E-CE9C-11D2-9F0E-006097C686F6} 0x0d Album Artist
{56A3372E-CE9C-11D2-9F0E-006097C686F6} 0x23 Beats-per-minute
{56A3372E-CE9C-11D2-9F0E-006097C686F6} 0x24 Conductor
{5CBF2787-48CF-4208-B90E-EE5E5D420294} 0x15 Description
{5CBF2787-48CF-4208-B90E-EE5E5D420294} 0x17 DateVisited
{6444048F-4C8B-11D1-8B70-080036B11A03} 0x0d Dimensions
{64440491-4C8B-11D1-8B70-080036B11A03} 0x03 Frame Width
{64440491-4C8B-11D1-8B70-080036B11A03} 0x04 Frame Height
{64440491-4C8B-11D1-8B70-080036B11A03} 0x06 Frame Rate
{64440492-4C8B-11D1-8B70-080036B11A03} 0x04 Bit Rate
{64440492-4C8B-11D1-8B70-080036B11A03} 0x0b Copyright
{64440492-4C8B-11D1-8B70-080036B11A03} 0x13 Composer
{9B174B34-40FF-11D2-A27E-00C04FC30871} 0x04 Owner
{A0E74609-B84D-4F49-B860-462BD9971F98} 0x64 35 mm Focal Length
{B725F130-47EF-101A-A5F1-02608C9EEBAC} 0x02 FolderName
{B725F130-47EF-101A-A5F1-02608C9EEBAC} 0x04 Type
{B725F130-47EF-101A-A5F1-02608C9EEBAC} 0x0a Name
{B725F130-47EF-101A-A5F1-02608C9EEBAC} 0x0c Size
{B725F130-47EF-101A-A5F1-02608C9EEBAC} 0x0d Attributes
{B725F130-47EF-101A-A5F1-02608C9EEBAC} 0x0e DateModified
{B725F130-47EF-101A-A5F1-02608C9EEBAC} 0x0f DateCreated
{B725F130-47EF-101A-A5F1-02608C9EEBAC} 0x10 DateAccessed
{D35F743A-EB2E-47F2-A286-844132CB1427} 0x64 EXIF Version
{D5CDD502-2E9C-101B-9397-08002B2CF9AE} 0x02 Categories
{DABD30ED-0043-4789-A7F8-D013A4736622} 0x64 Folder
{E3E0584C-B788-4A5A-BB20-7F5A44C9ACDD} 0x06 FolderPath
{F29F85E0-4FF9-1068-AB91-08002B27B3D9} 0x02 Title
{F29F85E0-4FF9-1068-AB91-08002B27B3D9} 0x04 Authors
{F29F85E0-4FF9-1068-AB91-08002B27B3D9} 0x05 Tags
{F7DB74B4-4287-4103-AFBA-F1B13DCD75CF} 0x64 Date
In ColInfo & Sort, the pair is in a binary format, the PROPERTYKEY structure:
typedef struct {
GUID fmtid;
DWORD pid;
} PROPERTYKEY;
There is some byte-shuffling in converting a CLSID-style GUID to its binary equivalent. Using Name as an example:
{B725F130-47EF-101A-A5F1-02608C9EEBAC} 0x0a
becomes
[30 f1 25 b7 ef 47 1a 10 a5 f1 02 60 8c 9e eb ac][0a 00 00 00] ( Continuous in registry, grouped for clarity )
I use this PowerShell snippet to convert a binary GUID/CLSID to its more familiar text representaton:
$Bags = 'HKCU:\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\Bags'
$Sample = gci $rp.Bags -Recurse | ? Property -Contains Sort | Select -First 5 | gp
$Sample | ForEach{ '{{{0}}}' -f [GUID][Byte[]]$_.Sort[20..35] }
Output:
{b725f130-47ef-101a-a5f1-02608c9eebac}
{b725f130-47ef-101a-a5f1-02608c9eebac}
{f29f85e0-4ff9-1068-ab91-08002b27b3d9}
{83914d1a-c270-48bf-b00d-1c4e451b0150}
{f7db74b4-4287-4103-afba-f1b13dcd75cf}
The PROPERTYKEY structures is used to build the structures that define ColInfo and Sort. If we set a folder to have a single column, Name, and examine the resulting ColInfo value, we see the following bytes:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \ Constant
fd df df fd 10 00 00 00 00 00 00 00 00 00 00 00 / Header
[01 00 00 00] ← Column Count
[18 00 00 00] ← Byte count per column?
[[[30 f1 25 b7 ef 47 1a 10 a5 f1 02 60 8c 9e eb ac] ← FMTID
[0a 00 00 00] ← PID ] ← PROPERTYKEY
[da 01 00 00] ← Width (Pixels?) ] <- PROPERTYKEY/Width pair
For each column added to the view, the column count is incremented and the approprate PROPERTYKEY/Width pair is added:
[00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fd df df fd 10 00 00 00 00 00 00 00 00 00 00 00]
[03 00 00 00]
[18 00 00 00]
[[[30 f1 25 b7 ef 47 1a 10 a5 f1 02 60 8c 9e eb ac]
[0d 00 00 00]] ← 'Attributes'
[53 00 00 00]]
[[[30 f1 25 b7 ef 47 1a 10 a5 f1 02 60 8c 9e eb ac]
[0a 00 00 00]] ← 'Name'
[10 01 00 00]]
[[[30 f1 25 b7 ef 47 1a 10 a5 f1 02 60 8c 9e eb ac]
[04 00 00 00]] ← 'Type'
[0a 01 00 00]]
Sort uses the SORTCOLUMN structure:
typedef struct SORTCOLUMN {
PROPERTYKEY propkey;
SORTDIRECTION direction;
} SORTCOLUMN;
The Sort structure consists of:
Header composed of 16 zero-valued bytes
SORTCOLUMN count (DWORD)
One or more SORTCOLUMN structures
Name, ascending:
[00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00] <- Header
[01 00 00 00] <- SORTCOLUMN count
[[[30 f1 25 b7 ef 47 1a 10 a5 f1 02 60 8c 9e eb ac] <- FMTID
[0a 00 00 00] <- PID ] <- PROPERTYKEY
[01 00 00 00] <- Direction ] <- SORTCOLUMN
and Name, descending:
[00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00]
[01 00 00 00]
[[[30 f1 25 b7 ef 47 1a 10 a5 f1 02 60 8c 9e eb ac]
[0a 00 00 00]]
[ff ff ff ff] <- Direction ] <- SORTCOLUMN
When secondary, tertiary or quaternary properties are selected via a < Shift >+click on the column header, the corresponding SORTCOLUMN is added to the Sort structure.
Type, descending, then Name, ascending:
(gp 'HKCU:\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\Bags\66\Shell\{7D49D726-3C21-4F05-99AA-FDC2C9474656}\').Sort | Format-Hex
Path:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000010 02 00 00 00 30 F1 25 B7 EF 47 1A 10 A5 F1 02 60 ....0ñ%·ïG..¥ñ.`
00000020 8C 9E EB AC 04 00 00 00 FF FF FF FF 30 F1 25 B7 ë¬........0ñ%·
00000030 EF 47 1A 10 A5 F1 02 60 8C 9E EB AC 0A 00 00 00 ïG..¥ñ.`ë¬....
00000040 01 00 00 00 ....
[00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00]
[02 00 00 00] <- Two columns specified
[[[30 f1 25 b7 ef 47 1a 10 a5 f1 02 60 8c 9e eb ac]
[04 00 00 00]] <- 'TYpe'
[ff ff ff ff]]
[[[30 f1 25 b7 ef 47 1a 10 a5 f1 02 60 8c 9e eb ac]
[0a 00 00 00]] <- 'Name'
[01 00 00 00]]
BagMRU
BagMRU and its sub-keys are the tree-structured index that associates a folder's path with its numbered Bag (NodeSlot). BagMRU and its sub-keys mirror the Shell Namespace, with the BagMRU key itself corresponding to the rooted (virtual) Desktop. Each node in BagMRU corresponds to a folder in the Shell namespace. The fact that folders are referenced by their namespace path means one file system folder, say Documents, will have three separate views:
Desktop\This PC\Documents
Shell:UsersFilesFolder\Documents
Desktop\This PC\C:\Users<UserName>\Documents
No folder names are readily apparent when examining BagMRU; the nodes have simple integer names, and there are no string values. Where's the human-readable path?!?!? Why does MS make everything so difficult?!?!?:D
Well,every integer-named node has a corresponding integer-named REG_BINARY value in its parent node. It is this value that identifies its associated sub-key. The key to the mystery? Those binary values are Item IDs (actually the corresponding ITEMID_CHILD type of the ITEMIDLIST structure). So, we can use the ID Lists of a node and its ancestors to construct an IDLIST_ABSOLUTE, the analog to a fully-qualified path:
But it's still just an array of cryptic bytes. To extract meaningful information, we turn to the Windows API, specifically SHGetNameFromIDList. Per the documentation:
"Retrieves the display name of an item identified by its IDList."
Fortunately, PowerShell can incorporate APIs via Add-Type. Unfortunately, if PInvoke.net deosn't have a C# signature/wrapper in its database, discovering a working signature and wrapper can be frustrating. But through studying other signatures and trial-and-error, I hit on one:
$NSPBsource = #"
using System;
using System.Runtime.InteropServices;
using System.Text;
public class IDLTranslator
{
[DllImport("shell32.dll", SetLastError=true, CharSet=CharSet.Unicode)]
[return: MarshalAs(UnmanagedType.Error)]
static extern int SHGetNameFromIDList(IntPtr pidl, uint sigdnName, out StringBuilder ppszName);
public string GetFolderName(Byte[] IDL) {
GCHandle pinnedArray = GCHandle.Alloc(IDL, GCHandleType.Pinned);
IntPtr PIDL = pinnedArray.AddrOfPinnedObject();
StringBuilder name = new StringBuilder(2048);
int result = SHGetNameFromIDList(PIDL, 0x0, out name);
pinnedArray.Free();
return name.ToString();
}
}
"# # End $NSPBsource
Add-Type -TypeDefinition $NSPBsource
$NSPB = New-Object IDLTranslator
# Invocation:
# $FolderName = $NSPB.GetFolderName($IDL)
# Where $IDL is a [Byte[]] ItemIDList
This is enough to deccode the IDLs of the folders rooted in the Desktop:
$BagMRU = 'HKCU:\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\BagMRU'
(gp $Bagmru).PsBase.Properties | ? Name -match '\d+' | ForEach{
$NSPB.GetFolderName($_.Value)
} | sort
OUtput: ( specific results will vary :D )
Control Panel
Dustbin
Keith Miller
Libraries
My Posts
One Drive
Quick Access
Sandbox
Search Results in !Archive
Search Results in !Archive
Search Results in AAA Mess
Search Results in AAA Mess
Search Results in CopyTEst
Search Results in CopyTEst
Search Results in Desktop
Search Results in Desktop
Search Results in Documents
Search Results in Folder View Defaults
Search Results in Folder View Defaults
Search Results in iTunes
Search Results in Local Documents
Search Results in Local Downloads
Search Results in Local Downloads
Search Results in Local Music
Search Results in Local Music
Search Results in Local Pictures
Search Results in Local Videos
Search Results in Microsoft.BingWeather_4.36.20503.0_x64__8wekyb3d8bbwe
Search Results in Microsoft.BingWeather_4.36.20503.0_x64__8wekyb3d8bbwe
Search Results in MTV Party to Go, Vol. 3
Search Results in One Drive
Search Results in One Drive
Search Results in Pictures
Search Results in PoweShell Snippets
Search Results in PoweShell Snippets
Search Results in Private
Search Results in Quick Access
Search Results in Sandbox
Search Results in Sandbox
Search Results in Sandbox
Search Results in Sandbox
Search Results in Screenshots
Search Results in Sort music on Artist
Search Results in Sort music on Artist
Search Results in Standalone Programs
Search Results in Standalone Programs
Search Results in Users
Search Results in Users
Search Results in Windows (C:)
Search Results in Windows (C:)
This Device
This Device
This PC
Websites
Nifty, huh? And don't worry, those "Search Results..." lines that appear to be duplicates aren't --- they work together. Because the .SearchResults and .Library FolderTypes have mutilple TopViews (selected via ArrangeBy >) available, one bag specifies which TopView is last used (and hence used when the folder is re-displayed) and the other one(s) hold the view settings for a specific TopView. More on that later --- we're not done with paths yet.
To be continued...

Python search for a start hex string, save the hex string after that to a new file

PYTHON
I would like to open a text file, search for a HEX pattern that starts with 45 3F, grab the following hex 6 HEX values, for example 45 4F 5D and put all this in a new file. I also know that it always ends with 00 00.
So the file can look like: bla bla sdsfsdf 45 3F 08 DF 5D 00 00 dsafasdfsadf 45 3F 07 D3 5F 00 00 xztert
And should be put in a new file like this:
08 DF 5D
07 D3 5F
How can I do that?
I have tried:
output = open('file.txt','r')
my_data = output.read()
print re.findall(r"45 3F[0-9a-fA-F]", my_data)
but it only prints:
[]
Any suggestions?
Thank you :-)

Here's a complete answer (Python 3):
with open('file.txt') as reader:
my_data = reader.read()
matches = re.findall(r"45 3F ([0-9A-F]{2} [0-9A-F]{2} [0-9A-F]{2})", my_data)
data_to_write = " ".join(matches)
with open('out.txt', 'w') as writer:
writer.write(data_to_write)
re.findall returns the capturing group, so there's no need to clear out '45 3F'.
When dealing with files, use with open so that the file descriptor will be released when indentation ends.
if you want to accept the first two octets dynamically:
search_string = raw_input()
regex_pattern = search_string.upper() + r" ([0-9A-F]{2} [0-9A-F]{2} [0-9A-F]{2})"
matches = re.findall(regex_pattern, my_data)

Python parsing serial hex string with fixed format

I am successfully communicating with a simple device over serial, with a specific request packet, and a fixed format return packet is received. Im starting with python so I can use this on multiple devices, I'm only really used to PHP/C.
For example, I send the following as hex:
12 05 0b 03 1f
and in return I get
12 05 0b 03 1f 12 1D A0 03 18 00 22 00 00 CA D4 4F 00 00 22 D6 99 18 00 70 80 00 80 00 06 06 00 00 D9
I know how the packet is constructed, the first 5 bytes is the data that was sent. The next 3 bytes are an ID, the packet length, and a response code. Its commented in my code here:
import serial, time
ser = serial.Serial(port='COM1', baudrate=9600, timeout=0, parity=serial.PARITY_EVEN, stopbits=serial.STOPBITS_ONE, bytesize=serial.EIGHTBITS)
while True:
# Send The Request - 0x12 0x05 0x0B 0x03 0x1F
ser.write("\x12\x05\x0B\x03\x1F")
# Clear first 5 bytes (Original Request is Returned)
ser.read(5)
# Response Header (3 bytes)
# - ID (Always 12)
# - Packet Length (inc these three)
# - General Response (a0 = Success, a1 = Busy, ff = Bad Command)
ResponseHeader = ser.read(3).encode('hex')
PacketLength = int(ResponseHeader[2:4],16)
if ResponseHeader[4:6] == "a0":
# Response Success
ResponseData = ser.read(PacketLength).encode('hex')
# Read First Two Bytes
Data1 = int(ResponseData[0:4],16)
print Data1
else:
# Clear The Buffer
RemainingBuffer = ser.inWaiting()
ser.read(RemainingBuffer)
time.sleep(0.12)
To keep it simple for now, I was just trying to read the first two bytes of the actual response (ResponseData), which should give me the hex 0318. I then want to output that as a decimal =792. The program is meant to run in a continuous loop.
Some of the variables in the packet are one byte, some are two bytes. Although, up to now I'm just getting an error:
ValueError: invalid literal for int() with base 16: ''
I'm guessing this is due to the format of the data/variables I have set, so not sure if I'm even going about this the right way. I just want to read the returned HEX data in byte form and be able to access them on an individual level, so I can format/output them as required.
Is there a better way to do this? Many thanks.

I recommend using the struct module to read binary data, instead of recoding it using string functions to hex and trying to parse the hex strings.

As your code stands now, you send binary (not hex) data over the wire, and receive binary (not hex) data back from the device. Then you convert the binary data to hex, only to convert it again to Python variables.
Let's skip the extra conversion step by using struct.unpack:
# UNTESTED
import struct
...
while True:
# Send The Request - 0x12 0x05 0x0B 0x03 0x1F
ser.write("\x12\x05\x0B\x03\x1F")
# Clear first 5 bytes (Original Request is Returned)
ser.read(5)
# Response Header (3 bytes)
# - ID (Always 12)
# - Packet Length (inc these three)
# - General Response (a0 = Success, a1 = Busy, ff = Bad Command)
ResponseHeader = ser.read(3)
ID,PacketLength,Response = struct.unpack("!BBB", ResponseHeader)
if Response == 0xa0:
# Response Success
ResponseData = ser.read(PacketLength)
# Read First Two Bytes
Data1 = struct.unpack("!H", ResponseData[0:2])
print Data1
else:
# Clear The Buffer
RemainingBuffer = ser.inWaiting()
ser.read(RemainingBuffer)

Problem writing unicode UTF-16 data to file in python

I'm working on Windows with Python 2.6.1.
I have a Unicode UTF-16 text file containing the single string Hello, if I look at it in a binary editor I see:
FF FE 48 00 65 00 6C 00 6C 00 6F 00 0D 00 0A 00
BOM H e l l o CR LF
What I want to do is read in this file, run it through Google Translate API, and write both it and the result to a new Unicode UTF-16 text file.
I wrote the following Python script (actually I wrote something more complex than this with more error checking, but this is stripped down as a minimal test case):
#!/usr/bin/python
import urllib
import urllib2
import sys
import codecs
def translate(key, line, lang):
ret = ""
print "translating " + line.strip() + " into " + lang
url = "https://www.googleapis.com/language/translate/v2?key=" + key + "&source=en&target=" + lang + "&q=" + urllib.quote(line.strip())
f = urllib2.urlopen(url)
for l in f.readlines():
if l.find("translatedText") > 0 and l.find('""') == -1:
a,b = l.split(":")
ret = unicode(b.strip('"'), encoding='utf-16', errors='ignore')
break
return ret
rd_file_name = sys.argv[1]
rd_file = codecs.open(rd_file_name, encoding='utf-16', mode="r")
rd_file_new = codecs.open(rd_file_name+".new", encoding='utf-16', mode="w")
key_file = open("api.key","r")
key = key_file.readline().strip()
for line in rd_file.readlines():
new_line = translate(key, line, "ja")
rd_file_new.write(unicode(line) + "\n")
rd_file_new.write(new_line)
rd_file_new.write("\n")
This gives me an almost-Unicode file with some extra bytes in it:
FF FE 48 00 65 00 6C 00 6C 00 6F 00 0D 00 0A 00 0A 00
20 22 E3 81 93 E3 82 93 E3 81 AB E3 81 A1 E3 81 AF 22 0A 00
I can see that 20 is a space, 22 is a quote, I assume that "E3" is an escape character that urllib2 is using to indicate that the next character is UTF-16 encoded??
If I run the same script but with "cs" (Czech) instead of "ja" (Japanese) as the target language, the response is all ASCII and I get the Unicode file with my "Hello" first as UTF-16 chars and then "Ahoj" as single byte ASCII chars.
I'm sure I'm missing something obvious but I can't see what. I tried urllib.unquote() on the result from the query but that didn't help. I also tried printing the string as it comes back in f.readlines() and it all looks pretty plausible, but it's hard to tell because my terminal window doesn't support Unicode properly.
Any other suggestions for things to try? I've looked at the suggested dupes but none of them seem to quite match my scenario.

I believe the output from Google is UTF-8, not UTF-16. Try this fix:
ret = unicode(b.strip('"'), encoding='utf-8', errors='ignore')

Those E3 bytes are not "escape characters". If one had no access to documentation, and was forced to make a guess, the most likely suspect for the response encoding would be UTF-8. Expectation (based on a one-week holiday in Japan): something like "konnichiwa".
>>> response = "\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF"
>>> ucode = response.decode('utf8')
>>> print repr(ucode)
u'\u3053\u3093\u306b\u3061\u306f'
>>> import unicodedata
>>> for c in ucode:
... print unicodedata.name(c)
...
HIRAGANA LETTER KO
HIRAGANA LETTER N
HIRAGANA LETTER NI
HIRAGANA LETTER TI
HIRAGANA LETTER HA
>>>
Looks close enough to me ...

Reading and writing binary data of dynamic sizes question

I am trying to read data from a file in binary mode and manipulate that data.
try:
resultfile = open("binfile", "rb")
except:
print "Error"
resultsize = os.path.getsize("binfile")
There is a 32 byte header which I parse fine then the buffer of binary data starts. The data can be of any size from 16 to 4092 and can be in any format from text to a pdf or image or anything else. The header has the size of the data so to get this information I do
contents = resultfile.read(resultsize)
and this puts the entire file into a string buffer. I found out this is probably my problem because when I try to copy chunks of the hex data from "contents" into a new file some bytes do not copy correctly so pdfs and images will come out corrupted.
Printing out a bit of the file string buffer in the interpreter yields for example something like "%PDF-1.5\r\n%\xb5\xb5\xb5\xb5\r\n1 0 obj\r\n" when I just want the bytes themselves in order to write them to a new file. Is there an easy solution to this problem that I am missing?
Here is an example of a hex dump with the pdf written with my python and the real pdf:
25 50 44 46 2D 31 2E 35 0D 0D 0A 25 B5 B5 B5 B5 0D 0D 0A 31 20 30 20 6F 62 6A 0D 0D 0A
25 50 44 46 2D 31 2E 35 0D 0A 25 B5 B5 B5 B5 0D 0A 31 20 30 20 6F 62 6A
It seems like a 0D is being added whenever there is a 0D 0A. In image files it might be a different byte, I don't remember and might have to test it.
My code to write the new file is pretty simple, using contents as the string buffer holding all the data.
fbuf = contents[offset+8:size+offset]
fl = open(fname, 'a')
fl.write(fbuf)
This is being called in a loop based on a signature that is found in the header. Offset+8 is beginning of actual pdf data and size is the size of the chunk to copy.

You need to open your output file in binary mode, as you do your input file. Otherwise, newline characters may get changed. You can see that this is what happens in your hex dump: 0A characters ('\n') are changed into OD 0A ('\r\n').
This should work:
input_file = open('f1', 'rb')
contents = input_file.read()
#....
data = contents[offset+8:size+offset] #for example
output_file = open('f2', 'wb')
output_file.write(data)

The result you get is "just the bytes themselves". You can write() them to an open file to copy them.
"It seems like a 0D is being added whenever there is a 0D 0A"
Sounds like you are on windows, and you are opening one of your files in text mode instead of binary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.