finding fonts by examining a hex dump of a psd file

finding fonts by examining a hex dump of a psd file - python

I want to be able to find out what fonts a psd file has using Python. I was able to read a psd file as a binary file and convert the contents into hex.
>>> with open(test_file,'rb') as f:
... content = f.read()
... hex_content = binascii.hexlify(content)
Then I decoded the hex contents into a text file.
>>> with open('./decoded1.txt', 'w') as f:
... f.write(hex_content.decode("hex"))
Near the bottom of the decoded file, I found some sort of header named /FontSet, which I think is what I am looking for.
/FontSet [
<<
/Name (þÿ A d o b e I n v i s F o n t)
/Script 0
/FontType 0
/Synthetic 0
>>
<<
/Name (þÿ M y r i a d P r o - R e g u l a r)
/Script 0
/FontType 0
/Synthetic 0
>>
]
Am I on the right track? I recognize MyriadPro-Regular as the font used in my test file. What is AdobeInvisFont? Is this the Adobe Blank font?

Related

Not able to read rpt file using Python 3

I am trying to read a .rpt file using the python code:
>>> with open(r'C:\Users\lenovo-pc\Desktop\training2.rpt','r',encoding = 'utf-8', errors = 'replace') as d:
... count = 0
... for i in d.readlines():
... count = count + 1
... print(i+"\n")
...
...
u
i
d
|
e
x
p
i
d
|
n
a
m
e
|
d
o
m
a
i
n
And I am getting the following result as mentioned above.
Kindly, let me know how I can read the .rpt file using python3.

This is, indeed, strange behavior. While I can not easily reproduce the error without knowing the format of the .rpt file here are some hints what might go wrong. I assume it looks something like this:
uid|expid|name|domain
...
Which can be read and printed with the following code:
with open(r'C:\Users\lenovo-pc\Desktop\training2.rpt','r',encoding = 'utf-8', errors = 'replace') as rfile:
count = 0
for line in rfile:
count += 1
print(line.strip()) # this removes white spaces, line breaks etc.
However, the problem seems to be that you iterate over the string of the first line in your file instead of the lines in the file. That would produce the patter of you see, as the print() function adds a line break (in addition to the one you add manually). This leaves you with on character per line (followed by two line breaks).
>>> for i in "foo":
... print(i+"\n")
f
o
o
Make sure you did not reuse variable names from earlier in the session and do not overwrite the file object.

How do I print a range of lines after a specific pattern into separate files when this pattern appears several times in an input file

Sorry for my previous post, I had no idea what I was doing. I am trying to cut out certain ranges of lines in a given input file and print that range to a separate file. This input file looks like:
18
generated by VMD
C 1.514895 -3.887949 2.104134
C 2.371076 -2.780954 1.718424
C 3.561071 -3.004933 1.087316
C 4.080424 -4.331872 1.114878
C 3.289761 -5.434047 1.607808
C 2.018473 -5.142150 2.078551
C 3.997237 -6.725186 1.709355
C 5.235126 -6.905640 1.295296
C 5.923666 -5.844841 0.553037
O 6.955216 -5.826197 -0.042920
O 5.269004 -4.590026 0.590033
H 4.054002 -2.184680 0.654838
H 1.389704 -5.910354 2.488783
H 5.814723 -7.796634 1.451618
O 1.825325 -1.537706 1.986256
H 2.319215 -0.796042 1.550394
H 3.390707 -7.564847 2.136680
H 0.535358 -3.663175 2.483943
18
generated by VMD
C 1.519866 -3.892621 2.109595
I would like to print every 100th frame starting from the first frame into its own file named "snapshot0.xyz" (The first frame is frame 0).
For example, the above input shows two snapshots. I would like to print out lines 1:20 into its own file named snapshot0.xyz and then skip 100 (2000 lines) snapshots and print out snapshot1.xyz (with the 100th snapshot). My attempt was in python, but you can choose either grep, awk, sed, or Python.
My input file: frames.dat
1 #!/usr/bin/Python
2
3
4
5 mest = open('frames.dat', 'r')
6 test = mest.read().strip().split('\n')
7
8 for i in range(len(test)):
9 if test[i] == '18':
10 f = open("out"+`i`+".dat", "w")
11 for j in range(19):
12 print >> f, test[j]
13 f.close()

I suggest using the csv module for this input.
import csv
def strip_empty_columns(line):
return filter(lambda s: s.strip() != "", line)
def is_count(line):
return len(line) == 1 and line[0].strip().isdigit()
def is_float(s):
try:
float(s.strip())
return True
except ValueError:
return False
def is_data_line(line):
return len(line) == 4 and is_float(line[1]) and is_float(line[2]) and is_float(line[3])
with open('frames.dat', 'r') as mest:
r = csv.reader(mest, delimiter=' ')
current_count = 0
frame_nr = 0
outfile = None
for line in r:
line = strip_empty_columns(line)
if is_count(line):
if frame_nr % 100 == 0:
outfile = open("snapshot%d.xyz" % frame_nr, "w+")
elif outfile:
outfile.close()
outfile = None
frame_nr += 1 # increment the frame counter every time you see this header line like '18'
elif is_data_line(line):
if outfile:
outfile.write(" ".join(line) + "\n")
The opening post mentions to write every 100th frame to an output file named snapshot0.xyz. I assume the 0 should be a counter, ot you would continously overwrite the file. I updated the code with a frame_nr counter and a few lines which open/close an output file depending on the frame_nr and write data if an output file is open.

This might work for you (GNU sed and csplit):
sed -rn '/^18/{x;/x{100}/z;s/^/x/;x};G;/\nx$/P' file | csplit -f snapshot -b '%d.xyz' -z - '/^18/' '{*}'
Filter every 100th frame using sed and pass that file to csplit to create the individual files.

Facing issue while listing file contents in cygwin

Context: I want to install ".msi" file on remote windows machine via python script.
I have installed cygwin on remote windows machine and ssh service is running. I execute the command via ssh on remote windows machine from Linux host using python script. For installation of msi file i have used below command:
msiexec /package "msi file name" /quiet /norestart /log "log file name (say instlog.log)"
Now, to verify that installation is successful i list the contents of log file (instlog.log) and checks for string "Installation success or error status: 0".
Problem:
"type" command does not work in cygwin. So i tried "cd {0}; cat {1} | tail -5".format(FileLocation, FileName) to list file contents but i am getting output in different format and python script is unable to match above mentioned string in output. This is want i want to display on console:
MSI (s) (64:74) [18:03:51:360]: Windows Installer installed the product. Product Name: pkg-name. Product Version: 0.2.24-10891. Product Language: 1033. Manufacturer: XYZ Company. Installation success or error status: 0.
And what i am actually getting is:
M S I ( s ) ( 6 4 : 7 4 ) [ 1 8 : 0 3 : 5 1 : 3 6 0 ] : W i n d o w s I n s t a l l e r i n s t a l l e d t h e p r o d u c t . P r o d u c t N a m e : p k g - n a m e . P r o d u c t V e r s i o n : 0 . 2 . 2 4 - 1 0 8 9 1 . P r o d u c t L a n g u a g e : 1 0 3 3 . M a n u f a c t u r e r : X Y Z C o m p a n y . I n s t a l l a t i o n s u c c e s s o r e r r o r s t a t u s : 0 .
So somehow an extra space is introduced after each character in output. I want to know how can i get output in a normal way rather than space separated format. Thank you.

The problem is that msiexec saved its log file in Unicode format. In Windows Unicode consists of 2 chars (meaning that each character that you see is stored in memory as 2 bytes or chars): the first is the codepage number and the second is the entry of the character in that codepage (that is the character itself). Because you're running on an English version the codepage number is 0 (or \0 or \x00 or NULL). Some popular editors are smart enough to figure the encoding out and only display the characters (leaving the interleaved NULL chars aside). Now there are some ways to get through this.
Upgrade cygwin. On my computer (I also have Cygwin installed) I don't experience this problem (my Cygwin is using: GNU coreutils 8.15 - this can be seen for example by typing tail --version). Here are some outputs (I included the hexdump at the end to show you that the file is in unicode format):
cat unicode.txt
yields: unicode chars
tail unicode.txt
yields: unicode chars
hexdump unicode.txt
yields:
0000000 0075 006e 0069 0063 006f 0064 0065 0020
0000010 0063 0068 0061 0072 0073 000d 000a
000001e
Convert the msiexec logs to ASCII format. I am not aware of any native tool that does that but you can Google search for unicode to ascii converter and download such a tool; or as I mentioned earlier there are editors that understand unicode, one that i've already tried and is able to convert files from unicode to ascii is Textpad; or you can write the tool yourself.
If you're reading the msi log file from python you could handle the unicode files from the script. I assume that you have some code that reads the file contents like (!!! I didn't include any exception handling !!!):
f = open("some_msi_log_file.log", "rb")
text = f.read()
f.close()
and you're doing the processing on text. If you modify the code above to:
f = open("some_msi_log_file.log", "rb")
unicode_text = f.read()
f.close()
text = "".join([char for char in unicode_text if char != '\x00'])
text won't contain the \x00s anymore (and will also work with regular ASCII files).

The log file should be converted to a 8 bit wide format like UTF8. This could be achieved with iconv command. You should install it with cygwin installer, and after that use the following command:
iconv -f ucs2 -t utf8 instlog.log > instlog2.log

Reading file in Fortran-90 written out by Python f.write()

So someone wrote this code which outputs x,y,z positions of some particles.
if rs.contains_block(file+'.hdf5',"POS ",parttype=1):
d1 = rs.read_block(file, "POS ",parttype=1,verbose=False)
blocksize = struct.pack('I', len(d1)*8*3)
f.write(blocksize)
for i in range(len(d1)):
for j in range(3):
data = struct.pack('d',d1[i][j])
f.write(data)
f.write(blocksize)
print 'Position real', d1, k
else:
data = struct.pack('I',0)
f.write(data)
f.write(data)
Or basically:
for i = 1:Nparticles
for j = 1:3
write xyz[i][j]
end
end
Now I am trying to read this back in F90 like:
real*4, allocatable :: pos(:,:)
N=sum(Nparticles)
open (1, file=filename, form='unformatted')
allocate(pos(1:3,1:N))
read (1) pos
close(1)
When I print the first three rows I get:
do i =1,3
print *, pos(1:3,i)
end do
>> 0.00000000 2.61613369 -2.00000000
1.88289821 -2.00000000 1.96834707
2.00000000 2.61616445 2.00000000
Now I know for certain that the positions range from 0 - 25 so getting -2.0000 everywhere is concerning. Is there something about my friends Python write out (f.write()) that that I need to tell Fortran to do during the read in so it outputs the positions correctly? Leading characters? Allocation correction? I've used a Python read-in and I get the following for the first 3 entries:
>>> pos = rs.read_block("filename","POS ",parttype=1)
>>> pos(1:3,:)
array([[ 12.49398994, 21.89432526, 6.23691988],
[ 12.48858261, 21.89297867, 6.23258686],
[ 12.48777962, 21.89576149, 6.23423147],
Which is not what I get when I do the Fortran read in above.
Thanks.

Reading path names from a file in Python under Windows

I have a Python script that read a list of path names from a file and open them using the gzip module. It works well under Linux. But when I used it under Windows, I met an error when calling the gzip.open function. The error message is as follows:
File "C:\dev_tools\Python27\lib\gzip.py", line 34, in open
return GzipFile(filename, mode, compresslevel)
File "C:\dev_tools\Python27\lib\gzip.py", line 89, in __init__
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
The filename should be something like
'G:\ext_pt1\cfx33_50instr4_testset\cfx33_50instr4_0-99\cfx33_50instr4_cov\cfx33_50instr4_id0_cov\cfx33_50instr4_id0.detail.rpt.gz'
But when I printed the filename, it printed out something like
' ■G : \ e x t _ p t 1 \ c f x 3 3 _ 5 0 i n s t r 4 _ t e s t s e t \
c f x 3 3 _ 5 0 i n s t r 4 _ 0 - 9 9 \ c f x 3 3 _ 5 0 i n s t r 4 _
c o v \ c f x 3 3 _ 5 0 i n s t r 4 _ i d 0 _ c o v \ c f x 3 3 _ 5 0
i n s t r 4 _ i d 0 . d e t a i l . r p t . g z'
And when I printed repr(filename), it printed out something like
'\xff\xfeG\x00:\x00\\x00e\x00x\x00t\x00_\x00p\x00t\x001\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00s\x00t\x00r\x004\x00_\x00t\x00e\x00s\x00t\x00s\x00e\x00t\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00\x00t\x
00r\x004\x00_\x000\x00-\x009\x009\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00\x00t\x00r\x004\x00_\x00c\x00o\x00v\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00s\x00t\x00r\x004\x00_\x00i\x00d\x000\x00_\x00c\x00o\x00v\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00s\x00t\x00r\x004\x00_\x00i\x00d\x000\x00.\x00d\x00e\x00t\x00a\x00i\x00l\x00.\x00r\x00p\x00t\x00.\x00g\x00z\x00'
I don't know why Python added those spaces (possibly the NULL bytes?) when it read the file. Does anyone have any clue?

Python has not added anything; it has merely read what is in the file. You have a little-endian UTF-16 string there, as you can plainly tell by the byte-order mark in the first two bytes. If you are not expecting this, you could convert it to ASCII (assuming it doesn't have any non-ASCII characters).
# convert mystring from little-endian UTF-16 with optional BOM to ASCII
mystring = unicode(mystring, encoding="utf-16le").encode("ascii", "ignore")
Or just convert it to proper Unicode and use it that way, if Windows will tolerate it:
mystring = unicode(mystring, encoding="utf-16le").lstrip(u"\ufeff")
Above, I have manually specified the byte order and then stripped off the BOM, rather than specifying "utf-16" as the encoding and letting Python figure out the byte order. This is because the BOM is going to be found once at the beginning of the file, not at the beginning of each line, so if you are converting the lines to Unicode one at a time, you won't have a BOM most of the time.
However, it might make more sense to go back to the source of that file and figure out why it's being saved in little-endian UTF-16 if you expected ASCII. Is the file generated the same way on Linux and Windows, for instance? Has it been touched by a text editor that defaults to saving as Unicode? Etc.

It seems that the encoding of your file has some problem. The printed file name pasted in your question is not the normal character. Have you saved your path-list file in unicode format?

I had the same problem. I replaced \ with / and it was ok. Just wanted you to remind this possibility before going into more advanced remedies.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

finding fonts by examining a hex dump of a psd file - python

Related

Not able to read rpt file using Python 3

How do I print a range of lines after a specific pattern into separate files when this pattern appears several times in an input file

Facing issue while listing file contents in cygwin

Reading file in Fortran-90 written out by Python f.write()

Reading path names from a file in Python under Windows

Categories

Resources