.doc files, .pdf files, and some image formats all contain metadata about the file, such as the author.
Is a .py file just a plain text file whose contents are all visible once opened with a code editor like Sublime, or does it also contain metadata? If so, how does one access this metadata?
On Linux and most Unixes, .py's are just text (sometimes unicode text).
On Windows and Mac, there are cubbyholes where you can stash data, but I doubt Python uses them.
.pyc's, on the other hand, have at least a little metadata stuff in them - or so I've heard. Specifically: there's supposed to be a timestamp in them, so that if you copy a filesystem hierarchy, python won't automatically recreate all the .pyc's on import. There may or may not be more.
Related
I want to read VFP vcx files as plaintext in python. Any tips on how I should go about it?
I understand that the mime type of the file is an octet stream, which is typically associated with binary files. Also apparent is that VFP uses vcx file in combination with vct files to display the initial Source code.
I have been trying some static code analysis methods to extract the information that I need from the vct file, but I had no luck since the control characters mess up even the legible parts of the vct file, which is very hard to automate.
I have searched for weeks. This is my last resort before going into VFP and scraping it manually.
Any help is mich appreciated.
You have a few options:
Fernando Bozzo's Foxbin is a github repository with some VFP code to convert vcx, scx ... to prg files.
In VFP tools menu there is View Class Code option
There is scctext that ships with VFP.
All the above generate VFP prg files which are text. But probably that is not what you meant. Then you could simply open a vcx as a table (it is a table with a vcx extension) and read all the object names, properties, methods and such.
I want to read VFP vcx files as plaintext in python
For what it's worth, *.vcx / *.vct files are just renamed dBase/xBase *.DBF/ *.FPT file pairs, just like *.scx \ *.sct VFP Form files. So you could probably use something like dbfread
Good morning all,
I've made a Python script that adds text on top of images, based on a preset template. I'm now developing a template editor that will let the user edit the template in GUI, then save the template as a config file. The idea is that one user can create a template, export it, send it to a new user on a separate computer, who can import it into their config file. The second user will retain full edit abilities on the template (if any changes needs to be made).
Now, in addition to the text, I also want the ability to add up to two images (company logos, ect.) to the template/stills. Now, my question: Is there a way to convert a JPG to pure text data, that can be saved to a config file, and that can be reinterpreted to a JPG at the receiving system. And if not, what would be the best way to achieve this? What I'm hoping to avoid is the user having to send the image files separately.
Sounds questionable that you want to ship an image as text file (it's easy, base64 is supplied with python, but it drastically increases the amount of bytes. I'd strongly recommend not doing that).
I'd rather take the text and embed it in the image metadata! That way, you would still have a valid image file, but if loaded with your application, that application could read the metadata, interpret it as text config.
There's EXIF and XMP metadata, for both there's python modules.
Alternatively, would make more sense to simply put images and config files into one archive file (you know .docx word documents? They do exactly that, just like .odt; java jar files? Same. Android APK files? All archive files with multiple files inside) python brings a zip module to enable you to do that easily.
Instead of an archive, you could also build a PDF file. That way, you could simply have the images embedded in the PDF, the text editable on top of it, any browser can display it, and the text stays editable. Operating on pdf files can be done in many ways, but I like Fitz from the PyMuPDF package. Just make a document the size of your image, add the image file, put the text on top. On the reader side, find the image and text elements. It's relatively ok to do!
PDF is a very flexible format, if you need more config that just text information, you can add arbitrary text streams to the document that are not displayed.
If I understand properly, you want to use the config file as a settings file that stores the preferences of a user, you could store such data as JSON/XML/YAML or similar, such files are used to store data in pure readable text than binary can be parsed into a Python dict object. As for storing the images, you can have the generated images uploaded to a server then use their URL when they are needed to re-download them, unless if I didn’t understand the question?
I want to check string is valid extension in Python.
For example, I have string png or .png and I want to check if it is exist extension. So I think I need list of extensions like ["png", "jpg", "pdf", "txt", ....] but I can't find it anywhere.
Does anyone have a way to do this or have a list of extensions?
I'm using Python 3.8, and Window 10. Thanks.
“Valid extension” has no sense as extensions are purely arbitrary representation metadata. You can for instance open a PNG file whose name does not end with .png in any image viewer (blatantly ill-behaved exceptions aside). Files actually do not even need to have an extension at all.
If what you want is an extension list to be used in contexts like file pickers (that filter files according to their extension), then that list is up to your application domain: you must define what file types (and hence their common extensions) you want to support.
For a (long but still not exhaustive) list of common file extensions, you may check out websites such as http://www.fileextension.org or https://www.extension.info (many others exist).
I am writing a Python script to index a large set of Windows installers into a DB.
I would like top know how to read the metadata information (Company, Product Name, Version, etc) from EXE, MSI and ZIP files using Python running on Linux.
Software
I am using Python 2.6.5 on Ubuntu 10.04 64-bit with Django 1.2.1.
Found so far:
Windows command line utilities that can extract EXE metadata (like filever from SysUtils), or other individual CL utils that only work in Windows. I've tried running these through Wine but they have problems and it hasn't been worth the work to go and find the libs and frameworks that those CL utils depend on and try installing them in Wine/Crossover.
Win32 modules for Python that can do some things but won't run in Linux (right?)
Secondary question:
Obviously changing the file's metadata would change the MD5 hashsum of the file. Is there a general method of hashing a file independent of the metadata besides locating it and reading it in (ex: like skipping the first 1024 byes?)
Take a look at this library: http://bitbucket.org/haypo/hachoir/wiki/Home and this example program that uses the library: http://pypi.python.org/pypi/hachoir-metadata/1.3.3. The second link is an example program which uses the Hachoir binary file manipulation library (first link) to parse the metadata.
The library can handle these formats:
Archives: bzip2, gzip, zip, tar
Audio: MPEG audio ("MP3"), WAV, Sun/NeXT audio, Ogg/Vorbis (OGG), MIDI, AIFF, AIFC, Real audio (RA)
Image: BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF
Misc: Torrent
Program: EXE
Video: ASF format (WMV video), AVI, Matroska (MKV), Quicktime (MOV), Ogg/Theora, Real media (RM)
Additionally, Hachoir can do some file manipulation operations which I would assume includes some primitive metadata manipulation.
The hachoir-metadata get the "Product Version" but the compilers changes the "File Version".
Then the version returned is not the we need.
I found a small a well working soluction:
http://pev.sourceforge.net/
I've tested with success. It's simple, fast and stable.
To answer one of your questions, you can use the zipfile module, specifically the ZipInfo object to get the metadata for zip files.
As for hashing only the data of the file, you can only to that if you know which parts are data and which are metadata. There can be no general method as many file formats store their metadata differently.
To answer your second question: no, there is no way to hash a PE file or ZIP file, ignoring the metadata, without locating and reading the metadata. This is because the metadata you're interested in is stored at variable locations in the file.
In the case of PE files (EXE, DLL, etc), it's stored in a resource block, typically towards the end of the file, and a series of pointers and tables at the start of the file gives the location.
In the case of ZIP files, it's scattered throughout the archive -- each included file is preceded by its own metadata, and then there's a table at the end giving the locations of each metadata block. But it sounds like you might actually be wanting to read the files within the ZIP archive and look for EXEs in there if you're after program metadata; the ZIP archive itself does not store company names or version numbers.
I have a .rtf file that contains nothing but an integer, say 15. I wish to read this integer in through python and manipulate that integer in some way. However, it seems that python is reading in much of the metadata associated with .rtf files. Why is that? How can I avoid it? For example, trying to read in this file, I get..
{\rtf1\ansi\ansicpg1252\cocoartf949\cocoasubrtf460
{\fonttbl\f0\fswiss\fcharset0
Helvetica;}
{\colortbl;\red255\green255\blue255;}
\margl720\margr720\margb720\margt720\vieww9000\viewh8400\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\ql\qnatural\pardirnatural
That's the nature of .RTF (i.e Rich Text files), they include extra data to define how the text is layed-out and formated.
It is not recommended to store data in such files lest you encounter the difficulties you noted. Would you go through the effort to parse this file and "recover" your one numeric value, you may expose your application to the risk of updated versions of the RTF format which may render the parsing logic partially incorrect and hence yield wrong numeric data for the application).
Why not store this info in a true text file. This could be a flat text file or preferably an XML, YAML, JSON file for example for added "forward" compatibility as your application and you may add extra parameters and such in the file.
If this file is a given, however, there probably exist Python libraries to read and write to it. Check the Python Package Index (PyPI) for the RTF keyword.
That's exactly what the RTF file contains, so Python (in the absence of further instruction) is giving you what the file contains.
You may be looking for a library to read the contents of RTF files, such as pyrtf-ng.