Most Effecient way to parse Evtx files for specific content

Most Effecient way to parse Evtx files for specific content - python

I have hundreds of gigs of Evtx security event logs I want to parse for specific Event IDs (4624) and usernames (joe) based on the Event IDs. I have attempted to use Powershell cmdlet like below:
get-winevent -filterhashtable #{Path="mypath.evtx"; providername="securitystuffprovider"; id=4624}
I know I can pass a variable containing a list to the Path parameter for all of my evtx files, but I am unable to filter based on a subset of the message of the EVTX. Also, this takes an incredibly long time to parse just one Evtx file much less 150 or so. I know there is a python package to parse Evtx but I am not sure how that would look as the python-evtx parser doesn't provide great examples of importing and using the package itself. I can not extract all of the data into csv as that would take too much disk space. Any ideas on how would be amazing. Thanks.

Use -Path with the -FilterXPath parameter, and then filter using an XPath expression like so:
$Username = 'jdoe'
$XPathFilter = "*[System[(EventID=4624)] and EventData[Data[#Name='SubjectUserName'] and (Data='$Username')]]"
Get-WinEvent -Path C:\path\to\log\files\*.evtx -FilterXPath $XPathFilter

Related

Is there a way to feed downloaded xml continuously into XMLPullParser?

In my python script, I'm downloading some XML from a url. It contains the a list of elements within the root element. It really takes quite some time to do so and since the documentation of etree suggested to use the XMLPullParser for things like that, I wanted to try it, but didn't find any way of continuously reading the url into the XMLPullParser. I had hoped to already be able to process the list entries one by one that way, while still downloading. Anyone any idea?

You could try using urllib.request.urlopen from the standard library. Like open, you can use this as a context manager;
with urllib.request.urlopen("http://www.python.org/") as uf:
while True:
data = uf.read(1024) # read returns empty string when finished.
if data:
# feed to pullparser here...
print(data)
else:
break;

Is it possible to insert text in a existing .otp file through Python?

I'm looking for a way/module to insert a string of text formatted by position and text size, into an already existing .otp file(OpenDocument Presentation Template/Libre&OpenOffice version of a .ppt file). Preferably through Python. I've tried googling but the only thing I find is about macros and I'm not sure if that will work the way I want it to.
As an example I'm looking for an end product like this, where the text is inserted through running a Python script.
I imagine the outline of the script to look something like:
import modules
file = 'cat.otp'
open file
some function for inserting formatted text(right size, text size etc) in otp file

you wrote .opt in text and in pseudo-code .otp, but extension should be .odp for a presentation file. otp is for a master. If you just want to add content this should work for both:
While there is a convenient lib for pptx: python-pptx I don't know one for OpenOffice-Files.
If you don't have to create from scratch but just edit something (and it sounds like that) you may unpack the file via zipfile, edit the content.xml and zip back the whole file structure to odp-file. If you take your demo-file and read the xml (use notepad++ with XML-Tools extension and pretty print function to de-linearize) it is quite self-explaining, you'll get an idea pretty fast.

python-libtorrent torrent_info method

I've been using python-libtorrent to check what pieces belong to a file in a torrent containing multiple files.
I'm using the below code to iterate over the torrent file
info = libtorrent.torrent_info('~/.torrent')
for f in info.files():
print f
But this returns <libtorrent.file_entry object at 0x7f0eda4fdcf0> and I don't know how to extract information from this.
I'm unaware of the torrent_info property which would return piece value information of various files. Some help is appreciated.

the API is documented here and here. Obviously the python API can't always be exactly as the C++ one. But generally the interface takes a file index and returns some property of that file.

Parameter with dictionary path

I am very new to Python and am not very familiar with the data structures in Python.
I am writing an automatic JSON parser in Python, the JSON message is read into a dictionary using Ultra-JSON:
jsonObjs = ujson.loads(data)
Now, if I try something like:
jsonObjs[param1][0][param2] it works fine
However, I need to get the path from an external source (I read it from the DB), we initially thought we'll just write in the DB:
myPath = [param1][0][param2]
and then try to access:
jsonObjs[myPath]
But after a couple of failures I realized I'm trying to access:
jsonObjs[[param1][0][param2]]
Is there a way to fix this without parsing myPath?
Many thanks for your help and advice

Store the keys in a format that preserves type information, e.g. JSON, and then use reduce() to perform recursive accesses on the structure.

abstracting the conversion between id3 tags, m4a tags, flac tags

I'm looking for a resource in python or bash that will make it easy to take, for example, mp3 file X and m4a file Y and say "copy X's tags to Y".
Python's "mutagen" module is great for manupulating tags in general, but there's no abstract concept of "artist field" that spans different types of tag; I want a library that handles all the fiddly bits and knows fieldname equivalences. For things not all tag systems can express, I'm okay with information being lost or best-guessed.
(Use case: I encode lossless files to mp3, then go use the mp3s for listening. Every month or so, I want to be able to update the 'master' lossless files with whatever tag changes I've made to the mp3s. I'm tired of stubbing my toes on implementation differences among formats.)

I needed this exact thing, and I, too, realized quickly that mutagen is not a distant enough abstraction to do this kind of thing. Fortunately, the authors of mutagen needed it for their media player QuodLibet.
I had to dig through the QuodLibet source to find out how to use it, but once I understood it, I wrote a utility called sequitur which is intended to be a command line equivalent to ExFalso (QuodLibet's tagging component). It uses this abstraction mechanism and provides some added abstraction and functionality.
If you want to check out the source, here's a link to the latest tarball. The package is actually a set of three command line scripts and a module for interfacing with QL. If you want to install the whole thing, you can use:
easy_install QLCLI
One thing to keep in mind about exfalso/quodlibet (and consequently sequitur) is that they actually implement audio metadata properly, which means that all tags support multiple values (unless the file type prohibits it, which there aren't many that do). So, doing something like:
print qllib.AudioFile('foo.mp3')['artist']
Will not output a single string, but will output a list of strings like:
[u'The First Artist', u'The Second Artist']
The way you might use it to copy tags would be something like:
import os.path
import qllib # this is the module that comes with QLCLI
def update_tags(mp3_fn, flac_fn):
mp3 = qllib.AudioFile(mp3_fn)
flac = qllib.AudioFile(flac_fn)
# you can iterate over the tag names
# they will be the same for all file types
for tag_name in mp3:
flac[tag_name] = mp3[tag_name]
flac.write()
mp3_filenames = ['foo.mp3', 'bar.mp3', 'baz.mp3']
for mp3_fn in mp3_filenames:
flac_fn = os.path.splitext(mp3_fn)[0] + '.flac'
if os.path.getmtime(mp3_fn) != os.path.getmtime(flac_fn):
update_tags(mp3_fn, flac_fn)

I have a bash script that does exactly that, atwat-tagger. It supports flac, mp3, ogg and mp4 files.
usage: `atwat-tagger.sh inputfile.mp3 outputfile.ogg`
I know your project is already finished, but somebody who finds this page through a search engine might find it useful.

Here's some example code, a script that I wrote to copy tags between
files using Quod Libet's music format classes (not mutagen's!). To run
it, just do copytags.py src1 dest1 src2 dest2 src3 dest3, and it
will copy the tags in sec1 to dest1 (after deleting any existing tags
on dest1!), and so on. Note the blacklist, which you should tweak to
your own preference. The blacklist will not only prevent certain tags
from being copied, it will also prevent them from being clobbered in
the destination file.
To be clear, Quod Libet's format-agnostic tagging is not a feature of mutagen; it is implemented on top of mutagen. So if you want format-agnostic tagging, you need to use quodlibet.formats.MusicFile to open your files instead of mutagen.File.
Code can now be found here: https://github.com/DarwinAwardWinner/copytags
If you also want to do transcoding at the same time, use this: https://github.com/DarwinAwardWinner/transfercoder
One critical detail for me was that Quod Libet's music format classes
expect QL's configuration to be loaded, hence the config.init line in my
script. Without that, I get all sorts of errors when loading or saving
files.
I have tested this script for copying between flac, ogg, and mp3, with "standard" tags, as well as arbitrary tags. It has worked perfectly so far.
As for the reason that I didn't use QLLib, it didn't work for me. I suspect it was getting the same config-related errors as I was, but was silently ignoring them and simply failing to write tags.

You can just write a simple app with a mapping of each tag name in each format to an "abstract tag" type, and then its easy to convert from one to the other. You don't even have to know all available types - just those that you are interested in.
Seems to me like a weekend-project type of time investment, possibly less. Have fun, and I won't mind taking a peek at your implementation and even using it - if you won't mind releasing it of course :-) .

There's also tagpy, which seems to work well.

Since the other solutions have mostly fallen off the net, here is what I came up, based on the python mediafile library (python3-mediafile in Debian GNU/Linux).
#!/usr/bin/python3
import sys
from mediafile import MediaFile
src = MediaFile (sys.argv [1])
dst = MediaFile (sys.argv [2])
for field in src.fields ():
try:
setattr (dst, field, getattr (src, field))
except:
pass
dst.save ()
Usage: mediafile-mergetags srcfile dstfile
It copies (merges) all tags from srcfile into dstfile, and seems to work properly with flac, opus, mp3 and so on, including copying album art.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.