I'm trying to anonymize SIP Traces by replacing all the phone numbers with random ones. I'm able to read the file and the numbers from it. What I can't do however is modify the file without corrupting it.
I've tried different parsers (pyshark, dpkt & scapy) and they're all great for reading the file. Modifying doesn't work however.
What I've tried:
"Brute Force" by just reading in the file, modifying it and saving
it as .pcap again. This obviously didn't work at all and Wireshark
complained about the file being cut short (which it was (probably for character reasons?)).
All the parsers. The problem with these is that I can read the file but I can't write to it without turning it all into a string which again, breaks the file.
Is there some kind of function in one of the libraries where I could replace a pattern with another one? Or do any of you have an idea of how I could solve this differently?
Thank you for your answers
Related
I'm very new to python and programming in general, and I'm looking to make a discord bot that has a lot of hand-written chat lines to randomly pick from and send back to the user. Making a really huge variable full of a list of sentences seems like a bad idea. Is there a way that I can store the chatlines on a different file and have the bot pick from the lines in that file? Or is there anything else that would be better, and how would I do it?
I'll interpret this question as "how large a variable is too large", to which the answer is pretty simple. A variable is too large when it becomes a problem. So, how can a variable become a problem? The big one is that the machien could possibly run out of memory, and an OOM killer (out-of-memory killer) or similiar will stop your program. How would you know if your variable is causing these issues? Pretty simple, your program crashes.
If the variable is static (with a size fully known at compile-time or prior to interpretation), you can calculate how much RAM it will take. (This is a bit finnicky with Python, so it might be easier to load it up at runtime and figure it out with a profiler.) If it's more than ~500 megabytes, you should be concerned. Over a gigabyte, and you'll probably want to reconsider your approach[^0]. So, what do you do then?
As suggested by #FishballNooodles, you can store your data line-by-line in a file and read the lines to an array. Unfortunately, the code they've provided still reads the entire thing into memory. If you use the code they're providing, you've got a few options, non-exhaustively listed below.
Consume a random number of newlines from the file when you need a line of text. You would look at one character at a time, compare it to \n, and read the line if you've encountered the requested number of newlines. This is O(n) worst case with respect to the number of lines in the file.
Rather than storing the text you need at a given index, store its location in a file. Then, you can seek to the location (which is probably O(1)), and read the text. This requires an O(n) construction cost at the start of the program, but would work much better at runtime.
Use an actual database. It's usually better not to reinvent the wheel. If you're just storing plain text, this is probably overkill, but don't discount it.
[^0]: These numbers are actually just random. If you control the server environment on which you run the code, then you can probably come up with some more precise signposts.
You can store your data in a file, supposedly named response.txt
and retrieve it in the discord bot file as open("response.txt").readlines()
I have a variable in my python script that holds the path to a file as a string. Is there a way, through a python module, etc, to keep the file path updated if the file were to be moved to another destination?
Short answer: no, not with a string.
Long answer: If you want to use only a string to record the location of this file, you're probably out of luck unless you use other tools to find the location of the file, or record whenever it moves - I don't know what those tools are.
You don't give a lot of info in your question about what you want this variable for; as #DeepSpace says in his comment, if you're trying to make this String follow the file between different runs of this program, then you'd be better off making an argument for the script. If, however, you expect the file to move sometime during the execution of your program, you might be able to use a file object to keep track of it. Or, rather - instead of keeping the filepath in memory, keep a file descriptor in memory instead (the kind you would generate by using the open() function, and just never close that file until the program terminates. You can use seek to return to the start of the file if you needed to read it multiple times. Problems with this include that it's not memory-safe, and it's absolutely not a best practice.
TL;DR
Your best bet is probably to go with a solution like #DeepSpace mentioned where you could go and call your script with a parameter in command-line which will then force the user to input a valid path.
This is actually a really good question, but unfortunately purely Pythonly speaking, this is impossible.
No python module will be able to dynamically linked a variable to a path on the file-system. You will need an external script or routine which will update any kind of data structure that will hold the path value.
Even then, the name of the file could change, but not it's location. here is what I mean by that.
Let's say you were to wrapped that file in a folder only containing that specific file. Since you now know that it's location is fixed (theoretically speaking), you can have another python script/routine that will read the filename and store it in a textfile. Your other script could go and get that file name (considering your routine would sync that file on a regular basis). But, as soon as the location of the file changes, how can you possibly know where it is now. It has to be manually hard coded somewhere to have something close to the behavior your expecting.
Note that my example is not in any way a solution to go-to for your problem. I'm actually trying to underline the shortcomings of such a feature.
I am reading an formatted sequential file output from a Fortran program. I am using the scipy.io.FortranFile class to do this, and am successfully extracting the information I need.
My problem: I do not know how long the input file is, and have no way of knowing how many records to read in. Currently, I am simply iteratively reading the file until an exception is raised (a TypeError, but I don't know if this is how it would always fail). I would prefer to do this more elegantly.
Is there anyway to detect EOF using the FortranFile class? Or alternately is there a better way to read in unformatted sequential files?
Some cursory research (I am not a Fortran programmer) indicates to me that if reading this using the Fortran READ function, one can check the IOSTAT flag to determine if you are at the end of the file. I would be surprised if a similar capability isn't provided in the FortranFile class, but I don't see any mention of it in the documentation.
I have an executable (converted to exe from python using py2exe) that outputs lists of numbers that could be from 0-50K lines long or a little bit more.
While developing, I just saved them to a TXT file using simple f.write.
The person wants to print this output on paper! (don't ask why lol)
So, I'm wondering if I can output it to something like HTML? XML? Something that could display tables of 50K lines and maybe 3 columns and that would also run in any PC without additional programs?
Suggestions?
EDIT:
Regarding CSV:
In most situations the best way in my opinion would be to make a CSV. I'm not opposing it in anyway, rather I think others might find Lott's answer useful for their cases. Sorry I didn't explain it that well in my question as far as my constraints go.
My constraints are: the user doesn't have an office suite, no python installed. Just think of a PC that has the bare minimum after a clean windows xp/vista installation, maybe Internet Explorer 7 or 8. This PC has to be able to open my output file and allow for reasonable viewing, searching, and printing.
CSV.
http://docs.python.org/library/csv.html
http://en.wikipedia.org/wiki/Comma-separated_values
They can load a spreadsheet and print anything they want.
If you can't install anything on the computer, the you might be best off outputting an HTML file with the data in a <table> that the user could view/search/print in IE.
You could use LaTeX to produce a PDF, maybe? But why exactly isn't a text file good enough?
You can produce a PDF using Reportlab. After all if you really want full control of the printed output, there's nothing that beats PDF.
Does 50k lines make too large a file? If not, just continue writing text files. Otherwise an easy solution would be to continue spitting out text files and compress them, e.g. with zip. You could use the zipfile library in Python. Most computers have no trouble reading zip files.
Question: How do you write data to an already existing file at the beginning of the file with out writing over what's already there and with out reading the entire file into memory? (e.g. prepend)
Info:
I'm working on a project right now where the program frequently dumps data into a file. this file will very quickly balloon up to 3-4gb. I'm running this simulation on a computer with only 768mb of ram. pulling all that data to the ram over and over will be a great pain and a huge waste of time. The simulation already takes long enough to run as it is.
The file is structured such that the number of dumps it makes is listed at the beginning with just a simple value, like 6. each time the program makes a new dump I want that to be incremented, so now it's 7. the problem lies with the 10th, 100th, 1000th, and so dump. the program will enter the 10 just fine, but remove the first letter of the next line:
"9\n580,2995,2083,028\n..."
"10\n80,2995,2083,028\n..."
obviously, the difference between 580 and 80 in this case is significant. I can't lose these values. so i need a way to add a little space in there so that I can add in this new data without losing my data or having to pull the entire file up and then rewrite it.
Basically what I'm looking for is a kind of prepend function. something to add data to the beginning of a file instead of the end.
Programmed in Python
~n
See the answers to this question:
How do I modify a text file in Python?
Summary: you can't do it without reading the file in (this is due to how the operating system works, rather than a Python limitation)
It's not addressing your original question, but here are some possible workarounds:
Use SQLite (it's bundled with your Python)
Use a fancier database, either RDBMS or NoSQL
Just track the number of dumps in a different text file
The first couple of options are a little more work up front, but provide more flexibility. The last option is the easiest solution to your current problem.
You could quite easily create an new file, output the data you wish to prepend to that file and then copy the content of the existing file and append it to the new one, then rename.
This would prevent having to read the whole file if that is the primary issue.