writing files, reading them and call them with different moduls

writing files, reading them and call them with different moduls - python

I am trying to save variables after I run a program to a text file and read them in different module, to call them back in the original program. Point of that is to write plots with 4 different outcome of the main program.
attempt at coding
#main program
a = array([[0.05562032, 0.05386903, 0.05216994, 0.03045489, 0.03029977,
0.03014554],
[0. , 0.00175129, 0.00345037, 0.15353227, 0.1536874 ,
0.15384163]])
#save paramaters in external file
save_paramaters = open('save.txt','w')
save_paramaters.write(str(a))
save_paramaters.close()
I open the txt file in python module and save it as a variable, which I corrected manually(replacing spaces with commas)
#new program
dat = "save.txt"
b = open(dat, "r")
c = array(b.read())
In the main program I now call the variable with
a = array([[0.05562032, 0.05386903, 0.05216994, 0.03045489, 0.03029977,
0.03014554],
[0. , 0.00175129, 0.00345037, 0.15353227, 0.1536874 ,
0.15384163])
#save paramaters in external file
save_paramaters = open('save.txt','w')
save_paramaters.write(str(a))
save_paramaters.close()
#open the variable
from program import c
from matplotlib.pyplot import figure, plot
#and try to plot it
plot(c[1][:], label ='results2')
plot(c[0][:], label ='results1')
File "/Example.py", line 606, in example
plot(c[1][:], label ='results2') #model
IndexError: too many indices for array

If you want to save an array you can't just save it as text and expect python to figure it out. When you read it, you're reading it as text (as a string) and that's all your program can know.
If you want to save complex objects you have several other options:
You can save text (as you do) but parse it manually when reading it to turn it into an array. This is complex to write without bugs and will get even more complex if you have anything even more complex than an array.
You can save it using pickle - while this is a good solution for almost all objects, the file created wouldn't be readable to humans, and that's perhaps not what you want.
A good middle ground is to save objects as JSON - this is a standard for most datatypes and would work beautifully for dicts and lists and tuples (but will fail with more complex objects), and more importantly, it will be readable to humans such as yourself.
Let's say you go with JSON. You save a list like this:
import json
with open('save.txt','w') as f:
json.dump(your_object, f)
As simple as that. To read back the list:
import json
with open('save.txt','r') as f:
your_new_object = json.load(f)
This is fairly simple isn't it? Notice I used a with statement to open the files to make sure they close properly as well, but that's also more simple to write. Using pickles is fairly similar and even has the same syntax, but objects are saved as bytes and not text (so you have to use 'rb' and 'wb' modes on files to read and write, respectively).
To do the same thing with numpy array, we can also use numpy.save:
np.save('save', your_numpy_array)
And we read it back (with a npy extension) with numpy.load:
your_array = np.load('save.npy')
In readability terms, opening the file would be semi-readable (less than JSON, more than pickle)

Related

Is there any feasible solution to read WOT battle results .dat files?

I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.

Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.

First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.

After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.

Get different strings from a file and write a .txt

I'am trying to get lines from a text file (.log) into a .txt document.
I need get into my .txt file the same data. But the line itself is sometimes different. From what I have seen on internet, it's usualy done with a pattern that will anticipate how the line is made.
1525:22Player 11 spawned with userinfo: \team\b\forcepowers\0-5-030310001013001131\ip\46.98.134.211:24806\rate\25000\snaps\40\cg_predictItems\1\char_color_blue\34\char_color_green\34\char_color_red\34\color1\65507\color2\14942463\color3\2949375\color4\2949375\handicap\100\jp\0\model\desann/default\name\Faybell\pbindicator\1\saber1\saber_malgus_broken\saber2\none\sex\male\ja_guid\420D990471FC7EB6B3EEA94045F739B7\teamoverlay\1
The line i'm working with usualy looks like this. The data i'am trying to collect are :
\ip\0.0.0.0
\name\NickName_of_the_player
\ja_guid\420D990471FC7EB6B3EEA94045F739B7
And print these data, inside a .txt file. Here is my current code.
As explained above, i'am unsure about what keyword to use for my research on google. And how this could be called (Because the string isn't the same?)
I have been looking around alot, and most of the test I have done, have allowed me to do some things, but i'am not yet able to do as explained above. So i'am in hope for guidance here :) (Sorry if i'am noobish, I understand alot how it works, I just didn't learned language in school, I mostly do small scripts, and usualy they work fine, this time it's way harder)
def readLog(filename):
with open(filename,'r') as eventLog:
data = eventLog.read()
dataList = data.splitlines()
return dataList
eventLog = readLog('games.log')

You'll need to read the files in "raw" mode rather than as strings. When reading the file from disk, use open(filename,'rb'). To use your example, I ran
text_input = r"1525:22Player 11 spawned with userinfo: \team\b\forcepowers\0-5-030310001013001131\ip\46.98.134.211:24806\rate\25000\snaps\40\cg_predictItems\1\char_color_blue\34\char_color_green\34\char_color_red\34\color1\65507\color2\14942463\color3\2949375\color4\2949375\handicap\100\jp\0\model\desann/default\name\Faybell\pbindicator\1\saber1\saber_malgus_broken\saber2\none\sex\male\ja_guid\420D990471FC7EB6B3EEA94045F739B7\teamoverlay\1"
text_as_array = text_input.split('\\')
You'll need to know which columns contain the strings you care about. For example,
with open('output.dat','w') as fil:
fil.write(text_as_array[6])
You can figure these array positions from the sample string
>>> text_as_array[6]
'46.98.134.211:24806'
>>> text_as_array[34]
'Faybell'
>>> text_as_array[44]
'420D990471FC7EB6B3EEA94045F739B7'
If the column positions are not consistent but the key-value pairs are always adjacent, we can leverage that
>>> text_as_array.index("ip")
5
>>> text_as_array[text_as_array.index("ip")+1]
'46.98.134.211:24806'

Increase speed numpy.loadtxt?

I have hundred of thousands of data text files to read. As of now, I'm importing the data from text files every time I run the code. Perhaps the easy solution would be to simply reformat the data into a file faster to read.
Anyway, right now every text files I have look like:
User: unknown
Title : OE1_CHANNEL1_20181204_103805_01
Sample data
Wavelength OE1_CHANNEL1
185.000000 27.291955
186.000000 27.000877
187.000000 25.792290
188.000000 25.205620
189.000000 24.711882
.
.
.
The code where I read and import the txt files is:
# IMPORT DATA
path = 'T2'
if len(sys.argv) == 2:
path = sys.argv[1]
files = os.listdir(path)
trans_import = []
for index, item in enumerate(files):
trans_import.append(np.loadtxt(path+'/'+files[1], dtype=float, skiprows=4, usecols=(0,1)))
The resulting array looks in the variable explorer as:
{ndarray} = [[185. 27.291955]\n [186. 27.000877]\n ... ]
I'm wondering, how I could speed up this part? It takes a little too long as of now just to import ~4k text files. There are 841 lines inside every text files (spectrum). The output I get with this code is 841 * 2 = 1682. Obviously, it considers the \n as a line...

It would probably be much faster if you had one large file instead of many small ones. This is generally more efficient. Additionally, you might get a speedup from just saving the numpy array directly and loading that .npy file in instead of reading in a large text file. I'm not as sure about the last part though. As always when time is a concern, I would try both of these options and then measure the performance improvement.
If for some reason you really can't just have one large text file / .npy file, you could also probably get a speedup by using, e.g., multiprocessing to have multiple workers reading in the files at the same time. Then you can just concatenate the matrices together at the end.
Not your primary question but since it seems to be an issue - you can rewrite the text files to not have those extra newlines, but I don't think np.loadtxt can ignore them. If you're open to using pandas, though, pandas.read_csv with skip_blank_lines=True should handle that for you. To get a numpy.ndarray from a pandas.DataFrame, just do dataframe.values.

Let use pandas.read_csv (with C speed) instead of numpy.loadtxt. This is a very helpful post:
http://akuederle.com/stop-using-numpy-loadtxt

How to read file to import matrix?

I need to use some matrices in Python programs, like
Q = np.matrix([[1,0,1,1,0],
[0,2,0,1,1],
[1,0,2,0,1],
[1,1,0,1,0],
[0,1,1,0,1]])
and I want to import the matrix (use numpy) from a file, so what should I do to realize it? what code should I write and what file should I use (.txt?). I am quite new to python, anyone can help me? Thank you in advance.

I'm assuming that you're not only importing the matrices, but also exporting them to files in the first place.
If that's true, there are multiple easy options, with different tradeoffs.
np.save saves the array in a binary format that's only usable by NumPy. But it's very fast, and generates reasonably small files.
np.save('matrix.npy', Q)
Q = np.load('matrix.npy')
np.savetxt saves the array in a text file, using a dialect of CSV (with whitespace separators, by default). It's slower, and generates bigger files, but if you want to be able to read or edit the files (or send them through an ASCII-only channel, like email without attachments), it's the best option.
np.savetxt('matrix.txt', Q)
Q = np.loadtxt('matrix.txt')
np.savetxt can also save the array in a compressed text file. This gives you small files, but they're slower to save and load. They're not directly human-readable, but it's very easy to un-gzip a file, and then you've got a text file you can read and edit. So, sometimes this is worth doing.
np.savetxt('matrix.txt.gz', Q)
Q = np.loadtxt('matrix.txt.gz')
Finally, you can just use standard Python saving and loading mechanisms, like pickle:
with open('matrix.pickle', 'wb') as f:
pickle.dump(Q, f)
with open('matrix.pickle', 'rb') as f:
Q = pickle.load(f)
This is really only useful if you need to store NumPy arrays together with non-NumPy objects.
If you have to save multiple matrices, instead of saving one per file, you might want to look at savez and savez_compressed. Or, if you need multiple objects, only some of which are NumPy, pickle may be the best option.

get specific content from file python

I have a file test.txt which has an array:
array = [3,5,6,7,9,6,4,3,2,1,3,4,5,6,7,8,5,3,3,44,5,6,6,7]
Now what I want to do is get the content of array and perform some calculations with the array. But the problem is when I do open("test.txt") it outputs the content as the string. Actually the array is very big, and if I do a loop it might not be efficient. Is there any way to get the content without splitting , ? Any new ideas?

I recommend that you save the file as json instead, and read it in with the json module. Either that, or make it a .py file, and import it as python. A .txt file that looks like a python assignment is kind of odd.

Does your text file need to look like python syntax? A list of comma separated values would be the usual way to provide data:
1,2,3,4,5
Then you could read/write with the csv module or the numpy functions mentioned above. There's a lot of documentation about how to read csv data in efficiently. Once you had your csv reader data object set up, data could be stored with something like:
data = [ map( float, row) for row in csvreader]

If you want to store a python-like expression in a file, store only the expression (i.e. without array =) and parse it using ast.literal_eval().
However, consider using a different format such as JSON. Depending on the calculations you might also want to consider using a format where you do not need to load all data into memory at once.

Must the array be saved as a string? Could you use a pickle file and save it as a Python list?
If not, could you try lazy evaluation? Maybe only process sections of the array as needed.
Possibly, if there are calculations on the entire array that you must always do, it might be a good idea to pre-compute those results and store them in the txt file either in addition to the list or instead of the list.

You could also use numpy to load the data from the file using numpy.genfromtxt or numpy.loadtxt. Both are pretty fast and both have the ability to do the recasting on load. If the array is already loaded though, you can use numpy to convert it to an array of floats, and that is really fast.
import numpy as np
a = np.array(["1", "2", "3", "4"])
a = a.astype(np.float)

You could write a parser. They are very straightforward. And much much faster than regular expressions, please don't do that. Not that anyone suggested it.
# open up the file (r = read-only, b = binary)
stream = open("file_full_of_numbers.txt", "rb")
prefix = '' # end of the last chunk
full_number_list = []
# get a chunk of the file at a time
while True:
# just a small 1k chunk
buffer = stream.read(1024)
# no more data is left in the file
if '' == buffer:
break
# delemit this chunk of data by a comma
split_result = buffer.split(",")
# append the end of the last chunk to the first number
split_result[0] = prefix + split_result[0]
# save the end of the buffer (a partial number perhaps) for the next loop
prefix = split_result[-1]
# only work with full results, so skip the last one
numbers = split_result[0:-1]
# do something with the numbers we got (like save it into a full list)
full_number_list += numbers
# now full_number_list contains all the numbers in text format
You'll also have to add some logic to use the prefix when the buffer is blank. But I'll leave that code up to you.

OK, so the following methods ARE dangerous. Since they are used to attack systems by injecting code into them, used them at your own risk.
array = eval(open("test.txt", 'r').read().strip('array = '))
execfile('test.txt') # this is the fastest but most dangerous.
Safer methods.
import ast
array = ast.literal_eval(open("test.txt", 'r').read().strip('array = ')).
...
array = [float(value) for value in open('test.txt', 'r').read().strip('array = [').strip('\n]').split(',')]
The eassiest way to serialize python objects so you can load them later is to use pickle. Assuming you dont want a human readable format since this adds major head, either-wise, csv is fast and json is flexible.
import pickle
import random
array = random.sample(range(10**3), 20)
pickle.dump(array, open('test.obj', 'wb'))
loaded_array = pickle.load(open('test.obj', 'rb'))
assert array == loaded_array
pickle does have some overhead and if you need to serialize large objects you can specify the compression ratio, the default is 0 no compression, you can set it to pickle.HIGHEST_PROTOCOL pickle.dump(array, open('test.obj', 'wb'), pickle.HIGHEST_PROTOCOL)
If you are working with large numerical or scientific data sets then use numpy.tofile/numpy.fromfile or scipy.io.savemat/scipy.io.loadmat they have little overhead, but again only if you are already using numpy/scipy.
good luck.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

writing files, reading them and call them with different moduls - python

Related

Is there any feasible solution to read WOT battle results .dat files?

Get different strings from a file and write a .txt

Increase speed numpy.loadtxt?

How to read file to import matrix?

get specific content from file python

Categories

Resources