tl;dr in bold below
I'm currently developing a text-based adventure game, and I've implemented a basic saving system.
The process takes advantage of the 'pickle' module. It generates or appends to a file with a custom extension (when it is, in reality, a text file).
The engine pickles the player's location, their inventory, and, well, the last part is where it gets a little weird.
The game loads dialog from a specially formatted script (Here I mean as in an actor's script). Some dialog changes based on certain conditions (already spoken to them before, new event, etc.). So, for that third object the engine saves, it saves ALL dialog trees in their current positions. As in, it saves the literal script in its current state.
Here is the saving routine:
with open('save.devl','wb') as file:
pickle.dump((current_pos,player_inv,dia_dict),file)
for room in save_map:
pickle.dump(room,file)
file.close()
My problem is, this process makes a very ugly, very verbose, super large text file. Now I know that text files are basically the smallest files I can generate, but I was wondering if there was any way to compress or otherwise make more efficient the process of recording the state of everything in the game. Or, less preferably but better in the long run, just a smarter way to save the player's data.
The format of dialog was requested. Here is a sample:
[Isaac]
a: Hello.|1. I'm Evan.|b|
b: Nice to meet you.|1. Where are you going?\2.Goodbye.|c,closer|
c: My cousin's wedding.|1. Interesting. Where are you from?\2. What do you know about the ship?\3. Goodbye.|e,closer|
closer: See you later.||break|
e: It's the WPT Magnus. Cruise-class zeppelin. Been in service for about three years, I believe.||c|
standing: Hello, again.|1. What do you know about the ship?\2.Goodbye.|e,closer|
The name in brackets is how the program identifies which tree to call. Each letter is a separate branch in the tree. The bars separate the branch into three parts: 1. What the character says 2. The responses you are allowed 3. Where each response goes, or if the player doesn't respond, where the player is directed afterwards.
In this example, after the player has talked to Isaac, the 'a' branch is erased from the copy of the tree that the game stores in memory. It then permanently uses the 'standing' branch.
Pickle itself has other protocols that are all more compact than the default protocol (protocol 0) - which is the only one "text based" - the others are binary protocols.
But them, you hardly would get more than 50% of the file size - to be able to enhance the answer, we need to know better what you are saving, and if there are smarter ways to save your data - for example, by avoiding repeating the same sub-data structure if it is present in several of your rooms. (Although if you are using object identity inside your game, Pickle should take care of that).
That said, just change your pickle.dump calls to include the protocol parameter - the -1 value is equivalent to "HIGHEST_PROTOCOL", which is usually the most efficient:
pickle.dump(room,file, protocol=-1)
(loading the pickles do not require that the protocol is passed at all)
Aditionally, you might want to use Python's zlib interface to compress pickle data. That could give you another 20-30% file size reduction - you have to chain the calls to file.write, zlib.compress and pickle.dumps, so you will be easier with a little helper code - also you need to control file offsets, as zlib is not like pickle which advances the file pointer:
import pickle, zlib
def store_obj(file_, obj):
compressed = zlib.compress(pickle.dumps(obj, protocol=-1), level=9)
file_.write(len(compressed).to_bytes(4, "little"))
file_.write(compressed)
def get_obj(file_):
obj_size = int.from_bytes(file_.read(4), "little")
if obj_size == 0:
return None
data = zlib.decompress(self.file_.read(obj_size))
return pickle.loads(data)
Related
I read that Unit tests run fast. If they don’t run fast, they aren’t unit tests. A test is not a unit test if 1. It talks to a database. 2. It communicates across a network. 3. It touches the file system. 4. You have to do special things to your environment (such as editing configuration files) to run it. in Working Effectively with legacy code (book).
I have a function that is downloading the zip from the internet and then converting it into a python object for a particular class.
import typing as t
def get_book_objects(date: str) -> t.List[Book]:
# download the zip with the date from the endpoint
res = requests.get(f"HTTP-URL-{date}")
# code to read the response content in BytesIO and then use the ZipFile module
# to extract data.
# parse the data and return a list of Book object
return books
let's say I want to write a unit test for the function get_book_objects. Then how am I supposed to write a unit test without making a network request? I mean I prefer file system read-over a network request because it will be way faster than making a request to the network although it is written that a good unit test also not touches the file system I will be fine with that.
So even if I want to write a unit test where I can provide a local zip file I have to modify the existing function to open the file from the local file system or I have to add some additional parameter to the function so I can send a zip file path from unit test function.
What will you do to write a good unit test in this kind of situation?
What will you do to write a good unit test in this kind of situation?
In the TDD world, the usual answer would be to delegate the work to a more easily tested component.
Consider:
def get_book_objects(date: str) -> t.List[Book]:
# This is the piece that makes get_book_objects hard
# to isolate
http_get = requests.get
# download the zip with the date from the endpoint
res = http_get(f"HTTP-URL-{date}")
# code to read the response content in BytesIO and then use the ZipFile module
# to extract data.
# parse the data and return a list of Book object
return books
which might then become something like
def get_book_objects(date: str) -> t.List[Book]:
# This is the piece that makes get_book_objects hard
# to isolate
http_get = requests.get
return get_book_objects_v2(http_get, date)
def get_book_objects_v2(http_get, date: str) -> t.List[Book]
# download the zip with the date from the endpoint
res = http_get(f"HTTP-URL-{date}")
# code to read the response content in BytesIO and then use the ZipFile module
# to extract data.
# parse the data and return a list of Book object
return books
get_book_objects is still hard to test, but it is also "so simple that there are obviously no deficiencies". On the other hand, get_book_objects_v2 is easy to test, because your test can control what callable is passed to the subject, and can use any reasonable substitute you like.
What we've done is shift most of the complexity/risk into a "unit" that is easier to test. For the function that is still hard to test, we'll use other techniques.
When authors talk about tests "driving" the design, this is one example - we're treating "complicated code needs to be easy to test" as a constraint on our design.
You've already identified the correct reference (Working Effectively with Legacy Code). The material you want is the discussion of seams.
A seam is a place where you can alter behavior in your program without editing in that place.
(In my edition of the book, the discussion begins in Chapter 4).
I'm coding a text game in python 3.4 and when I though about making a save game came the question:
How can I jump to the place that the player stopped?
I'm making a simple game, me and my friends, so I just wanna jump to a certain part of the code, and I can't do that without having to make around 15 copies of the code, so can I jump to a line?
You can do that using something like python-goto but this is a very bad idea.
In python, you don't have really any reason to do a goto.
A way better way would be to save the structure containing your data with something like pickle and loading it back when the user want to restart the game.
For instance:
import pickle
game_data = {'something': [1, 2, 3 ]}
pickle.dump(game_data, open('file.bin', 'wb')
Then, you can load the data back:
import pickle
game_data = pickle.load(open('file.bin', 'rb'))
There is no goto built into Python. There are ways to effectively 'halt' in a method by using yield and creating a generator, which is effectively how Python coroutines work (see the asyncio module) however this isn't really appropriate for your needs.
For saving game state, saving and serialising the state you need to resume the gameplay in a more general way is a much better idea. You could use pickle For this serialisation.
You need to consider the game-state as something that you can assign a value (or values) to. If this is a very simple text game, then the player will have a location, and that location will presumably be something you can "jump" to via use of a reference.
Let's say your code follows this pseudo-code pattern:
start
player_location = 0
print_start_game_text()
begin loop:
display_text_for_location[player_location]
display_options_for_location[player_location]
player_location = parse_player_response(response_options_for_location[player_location])
if isGameEndCondition(player_location):
break;
print_end_game_text()
end
This pattern would reference some data files that, for each location provided some collection such as 1, "you are in a room, doors are [E]ast and [W]est. You can [S]ave your game, or [L]oad a previously saved one", { "E" : 3, "W" : 2, "S" : "savegame", "L" : "loadgame" }
Then using a function to display some options, collecting the users response and parsing that data, returning a single value; the next location. You then have a new key to reference the next element in the data-file.
IF your game is as simple as this, then your save file need only contain a single reference, the player's location! Simple.
If you have objects that the player can manipulate, then you'll need to figure out a way to keep track of those, their locations, or state-values - it all depends on what your game does, and how it's played.
You should be thinking along these program vs data lines though, as it will make the game much easier to design, and later, extend, since all you'd have to do to create a new adventure, or level, is provide a new datafile.
I'm seeking advice about methods of implementing object persistence in Python. To be more precise, I wish to be able to link a Python object to a file in such a way that any Python process that opens a representation of that file shares the same information, any process can change its object and the changes will propagate to the other processes, and even if all processes "storing" the object are closed, the file will remain and can be re-opened by another process.
I found three main candidates for this in my distribution of Python - anydbm, pickle, and shelve (dbm appeared to be perfect, but it is Unix-only, and I am on Windows). However, they all have flaws:
anydbm can only handle a dictionary of string values (I'm seeking to store a list of dictionaries, all of which have string keys and string values, though ideally I would seek a module with no type restrictions)
shelve requires that a file be re-opened before changes propagate - for instance, if two processes A and B load the same file (containing a shelved empty list), and A adds an item to the list and calls sync(), B will still see the list as being empty until it reloads the file.
pickle (the module I am currently using for my test implementation) has the same "reload requirement" as shelve, and also does not overwrite previous data - if process A dumps fifteen empty strings onto a file, and then the string 'hello', process B will have to load the file sixteen times in order to get the 'hello' string. I am currently dealing with this problem by preceding any write operation with repeated reads until end of file ("wiping the slate clean before writing on it"), and by making every read operation repeated until end of file, but I feel there must be a better way.
My ideal module would behave as follows (with "A>>>" representing code executed by process A, and "B>>>" code executed by process B):
A>>> import imaginary_perfect_module as mod
B>>> import imaginary_perfect_module as mod
A>>> d = mod.load('a_file')
B>>> d = mod.load('a_file')
A>>> d
{}
B>>> d
{}
A>>> d[1] = 'this string is one'
A>>> d['ones'] = 1 #anydbm would sulk here
A>>> d['ones'] = 11
A>>> d['a dict'] = {'this dictionary' : 'is arbitrary', 42 : 'the answer'}
B>>> d['ones'] #shelve would raise a KeyError here, unless A had called d.sync() and B had reloaded d
11 #pickle (with different syntax) would have returned 1 here, and then 11 on next call
(etc. for B)
I could achieve this behaviour by creating my own module that uses pickle, and editing the dump and load behaviour so that they use the repeated reads I mentioned above - but I find it hard to believe that this problem has never occurred to, and been fixed by, more talented programmers before. Moreover, these repeated reads seem inefficient to me (though I must admit that my knowledge of operation complexity is limited, and it's possible that these repeated reads are going on "behind the scenes" in otherwise apparently smoother modules like shelve). Therefore, I conclude that I must be missing some code module that would solve the problem for me. I'd be grateful if anyone could point me in the right direction, or give advice about implementation.
Use the ZODB (the Zope Object Database) instead. Backed with ZEO it fulfills your requirements:
Transparent persistence for Python objects
ZODB uses pickles underneath so anything that is pickle-able can be stored in a ZODB object store.
Full ACID-compatible transaction support (including savepoints)
This means changes from one process propagate to all the other processes when they are good and ready, and each process has a consistent view on the data throughout a transaction.
ZODB has been around for over a decade now, so you are right in surmising this problem has already been solved before. :-)
The ZODB let's you plug in storages; the most common format is the FileStorage, which stores everything in one Data.fs with an optional blob storage for large objects.
Some ZODB storages are wrappers around others to add functionality; DemoStorage for example keeps changes in memory to facilitate unit testing and demonstration setups (restart and you have clean slate again). BeforeStorage gives you a window in time, only returning data from transactions before a given point in time. The latter has been instrumental in recovering lost data for me.
ZEO is such a plugin that introduces a client-server architecture. Using ZEO lets you access a given storage from multiple processes at a time; you won't need this layer if all you need is multi-threaded access from one process only.
The same could be achieved with RelStorage, which stores ZODB data in a relational database such as PostgreSQL, MySQL or Oracle.
For beginners, You can port your shelve databases to ZODB databases like this:
#!/usr/bin/env python
import shelve
import ZODB, ZODB.FileStorage
import transaction
from optparse import OptionParser
import os
import sys
import re
reload(sys)
sys.setdefaultencoding("utf-8")
parser = OptionParser()
parser.add_option("-o", "--output", dest = "out_file", default = False, help ="original shelve database filename")
parser.add_option("-i", "--input", dest = "in_file", default = False, help ="new zodb database filename")
parser.set_defaults()
options, args = parser.parse_args()
if options.in_file == False or options.out_file == False :
print "Need input and output database filenames"
exit(1)
db = shelve.open(options.in_file, writeback=True)
zstorage = ZODB.FileStorage.FileStorage(options.out_file)
zdb = ZODB.DB(zstorage)
zconnection = zdb.open()
newdb = zconnection.root()
for key, value in db.iteritems() :
print "Copying key: " + str(key)
newdb[key] = value
transaction.commit()
I suggest using TinyDB, it's much much better and simple to use.
https://tinydb.readthedocs.io/en/stable/
I have a game I've been working on for awhile. The core is C++, but I'm using Python for scripting and for Attacks/StatusEffects/Items etc.
I've kind of coded myself into a corner where I'm having to use eval to get the behaviout I want. Here's how it's arising:
I have an xml document that I use to spec attacks, like so:
<Attack name="Megiddo Flare" mp="144" accuracy="1.5" targetting="EnemyParty">
<Flags>
<IgnoreElements/>
<Unreflectable/>
<ConstantDamage/>
<LockedTargetting/>
</Flags>
<Components>
<ElementalWeightComponent>
<Element name="Fire" weight="0.5"/>
</ElementalWeightComponent>
<ConstantDamageCalculatorComponent damage="9995" index="DamageCalculatorComponent"/>
</Components>
</Attack>
I parse this file in python, and build my Attacks. Each Attack consist of any number of Components to implement different behaviour. In this Attack's case, I implement a DamageCalculatorComponent, telling python to use the ConstantDamage variant. I implement all these components in my script files. This is all well and good for component types I'm going to use often. There are some attacks where that attack will be the only attack to use that particular Component Variant. Rather then adding the component to my script files, I wanted to be able to specify the Component class in the xml file.
For instance, If I were to implement the classic White Wind attack from Final Fantasy (restores team HP by the amount of HP of the attacker)
<Attack name="White Wind" mp="41" targetting="AnyParty">
<Flags>
<LockedTargetting/>
</Flags>
<Components>
<CustomComponent index="DamageCalculatorComponent">
<![CDATA[
class WhiteWindDamageComponent(DamageCalculatorComponent):
def __init__(self, Owner):
DamageCalculatorComponent.__init__(self, Owner)
def CalculateDamage(self, Action, Mechanics):
Dmg = 0
character = Action.GetUsers().GetFirst()
SM = character.GetComponent("StatManagerComponent")
if (SM != None):
Dmg = -SM.GetCurrentHP()
return Dmg
return WhiteWindDamageComponent(Owner)
]]>
</CustomComponent>
</Components>
</Attack>
I was wondering if there might be a better way to do this? The only other way I can see is too put every possible Component variant definition into my python files, and expand my Component creators to check for the additional variant. Seems abit wasteful for a single use Component. Is there a better/safer alternative to generating types dynamically, or perhaps another solution I'm not seeing?
Thanks in advance
Inline Python is bad because
Your source code files cannot be understood by Python source code editors and you will miss syntax highlighting
All other tools, like pylint, which can be used to lint and validate source code will fail also
Alternative
In element <CustomComponent index="DamageCalculatorComponent">
... add parameter script:
<CustomComponent index="DamageCalculatorComponent" script="damager.py">
Then add file damager.py somewhere along the file system.
Load it as described here: What is an alternative to execfile in Python 3?
In your main game engine code construct the class out of loaded system module like:
damager_class = sys.modules["mymodulename"].my_factory_function()
All Python modules must share some kind of agreed class names / entry point functions.
If you want to have really pluggable architecture, use Python eggs and setup.py entry points
http://wiki.pylonshq.com/display/pylonscookbook/Using+Entry+Points+to+Write+Plugins
Example
https://github.com/miohtama/vvv/blob/master/setup.py
I read many files from my system. I want to read them faster, maybe like this:
results=[]
for file in open("filenames.txt").readlines():
results.append(open(file,"r").read())
I don't want to use threading. Any advice is appreciated.
the reason why i don't want to use threads is because it will make my code unreadable,i want to find so tricky way to make speed faster and code lesser,unstander easier
yesterday i have test another solution with multi-processing,it works bad,i don't know why,
here is the code as follows:
def xml2db(file):
s=pq(open(file,"r").read())
dict={}
for field in g_fields:
dict[field]=s("field[#name='%s']"%field).text()
p=Product()
for k,v in dict.iteritems():
if v is None or v.strip()=="":
pass
else:
if hasattr(p,k):
setattr(p,k,v)
session.commit()
#cost_time
#statistics_db
def batch_xml2db():
from multiprocessing import Pool,Queue
p=Pool(5)
#q=Queue()
files=glob.glob(g_filter)
#for file in files:
# q.put(file)
def P():
while q.qsize()<>0:
xml2db(q.get())
p.map(xml2db,files)
p.join()
results = [open(f.strip()).read() for f in open("filenames.txt").readlines()]
This may be insignificantly faster, but it's probably less readable (depending on the reader's familiarity with list comprehensions).
Your main problem here is that your bottleneck is disk IO - buying a faster disk will make much more of a difference than modifying your code.
Well, if you want to improve performance then improve the algorithm, right? What are you doing with all this data? Do you really need it all in memory at the same time, potentially causing OOM if filenames.txt specifies too many or too large of files?
If you're doing this with lots of files I suspect you are thrashing, hence your 700s+ (1 hour+) time. Even my poor little HD can sustain 42 MB/s writes (42 * 714s = 30GB). Take that grain of salt knowing you must read and write, but I'm guessing you don't have over 8 GB of RAM available for this application. A related SO question/answer suggested you use mmap, and the answer above that suggested an iterative/lazy read like what you get in Haskell for free. These are probably worth considering if you really do have tens of gigabytes to munge.
Is this a one-off requirement or something that you need to do regularly? If it's something you're going to be doing often, consider using MySQL or another database instead of a file system.
Not sure if this is still the code you are using.
A couple adjustments I would consider making.
Original:
def xml2db(file):
s=pq(open(file,"r").read())
dict={}
for field in g_fields:
dict[field]=s("field[#name='%s']"%field).text()
p=Product()
for k,v in dict.iteritems():
if v is None or v.strip()=="":
pass
else:
if hasattr(p,k):
setattr(p,k,v)
session.commit()
Updated:
remove the use of the dict, it is extra object creation, iteration and collection.
def xml2db(file):
s=pq(open(file,"r").read())
p=Product()
for k in g_fields:
v=s("field[#name='%s']"%field).text()
if v is None or v.strip()=="":
pass
else:
if hasattr(p,k):
setattr(p,k,v)
session.commit()
You could profile the code using the python profiler.
This might tell you where the time being spent is.
It may be in session.Commit() this may need to be reduced to every couple of files.
I have no idea what it does so that is really a stab in the dark, you may try and run it without sending or writing any output.
If you can separate your code into Reading, Processing and Writing.
A) You can see how long it takes to read all the files.
Then by loading a single file into memory process it enough time to represent the entire job without extra reading IO.
B) Processing cost
Then save a whole bunch of sessions representative of your job size.
C) Output cost
Test the cost of each stage individually. This should show you what is taking the most time and if any improvement can be made in any area.