I'm making a script which fetches movies and shows from different services. I need a functionality where if a movie is available on a streaming platform (e.g. Paramount) in both 4k and HD, then I want to only show 4K result.
However, If the title is only available for purchase, then I want to exclude that from the results.
resp = {
# Fetches JSON response as dict from server
# which contains offers as a list of dictionaries
"offers": [
{
"monetization_type": "flatrate",
"package_short_name": "pmp",
"presentation_type": "4k",
},
{
"monetization_type": "flatrate",
"package_short_name": "pmp",
"presentation_type": "hd",
},
{
"monetization_type": "flatrate",
"package_short_name": "fxn",
"presentation_type": "hd",
},
{
"monetization_type": "buy",
"package_short_name": "itu",
"presentation_type": "4k",
},
]
}
def get_Stream_info(obj, results=[]):
try:
if obj["offers"]:
count = 0
for i in range(len(obj["offers"])):
srv = obj["offers"][i]["monetization_type"]
qty = obj["offers"][i]["presentation_type"]
pkg = obj["offers"][i]["package_short_name"]
if srv == "flatrate" and qty in ["4k", "hd"]:
results.append(f"Stream [{i+1}]: US - {pkg} - {qty}")
count = 1
else:
errstr = f"No streaming options available."
if count == 0:
results.append(errstr)
except KeyError:
results.append(f"Not available.")
return "\n".join(results)
if __name__ == "__main__":
print(get_Stream_info(resp))
Result:
Stream [1]: US - pmp - 4k # ParamountPlus
Stream [2]: US - pmp - hd # ParamountPlus
Stream [3]: US - fxn - hd # FoxNow
4K and HD are available on ParamountPlus but I only want to print 4K.
Finally HD on all others where 4K isn't available.
What if you create a dictionary that rates the qualities? Could be good if you later have streams that are SD or other formats. That way you are always showing only the best quality stream from each service with minimal code:
qty_ratings = {
'4k': 1,
'hd': 2,
'sd': 3
}
Append the highest quality stream:
if monetize == 'flatrate':
# Check if there are more than one stream from the same service
if len(qty) > 1:
qty = min([qty_ratings[x] for x in qty])
result.append(f"Stream {[i]}: US - {service} - {qty}")
return "\n".join(result)
Personally I would explicitly sort the streams, not because it's more efficient, but because it makes it clear what I'm doing.
Assuming your offers are defined as in the question:
ranking = ("sd", "hd", "4k")
def get_best_stream(streams):
return max(streams, key=lambda x: ranking.index(x["presentation_type"]))
get_best_stream(offers)
As #Gustaf notes, this is forward compatible. I've used a tuple rather than a dict since we really only care about order (perhaps even more explicit would be an enum).
If you want to keep one offer from every source, I would encode this explicitly:
def get_streams(streams):
sources = set(x["package_short_name"] for x in streams)
return [
get_best_stream(s for s in streams if s["package_short_name"] == source)
for source in sources
]
get_streams(offers)
Of course if you have billions of streams it would be more efficient to build a mapping between source and offers, but for a few dozen the cost of an iterator is trivial.
Related
I am using python 3.7 with SimPy 4. I have 4 Resources (say "First Level") with a capacity of 5 and each Resource has an associated Resource (say "Second Level") with a capacity of 1 (So, 4 "First Level" Resources and 4 "Second Level" Resources in total). When an agent arrives, it requests a Resource from any Resource of the "First Level", when it gets access to it then it requests the associated Resource of the "Second Level".
I am using AnyOf to choose any of the "First Level" Resources. It works but I need to know which Resource is chosen by which agent. How can I do that?
Here is a representation of what I am doing so far:
from simpy.events import AnyOf, Event
num_FL_Resources = 4
capacity_FL_Resources = 5
FL_Resources = [simpy.Resource(env, capacity = capacity_FL_Resources ) for i in range(num_FL_Resources)]
events = [FirstLevelResource.request() for FirstLevelResource in FL_Resources]
yield Anyof(env, events)
Note 1: I didn't use Store or FilterStore in the "First Level" and randomly put the agent to one of the available Store because the agents are keep coming and all of the Stores might be in use. They need to queue up. Also, please let me know if there is a good way of using Store here.
Note 2: Resource.users gives me <Request() object at 0x...> so it isn't helpful.
Note 3:: I am using a nested dictionary for "First Level" and "Second Level" Resources like below. However, for convenience I didn't add my longer code here.
{'Resource1': {'FirstLevel1': <simpy.resources.resource.Resource at 0x121f45690>,
'SecondLevel1': <simpy.resources.resource.Resource at 0x121f45710>},
'Resource2': {'FirstLevel2': <simpy.resources.resource.Resource at 0x121f457d0>,
'SecondLevel2': <simpy.resources.resource.Resource at 0x121f458d0>},
'Resource3': {'FirstLevel3': <simpy.resources.resource.Resource at 0x121f459d0>,
'SecondLevel3': <simpy.resources.resource.Resource at 0x121f45a90>},
'Resource4': {'FirstLevel4': <simpy.resources.resource.Resource at 0x121f47750>,
'SecondLevel4': <simpy.resources.resource.Resource at 0x121f476d0>}}
So I did it with a store. In the store I have groups of first level objects that have a common second level resource. here is the code
"""
example of a two stage resource grab using a store and resouces
A agent will queue up to get a first level resource object
and then use this object to get a second level rescource
However groups of the frist level resouce have one common second level resource
so there will also be a queue for the second level resource.
programer: Michael R. Gibbs
"""
import simpy
import random
class FirstLevel():
"""
A frist level object, a group of these objects will make a type of resource
each object in the group will have the same second level resource
"""
def __init__(self, env, groupId, secondLevel):
self.env = env
self.groupId = groupId
self.secondLevel = secondLevel
def agent(env, agentId, firstLevelStore):
"""
sims a agent/entity that will first grab a first level resource
then a second level resource
"""
print(f'agent {agentId} requesting from store with {len(firstLevelStore.items)} and queue {len(firstLevelStore.get_queue)}')
# queue and get first level resouce
firstLevel = yield firstLevelStore.get()
print(f"agent {agentId} got first level resource {firstLevel.groupId} at {env.now}")
# use the first level resource to queue and get the second level resource
with firstLevel.secondLevel.request() as req:
yield req
print(f"agent {agentId} got second level resource {firstLevel.groupId} at {env.now}")
yield env.timeout(random.randrange(3, 10))
print(f"agent {agentId} done second level resource {firstLevel.groupId} at {env.now}")
# put the first level resource back into the store
yield firstLevelStore.put(firstLevel)
print(f"agent {agentId} done first level resource {firstLevel.groupId} at {env.now}")
def agentGen(env, firstLevelStore):
"""
creates a sequence of agents
"""
id = 1
while True:
yield env.timeout(random.randrange(1, 2))
print(f"agent {id} arrives {env.now}")
env.process(agent(env,id, firstLevelStore))
id += 1
if __name__ == '__main__':
print("start")
num_FL_Resources = 4 # number of first level groups/pools
capacity_FL_Resources = 5 # number of first level in each group/pool
env = simpy.Environment()
# store of all first level, all mixed togethers
store = simpy.Store(env, capacity=(num_FL_Resources * capacity_FL_Resources))
for groupId in range(num_FL_Resources):
# create the second level resource for each group os first level resources
secondLevel = simpy.Resource(env,1)
for cap in range(capacity_FL_Resources):
# create the individual first level objects for the group
firstLevel = FirstLevel(env,groupId,secondLevel)
store.items.append(firstLevel)
env.process(agentGen(env, store))
env.run(200)
print("done")
I have a JSON file with the following exemplified format,
{
"Table1": {
"Records": [
{
"Key1Tab1": "SomeVal",
"Key2Tab1": "AnotherVal"
},
{
"Key1Tab1": "SomeVal2",
"Key2Tab1": "AnotherVal2"
}
]
},
"Table2": {
"Records": [
{
"Key1Tab1": "SomeVal",
"Key2Tab1": "AnotherVal"
},
{
"Key1Tab1": "SomeVal2",
"Key2Tab1": "AnotherVal2"
}
]
}
}
The root keys are table names from an SQL database and its corresponding value is the rows.
I want to split the JSON file into seperate parquet files each representing a table.
Ie. Table1.parquet and Table2.parquet.
The big issue is the size of the file preventing me from loading it into memory.
Hence, I tried to use dask.bag to accommodate for the nested structure of the file.
import dask.bag as db
from dask.distributed import Client
client = Client(n_workers=4)
lines = db.read_text("filename.json")
But assessing the output with lines.take(4) shows that dask can't read the new lines correct.
('{\n', ' "Table1": {\n', ' "Records": [\n', ' {\n')
I've tried to search for solutions to the specific problem but without luck.
Is there any chance that the splitting can be solved with dask or is there other tools that could do the job?
As suggested here try the dask.dataframe.read_json() method
This may be sufficient, though I am unsure how it will behave if you don't have enough memory to store the entire resulting dataframe in-memory..
import dask.dataframe as dd
from dask.distributed import Client
client = Client()
df = dd.read_json("filename.json")
df.to_parquet("filename.parquet", engine='pyarrow')
docs
https://distributed.dask.org/en/latest/manage-computation.html#dask-collections-to-futures
https://examples.dask.org/dataframes/01-data-access.html#Write-to-Parquet
If Dask doesn't process the file in chunks when on a single system (it may not happily do so as JSON is distinctly unfriendly to parse in such a way .. though I unfortunately don't have access to my test system to verify this) and the system memory is unable to handle the giant file, you may be able to extend the system memory with disk space by creating a big swapfile.
Note that this will create a ~300G file (increase count field for more) and be may be incredibly slow compared to memory (but perhaps still fast enough for your needs, especially if it's a 1-off).
# create and configure swapfile
dd if=/dev/zero of=swapfile.img bs=10M count=30000 status=progress
chmod 600 swapfile.img
mkswap swapfile.img
swapon swapfile.img
#
# run memory-greedy task
# ...
# ensure processes have exited
#
# disable and remove swapfile to reclaim disk space
swapoff swapfile.img # may hang for a long time
rm swapfile.img
The problem is, that dask will split the file on newline characters by default, and you can't guarantee that this will not be in the middle of one of your tables. Indeed, even if you get it right, you still need to manipulate the resultant text to make complete JSON objects for each partition.
For example:
def myfunc(x):
x = "".join(x)
if not x.endswith("}"):
x = x[:-2] + "}"
if not x.startswith("{"):
x = "{" + x
return [json.loads(x)]
db.read_text('temp.json',
linedelimiter="\n },\n",
blocksize=100).map_partitions(myfunc)
In this case, I have purposefully made the blocksize smaller than each part to demonstrate: you will get a JSON object or nothing for each partition.
_.compute()
[{'Table1': {'Records': [{'Key1Tab1': 'SomeVal', 'Key2Tab1': 'AnotherVal'},
{'Key1Tab1': 'SomeVal2', 'Key2Tab1': 'AnotherVal2'}]}},
{},
{'Table2': {'Records': [{'Key1Tab1': 'SomeVal', 'Key2Tab1': 'AnotherVal'},
{'Key1Tab1': 'SomeVal2', 'Key2Tab1': 'AnotherVal2'}]}},
{},
{},
{}]
Of course, in your case you can immediately do something with the JSON rather than return it, or you can map to your writing function next in the chain.
When working with large files the key to success is processing the data as a stream, i.e. in filter-like programs.
The JSON format is easy to parse. The following program reads the input char by char (I/O should be bufferred) and cuts the top-level JSON object to separate objects. It properly follows the data structure and not the formatting.
The demo program just prints "--NEXT OUTPUT FILE--", where real output file switch should be implemented. Whitespace stripping is implemented as a bonus.
import collections
OBJ = 'object'
LST = 'list'
def out(ch):
print(ch, end='')
with open('json.data') as f:
stack = collections.deque(); push = stack.append; pop = stack.pop
esc = string = False
while (ch := f.read(1)):
if esc:
esc = False
elif ch == '\\':
esc = True
elif ch == '"':
string = not string
if not string:
if ch in {' ', '\t', '\r', '\n'}:
continue
if ch == ',':
if len(stack) == 1 and stack[0] == OBJ:
out('}\n')
print("--- NEXT OUTPUT FILE ---")
out('{')
continue
elif ch == '{':
push(OBJ)
elif ch == '}':
if pop() is not OBJ:
raise ValueError("unmatched { }")
elif ch == '[':
push(LST)
elif ch == ']':
if pop() is not LST:
raise ValueError("unmatched [ ]")
out(ch)
Here is a sample output for my testfile:
{"key1":{"name":"John","surname":"Doe"}}
--- NEXT OUTPUT FILE ---
{"key2":"string \" ] }"}
--- NEXT OUTPUT FILE ---
{"key3":13}
--- NEXT OUTPUT FILE ---
{"key4":{"sub1":[null,{"l3":true},null]}}
The original file was:
{
"key1": {
"name": "John",
"surname": "Doe"
},
"key2": "string \" ] }", "key3": 13,
"key4": {
"sub1": [null, {"l3": true}, null]
}
}
So I'm looking for a way to speed up the output of the following code, calling google's natural language API:
tweets = json.load(input)
client = language.LanguageServiceClient()
sentiment_tweets = []
iterations = 1000
start = timeit.default_timer()
for i, text in enumerate(d['text'] for d in tweets):
document = types.Document(
content=text,
type=enums.Document.Type.PLAIN_TEXT)
sentiment = client.analyze_sentiment(document=document).document_sentiment
results = {'text': text, 'sentiment':sentiment.score, 'magnitude':sentiment.magnitude}
sentiment_tweets.append(results)
if (i % iterations) == 0:
print(i, " tweets processed")
sentiment_tweets_json = [json.dumps(sentiments) for sentiments in sentiment_tweets]
stop = timeit.default_timer()
The issue is the tweets list is around 100k entries, iterating and making calls one by one does not produce an output on a feasible timescale. I'm exploring potentially using asyncio for parallel calls, although as I'm still a beginner with Python and unfamiliar with the package, I'm not sure if you can make a function a coroutine with itself such that each instance of the function iterates through the list as expected, progressing sequentially. There is also the question of managing the total number of calls made by the app to be within the defined quota limits of the API. Just wanted to know if I was going in the right direction.
I use this method for concurrent calls:
from concurrent import futures as cf
def execute_all(mfs: list, max_workers: int = None):
"""Excecute concurrently and mfs list.
Parameters
----------
mfs : list
[mfs1, mfs2,...]
mfsN = {
tag: str,
fn: function,
kwargs: dict
}
.
max_workers : int
Description of parameter `max_workers`.
Returns
-------
dict
{status, result, error}
status = {tag1, tag2,..}
result = {tag1, tag2,..}
error = {tag1, tag2,..}
"""
result = {
'status': {},
'result': {},
'error': {}
}
max_workers = len(mfs)
with cf.ThreadPoolExecutor(max_workers=max_workers) as exec:
my_futures = {
exec.submit(x['fn'], **x['kwargs']): x['tag'] for x in mfs
}
for future in cf.as_completed(my_futures):
tag = my_futures[future]
try:
result['result'][tag] = future.result()
result['status'][tag] = 0
except Exception as err:
result['error'][tag] = err
result['result'][tag] = None
result['status'][tag] = 1
return result
Where each result returns indexed by a given tag (if matters to you identify the which call return which result) when:
mfs = [
{
'tag': 'tweet1',
'fn': process_tweet,
'kwargs': {
'tweet': tweet1
}
},
{
'tag': 'tweet2',
'fn': process_tweet,
'kwargs': {
'tweet': tweet2
}
},
]
results = execute_all(mfs, 2)
While async is one way you could go, another that might be easier is using the multiprocessing python functionalities.
from multiprocessing import Pool
def process_tweet(tweet):
pass # Fill in the blanks here
# Use five processes at once
with Pool(5) as p:
processes_tweets = p.map(process_tweet, tweets, 1)
In this case "tweets" is an iterator of some sort, and each element of that iterator will get passed to your function. The map function will make sure the results come back in the same order the arguments were supplied.
I have an application that loads millions of documents to a collection, using 30-80 workers to simultaneously load the data. Sometimes, I find that the loading process didn't complete smoothly, and with other databases I can simply delete the table and start over, but not with Firestore collections. I have to list the documents and delete them, and I've not found a way to scale this with the same capacity as my loading process. What I'm doing now is that I have two AppEngine hosted Flask/Python methods, one to get a page of 1000 documents and pass to another method to delete them. This way the process to list documents is not blocked by the process to delete them. It's still taking days to complete which is too long.
Method to get list of documents and create a task to delete them, which is single threaded:
#app.route('/delete_collection/<collection_name>/<batch_size>', methods=['POST'])
def delete_collection(collection_name, batch_size):
batch_size = int(batch_size)
coll_ref = db.collection(collection_name)
print('Received request to delete collection {} {} docs at a time'.format(
collection_name,
batch_size
))
num_docs = batch_size
while num_docs >= batch_size:
docs = coll_ref.limit(batch_size).stream()
found = 0
deletion_request = {
'doc_ids': []
}
for doc in docs:
deletion_request['doc_ids'].append(doc.id)
found += 1
num_docs = found
print('Creating request to delete docs: {}'.format(
json.dumps(deletion_request)
))
# Add to task queue
queue = tasks_client.queue_path(PROJECT_ID, LOCATION, 'database-manager')
task_meet = {
'app_engine_http_request': { # Specify the type of request.
'http_method': 'POST',
'relative_uri': '/delete_documents/{}'.format(
collection_name
),
'body': json.dumps(deletion_request).encode(),
'headers': {
'Content-Type': 'application/json'
}
}
}
task_response_meet = tasks_client.create_task(queue, task_meet)
print('Created task to delete {} docs: {}'.format(
batch_size,
json.dumps(deletion_request)
))
Here is the method I use to delete the documents, which can scale. In effect it only processes 5-10 at a time, limited by the rate which the other method passes pages of doc_ids to delete. Separating the two helps, but not that much.
#app.route('/delete_documents/<collection_name>', methods=['POST'])
def delete_documents(collection_name):
# Validate we got a body in the POST
if flask.request.json:
print('Request received to delete docs from :{}'.format(collection_name))
else:
message = 'No json found in request: {}'.format(flask.request)
print(message)
return message, 400
# Validate that the payload includes a list of doc_ids
doc_ids = flask.request.json.get('doc_ids', None)
if doc_ids is None:
return 'No doc_ids specified in payload: {}'.format(flask.request.json), 400
print('Received request to delete docs: {}'.format(doc_ids))
for doc_id in doc_ids:
db.collection(collection_name).document(doc_id).delete()
return 'Finished'
if __name__ == '__main__':
# Set environment variables for running locally
app.run(host='127.0.0.1', port=8080, debug=True)
I've tried running multiple concurrent executions of delete_collection(), but am not certain that even helps, as I'm not sure if every time it calls limit(batch_size).stream() that it gets a distinct set of documents or possibly is getting duplicates.
How can I make this run faster?
This is what I came up with. It's not super fast (120-150 docs per second), but all the other examples I found in python didn't work at all:
now = datetime.now()
then = now - timedelta(days=DOCUMENT_EXPIRATION_DAYS)
doc_counter = 0
commit_counter = 0
limit = 5000
while True:
docs = []
print('Getting next doc handler')
docs = [snapshot for snapshot in db.collection(collection_name)
.where('id.time', '<=', then)
.limit(limit)
.order_by('id.time', direction=firestore.Query.ASCENDING
).stream()]
batch = db.batch()
for doc in docs:
doc_counter = doc_counter + 1
if doc_counter % 500 == 0:
commit_counter += 1
print('Committing batch {} from {}'.format(commit_counter, doc.to_dict()['id']['time']))
batch.commit()
batch.delete(doc.reference)
batch.commit()
if len(docs) == limit:
continue
break
print('Deleted {} documents in {} seconds.'.format(doc_counter, datetime.now() - now))
As mentioned in the other comments, .stream() has a 60 second deadline. This iterative structure sets a limit of 5000 after which .stream() is called again, which keeps it under the 60 second limit. If anybody knows how to speed this up, let me know.
Here is my simple Python script that I used to test batch deletes. Like #Chris32 said, the batch mode will delete thousands of documents per second if latency isn't too bad.
from time import time
from uuid import uuid4
from google.cloud import firestore
DB = firestore.Client()
def generate_user_data(entries = 10):
print('Creating {} documents'.format(entries))
now = time()
batch = DB.batch()
for counter in range(entries):
# Each transaction or batch of writes can write to a maximum of 500 documents.
# https://cloud.google.com/firestore/quotas#writes_and_transactions
if counter % 500 == 0 and counter > 0:
batch.commit()
user_id = str(uuid4())
data = {
"some_data": str(uuid4()),
"expires_at": int(now)
}
user_ref = DB.collection(u'users').document(user_id)
batch.set(user_ref, data)
batch.commit()
print('Wrote {} documents in {:.2f} seconds.'.format(entries, time() - now))
def delete_one_by_one():
print('Deleting documents one by one')
now = time()
docs = DB.collection(u'users').where(u'expires_at', u'<=', int(now)).stream()
counter = 0
for doc in docs:
doc.reference.delete()
counter = counter + 1
print('Deleted {} documents in {:.2f} seconds.'.format(counter, time() - now))
def delete_in_batch():
print('Deleting documents in batch')
now = time()
docs = DB.collection(u'users').where(u'expires_at', u'<=', int(now)).stream()
batch = DB.batch()
counter = 0
for doc in docs:
counter = counter + 1
if counter % 500 == 0:
batch.commit()
batch.delete(doc.reference)
batch.commit()
print('Deleted {} documents in {:.2f} seconds.'.format(counter, time() - now))
generate_user_data(10)
delete_one_by_one()
print('###')
generate_user_data(10)
delete_in_batch()
print('###')
generate_user_data(2000)
delete_in_batch()
In this public documentation is described how using a callable Cloud Function you can take advantage of the firestore delete command in the Firebase Command Line Interface deleting up to 4000 documents per second.
For a school project, I'm creating a game that has a score system, and I would like to create some sort of leaderboard. Once finished, the teachers will upload it to a shared server where other students can download a copy of the game, but unfortunately students can't save to that server; if we could, leaderboards would be a piece of cake. There would at most be a few hundred scores to keep track of, and all the computers have access to the internet.
I don't know much about servers or hosting, and I don't know java, html, or any other language commonly used in web development, so other related questions don't really help. My game prints the scoring information to a text file, and from there I don't know how to get it somewhere online that everyone can access.
Is there a way to accomplish such a task with just python?
Here I have the code for updating a leaderboard file (assuming it would just be a text file) once I have the scores. This would assume that I had a copy of the leaderboard and the score file in the same place.
This is the format of my mock-leaderboard (Leaderboards.txt):
Leaderboards
1) JOE 10001
2) ANA 10000
3) JAK 8400
4) AAA 4000
5) ABC 3999
This is what the log-file would print - the initials and score (log.txt):
ABC
3999
Code (works for both python 2.7 and 3.3):
def extract_log_info(log_file = "log.txt"):
with open(log_file, 'r') as log_info:
new_name, new_score = [i.strip('\n') for i in log_info.readlines()[:2]]
new_score = int(new_score)
return new_name, new_score
def update_leaderboards(new_name, new_score, lb_file = "Leaderboards.txt"):
cur_index = None
with open(lb_file, 'r') as lb_info:
lb_lines = lb_info.readlines()
lb_lines_cp = list(lb_lines) # Make a copy for iterating over
for line in lb_lines_cp:
if 'Leaderboards' in line or line == '\n':
continue
# Now we're at the numbers
position, name, score = [ i for i in line.split() ]
if new_score > int(score):
cur_index = lb_lines.index(line)
cur_place = int(position.strip(')'))
break
# If you have reached the bottom of the leaderboard, and there
# are no scores lower than yours
if cur_index is None:
# last_place essentially gets the number of entries thus far
last_place = int(lb_lines[-1].split()[0].strip(')'))
entry = "{}) {}\t{}\n".format((last_place+1), new_name, new_score)
lb_lines.append(entry)
else: # You've found a score you've beaten
entry = "{}) {}\t{}\n".format(cur_place, new_name, new_score)
lb_lines.insert(cur_index, entry)
lb_lines_cp = list(lb_lines) # Make a copy for iterating over
for line in lb_lines_cp[cur_index+1:]:
position, entry_info = line.split(')', 1)
new_entry_info = str(int(position)+1) + ')' + entry_info
lb_lines[lb_lines.index(line)] = new_entry_info
with open(lb_file, 'w') as lb_file_o:
lb_file_o.writelines(lb_lines)
if __name__ == '__main__':
name, score = extract_log_info()
update_leaderboards(name, score)
Some more info:
The score would be less than 1 000 000
Ideally, the solution would just be some code external to the game, so that I would just make an executable that the user would run after they've finished
I know it doesn't sound very secure - and it isn't - but that's ok, it's doesn't need to be hackproof
The easiest is probably to just use MongoDB or something (MongoDB is a NoSQL type database that allows you to save dictionary data easily...)
You can use the free account at https://mongolab.com (that should give you plenty of space).
You will need pymongo as well pip install pymongo.
Then you can simply save records there:
from pymongo import MongoClient, DESCENDING
uri = "mongodb://test1:test1#ds051990.mongolab.com:51990/joran1"
my_db_cli = MongoClient(uri)
db = my_db_cli.joran1 # select the database ...
my_scores = db.scores # this will be created if it doesn't exist!
# add a new score
my_scores.insert({"user_name": "Leeeeroy Jenkins", "score": 124, "time": "11/24/2014 13:43:22"})
my_scores.insert({"user_name": "bob smith", "score": 88, "time": "11/24/2014 13:43:22"})
# get a list of high scores (from best to worst)
print(list(my_scores.find().sort("score", DESCENDING)))
Those credentials will actually work if you want to test the system (keep in mind I added leeroy a few times).