More efficient way to update multiple model objects each with unique values - python

I am looking for a more efficient way to update a bunch of model objects. Every night I have background jobs creating 'NCAABGame' objects from an API once the scores are final.
In the morning I have to update all the fields in the model with the stats that the API did not provide.
As of right now I get the stats formatted from an excel file and I copy and paste each update and run it like this:
NCAABGame.objects.filter(
name__name='San Francisco', updated=False).update(
field_goals=38,
field_goal_attempts=55,
three_points=11,
three_point_attempts=24,
...
)
The other day there were 183 games, most days between 20-30 so it can be very timely doing it this way. I've looked into bulk_update and a few other things but I can't really find a solution. I'm sure there is something simple that I'm just not seeing.
I appreciate any ideas or solutions you can offer.

If you need to update each object that gets created via the API manually anyway, I would not even bother going through Django. Just load your games from the API directly in Excel, then make your edits in Excel, and save as CSV file. Then I would add the CSV directly into the database table, unless there is a specific reason that objects must be created via Django? I mean, you can of course do that with something like the below, which could be modified to also work for your current method via updates, but then you need to first retrieve the correct pk of the object that you want to update.
import csv
with open("my_data.csv", 'r') as my_data_file:
reader = csv.reader(my_data_file)
for row in reader:
# get_or_create returns a tuple. 'created' is a boolean that indicates
# if a new object was created or not, with game holding the object that
# was either retrieved or created
game, created = NCAABGame.objects.get_or_create(
name=row[0],
field_goals=row[1],
field_goal_attempts=row[2],
....,
)

Related

Any idea to create the price table which associate with date in Django(Python)

I would like to create a price table by date, I tried to google this for python and django, but still have no idea for this. I don't want to create one to one relationship object like an options. but I would like to create the database associating date and price. Sorry that it may be simple question..
Would it be solution to create a database by using PostgreSQL, and read by django? or any resource / reference can help get me in right direction to access this problem?
Thanks so much
Well there is more to it then assigning a price to a date. You will need one or more tables that hold the establishment(hotels) data. These would include the room information as all rooms will not have the same price. Also the price will probably change over time, so you will need to track that. Then there is the reservation information to track. This is just some of the basics. It is not a simple task by any means. I would try a simpler project to start with to learn Django and how to get data in and out of it.

Is there any way mssql can notify my python application when any table or row has been updated?

I dont have much knowledge in dbs, but wanted to know if there is any technique by which when i update or insert a specific entry in a table, it should notify my python application to which i can then listen whats updated and then update that particular row, in the data stored in session or some temporary storage.
I need to send data filter and sort calls again n again, so i dont want to fetch whole data from sql, so i decided to keep it local, nd process it from there. But i was worried if in the mean time the db updates, and i could have been passing the same old data to filter requests.
Any suggestions?
rdbs only will be updated by your program's method or function sort of things.
you can just print console or log inside of yours.
if you want to track what updated modified deleted things,
you have to build a another program to able to track the logs for rdbs
thanks.

Large Import into django postgres database

I have a CSV file with 4,500,000 rows in it that needs to be imported into my django postgres database. This files includes relations so it isn't as easy as using COPY to import the CSV file straight into the database.
If I wanted to load it straight into postgres, I can change the CSV file to match the database tables, but I'm not sure how to get the relationship since I need to know the inserted id in order to build the relationship.
Is there a way to generate sql inserts that will get the last id and use that in future statements?
I initially wrote this using django ORM, but its going to take way to long to do that and it seems to be slowing down. I removed all of my indexes and contraints, so that shouldn't be the issue.
The database is running locally on my machine. I figured once I get the data into a database, it wouldn't be hard to dump and reload it on the production database.
So how can I get this data into my database with the correct relationships?
Note that I don't know JAVA so the answer suggested here isn't super practical for me: Django with huge mysql database
EDIT:
Here are more details:
I have a model something like this:
class Person(models.Model):
name = models.CharField(max_length=100)
offices = models.ManyToManyField(Office)
job = models.ForeignKey(Job)
class Office(models.Model):
address = models.CharField(max_length=100)
class Job(models.Model):
title = models.CharField(max_length=100)
So I have a person who can have 1 job but many offices. (My real model has more fields, but you get the idea).
My CSV file is something like this:
name,office_1,office_2,job
hailey,"123 test st","222 USA ave.",Programmer
There are more fields than that, but I'm only including the relevant ones.
So I need to make the person object and the office objects and relate them. The job objects are already created so all I need to do there is find the job and save it as the person's job.
The original data was not in a database before this. Only the flat file. We are trying to make it relational so there is more flexibility.
Thanks!!!
Well this is though one.
When you say relations, they are all on a single CSV file? I mean, like this, presuming a simple data model, with a relation to itself?
id;parent_id;name
4;1;Frank
1;;George
2;1;Costanza
3;1;Stella
If this is the case and it's out of order, I would write a Python script to reorder these and then import them.
I had a scenario a while back that I had a number of CSV files, but they were from individual models, where I loaded the first parent one, then the second, etc.
We wrote here custom importers that would read the data from a single CSV, and would do some processing on it, like check if it already existed, if some things were valid, etc. A method for each CSV file.
For CSV's that were big enough, we just split them in smaller files (around 200k records each) and processed them one after the other. The difference is that all the previous data that this big CSV depended on, was already in the database, imported by the same method described previously.
Without an example, I can't comment much more.
EDIT
Well, since you gave us your model, and based on the fact that the job model is already there, I would go for something like this:
create a custom method, even if you one n you can invoke from the shell. A method/function or whatever, that will receive a single line of the file.
In that method, discover how many offices that person is related to. Search to see if the office already exists in the DB. If so, use it to relate a person and the office. If not, create it and relate them
Lookup for the job. Does it exist? Yes, then use it. No? Create it and then use it.
Something like this:
def process_line(line):
data = line.split(";")
person = Person()
# fill in the person details that are in the CSV
person.name = data[1]
person.name = data[2]
person.save() # you'll need to save to use the m2m
offices = get_offices_from_line(line) # returns the plain data, not office instances
for office in offices:
obj, create = get_or_create(office_address=office)
if (obj):
person.offices.add(obj)
if (create):
person.offices.add(create)
job_obj, job_create = get_or_create(job_title=data[5])
# repeat
Be aware that the function above was not tested or guarded against any kind of errors. You'll need to:
Do that yourself;
Create the function that identifies the offices each person has. I don't know the data, but perhaps if you look at the field preceding the first office and look until the first field after all the offices you'll be able to grasp all of them;
You'll need to create a function to parse the high level file, iterate the lines and pass them along your shiny import function.
Here are the docs for get_or_create: https://docs.djangoproject.com/en/1.8/ref/models/querysets/#get-or-create

Django Models Counter

I am using Django with a bunch of models linked to a MySQL database. Every so often, my project needs to generate a new number (sequentially, although this is not important) that becomes an ID for rows in one of the database tables. I cannot use the auto-increment feature in the models because multiple rows will end up having this number (it is not the primary key). Thus far, I have been using global variables in views.py, but every time I change anything and save, the variables are reset with the server. What is the best way to generate a new ID like this (without it being reset all the time), preferably without writing to a file every time? Thanks in advance!
One way is to create a table in your database and save those values that you want in it. Another way is to use HTTP Cookies to save values if you want to avoid server reset problem. Though, I do not prefer this way.
You can follow this link to set and read values from Cookies in django:-
https://docs.djangoproject.com/en/dev/topics/http/sessions/#s-setting-test-cookies

Please help me design a database schema for this:

I'm designing a python application which works with a database. I'm planning to use sqlite.
There are 15000 objects, and each object has a few attributes. every day I need to add some data for each object.(Maybe create a column with the date as its name).
However, I would like to easily delete the data which is too old but it is very hard to delete columns using sqlite(and it might be slow because I need to copy the required columns and then delete the old table)
Is there a better way to organize this data other than creating a column for every date? Or should I use something other than sqlite?
It'll probably be easiest to separate your data into two tables like so:
CREATE TABLE object(
id INTEGER PRIMARY KEY,
...
);
CREATE TABLE extra_data(
objectid INTEGER,
date DATETIME,
...
FOREIGN KEY(objectid) REFERENCES object(id)
);
This way when you need to delete all of your entries from a date it'll be an easy:
DELETE FROM extra_data WHERE date = curdate;
I would try and avoid altering tables all the time, usually indicates a bad design.
For that size of a db, I would use something else. I've used sqlite once for a media library with about 10k objects and it was slow, like 5 minutes to query it all and display, searches were :/, switching to postgres made life so much easier. This is just on the performance issue only.
It also might be better to create an index that contains the date and the data/column you want to add and a pk reference to the object it belongs and use that for your deletions instead of altering the table all the time. This can be done in sqlite if you give the pk an int type and save the pk of the object to it, instead of using a Foreign Key like you would with mysql/postgres.
If your database is pretty much a collection of almost-homogenic data, you could as well go for a simpler key-value database. If the main action you perform on the data is scanning through everything, it would perform significantly better.
Python library has bindings for popular ones as "anydbm". There is also a dict-imitating proxy over anydbm in shelve. You could pickle your objects with the attributes using any serializer you want (simplejson, yaml, pickle)

Categories

Resources