I am new to django and i have been scouting the site to find good ways to store images generated using python(there seems to be contradicting views on saving it to a local folder, file or database). My website's sole purrpose is to make qrcodes all the time. In my local machine i would save it in my program folder like:
import pyqrcode
qr = pyqrcode.create("{'name': 'myqr'}")
qr.png("horn.png", scale=6)
print "All done!"
I am running into a situation where i will be dealing with many users doing the same thing. I doubt saving it to a local folder would be a viable option. I am set on saving these images as blobs in mysql.
Has anyone done this sort of thing? If so, what was the best way to implement.Example code would also be helpful.
Storing images in the database is somewhat an edge-case. To relieve the database, I would not store them in the database, but in a folder on your server.
I understand your question so, that some users might create the same qr-codes. In this case, I would create a database table like that:
CREATE TABLE qrcodes (
value TEXT PRIMARY KEY,
fname TEXT
);
With this table you can find the qr-codes by the encoded value. If an entry exists for a given value, than you can just return the contents of the file whose name is stored.
Leaves one question: How to create the file names. There are many possibilities. One would be to create an UUID and make a filename from it. An other option would be to use a global counter and give every file a new number. The counter must be stored in a database table of course.
Of course you can still store the image in the database, if you want. Just don't use the fname field, but a blob-field that stores the content. That solution should work on most databases, but is likely slower than the file approach when you have really big amounts of data.
Related
I am building an image mosaic that detect if the user's selected area are taken or not.
My idea is to store the available_spots in a list, and I would just have to look through the list to check whether a spot is available or not.
The problem is that when I reload the website, avaliable_spots also gets reset to blank list,
so I want to store this array somewhere, that is fast to read and write to.
I am currently thinking about a text file to store this, but that might take forever to read since array length is over 1.4 million. Is there any other solutions that might be better?
You can't store the data in a file for a few reasons: (1) GAE standard won't let you, (2) the data is lost when your server is restarted, and (3) different instances will have different data.
Of course you can and should store the data in a database of your choice. Firestore is likely a better and cheaper option than SQL. It should be fast enough for you and you can implement caching if needed.
You might be able to store the data in a single Firestore entity and consider using compression if you are getting close to the max entity size.
If you want to store into a database you can use the "sqlite3" module.
Is a simple database that gets stored in a file so you dont have to install a database program. Is great for small projects.
If you want to do more complex stuff with databases you can use "sqlalchemy".
I am developing a website using django 1.8 and python3.4 with anaconda environment. Basically, A file and plot are generated and display on the web page after a user inputs parameters and submits it. Then the user can download the file and image.
My approach to handle the file storage is store them as static files and name them as "result.csv" and "plot.png".The files are totally different based on users' request. But if more than one user request something simultaneously, the system only save a file with a name. It is very dangerous.
I have no insight to deal with this situation. Could anyone give some suggestion or a direction for me? Thank you very much.
There are several ways to accomplish this. The first ones that spring to mind are, assuming you want to keep one set of results per user (i.e. the last generated)
1- Create unique names based on the user-id. This allows you to access the files without first consulting the user data in the DB.
It also has the advantage of not having to delete previous versions of the files.
2- Create unique filenames with the uuid library module
import uuid
user_filename = uuid.uuid4()
csv_filename = user_filename+'.csv'
png_filename = user_filename+'.png'
and store the user_filename in the DB user record for later access.
3- Do the same but using a timestamp with enough resolution
Alternatively you can create a subdirectory with a unique name and store the files inside it with a static name for both.
Options 2 and 3 require you to remove previous versions of the files when generating new ones for the same user.
As mentioned by #Wtower, store the files in the MEDIA directory, maybe under a further subdirectory.
I have a CSV file with 4,500,000 rows in it that needs to be imported into my django postgres database. This files includes relations so it isn't as easy as using COPY to import the CSV file straight into the database.
If I wanted to load it straight into postgres, I can change the CSV file to match the database tables, but I'm not sure how to get the relationship since I need to know the inserted id in order to build the relationship.
Is there a way to generate sql inserts that will get the last id and use that in future statements?
I initially wrote this using django ORM, but its going to take way to long to do that and it seems to be slowing down. I removed all of my indexes and contraints, so that shouldn't be the issue.
The database is running locally on my machine. I figured once I get the data into a database, it wouldn't be hard to dump and reload it on the production database.
So how can I get this data into my database with the correct relationships?
Note that I don't know JAVA so the answer suggested here isn't super practical for me: Django with huge mysql database
EDIT:
Here are more details:
I have a model something like this:
class Person(models.Model):
name = models.CharField(max_length=100)
offices = models.ManyToManyField(Office)
job = models.ForeignKey(Job)
class Office(models.Model):
address = models.CharField(max_length=100)
class Job(models.Model):
title = models.CharField(max_length=100)
So I have a person who can have 1 job but many offices. (My real model has more fields, but you get the idea).
My CSV file is something like this:
name,office_1,office_2,job
hailey,"123 test st","222 USA ave.",Programmer
There are more fields than that, but I'm only including the relevant ones.
So I need to make the person object and the office objects and relate them. The job objects are already created so all I need to do there is find the job and save it as the person's job.
The original data was not in a database before this. Only the flat file. We are trying to make it relational so there is more flexibility.
Thanks!!!
Well this is though one.
When you say relations, they are all on a single CSV file? I mean, like this, presuming a simple data model, with a relation to itself?
id;parent_id;name
4;1;Frank
1;;George
2;1;Costanza
3;1;Stella
If this is the case and it's out of order, I would write a Python script to reorder these and then import them.
I had a scenario a while back that I had a number of CSV files, but they were from individual models, where I loaded the first parent one, then the second, etc.
We wrote here custom importers that would read the data from a single CSV, and would do some processing on it, like check if it already existed, if some things were valid, etc. A method for each CSV file.
For CSV's that were big enough, we just split them in smaller files (around 200k records each) and processed them one after the other. The difference is that all the previous data that this big CSV depended on, was already in the database, imported by the same method described previously.
Without an example, I can't comment much more.
EDIT
Well, since you gave us your model, and based on the fact that the job model is already there, I would go for something like this:
create a custom method, even if you one n you can invoke from the shell. A method/function or whatever, that will receive a single line of the file.
In that method, discover how many offices that person is related to. Search to see if the office already exists in the DB. If so, use it to relate a person and the office. If not, create it and relate them
Lookup for the job. Does it exist? Yes, then use it. No? Create it and then use it.
Something like this:
def process_line(line):
data = line.split(";")
person = Person()
# fill in the person details that are in the CSV
person.name = data[1]
person.name = data[2]
person.save() # you'll need to save to use the m2m
offices = get_offices_from_line(line) # returns the plain data, not office instances
for office in offices:
obj, create = get_or_create(office_address=office)
if (obj):
person.offices.add(obj)
if (create):
person.offices.add(create)
job_obj, job_create = get_or_create(job_title=data[5])
# repeat
Be aware that the function above was not tested or guarded against any kind of errors. You'll need to:
Do that yourself;
Create the function that identifies the offices each person has. I don't know the data, but perhaps if you look at the field preceding the first office and look until the first field after all the offices you'll be able to grasp all of them;
You'll need to create a function to parse the high level file, iterate the lines and pass them along your shiny import function.
Here are the docs for get_or_create: https://docs.djangoproject.com/en/1.8/ref/models/querysets/#get-or-create
I am using Django with a bunch of models linked to a MySQL database. Every so often, my project needs to generate a new number (sequentially, although this is not important) that becomes an ID for rows in one of the database tables. I cannot use the auto-increment feature in the models because multiple rows will end up having this number (it is not the primary key). Thus far, I have been using global variables in views.py, but every time I change anything and save, the variables are reset with the server. What is the best way to generate a new ID like this (without it being reset all the time), preferably without writing to a file every time? Thanks in advance!
One way is to create a table in your database and save those values that you want in it. Another way is to use HTTP Cookies to save values if you want to avoid server reset problem. Though, I do not prefer this way.
You can follow this link to set and read values from Cookies in django:-
https://docs.djangoproject.com/en/dev/topics/http/sessions/#s-setting-test-cookies
I'm looking at a way to access a csv file's cells in a random fashion. If I use Python's csv module, I can only iterate through all lines which is rather slow. I should also add that the file is pretty large (>100MB) and that I'm looking at short response time.
I could preprocess the file into a different data format for faster row/column access. Perhaps someone has done this before and can share some experiences.
Background:
I'd like to show an extract of the csv on screen provided by a web server (depending on scroll position). Keeping the file in memory is not an option.
I have found SQLite good for this sort of thing. It is easy to set up and you can store the data locally, but you also get easier control over what you select than csv files and you get the facility to add indexes etc.
There is also a built in facility for loading csv files into a table: http://www.sqlite.org/cvstrac/wiki?p=ImportingFiles.
Let me know if you want any further details on the SQLite route i.e. how to create the table, load the data in or query it from Python.
SQLite Instructions to load .csv file to table
To create a database file you can just add the filename required as an argument when opening SQLite. Navigate to the directory containing the csv file from the command line (I am assuming here that you want the SQLite .db file to be contained in the same dir). If using Windows add SQLite to your PATH environment variable if not already done, (instructions here if you need them) and open SQLite as follows with an argument for the name that you want to give your database file e.g.:
sqlite3 example.db
Check the database file has been created by entering:
.databases
Create a table to hold the data. I am using an example for a simple customer table here. If data types are inconsistent for any columns use text:
create table customers (ID integer, Title text, Forename text, Surname text, Postcode text, Addr_Line1 text, Addr_Line2 text, Town text, County text, Home_Phone text, Mobile text, Comments text);
Specify the separator to be used:
.separator ","
Issue the command to import the data, the sytnax takes the form .import filename.ext table_name e.g.:
.import cust.csv customers
Check that the data has loaded in:
select count(*) from customers;
Add an index for columns that you are likely to filter on (full syntax described here) e.g.:
create index cust_surname on customers(surname);
You should now have fast access to the data when filtering on any of the indexed columns. To leave SQLite use .exit, to get a list of other helpful non-SQL commands use .help.
Python Alternative
Alternatively if you want to stick with pure Python and pre-process the file then you could load the data into a dictionary which would allow much faster access to the data as the dictionary keys behave like an index meaning that you can get to values associated with a key quickly without going through the records one by one. I would need further details of your input data and what fields the lookups would be based on to provide further details on how to implement this.
However, unless you will know in advance when the data will be required (to be able to pre-process the file before the request for data) then you would still have the overhead of loading the file from disk into memory every time you run this. Depending on your exact usage this may make the database solution more appropriate.