How to create a large number of tables - python

I've got a large dataset to work with to create a storage system to monitor movement in a store. There's over like 300 products in that store and the main structure of all tables is the same. The only difference is the data inside. There's a larger data base called StorageTF and I want to create a lot of tables called Product_1,Product_2,Product_3 etc..
The table structure should look like
The main large data set (table) looks like this:
CREATE TABLE StoringTF (
Store_code INTEGER,
Store TEXT,
Product_Date TEXT,
Permission INTEGER,
Product_Code INTEGER,
Product_Name TEXT,
Incoming INTEGER,
Unit_Buying_Price INTEGER,
Total_Buying_Price INTEGER,
Outgoing INTEGER,
Unit_Sell_Price INTEGER,
Total_Sell_Price INTEGER,
Description TEXT)
I want the user to input a code in an entry called PCode
it looks like this
PCode = Entry(root, width=40)
PCode.grid(row=0,column=0)
then a function compares the input with all codes in the main table and takes that one and gets the table that has the same product_code.
So the sequence is. All the product tables for all product_Codes in the main table will be created and will have all data from main table that has same product_code.
Then when the program is opened the user inputs a product_code
the program picks the table that has the same code and shows it to the user.
Thanks a lot and I know it's hard but I really need your help and I'm certain you can help me. Thanks.
The product table should look like
CREATE TABLE Product_x (Product_Code INTEGER,
Product_Name TEXT, --taken from main table from lines that has same product code
Entry_Date, TEXT,
Permission_Number INTEGER,
Incoming INTEGER,
Outgoing INTEGER,
Description TEXT,
Total_Quantity_In_Store INTEGER, --which is main table's incoming - outgoing
Total_Value_In_Store INTEGER --main table's total_buying_price - total_sell_price
)
Thank you for your help and hope you can figure it out because I'm really struggling with it.

From your comment:
I think I'd select some columns from main table but I don't know how I'd update the only some columns with select columns from main table where product code = PCode.get() "which is the entry box". is that possible.
Yes, it is definitely possible to present only certain rows and columns of data to the user.
However, there are many patterns (i.e. programming techniques) that you could follow for presenting data to the user, but every common, best-practice technique always separates the backend data (i.e. database) from the user interface. It is not necessary to limit presentation of data to one entire table at a time. In most cases the data should never be presented and/or exposed to the user exactly as it appears in a table. Of course sometimes the data is simple and direct enough to do that, but most applications re-format and group data in different views for proper presentation. (Here the term view is meant as a very general, abstract term for representing data in alternative ways from how it is stored. I mention specific sqlite views below.)
The entire philosophy behind modern databases is for efficient, well-designed storage that can be queried to return just what data is appropriate for each application. Much of this capability is based on the host-language data models, but sqlite directly supports features to help with this. For instance, a view can be defined to select only certain columns and rows at a time (i.e. choose certain Produce_Code values). An sqlite view is just an SQL query that is saved and can have certain properties and actions defined for it. By default, a sqlite view is read-only, but triggers can be defined to allow updates to the underlying tables via the view.
From my earlier comment: You should research data normalization. That is the key principle for designing relational databases. For instance, you should avoid duplicate data columns like Product_Name. That column should only be in the StoringTF. Calculated columns are also usually redundant and unnecessary--don't store the Total_Value_In_Store column, rather calculate it when needed by query and/or view. Having duplicate columns invites mismatched data or at least unnecessary care to make sure all columns are synced when one is updated. Instead you can just query joined tables to get related values.
Honestly, these concepts can require much study before implementing properly. By all means, go forward with developing a solution that fits your needs, but a Stack Overflow answer is no place for full tutorials which I perceive that you might need. Really your question seems more about overall design and I think my answer can get you started on the right track. Anything more specific and you'll need to ask other questions later on.

Related

ForeignKeys and historical data

I have a Quote table that generates a price based on lots of parameters. Price is generated based on data from multiple other tables like: Charges, Coupon, Promotions etc. The best way to deal with it is using ForeignKeys. Here's the table (all _id fields are foreign keys):
Quote
#user input fields
charge_id
promotion_id
coupon_id
tariff_id
Everything looks good. Each quote record has very detailed data that tells you where the price comes from. The problem is the data in the tables that it depends on isn't guaranteed to stay the same. It might get deleted or get changed. Let's say a Quote has a foreign key to a tariff and then it gets changed. The records associated with it no longer tell the same story. How do you deal with something like this? Would really appreciate if you recommend some theory related to this.
If you don't want your quote values to be changed by change in foreign key related objects, the best bet is to add all fields from individual foreign keys models in the Quote class.
Then while calculating the values of Quote, you fetch data from all the (now not) related objects and save them in the Quotes table.
Now any changes to Foreign tables will not affect the Quotes table.
Another option would be to use a library like django-simple-history and that keeps track of all changes to the models with change of time.

Is any variable in PostgreSQL to store a list

We have a ticket software to manage our work, every ticket is assigned to a tech in one field -the normal stuff-, but now we want to assign the same ticket to several technicians eg: tick 5432: tech_id(2,4,7) where 2,4,7 are tech IDs.
Of course we can do that using a separate table with the IDs of the tech and the ticket ID, but we have to convert the data.
The "right" way to do this is to have a separate table of ticket
assignments. Converting the data for something like this is fairly simple on the database end. create table assign as select tech_id from ... followed by creating any necessary foreign key constraints.
Rewriting your interface code can be trickier, but you're going to
have to do that anyway to allow for more than one tech.
You could use an array type, but sometimes database
interfaces don't understand postgres array types. There isn't anything
inherent in arrays that prevents duplicates or imposes ordering, but
you could do that with an appropriate trigger.

Python sqite3 user defined queries (selecting tables)

I have a uni assignment where I'm implementing a database that users interact with over a webpage. The goal is to search for books given some criteria. This is one module within a bigger project.
I'd like to let users be able to select the criteria and order they want, but the following doesn't seem to work:
cursor.execute("SELECT * FROM Books WHERE ? REGEXP ? ORDER BY ? ?", [category, criteria, order, asc_desc])
I can't work out why, because when I go
cursor.execute("SELECT * FROM Books WHERE title REGEXP ? ORDER BY price ASC", [criteria])
I get full results. Is there any way to fix this without resorting to injection?
The data is organised in a table where the book's ISBN is a primary key, and each row has many columns, such as the book's title, author, publisher, etc. The user should be allowed to select any of these columns and perform a search.
Generally, SQL engines only support parameters on values, not on the names of tables, columns, etc. And this is true of sqlite itself, and Python's sqlite module.
The rationale behind this is partly historical (traditional clumsy database APIs had explicit bind calls where you had to say which column number you were binding with which value of which type, etc.), but mainly because there isn't much good reason to parameterize values.
On the one hand, you don't need to worry about quoting or type conversion for table and column names. On the other hand, once you start letting end-user-sourced text specify a table or column, it's hard to see what other harm they could do.
Also, from a performance point of view (and if you read the sqlite docs—see section 3.0—you'll notice they focus on parameter binding as a performance issue, not a safety issue), the database engine can reuse a prepared optimized query plan when given different values, but not when given different columns.
So, what can you do about this?
Well, generating SQL strings dynamically is one option, but not the only one.
First, this kind of thing is often a sign of a broken data model that needs to be normalized one step further. Maybe you should have a BookMetadata table, where you have many rows—each with a field name and a value—for each Book?
Second, if you want something that's conceptually normalized as far as this code is concerned, but actually denormalized (either for efficiency, or because to some other code it shouldn't be normalized)… functions are great for that. create_function a wrapper, and you can pass parameters to that function when you execute it.

Python: Dumping Database Data with Peewee

Background
I am looking for a way to dump the results of MySQL queries made with Python & Peewee to an excel file, including database column headers. I'd like the exported content to be laid out in a near-identical order to the columns in the database. Furthermore, I'd like a way for this to work across multiple similar databases that may have slightly differing fields. To clarify, one database may have a user table containing "User, PasswordHash, DOB, [...]", while another has "User, PasswordHash, Name, DOB, [...]".
The Problem
My primary problem is getting the column headers out in an ordered fashion. All attempts thus far have resulted in unordered results, and all of which are less then elegant.
Second, my methodology thus far has resulted in code which I'd (personally) hate to maintain, which I know is a bad sign.
Work so far
At present, I have used Peewee's pwiz.py script to generate the models for each of the preexisting database tables in the target databases, then went and entered all primary and foreign keys. The relations are setup, and some brief tests showed they're associating properly.
Code: I've managed to get the column headers out using something similar to:
for i, column in enumerate(User._meta.get_field_names()):
ws.cell(row=0,column=i).value = column
As mentioned, this is unordered. Also, doing it this way forces me to do something along the lines of
getattr(some_object, title)
to dynamically populate the fields accordingly.
Thoughts and Possible Solutions
Manually write out the order that I want stuff in an array, and use that for looping through and populating data. The pros of this is very strict/granular control. The cons are that I'd need to specify this for every database.
Create (whether manually or via a method) a hash of fields with an associated weighted value for all possibly encountered fields, then write a method for sorting "_meta.get_field_names()" according to weight. The cons of this is that the columns may not be 100% in the right order, such as Name coming before DOB in one DB, while after it in another.
Feel free to tell me I'm doing it all wrong or suggest completely different ways of doing this, I'm all ears. I'm very much new to Python and Peewee (ORMs in general, actually). I could switch back to Perl and do the database querying via DBI with little to no hassle. However, it's libraries for excel would cause me as many problems, and I'd like to take this as a time to expand my knowledge.
There is a method on the model meta you can use:
for field in User._meta.get_sorted_fields():
print field.name
This will print the field names in the order they are declared on the model.

Dynamic column formatting in SQL - and a backend to store the formatting

I'm trying to create a system in Python in which one can select a number of rows from a set of tables, which are to be formatted in a user-defined way. Let's say the table a has a set of columns, some of which include a date or timestamp value. The user-defined format for each column should be stored in another table, and queried and applied on the main query at runtime.
Let me give you an example: There are different ways of formatting a date column, e.g. using
SELECT to_char(column, 'YYYY-MM-DD') FROM table;
in PostgreSQL.
For example, I'd like the second parameter of the to_char() builtin to be queried dynamically from another table at runtime, and then applied if it has a value.
Reading the definition from a table as such is not that much of a problem, rather than creating a database scheme which would receive data from a user interface from which a user can select which formatting instructions to apply to the different columns. The user should be able to pick the user's set of columns to be included in the user's query, as well as the user's user defined formatting for each column.
I've been thinking about doing this in an elegant and efficient way for some days now, but to no avail. Having the user put in the user's desired definition in a text field and including it in a query would pretty much generate an invitation for SQL injection attacks (although I could use escape() functions), and storing every possible combination doesn't seem feasible to me either.
It seems to me a stored procedure or a sub-select would work well here, though I haven't tested it. Let's say you store a date_format for each user in the users table.
SELECT to_char((SELECT date_format FROM users WHERE users.id=123), column) FROM table;
Your mileage may vary.
Pull the dates out as Unix timestamps and format them in Python:
SELECT DATE_PART('epoch', TIMESTAMP(my_col)) FROM my_table;
my_date = datetime.datetime.fromtimestamp(row[0]) # Or equivalent for your toolkit
I've found a couple of advantages to this approach: unix timestamps are the most space-efficient common format (this approach is effectively language neutral) and because the language you're querying the database in is richer than the underlying database, giving you plenty of options if you start wanting to do friendlier formatting like "today", "yesterday", "last week", "June 23rd".
I don't know what sort of application you're developing but if it's something like a web app which will be used by multiple people I'd also consider storing your database values in UTC so you can apply user-specific timezone settings when formatting without having to consider them for all of your database operations.

Categories

Resources