I have some scripts which are running and synchronizing data from AGOL (Esri Arcgis Online) to PostgreSQL DB using python.
But every time when I am synchronizing a table, it deletes the old table in the DB and inserts the new data frame (I am using Spatial Data Frame to pull the data, and insert into the DB) with the same name (I am reading the table names from another table which contains all links to the AGOL feature layers, and the table names, and everything works fine till this step.
The problem comes with the reporting. I have another script which is used for the reporting, which if uses another user (except the postgres superuser) is showing error (technically it is trying to export data from a table which was previously deleted.
Can I add a rule to always give rights to select and insert data to a specific users/roles, on every newly created table in the DB? Whenever new table is created, to have the access to it?
I need the roles as we don't want to give the superuser to everyone that works in the reporting department, moreover they want to see who was last running the sync and the reports.
Related
I am trying to read XML file(s) in Python, parse them, extract required fields and insert the extracted data into a Postgres table. I am new to Python and Postgres. So was hoping if someone could please clarify some questions I have in my mind.
The requirement is that there will be 2 target tables in Postgres for every XML file (of a certain business entity e.g. customers, product etc) that should be received and read on a particular day - CURRENT and HISTORY.
The CURRENT table (e.g. CUST_CURR) is supposed to hold the latest data received for a particular run (current day file) on that particular day and HISTORY table (CUST_HIST) will contain the history of all the data received till the previous run - i.e just keep on appending the records for every run into the HIST table.
However, the requirement is to make the HIST table a PARTITIONED table (to improve query response time by partition-pruning) based on the current process run date. In other words, during a particular run, the CURR table needs to be truncated and loaded with the day's extracted records, and the records already existing in the CURR table should be copied/inserted/appended into the HIST table in a NEW Partition (of the HIST table) based on the run date.
Now, when I searched the internet to know more about Postgres PARTITION-ing tables, it appears that to create NEW partitions, new tables need to be created manually (with a different name) every time representing that partition documentation?? The example shows a CREATE TABLE statement for creating a partition!!
CREATE TABLE CUSTOMER_HIST_20220630 PARTITION OF CUST_CURRENT
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')
I am sure I have misinterpreted this but can anyone please correct me and help me clear the confusion.
So if anyone has to query the HIST table with the run-date filter (assuming that the partitions are created on the run_date column), the user has to query that particular (sub)table (something like SELECT * FROM CUST_HIST_20220630 WHERE run_dt >= '2022-05-31') instead of the main partitioned table (SELECT * FROM CUSTOMER_HIST) ?
In other RDBMS's (Oracle, Teradata etc) the partitions are created automatically when the data is loaded and they remain part of the same table. When a user queries the table based on the partitioned column, the DB optimizer engine understands this and effectively prunes the unnecessary partitions and reads only the required partition(s), thereby highly increasing response time.
Request someone to please clear my confusion. Is there a way to automate partition creation while loading data to Postgres table by using Python (psycopg2) ? I am new to Postgres and Python so please forgive my naivety.
I'm trying to migrate the database for an existing application from access to SQLite. The application uses Autonumbers to generate unique IDs in Access and some tables reference rows from other tables by these unique IDs.
What's a good way to migrate these and keep this functionality intact?
From what I've read, SQLite uses Auto indexing for this. How would I create the links between the tables? Do I have to search the other tables for the row with that unique ID and replace the reference with the SQL generated ID?
example:
table 1, has a column linkedID with a row with the value {7F99297A-DE91-4BD6-9ED8-FC13D668CDA2}, which is linked to a row in table 2 with primaryKey {7F99297A-DE91-4BD6-9ED8-FC13D668CDA2}.
Well, there not really a automated way to do this.
but, what I do to migrate data?
I setup a linked table in Access. Double check if that linked table works (you need to install the odbc driver).
Assuming you have a working linked table?
Then you can do this to export the Access table in VBA to sqlite.
Dim LocalTable As String ' name of local table link
Dim ServerTable As String ' name of table on SQL Lite
LocalTable = "Table1"
ServerTable = "TableSLite"
Dim strCon As String
strCon = CurrentDb.TableDefs("Test1").Connect
' above is a way to steal and get a working connection from a valid
' working linked table (I hate connection strings in code)
Debug.Print strCon
DoCmd.TransferDatabase acExport, "ODBC Database", strCon, acTable, LocalTable, ServerTable
Debug.Print "done export of " & LocalTable
That will get you the table in sqlite. But, there are no DDL (data definition commands) in sqlite to THEN change that PK id from Access to a PK and autonumber.
However, assuming you say have "db browser"?
Then simple export the table(s) as per above.
Now, in db browrser, open up the table, and choose modify, and simple check the AI (auto increemnt, and then PK settings - in fact if you check box AI, then the PK useally selects for you. So, after I done the above export. (and you should consider close down Access - since you had/have linked tables).
So, in db browser, we now do this:
so, for just a few tables, the above is not really hard.
However, the above export (transfer) of data does not set the PK, and auto increment for you.
If you need to do this with code, and this is not a one time export/transfer, then I don't have a good solution.
Unfortantly, SqlLite does NOT allow a alter table command to set PK and set auto increment (if that was possbile, then after a export, you could execute the DDL command in sqlite (or send the command from your client software) to make this alteration.
I not sure if sql lite can spit out the "create table" command that exists for a given table (but, I think it can). So, you might export the schema, get the DDL command, modify that command, drop the table, re-run the create table command (with current PK and auto increment), and THEN use a export or append query in Access.
But, transfer of the table(s) in question can be done quite easy as per above, but the result(s) do not set nor include the PK setting(s) for you.
However, if this is one time export? Then export of tables - and even the dirty work of figuring out the correct data types to be used?
The above works well - but you have to open up the tables in a tool like say db browser, and then set PK and auto increment.
I do the above quite often for transfer of Access tables to sqlLite tables, but it does then require some extra steps to setup the PK and auto increment.
Another possbile way if this had to be done more then one time?
I would export as per above, and then add the PK (and auto increment).
I would then grab say the 8 tables create commands from sqlLite, and save those create table commands in the client software.
then you execute the correct create table command, and then do a append query from Access. So, it really depends if this is a one time export, or this process of having to create the table(s) in sqlLite is to occur over and over.
I'm doing a web scraping + data analysis project that consists of scraping product prices every day, clean the data, and store that data into a PostgreSQL database. The final user won't have access to the data in this database (scraping every day becomes huge, so eventually I won't be able to upload in GitHub), but I want to explain how to replicate the project. The steps are basically:
Scraping with Selenium (Python), and save the raw data into CSV files (already in GitHub);
Read these CSV files, clean the data, and store it into the database (the cleaning script already in GitHub);
Retrieve the data from the database to create dashboards and anything that I want (not yet implemented).
To clarify, my question is about how can I teach someone that sees my project, to replicate it, given that this person won't have the database info (tables, columns). My idea is:
Add SQL queries in a folder, showing how to create the database skeleton (same tables and columns);
Add in README info like creating the environment variables to access the database;
It is okay doing that? I'm looking for best practices in this context. Thanks!
So, I have a local office that sends us a list of new subscribers from their location daily. But instead of just sending over the new data, they just send over a csv with what i think is the last 25,000 records - i'm guessing the clerk just clicks some default option.
I have a simple python script which inserts this data to a local mysql db, with sub_id set as unique index to prevent duplicates.
My problem however, is that I have to send the new subscriber data over to another team.
I want to add this functionality to the existing python script, and the solution I could think of is to add a "NEW" status when inserting to the db, and then export all rows with "NEW" status, and then update the "NEW" status to "EXPORTED".
This feels inefficient to me -
Is there a better approach to this?
Add a created_at timestamp to the table and send everything with a created_at greater than when the process started. This is the most robust option, and also adds a bit of information to the database which may be useful later; you can figure out which row was made by which import.
Or, have the Python script remember the IDs of every row it inserted. Then select only those rows.
Alternatively, have the Python script build the report as it inserts the data.
I want some user fields to be updated in Active Directory from SQL server. Is it possible to do that or Is it possible to update the fields using python? Any pointers would be greatly helpful!
You can use something like Python LDAP to make changes in Active Directory via the LDAP interface. The challenge is knowing what/when data changes in your database table.
In MySQL, you can use triggers to perform actions when INSERT, UPDATE, or DELETE operations are committed. A trigger could be used to populate a second table that is essentially a changelog. Either remove items from the changelog table when processed and updated into AD or maintain a "last change processed" number within your code and retain the changelog data as an audit log.