In MySQL, I have two different databases -- let's call them A and B.
Database A resides on server server1, while database B resides on server server2.
Both servers {A, B} are physically close to each other, but are on different machines and have different connection parameters (different username, different password etc).
In such a case, is it possible to perform a join between a table that is in database A, to a table that is in database B?
If so, how do I go about it, programatically, in python? (I am using python's MySQLDB to separately interact with each one of the databases).
Try to use FEDERATED Storage Engine.
Workaround: it is possible to use another DBMS to retrieve data between two databases, for example you could do it using linked servers in MS SQL Server (see sp_addlinkedserver stored procedure). From the documentation:
A linked server allows for access to distributed, heterogeneous queries against OLE DB data sources.
It is very simple - select data from one server, select data from another server and aggregate using Python. If you would like to have SQL query with JOIN - put result from both servers into separate tables in local SQLite database and write SELECT with JOIN.
No. It is not possible to do the join as you would like. But you may be able to sort something out by replicating one of the servers to the other for the individual database.
One data set is under the control of one copy of MySQL and the other dataset is under the control of the other copy of MySQL. The query can only be processed by one of the (MySQL) servers.
If you create a copy of the second database on the first server or vice versa (the one that gets the fewest updates is best) you can set up replication to keep the copy up to date. You will then be able to run the query as you want.
Related
I am trying to find a way, either in R or Python, to use a dataframe as a table in an Oracle SQL statement.
It is impractical, for my objective, to:
Create a string out of a column and use that as a criteria (more than a 1k, which is the limit)
Create a new table in the database and use that (don't have access)
Download the entire contents of the table and merge in pandas (millions of records in the database and would bog down the db and my system)
I have found packages that will allow you to "register" a dataframe and have it act as a "table/view" to allow queries against it, but it will not allow them to be used in a query with a different connection string. Can anyone point me in the right direction? Either to allow two different connections in the same SQL statement (to Oracle and a package like DuckDB) to permit an inner join or direct link to the dataframe and allow that to be used as a table in a join?
SAS does this so effortlessly and I don't want to go back to SAS because the other functionality is not as good as Python / R, but this is a dealbreaker if I can't do database extractions.
Answering my own question here -- after much research.
In short, this cannot be done. A series of criteria, outside of a list or concat, you cannot create a dataframe in python or R and pass it through a query into a SQL Server or Oracle database. It's unfortunate, but if you don't have permissions to write to temporary tables in the Oracle database, you're out of options.
PROC SQL can join a local table to one from a database. For example, joining db.sales and work.customer_info. To my knowledge, there aren't any packages in R or Python that can do this; either the local table must be uploaded to the database and the join done there, or the table (entire or subset) must be queried into local memory as a dataframe and then joined with the flat file.
Is there actually a way to do this in Python or R? Or is SAS superior for querying like this?
In general, no client (here being Python Pandas, R, and SAS) ever actually joins server stored database tables to client objects like data frames or data sets. Clients handle remote connections to backend systems and then imports result sets for additional local uses. Specifically:
Data frames in Python Pandas and R runs in RAM where you can join/merge/append any data frame in current global environment originated from flat files or other structures. Python and R can also connect to relational databases and query needed tables to be imported as data frames in current session's global environment. From there they can join/merge/append to other local data frames.
Data sets in SAS for default libraries runs on hard disk or file system folders where you can join/merge/append any data set in defined libraries or as 9.4 docs mention:
libref is a shortcut name or a “nickname” for the aggregate storage location where your SAS files are stored.
However, when connecting to a database using libname like in the SAS/Access API, the library is simply a visual not actual physical database tables and views, or as 9.4 docs specify (emphasis added):
libref specifies any SAS name that serves as an alias to associate SAS with a database, schema, server, or group of tables and views. Like the global SAS LIBNAME statement, the SAS/ACCESS LIBNAME statement creates shortcuts or nicknames for data storage locations. A SAS libref is an alias for a virtual or physical directory. A SAS/ACCESS libref is an alias for the DBMS database, schema, or server where your tables and views are stored.
The one exception for physical directories would be file-level databases including MS Access and SQLite that are not server-level databases like Postgres, SQL Server, Oracle:
libname mydata "C:\Path\To\Database.accdb";
libname mydata odbc complete = "Driver={SQLite3 ODBC Driver};Database=C:\Path\To\database.db";
Altogether, similar to Python and R, SAS does not run on physical database tables but likely imported results to actual backends. So in your example, each call of database table in proc sql, SAS runs a similar remote connection as Python and R would to retrieve results and then subsequent client-side processing join to local dataset. All processing may be handled in memory with results saved to hard disk (i.e., temp Work folder). What the user sees as one step may be multiple steps by SAS.
I am new at working with databases, and I have been given the task to combine data from two large databases as part of an internship program (heavily focused on the learning experience, but not many people at the job are familiar with databases). The options are either to create a new table or database, or make a front-end that pulls data from both databases. Is it possible to just make a front-end for this? There is an issue of storage if a new database has to be created.
I'm still on the stage where I'm trying to figure out exactly how I'm going to go about doing this, and how to access the data in the first place. I have the table data for the two databases that are already in existence and I know what items need to be pulled from both. The end goal is a website where the user can input one of the values and output the all the information about that item. One of the databases is an Oracle database in SQL and the other is a Cisco Prime database. I am planning to work in Python if possible. Any guidance on this would be very helpful!
Yes, it is perfectly OK to access both data sources from a single frontend.
Having said that, it might be a problem if you need to combine data from both datasources in large quantities. Because you might have to reimplement some relational database functions such as join, sort, group by.
Python is perfectly OK to connect to Oracle DataSources. Not so sure about Cisco Prime (which is an unusual database).
I would recommend using Linux or Mac (not Windows) if you are new to Python, since both platforms are more python friendly than Windows.
I'm evaluating using redis to store some session values. When constructing the redis client (we will be using this python one) I get to pass in the db to use. Is it appropriate to use the DB as a sort of prefix for my keys? E.g. store all session keys in db 0 and some messages in db 1 and so on? Or should I keep all my applications keys in the same db?
Quoting my answer from this question:
It depends on your use case, but my rule of thumb is: If you have a
very large quantity of related data keys that are unrelated to all the
rest of your data in Redis, put them in a new database. Reasons being:
You may need to (non-ideally) use the keys command to get all of that
data at some point, and having the data segregated makes that much
cheaper.
You may want to switch to a second redis server later, and having
related data pre-segregated makes this much easier.
You can keep your databases named somewhere, so it's easier for you,
or a new employee to figure out where to look for particular data.
Conversely, if your data is related to other data, they should always
live in the same database, so you can easily write pipelines and lua
scripts that can access both.
Looks like there are several options:
mysqldump + rsync - can this be done for only specific data from an existing table, and not a whole table?
An insert to a federated table - this seems pretty untested and unknown at this point...
A Python script (pull into memory from A, then insert into B) - this would probably be pretty slow...
What kind of data warrants what method?
You also have another option -- mysql replication!
Can you not extract that single table into its own database and replicate just that to the second server (or as many servers as you like)?