I have some SQL Server tables that contain Image data types.
I want to make it somehow usable in PostgreSQL. I'm a python programmer, so I have a lot of learn about this topic. Help?
What you need to understand first is that the interfaces at the db level are likely to be different. Your best option is to write an abstraction layer for the blobs (and maybe publish it open source for the dbs you want to support).
On the PostgreSQL side you need to figure out whether you want to bo with bytea or lob. These are very different and have different features and limitations. If you are enterprising you might build in at least support in the spec for selecting them. In general bytea is better for smaller files while lob has more management overhead but it can both support larger files and supports chunking, seeking etc.
Related
I am very new to programming and this site too... An online course that I follow told me that it is not possible to manage bigger databases with db.sqlite3, what does it mean anyway?
Choice of Relational Database Management Systems (RDBMS) is dependent on your use case. The different options available have different pros and cons and hence, for different applications, some are more suitable than others.
I typically use SQLite (only for development purposes) and then switch to MySQL for my Django projects.
SQLite: Is file based. You can actually see the file in your project directory so all the CRUD (Create, Retrieve, Update, Delete) is done directly onto that file. Also, all the underlying code for the RDBMS is quite small in size. So all this makes it good for applications which don't require intensive use of databases or perhaps require offline storage e.g. IoT, small websites etc. When you try to use it for big projects that require intensive use of databases e.g. online stores, you run into many problems because the RDBMS is not as well developed as MySQL or PostgreSQL. The primary problem is a lack of concurrency i.e. only one device can be writing to the database at a time because operations are serialised.
MySQL: Is one of the most popularly used and my personal favourite (very easy to configure and use with Django). It's based on the client/server database model and not a file like SQLite and is very scalable i.e. it is capable of way more than SQLite and you can use it for many different applications that require heavy use of the RDBMS. It has better security, allows for concurrent operations and outperforms PostgreSQL in performance when you need to do lots of reading operations.
PostgreSQL: Is also a very strong option and capable of most of the stuff that MySQL can do but handles clients in a different way and it has an edge over MySQL in SELECTs and INSERTs. MySQL is still soooo much more widely used than PostgreSQL though.
There are also many other options on the market. You can take a look at this article which compares a bunch of them. But to answer your question, SQLite is very simplistic compared to the other options and stores everything in a file in your project rather than on a server, so as a result, there is little security, lack of concurrency etc. This is fine when developing and for use cases that do not require major use of databases but will not cut it for big projects.
This is not a matter of how big the DB is. SQLite DB can be very big, hundreds of Gigabytes.
It is a matter of how many user are using the application (you mention django) concurrently. As SQLite only support one writer at a time, the other are queued. Fortunately, you can have many concurrent readers.
So if you have a lot of concurrent access (that are not explicitly marked are read-only) then SQLite is not a good choice anymore. You'll prefer something like PostgreSQL.
BTW, everything is better explained in the documentation ;)
Selecting a database for your project is like selecting any other technology. It depends on your use case.
Size isn't the issue, complexity is. SQLite3 databases can grow as big as 281 terabytes. Limits on number of tables, columns & rows are also pretty decent.
If your application logic requires SQL operations like:
RIGHT OUTER JOIN, FULL OUTER JOIN
ALTER TABLE, ADD CONSTRAINT, etc..
DELETE, INSERT, or UPDATE on a VIEW
Custom user permissions to read/write
Then SQLite3 should not be your choice of database as these SQL features are not implemented in SQLite3.
I am new at working with databases, and I have been given the task to combine data from two large databases as part of an internship program (heavily focused on the learning experience, but not many people at the job are familiar with databases). The options are either to create a new table or database, or make a front-end that pulls data from both databases. Is it possible to just make a front-end for this? There is an issue of storage if a new database has to be created.
I'm still on the stage where I'm trying to figure out exactly how I'm going to go about doing this, and how to access the data in the first place. I have the table data for the two databases that are already in existence and I know what items need to be pulled from both. The end goal is a website where the user can input one of the values and output the all the information about that item. One of the databases is an Oracle database in SQL and the other is a Cisco Prime database. I am planning to work in Python if possible. Any guidance on this would be very helpful!
Yes, it is perfectly OK to access both data sources from a single frontend.
Having said that, it might be a problem if you need to combine data from both datasources in large quantities. Because you might have to reimplement some relational database functions such as join, sort, group by.
Python is perfectly OK to connect to Oracle DataSources. Not so sure about Cisco Prime (which is an unusual database).
I would recommend using Linux or Mac (not Windows) if you are new to Python, since both platforms are more python friendly than Windows.
I have collected a large Twitter dataset (>150GB) that is stored in some text files. Currently I retrieve and manipulate the data using custom Python scripts, but I am wondering whether it would make sense to use a database technology to store and query this dataset, especially given its size. If anybody has experience handling twitter datasets of this size, please share your experiences, especially if you have any suggestions as to what database technology to use and how long the import might take. Thank you
I recommend using a database schema for this, especially considering it's size. (this is without knowing anything about what the dataset holds) That being said, I suggest now or for future questions of this nature using the software suggestions website for this plus adding more about what the dataset would look like.
As for suggesting a certain database in specific, I recommend doing some research about what each do but for something that just holds data with no relations any will do and could show great query improvement vs just txt files as query's can be cached and data is faster to retrieve due to how databases store and lookup files weather it just be hashed values or whatever they use.
Some popular databases:
MYSQL, PostgreSQL - Relational Databases (simple and fast and easy to use/setup but need some knowledge of SQL)
MongoDB - NoSQL Database (also easy to use and setup and no SQL needed, it relies more on dicts to access DB through the API. Also memory mapped so can be faster than Relational but need to have enough RAM for the Indexes.)
ZODB - Full Python NoSQL Database (Kind of like MongoDB but written in Python)
These are very light and brief explanations of each DB, be sure to do your research before using them, they each have their pros and cons. Also, remember this is just a couple of many popular and highly used Databases, there's also TinyDB, SQLite (comes with Python), and PickleDB that are full Python but are generally for small applications.
My experience is mainly with PostgreSQL, TinyDB, and MongoDB, my favorite being MongoDB and PGSQL. For you, I'd look at either of those but don't limit yourself there's a slue of them plus many drivers that help you write easier/less code if that's what you want. Remember google is your friend! And welcome to Stack Overflow!
Edit
If your dataset is and will remain fairly simple but just large and you want to keep with using txt files, consider pandas and maybe a JSON or a csv format and library. It can greatly help and increase efficiency when querying/managing data like this from txt files plus less memory usage as it won't always or ever need the entire dataset in memory.
you can try using any NOSql DB. Mongo DB would be a good place to start
I am retrieving structured numerical data (float 2-3 decimal spaces) via http requests from a server. The data comes in as sets of numbers which are then converted into an array/list. I want to then store each set of data locally on my computer so that I can further operate on it.
Since there are very many of these data sets which need to be collected, simply writing each data set that comes in to a .txt file does not seem very efficient. On the other hand I am aware that there are various solutions such as mongodb, python to sql interfaces...ect but i'm unsure of which one I should use and which would be the most appropriate and efficient for this scenario.
Also the database that is created must be able to interface and be queried from different languages such as MATLAB.
If you just want to store it somewhere so MATLAB can work with it; pick your choice from the databases supported by matlab and then install the appropriate drivers for Python for that database.
All databases in Python have a standard API (called the dbapi) so there is a uniform way of dealing with databases.
As you haven't told us how to intend to work with this data later on, it is difficult to provide any further specifics.
the idea is that i wish to essentially download all of the data onto
my machine so that i can operate on it locally later (run analytics
and perform certain mathematical operations on it) instead of having
to call it from the server constantly.
For that purpose you can use any storage mechanism from text files to any of the databases supported by MATLAB - as all databases supported by MATLAB are supported by Python.
You can choose to store the data as "text" and then do the numeric calculations on the application side (ie, MATLAB side). Or you can choose to store the data as numbers/float/decimal (depending on the precision you need) and this will allow you to do some calculations on the database side.
If you just want to store it as text and do calculations on the application side then the easiest option is mongodb as it is schema-less. You would be storing the data as JSON - which may be the format that it is being retrieved from the web.
If you wish to take advantages of some math functions or other capabilities (for example, geospatial calculations) then a better choice is a traditional database that you are familiar with. You'll have to create a schema and define the datatypes for each of your incoming data objects; and then store them appropriately in order to take advantage of the database's query features.
I can recommend using a lightweight ORM like peewee which can use a number of SQL databases as the storage method. Then it becomes a matter of choosing the database you want. The simplest database to use is sqlite, but should you decide that's not fast enough switching to another database like PostgreSQL or MySQL is trivial.
The advantage of the ORM is that you can use Python syntax to interact with the SQL database and don't have to learn any SQL.
Have you considered HDF5? It's very efficient for numerical data, and is supported by both Python and Matlab.
Could any one shed some light on how to migrate my MongoDB to PostgreSQL? What tools do I need, what about handling primary keys and foreign key relationships, etc?
I had MongoDB set up with Django, but would like to convert it back to PostgreSQL.
Whether the migration is easy or hard depends on a very large number of things including how many different versions of data structures you have to accommodate. In general you will find it a lot easier if you approach this in stages:
Ensure that all the Mongo data is consistent in structure with your RDBMS model and that the data structure versions are all the same.
Move your data. Expect that problems will be found and you will have to go back to step 1.
The primary problems you can expect are data validation problems because you are moving from a less structured data platform to a more structured one.
Depending on what you are doing regarding MapReduce you may have some work there as well.
In the mean time, Postgres Foreign Data Wrapper for MongoDB has emerged (versions 9.1-9.4). With it, one can set up a view to MongoDB, via the PostgreSQL, and then handle the data as SQL.
This would probably mean rather easy way to copy data as well.
Limitations of FDW that I have faced:
objects within arrays (in MongoDB) do not seem to be addressable
objects with dynamic key names do not seem to be addressable
I know it's 2015 now. :)