Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I tend to start projects that are far beyond what I am capable of doing, bad habit or a good way to force myself to learn, I don't know. Anyway, this project uses a postgresql database, python and sqlalchemy. I am slowly learning everything from sql to sqlalchemy and python. I have started to figure out models and the declarative approach, but I am wondering: what is the easiest way to populate the database with data that needs to be there from the beginning, such as an admin user for my project? How is this usually done?
Edit:
Perhaps this question was worder in a bad way. What I wanted to know was the possible ways to insert initial data in my database, I tried using sqlalchemy and checking if every item existed or not, if not, insert it. This seemed tedious and can't be the way to go if there is a lot of initial data. I am a beginner at this and what better way to learn is there than to ask the people who do this regularly how they do it? Perhaps not a good fit for a question on stackoverflow, sorry.
You could use a schema change management tool like liquibase. Normally this is used to keep your database schema in source control, and apply patches to update your schema.
You can also use liquibase to load data from CSV files. So you could add a startup.csv file in liquibase that would be run the first time you run liquibase against your database. You can also have it run any time, and will merge data in the CSV with the database.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I'm building my first web app (hooray!). It's a very basic app that will take form input and spit it out in a script file for copying and pasting. I'm new to this all - do I need a database for something this simple? Building it in Flask, but I know enough Javascript to use that as well.
I played around with LocalStorage for a similar project, but have read that it can be a security risk. I've done some tutorials that have used SQLAlchemy and SQLite, but even those seem unnecessary for something this small.
Really appreciate any help on this - excited to get this sorted out!
Unless you are trying to store anything in the DB you don't any database connection. You are just taking input from a form and saving the same info in a script file or a text file. Database does not come into picture at all.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am working on a project where there is a necessity to store considerable data. I was wondering what is the difference between using SQL and the datascience library in python. I intend to use SQL from its python based libraries too or use a csv file to store info if I am going to use "datascience". I am leaning very much towards "datascience" as I find the following advantages:
It is subjectively very easy to use for me. I make much less mistakes.
With my limited knowledge in runtime, I think the datascience library will be more efficient.
Most importantly, it has many inbuilt functions that could allow me to make easier functions.
However, since so many people are using SQL, I was wondering if I am missing something major, particularly in scalability.
Some people online said that SQL allows us to store files on a database, but I do not see how that makes a difference. I can simply store the file in a folder on a system and save the link in the "datascience" table.
The "datascience library" is only intended to be a tool for teaching basic concepts in an academic entry level class. Unless you are taking such a class, you should ignore it and learn more standard tools.
If it helps you, you can learn Data Science using Pandas starting just from flat data files, such as CSV and JSON. You will absolutely need to learn to interface with SQL and NoSQL servers eventually. The advantages of a database over flat files are numerous and well described elsewhere.
It's up to you whether you want to learn Pandas first and SQL second, or SQL first. Many people in the real world would have learned SQL before Python/Pandas/Data Science, so you may want to go that route.
If you go ahead and study that datascience library, you will learn some concepts, but will then have to re-learn everything in there "for real." Maybe this is best for your learning style, maybe it isn't. We don't know you well enough. Do you want academic hand holding or do you want to do things the real way?
Good luck and enjoy your journey.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm currently building a scrapy project that can crawl any website from the first depth to the last. I don't extract many data, but I store the all page HTML (response.body) in a database.
I am currently using Elasticsearch with the bulk API, to store my raw HTML.
I had a look in Cassandra but I did not find an equivalent of the Elasticsearch bulk and it affects the performances of my spider.
I am interested in performance, and was wondering if Elasticsearch was a good choice, and if maybe there was a more appropriate NoSQL database.
That very much depends on what you are planning to do with the scraped data later on.
Elasticsearch does some complex indexing operations upon insertion which will make subsequent searches in the database quite fast ... but this also costs processing time and introduces a latency.
So to answer your question whether Elasticsearch was a good choice:
If you plan to build some kind of search engine later on, Elasticsearch was a good choice (as the name indicates). But you should have a close look at the configuration of Elasticsearch's indexing options etc to make sure it works the way you need it.
If on the other hand you just want to store the data and do processing tasks on it later, Elasticsearch was a poor choice and you would be better off with Cassandra or another NoSQL database.
Which NoSQL database suits your needs best, depends - again - on the actual usage scenario.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
So I have made a project website with a couple html files, css and some javascript. I was thinking that I would add a login functionality and some other things like that. So I set off to find out what would be good for it. Now its is hard for me because I can't decide which one to use. I saw Django on python but I had to make a whole new project for it and I have to start from scratch. I was just thinking I would add a database to this site using something. I am just doing some lightweight things. What should I use? Thank you.
Well if you are not going to use python for this with Django, you could use PHP and MySQL. I do it all the time for small sites. Basically just go to your hosting panel and create a new DB and then use php to create a connection script and then go from there as far as adding and using the database. All web hosts support this feature that I know of. If you go with Django though, you can simply add the sqllite3 DB or use something like MongoDB or any other DB out there really. Please look up the exact documentation for whichever route you choose.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have made a Django app to display some stuff, the content of this app comes from my web scraping script. So the data is keep growing.
So there are methods to update data in database:
I know I can run crontab job to call a python script to read a file then use python manage.py shell to update data, but I am afraid that I cannot get log and handle exception easily.
Using Celery to run data in background to write data to DB.
Using database tools.(Not in my consideration)
Is there is better method to add incremental data when data is very big? any of them is welcomed.
Thanks a lot.
Whether to take celery or use crontab in this particular case is quite an opinionated choice. Both solutions should work fine.
Celery is far more powerful in general and (once configured) will allow you to do more stuff like using async tasks, etc. But to get it you will have to configure and then manage it. It's not hard but anyway you will have to spend some to to get it work.
I often use cron for simple projects because it's much more simple and doesn't require any additional efforts to use it. Seems like in your case it would sufficient too.
At last, asking your question :) if you would prefer cron you should use custom management commands: https://docs.djangoproject.com/en/1.8/howto/custom-management-commands/. This would allow you not to write .sh scripts. Crontab would look like this in that case:
0 0 * * * /path/to/project/python /path/to/project/manage.py your_custom_command_name