I'll be setting up an webapp with Flask in an old Raspberrypi B+ running raspbian. The pi will also handle the desktop fuzz, so I'll try to keep it as light as possible.
The point of this question is mainly 1- what DB should I use? But I'm also wondering if 2- keeping it in a external usbstick would help? Let's take it step by step.
What DB: Consideration points
I rather do the programming using SQLAlchemy, so restrictions apply
The schema is not complex (around 10 tables)
Only one local user at first, probably forever, so a few querys and connections
Low overhead, the pi will most likely struggle, I'm just trying to minimize it.
The second point is about sd cards burnout. I read somewhere that any db should hit sd cards pretty hard and it got me thinking.
I'll set up some kind of external backup to this db anyway, but should I also keep the path to it in an stick? This should be really simple if I choose to use SQLite.
TYA
SQLite sounds like a perfect fit for this sort of use-case with embedded systems where you need a light-weight, yet full featured database. Many folks use SQLite databases on mobile devices as well for this reason: fairly limited cpu / memory resources, simple storage as a single file.
Related
I work as a DS in a ver small company, so all of the DS team is very "young" in this field.
We are currently experiencies issues with cooperation, especific at the writing code moment.
We've tried with VScode live share which is a great extension but, due to our pc's limitations goes hard to work when we are working with big df.
I was looking over deepnote, which sounds really great, but is has no support with MSsql server.
so, any alternative? Also we're thinking in cloud migration, like azure or AWS, but I was unable to find a proper way to do it or if there we can work in real time co-editting
so, any help or advice?
I have a question on the general strategy of how to integrate data into an MSSQL database.
Currently, I use python for my whole ETL process. I use it to clean, transform, and integrate the data in an MSSQL database. My data is small so I think this process works fine for now.
However, I think it a little awkward for my code to constantly read data and write data to the database. I think this strategy will be an issue once I'm dealing with large amount of data and the constant read/write seems very inefficient. However, I don't know enough to know if this is a real problem or not.
I want to know if this is a feasible approach or should I switch entirely to SSIS to handle it. SSIS to me is clunky and I'd prefer not to re-write my entire code. Any input on the general ETL architecture would be very helpful.
Is this practice alright? Maybe?
There are too many factors to give a definitive answer. Conceptually, what you're doing - Extract data from source, Transform it, Load it to destination, ETL, is all that SSIS does. It likely can do things more efficiently than python - at least I've had a devil of a time getting a bulk load to work with memory mapped data. Dump to disk and bulk insert that via python - no problem. But, if the existing process works, then let it go until it doesn't work.
If your team knows Python, introducing SSIS just to do ETL is likely going to be a bigger maintenance cost than scaling up your existing approach. On the other hand, if it's standard-ish Python + libraries and you're on SQL Server 2017+, you might be able to execute your scripts from within the database itself via sp_execute_external_script
If the ETL process runs on the same box as the database, then ensure you have sufficient resources to support both processes at their maximum observed levels of activity. If the ETL runs elsewhere, then you'll want to ensure you have fast, full duplex connectivity between the database server and the processing box.
Stand up a load testing environment that parallels production's resources. Dummy up a 10x increase in source data and observe how the ETL fares. 100x, 1000x. At some point, you'll identify what development sins you committed that do not scale and then you're poised to ask a really good, detailed question describing the current architecture, the specific code that does not perform well under load and how one can reproduce this load.
The above design considerations will hold true for Python, SSIS or any other ETL solution - prepackaged or bespoke.
My basic problem is that i am trying to have two python programs run simultaneously and have access to the same database table. I feel like this should have a simple solution but it has passed my by so far.
All my attempts at this have caused the database(sqlite) to be locked and the program falling over.
i have tried being clever with the timing with how they programs run so that as one program opens the connection the other closes it, copying data from one database to another etc.. but this just gets horrible and messy very quickly and also a big goal in my design is that I would like to keep latency to an absolute minimum.
The basic structure is pictured below.
I should add too that program one - 'always running and adding to database' is in the milliseconds timeframe.
Program two can be in the multiple seconds range. Obviously none of my solutions have been able to come close to that.
Any help, steps in the right direction or links to further reading is greatly appreciated!
Cheers
Although your title mentions MySQL, in your question you are only using sqlite. Now, sqlite is a perfectly capable database if you only have a single process accessing it, but it is not good for multiple simultaneous access. This is exactly where you need a proper database - like MySQL.
So I have tried to find a answer but must not be searching correctly or what I'm trying to do is the wrong way to go about it.
So I have a simple python script that creates a chess board and pieces in a command line environment. You can in put commands to move the pieces. So one of my co workers thought it would be cool to play each other over the network. I agreed and tried by creating a text file to read and write to on the network share. Then we would both run the script that reads that file. The problem I ran into is I pretty much DOS attacked that file share since it kept trying to check that file on network share for a update.
I am still new to python and haven't ever wrote code that travels the internet, our even a simple local network. So my question is how should I go about properly allowing 2 people to access this data at the same time with out stealing all the network resources?
Oh also im using version 2.6 because thats what everyone else uses and they refuse to change to new syntax
You need to use the proper networking way. It's not quite hard for simple networked program like yours.
Use the one from the Python's stdlib http://docs.python.org/library/socket.html (also take a look at the examples at the bottom of the page).
First off, without knowing how many times you are checking the fle with the moves, it is difficult to know why the file-share is getting DoS-ed. Most networks and network shares these days can handle that level of traffic - they are all gigabit Ethernet, so unless you are transferring large chunks of data each time, you should be ok. If you are transferring the whole file each time, then I'd suggest that you look at optimizing that.
That said, coming to your second question on how this is handled at a network level, to be honest, you are already doing it in a certain way - you are accessing a file on a network share and modifying it. The only optimization required is to be able to do it efficiently. Even over the network operations in a concurrent world do the same. In that case, it will be using fast in-memory database storing various changes / using a high-scale RDBMS / in the case of fast-serving web-servers better async I/O.
In the current case, since there are two users playing the game, I suggest that you work on a way to transmit only the difference in the moves each time over the network. So, instead of modifying the file over the network share, you can send the moves over to a server component and it synchronizing the changes locally to the file. Of course, this means you will need to create a server component that would do something like this
user1's moves <--> server <--> user2's moves . Server will modify the moves file.
Once you start doing this, you get into the realm of server programming / preventing race conditions etc. It will be a good learning experience.
We have begun upgrading hardware and software to a 64-bit architecture using Apache with mod_jk and four Tomcat servers (the new hardware). We need to be able to test this equipment with a large number of simultaneous connections while still actually doing things in the app (logging in, etc.)
I currently am using Python with the Mechanize library to do this, but it's just not cutting it. Threading is not "real" in Python, and multiprocessing makes the local box work harder than the machines we are trying to test since it has to load so much into memory for Mechanize.
The bottom line is that I need something that will really hammer this thing's connections and hold a session to make sure that the sticky sessions are working in mod_jk. I need to be able to code it quickly, it needs to be lightweight, and being able to do true multithreading would be a perk. Other than that, I am open-minded.
Any input will be greatly appreciated. Thanks.
Open Source Testing Tools
Not knowing the full requirements makes it difficult, however something from the list might fit the bill.
In order to accomplish what I wanted to do, I just went back to basics. Mechanize is somewhat bulky, and there was a lot of bloat involved in the main functionality tests I had before. So I started with a clean slate and just used cookielib.CookieJar and urllib2 to build a linear test and then run them in a while 1 loop. This provided enough strain on the Apache system to see how it would react in the new environment, and for the record, it did VERY well.