efficient database file trees [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
So I was making a simple chat app with python. I want to store user specific data in a database, but I'm unfamiliar with efficiency. I want to store usernames, public rsa keys, missed messages, missed group messages, urls to profile pics etc.
There's a couple of things in there that would have to be grabbed pretty often, like missed messages and profile pics and a couple of hashes. So here's the question: what database style would be fastest while staying memory efficient? I want it to be able to handle around 10k users (like that's ever gonna happen).
heres some I thought of:
everything in one file (might be bad on memory, and takes time to load in, important, as I would need to load it in after every change.)
seperate files per user (Slower, but memory efficient)
seperate files
per data value
directory for each user, seperate files for each value.
thanks,and try to keep it objective so this isnt' instantly closed!

The only answer possible at this point is 'try it and see'.
I would start with MySQL (mostly because it's the 'lowest common denominator', freely available everywhere); it should do everything you need up to several thousand users, and if you get that far you should have a far better idea of what you need and where the bottlenecks are.

Related

Is it better to have one large file or many smaller files for data storage? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
Improve this question
I have an C++ game which sends a Python-SocketIO request to a server, which loads the requested JSON data into memory for reference, and then sends portions of it to the client as necessary. Most of the previous answers here detail that the server has to repeatedly search the database, when in this case, all of the data is stored in memory after the first time, and is released after the client disconnects.
I don't want to have a large influx of memory usage whenever a new client joins, however most of what I have seen points away from using small files (50-100kB absolute maximum), and instead use large files, which would cause the large memory usage I'm trying to avoid.
My question is this: would it still be beneficial to use one large file, or should I use the smaller files; both from an organization standpoint and from a performance one?
Is it better to have one large file or many smaller files for data storage?
Both can potentially be better. Each have their advantage and disadvantage. Which is better depends on the details of the use case. It's quite possible that best way may be something in between such as a few medium sized files.
Regarding performance, the most accurate way to verify what is best is to try out each of them and measure.
You should separate it into multiple files for less memory if you're only accessing small parts of it. For example, if you're only accessing let's say a player, then your folder structure would look like this:
players
- 0.json
- 1.json
other
- 0.json
Then you could write a function that just gets the player with a certain id (0, 1, etc.).
If you're planning on accessing all of the players, other objects, and more at once, then have the same folder structure and just concatenate the parts you need into one object in memory.

Web scraping a forum [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
My concern revolves around how to store the data I'm trying to retrieve data from certain threads of a forum. I want to be able to plot as much information as I want, so I don't want to store everything in a rigid structure; I want to be able to use as much info as I can (timezones more active, timezones more active per user, keywords throughout the years, points throughout posters, etc).
How should I store this? A tree with upper nodes being pages and lower as posts? How do I store that tree in a way it is easy* to read?
* easy as in encapsulated in a format I could export easily to other stuff.
I suggest to scrape only the posts (why would you ever need the pages?) into JSON, which you can keep in PostgreSQL in a jsonb field—it allows querying your JSON flexibly.
Later you’d write a script, or multiple, that would iterate over posts and do useful stuff like cleaning up the data, normalizing values, aggregating stats, etc.
See also
Someone wrote a post about PostgreSQL and querying JSON

Should I create a new file for save data with python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
It may seem like a stupid question, but I really can't find information about this in google.
I am trying to develop a server-client application in python language, I am searching for a correct way to save data on a computer.
I have a client, that when he click the "Register" button I want that his computer will save the information and he can auto-login when he secondly entered the program.
Should I make a new file, save it with the data in the computer and then, load it again and read the data? I really don't know is this is the correct way.
There are different approaches to this problem. You could save the credentials/token/.. to the local disk, but keep in mind that in some cases this might be consindered a security risk. If you do so you should probably store it under user's home folder to keep it from other (non-admin/root users) at least.
You could also store it and encrypt it with e "Master password" (like Firefox does if you enable it).
Or you could connect to a 3rd party authentication server and store your information there. It all depends on the use case you are implementing as well as the complexity required.

Python R/W to text file network [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
What could happen if multiple users run the same copies of python script which designed to R/W data to a single text file store in network device at the same time?
Will the processes stop working?
If so, what could be the solution?
It can happen many bad things, I don't think the processes stop working, not at least because of concurrent access to file a file, but what could happen is and inconsistent file creation: for example, if one processes write hello, and there is a concurrent access to the file, you might get a line like hhelllolo
A solution I can see is, use a database as suggested, or, create a mechanism for locking the file to concurrent accesses (which might be cumbersome because you're working on network, not the same computer)
Another solution I can think of is create a server side simple script who handle the requests and lock the file for concurrent access. This is almost the same solution as using a database, you'll be creating an storage system from scratch so why bother :)
Hope this helps!

Search engine from scratch [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a MySQL database with around 10,000 articles in it, but that number will probably go up with time. I want to be able to search through these articles and pull out the most relevent results based on some keywords. I know there are a number of projects that I can plug into that can essentially do this for me. However, the application for this is very simple, and it would be nice to have direct control and working knowledge of how the whole thing operates. Therefore, I would like to look into building a very simple search engine from scratch in Python.
I'm not even sure where to start, really. I could just dump everything from the MySQL DB into a list and try to sort that list based on relevance, however that seems like it would be slow, and get slower as the amount of database items increase. I could use some basic MySQL search to get the top 100 most relevant results from what MySQL thinks, then sort those 100. But that is a two step process which may be less efficient, and I might risk missing an article if it is just out of range.
What are the best approaches I can take to this?
The best bet for you to do "Search Engine" for the 10,000 Articles is to read "Programming Collective Intelligence" by Toby Segaran. Wonderful read and to save your time go to Chapter 4 of August 2007 issue.
If you don't mind replacing the MySQL database with something else then I suggest elasticsearch, using pyes.
It has the functionality you would expect of a search engine, including full text search, great performance, pagination, more-like-this, plugin-able scoring algorithm, and is real time - so when more data is added it will be instantly shown in the search results.
If you dont want to remove the current database then you can easily run them side by side, and treat the MySQL as the master.

Categories

Resources