Rewrite a procedural script in clean OOP Python style

Rewrite a procedural script in clean OOP Python style - python

I've successfully written a Powershell script which:
query AD to obtain the list of computer
query every computer through WMI to obtain software/hardware information
insert the collected data into a MySQL database.
The script works fine but I don't like how it's implemented. It's procedural and there is a lot of code duplication, causing a mess everytime I need to change something.
Now, what I would like to ask you is: what is the cleanest way to implement it in python using OOP? I've thought something similar to this (pseudocode):
Class ADquery
function get_computers( UO ): return a list of computers in the specified UO
Class Computer
constructor( computername )
function query(): connect to the computer and pull the info through WMI
function print(): print the collected info to the console (debug)
property RAM
property CPU
...
Questions:
In order to save the collected data into a Database, do I have to create another object (e.g. Database) and pass the Computer object to him or add a member function to the Computer class (e.g. save_db() ) ?
If I go for the second option, that wouldn't cause a massive number of MySQL connections when I'm dealing with multiple objects?
Thanks a lot and sorry for my bad English

That architecture looks reasonable to me.
You could do either, I'm not sure it really makes a huge difference with a small application like this.
Potentially. Depending on how it's implemented you could get a lot of connections going on. If you're doing a reasonable number of inserts I'd stick them in a list and insert all at once, if that's possible with your code.

You could also grab an Object Oriented Design book from the internet or your local bookstore, e.g. Rumbaugh et al.. I would also recommend reading up on Design Patterns, e.g. the book by Gamma et. al.. I'm currently doing that, and it is really helpful to look at standard patterns of how to solve a particular problem to shape your thought process about object oriented programming.
ps Your English isn't bad at all (note that I'm also not a native speaker ;)).

Related

Automate a manual task using Python

I have a question and hope someone can direct me in the right direction; Basically every week I have to run a query (SSMS) to get a table containing some information (date, clientnumber, clientID, orderid etc) and then I copy all the information and that table and past it in a folder as a CSV file. it takes me about 15 min to do all this but I am just thinking can I automate this, if yes how can I do that and also can I schedule it so it can run by itself every week. I believe we live in a technological era and this should be done without human input; so I hope I can find someone here willing to show me how to do it using Python.
Many thanks for considering my request.

This should be pretty simple to automate:
Use some database adapter which can work with your database, for MSSQL the one delivered by pyodbc will be fine,
Within the script, connect to the database, perform the query, parse an output,
Save parsed output to a .csv file (you can use csv Python module),
Run the script as the periodic task using cron/schtask if you work on Linux/Windows respectively.

Please note that your question is too broad, and shows no research effort.
You will find that Python can do the tasks you desire.
There are many different ways to interact with SQL servers, depending on your implementation. I suggest you learn Python+SQL using the built-in sqlite3 library. You will want to save your query as a string, and pass it into an SQL connection manager of your choice; this depends on your server setup, there are many different SQL packages for Python.
You can use pandas for parsing the data, and saving it to a ~.csv file (literally called to_csv).
Python does have many libraries for scheduling tasks, but I suggest you hold off for a while. Develop your code in a way that it can be run manually, which will still be much faster/easier than without Python. Once you know your code works, you can easily implement a scheduler. The downside is that your program will always need to be running, and you will need to keep checking to see if it is running. Personally, I would keep it restricted to manually running the script; you could compile to an ~.exe and bind to a hotkey if you need the accessibility.

Storing data for applications

I have been studying programming for a few years and I am now working on my first desktop application. I am making a simple program that is able to keep track of information pertaining to a DND (Dungeons and Dragons) character/s. I want to find a way to store information about these characters so they next time the applications is launched, the characters will be saved. How do things like spotify save information about each user? First, I will give some info about the program itself. I have written it in python and it is organized as follows:
I have a file, which serves as the brain of the application (app.py).
A file which defines a class representing a character
A file defining a class that is used to find information about the characters
Other files defining classes used the build the UI
So far in my studies, I have only gathered inputted information from txt files, input functions and APIs via requests. I have worked with JSON before and am thinking this may be an option, but I am not sure how this would work in this case. I also had the idea of storing data in txt files, but want to learn the way it is done in the real world in order to make the best use of my time.
TLDR: I am making a desktop application using python and want an effective and common way people store information they want to access the next time the program is ran. I am looking for a local way to save the data that is also SAFE. If you have a recommendation that is server/cloud based, I would still like to hear how it may be done that as I am sure the knowledge will still be beneficial. I am looking for a way to SAFELY store information that will be saved even after the application is terminated. Any advice or anything you have personally used is appreciated.

Struggling to take the next step in how to store my data

I'm struggling with how to store some telemetry streams. I've played with a number of things, and I find myself feeling like I'm at a writer's block.
Problem Description
Via a UDP connection, I receive telemetry from different sources. Each source is decomposed into a set of devices. And for each device there's at most 5 different value types I want to store. They come in no faster than once per minute, and may be sparse. The values are transmitted with a hybrid edge/level triggered scheme (send data for a value when it is either different enough or enough time has passed). So it's a 2 or 3 level hierarchy, with a dictionary of time series.
The thing I want to do most with the data is a) access the latest values and b) enumerate the timespans (begin/end/value). I don't really care about a lot of "correlations" between data. It's not the case that I want to compute averages, or correlate between them. Generally, I look at the latest value for given type, across all or some hierarchy derived subset. Or I focus one one value stream and am enumerating the spans.
I'm not a database expert at all. In fact I know very little. And my three colleagues aren't either. I do python (and want whatever I do to be python3). So I'd like whatever we do to be as approachable as possible. I'm currently trying to do development using Mint Linux. I don't care much about ACID and all that.
What I've Done So Far
Our first version of this used the Gemstone Smalltalk database. Building a specialized Timeseries object worked like a charm. I've done a lot of Smalltalk, but my colleagues haven't, and the Gemstone system is NOT just a "jump in and be happy right away". And we want to move away from Smalltalk (though I wish the marketplace made it otherwise). So that's out.
Played with RRD (Round Robin Database). A novel approach, but we don't need the compression that bad, and being edge triggered, it doesn't work well for our data capture model.
A friend talked me into using sqlite3. I may try this again. My first attempt didn't work out so well. I may have been trying to be too clever. I was trying to do things the "normalized" way. I found that I got something working at first OK. But getting the "latest" value for given field for a subset of devices, was getting to be some hairy (for me) SQL. And the speed for doing so was kind of disappointing. So it turned out I'd need to learn about indexing too. I found I was getting into a hole I didn't want to. And headed right back where we were with the Smalltalk DB, lot of specialized knowledge, me the only person that could work with it.
I thought I'd go the "roll your own" route. My data is not HUGE. Disk is cheap. And I know real well how to read/write files. And aren't filesystems hierarchical databases anyway? I'm sure that "people in the know" are rolling their eyes at this primitive approach, but this method was the most approachable. With a little bit of python code, I used directories for my structuring, and then a 2 file scheme for each value (one for the latest value, and an append log for the rest of the values). This has worked OK. But I'd rather not be liable for the wrinkles I haven't quite worked out yet. There's as much code involved in how the data is serialized to/from (just using simple strings right now). One nice thing about this approach, is that while I can write python scripts to analyze the data, some things can be done just fine with classic command line tools. E.g (simple query to show all latest rssi values).
ls Telemetry/*/*/rssi | xargs cat
I spent this morning looking at alternatives. Growsed the NOSQL sites. Read up on PyTables. Scanned ZODB tutorial. PyTables looks very suited for what I'm after. Hierarchy of named tables modeling timeseries. But I don't think PyTables works with python3 yet (at least, there is no debian/ubuntu package for python3 yet). Ditto for ZODB. And I'm afraid I don't know enough about what the many different NOSQL databases do to even take a stab at one.
Plea for Ideas
I find myself more bewildered and confused than at the start of this. I was probably too naive that I'd find something that could be a little more "fire and forget" and be past it at this point. Any advice and direction you have, would be hugely appreciated. If someone can give me a recipe that I can meet my needs without huge amounts of overhead/education/ingress, I'd mark that as the answer for sure.

Ok, I'm going to take a stab at this.
We use Elastic Search for a lot of our unstructured data: http://www.elasticsearch.org/. I'm no expert on this subject, but in my day-to-day, I rely on the indices a lot. Basically, you post JSON objects to the index, which lives on some server. You can query the index via the URL, or by posting a JSON object to the appropriate place. I use pyelasticsearch to connect to the indices---that package is well-documented, and the main class that you use is thread-safe.
The query language is pretty robust itself, but you could just as easily add a field to the records in the index that is "latest time" before you post the records.
Anyway, I don't feel that this deserves a check mark (even if you go that route), but it was too long for a comment.

What you describe fits the database model (ex, sqlite3).
Keep one table.
id, device_id, valuetype1, valuetype2, valuetype3, ... ,valuetypen, timestamp
I assume all devices are of the same type (IE, have the same set of values that you care about). If they do not, consider simply setting the value=null when it doesn't apply to a specific device type.
Each time you get an update, duplicate the last row and update the newest value:
INSERT INTO DeviceValueTable (device_id, valuetype1, valuetype2,..., timestamp)
SELECT device_id, valuetype1, #new_value, ...., NOW()
FROM DeviceValueTable
WHERE device_id = #device_id
ORDER BY timestamp DESC
LIMIT 1;
To get the latest values for a specific device:
SELECT *
FROM DeviceValueTable
WHERE device_id = #device_id
ORDER BY timestamp DESC
LIMIT 1;
To get the latest values for all devices:
select
DeviceValueTable.*
from
DeviceValueTable a
inner join
(select id, max(timestamp) as newest
from DeviceValueTable group by device_id) as b on
a.id = b.id
You might be worried about the cost (size of storing) the duplicate values. Rely on the database to handle compression.
Also, keep in mind simplicity over optimization. Make it work, then if it's too slow, find and fix the slowness.
Note, these queries were not tested on sqlite3 and may contain typos.

It sounds to me like you want an on-disk, implicitly sorted datastructure like a btree or similar.
Maybe check out:
http://liw.fi/larch/
http://www.egenix.com/products/python/mxBase/mxBeeBase/

Your issue isn't technical, its poor problem specification.
If you are doing anything with sensor data then the old laboratory maxim applies "If you don't write it down, it didn't happen". In the lab, that means a notebook and pen, on a computer it means ACID.
You also seem to be prematurely optimizing the solution, which is well known to be the root of all evil. You don't say what size the data are, but you do say they "come no faster than once per minute, and may be sparse". Assuming they are an even 1.0KB in size, that's 1.5MB/day or 5.3GB/year. My StupidPhone has more storage than you would need in a year, and my laptop has sneezes that are larger.
The biggest problem is that you claim to "know very little" about databases and that is the crux of the matter. Your data is standard old 1950s data-processing boring. You're jumping into buzzword storage technologies when SQLite would do everything you need if only you knew how to ask it. Given that you've got Smalltalk DB down, I'd be quite surprised if it took more than a day's study to learn all the conventional RDBM principles you need and then some.
After that, you'd be able to write a question that can be answered in more than generalities.

Python over a network share

So I have tried to find a answer but must not be searching correctly or what I'm trying to do is the wrong way to go about it.
So I have a simple python script that creates a chess board and pieces in a command line environment. You can in put commands to move the pieces. So one of my co workers thought it would be cool to play each other over the network. I agreed and tried by creating a text file to read and write to on the network share. Then we would both run the script that reads that file. The problem I ran into is I pretty much DOS attacked that file share since it kept trying to check that file on network share for a update.
I am still new to python and haven't ever wrote code that travels the internet, our even a simple local network. So my question is how should I go about properly allowing 2 people to access this data at the same time with out stealing all the network resources?
Oh also im using version 2.6 because thats what everyone else uses and they refuse to change to new syntax

You need to use the proper networking way. It's not quite hard for simple networked program like yours.
Use the one from the Python's stdlib http://docs.python.org/library/socket.html (also take a look at the examples at the bottom of the page).

First off, without knowing how many times you are checking the fle with the moves, it is difficult to know why the file-share is getting DoS-ed. Most networks and network shares these days can handle that level of traffic - they are all gigabit Ethernet, so unless you are transferring large chunks of data each time, you should be ok. If you are transferring the whole file each time, then I'd suggest that you look at optimizing that.
That said, coming to your second question on how this is handled at a network level, to be honest, you are already doing it in a certain way - you are accessing a file on a network share and modifying it. The only optimization required is to be able to do it efficiently. Even over the network operations in a concurrent world do the same. In that case, it will be using fast in-memory database storing various changes / using a high-scale RDBMS / in the case of fast-serving web-servers better async I/O.
In the current case, since there are two users playing the game, I suggest that you work on a way to transmit only the difference in the moves each time over the network. So, instead of modifying the file over the network share, you can send the moves over to a server component and it synchronizing the changes locally to the file. Of course, this means you will need to create a server component that would do something like this
user1's moves <--> server <--> user2's moves . Server will modify the moves file.
Once you start doing this, you get into the realm of server programming / preventing race conditions etc. It will be a good learning experience.

Create a user-group in linux using python

I want to create a user group using python on CentOS system. When I say 'using python' I mean I don't want to do something like os.system and give the unix command to create a new group. I would like to know if there is any python module that deals with this.
Searching on the net did not reveal much about what I want, except for python user groups.. so I had to ask this.
I learned about the grp module by searching here on SO, but couldn't find anything about creating a group.
EDIT: I dont know if I have to start a new question for this, but I would also like to know how to add (existing) users to the newly created group.
Any help appreciated.
Thank you.

I don't know of a python module to do it, but the /etc/group and /etc/gshadow format is pretty standard, so if you wanted you could just open the files, parse their current contents and then add the new group if necessary.
Before you go doing this, consider:
What happens if you try to add a group that already exists on the system
What happens when multiple instances of your program try to add a group at the same time
What happens to your code when an incompatible change is made to the group format a couple releases down the line
NIS, LDAP, Kerberos, ...
If you're not willing to deal with these kinds of problems, just use the subprocess module and run groupadd. It will be way less likely to break your customers machines.
Another thing you could do that would be less fragile than writing your own would be to wrap the code in groupadd.c (in the shadow package) in Python and do it that way. I don't see this buying you much versus just exec'ing it, though, and it would add more complexity and fragility to your build.

I think you should use the commandline programs from your program, a lot of care has gone into making sure that they don't break the groups file if something goes wrong.
However the file format is quite straight forward to write something yourself if you choose to go that way

There are no library calls for creating a group. This is because there's really no such thing as creating a group. A GID is simply a number assigned to a process or a file. All these numbers exist already - there is nothing you need to do to start using a GID. With the appropriate privileges, you can call chown(2) to set the GID of a file to any number, or setgid(2) to set the GID of the current process (there's a little more to it than that, with effective IDs, supplementary IDs, etc).
Giving a name to a GID is done by an entry in /etc/group on basic Unix/Linux/POSIX systems, but that's really just a convention adhered to by the Unix/Linux/POSIX userland tools. Other network-based directories also exist, as mentioned by Jack Lloyd.
The man page group(5) describes the format of the /etc/group file, but it is not recommended that you write to it directly. Your distribution will have policies on how unnamed GIDs are allocated, such as reserving certain spaces for different purposes (fixed system groups, dynamic system groups, user groups, etc). The range of these number spaces differs on different distributions. These policies are usually encoded in the command-line tools that a sysadmin uses to assign unnamed GIDs.
This means the best way to add a group locally is to use the command-line tools.

If you are looking at Python, then try this program. Its fairly simple to use, and the code can easily be customized http://aleph-null.tv/downloads/mpb-adduser-1.tgz

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.