How to use use data anonymization with mysql database

How to use use data anonymization with mysql database - python

I will describe this. “IVM” stands for Intelligent Vacancy Manager which is our system. We are going to use MySQL database and we are going to store extracted data from CVs. It is a web-based application. Let’s say you’re the company.
When a company registers to the system, the company can upload their job vacancies to the system. So what the system does is it matches the CVs which is in a database that uploaded by Jobseekers. What I want to do is, when sending the Cvs to the companies, I want to use data anonymization. As just a part of the name, Hidden NIC, Address and Telephone Number is enough. I have to use python for that. What I feel is the database is just there with normal data (without anonymized). I feel is when sending the CVs to the company, in the middle my python script for anonymization should execute. So it doesn’t affect the database. Because when the company pay us and request the original CV, we can just execute the normal query to send the original Cv without hiding data.
I want my python script to do, when suggesting CVs, just affect the Name, NIC, Address, Telephone number.
Apart from that, I am asked to use the Kerberos authentication protocol to authorize users to the system.
I hope now you have somewhat knowledge of what I am going to do. if you have any idea on this thing please guide me and I would really grateful to you. I will be waiting for a reply soo :)
I have to do this as my university project. So I would like to hear all your suggestions.

Related

How to secure API key/string within code?

Hello Python Programmers!
I have created a fully-functioning user account system using MongoDB to store the usernames and passwords, along with data associated with that account. It works perfectly and I'm extremely happy with it. You can sign in, sign up, reset password, and more. There's only one problem though now.
I worry about users decompiling the source and being able to take the MongoDB login key/string and remotely accessing the database outside of the application. This is dangerous because it could allow for a data leak of usernames and passwords (along with other data) contained within the database. I am unaware of a method to "obfuscate" the string in a secure way to prevent that, or other methods of authenticating the connection to the database.
I could use PyArmor for obfuscation, but even that I don't trust enough. I know PyArmor has been deobfuscated before, and as said by a wise programmer, "Anything that can be read be a computer can be read by humans". With that being said, if someone can deobfuscate it, they can get the string.
On top of this, I also don't know how to authenticate a user login. Once the user succesfully logs in, two variables marked as "LoggedIn" and "LoggedUser" are changed to their respective values. But, as far as I'm aware from a security perspective, these values could be spoofed and then cause easy access to any benefits a logged in user would have... or even spoof being a logged in user and change values from the database on that user.
If anyone knows ways to protect the string and authenticate the user better, please let me know.
Thanks!

How to find out if email adresses exist?

I've got a list of emails that got corrupted by some robots. On my webpage I have a box "sign in to our newsletter" that got abused with fake adresses and now I can't make up the good adresses from the fake adresses.
I would like to write a small script that check the existence of all the adresses one by one preferably without sending an email. The list isn't that long (about 300 email).
Can I do this without breaking anti-spam rules? I know that I should send an email with a link for people to verify their email but I don't really want to do this as the people with real adresses have already opted in my newsletter and they are going to wonder why I ask them to do it again.
I would ideally do this with python as this is my scripting language of choice.
Any solution to this?

I'm not sure how to do it yourself, however, there are services for this. I use Kickbox. I typically use nodejs for the server, but they have a python library Kickbox-python. You can do 100 verifications a day for free, or pay for more. I use it to verify emails when users initially sign up.
EDIT: The kickbox pricing model has changed. Now you get 100 initial verifications free, and pay for any additional verifications after that threshold. Refer to the site for the current pricing plans.

Best way to inform user of an SQL Table Update?

I have a desktop python application whose data backend is a MySQL database, but whose previous database was a network-accessed xml file(s). When it was xml-powered, I had a thread spawned at the launch of the application that would simply check the xml file for changes and whenever the date modified changed (due to any user updating it), the app would refresh itself so multiple users could use and see the changes of the app as they went about their business.
Now that the program has matured and is venturing toward an online presence so it can be used anywhere. Xml is out the window and I'm using MySQL with SQLAlchemy as the database access method. The plot thickens, however, because the information is no longer stored in one xml file but rather it is split into multiple tables in the SQL database. This complicates the idea of some sort of 'last modified' table value or structure. Thus the question, how do you inform the users that the data has changed and the app needs to refresh? Here are some of my thoughts:
Each table needs a last-modified column (this seems like the worst option ever)
A separate table that holds some last modified column?
Some sort of push notification through a server?
It should be mentioned that I have the capability of running perhaps a very small python script on the same server hosting the SQL db that perhaps the app could connect to and (through sockets?) it could pass information to and from all connected clients?
Some extra information:
The information passed back and forth would be pretty low-bandwidth. Mostly text with the potential of some images (rarely over 50k).
Number of clients at present is very small, in the tens. But the project could be picked up by some bigger companies with client numbers possibly getting into the hundreds. Even still the bandwidth shouldn't be a problem for the foreseeable future.
Anyway, somewhat new territory for me, so what would you do? Thanks in advance!

As I understand this is not a client-server application, but rather an application that has a common remote storage.
One idea would be to change to web services (this would solve most of your problems on the long run).
Another idea (if you don't want to switch to web) is to refresh periodically the data in your interface by using a timer.
Another way (and more complicated) would be to have a server that receives all the updates, stores them in the database and then pushes the changes to the other connected clients.
The first 2 ideas you mentioned will have maintenance, scalability and design uglyness issues.
The last 2 are a lot better in my opinion, but I still stick to web services as being the best.

Sending data through the web to a remote program using python

I have a program that I wrote in python that collects data. I want to be able to store the data on the internet somewhere and allow for another user to access it from another computer somewhere else, anywhere in the world that has an internet connection. My original idea was to use an e-mail client, such as g-mail, to store the data by sending pickled strings to the address. This would allow for anyone to access the address and simply read the newest e-mail to get the data. It worked perfectly, but the program requires a new e-mail to be sent every 5-30 seconds. So the method fell through because of the limit g-mail has on e-mails, among other reasons, such as I was unable to completely delete old e-mails.
Now I want to try a different idea, but I do not know very much about network programming with python. I want to setup a webpage with essentially nothing on it. The "master" program, the program actually collecting the data, will send a pickled string to the webpage. Then any of the "remote" programs will be able to read the string. I will also need the master program to delete old strings as it updates the webpage. It would be preferred to be able to store multiple string, so there is no chance of the master updating while the remote is reading.
I do not know if this is a feasible task in python, but any and all ideas are welcome. Also, if you have an ideas on how to do this a different way, I am all ears, well eyes in this case.

I would suggest taking a look at setting up a simple site in google app engine. It's free and you can use python to do the site. Than it would just be a matter of creating a simple restful service that you could send a POST to with your pickled data and store it in a database. Than just create a simple web front end onto the database.

Another option in addition to what Casey already provided:
Set up a remote MySQL database somewhere that has user access levels allowing remote connections. Your Python program could then simply access the database and INSERT the data you're trying to store centrally (e.g. through MySQLDb package or pyodbc package). Your users could then either read the data through a client that supports MySQL or you could write a simple front-end in Python or PHP that displays the data from the database.

Adding this as an answer so that OP will be more likely to see it...
Make sure you consider security! If you just blindly accept pickled data, it can open you up to arbitrary code execution.

I suggest you to use a good middle-ware like: Zero-C ICE, Pyro4, Twisted.
Pyro4 using pickle to serialize data.

Tracing an IP address in Python

For a college project for my course on Introduction to Programming, I decided to make a small software that traces the IP address and puts them nicely on a GUI (PyQt). Not a big deal I know, but still I like the idea.
So I Googled around and found MaxMind's IP and their free offering and the pygeoip, which is an API for the MaxMind GeoIP databases. Pretty cool, eh!
But the downside is that to query their database, I have to download individual databases for country city. This is not good cause I have to make the end user download additional files (in MBs) just to look up an IP address.
So I am wondering, is there another method of doing this? How do I trace IP addresses? Note that I need them down to the city level, if possible. Something like this guy aruljohn.com/track.pl
Thanks!

I would have preferred "pygeoip", because it allows you to develop a complete solution locally. Of course, you will need to keep the database.
If you do not want to keep the database locally, you will have to depend on an external service to query for location of an IP. This will keep your solution small but dependent on this service.
For this check out: ipinfodb.com
http://ipinfodb.com/ip_location_api.php
They provide JSON and XML APIs interface which should be sufficiently easy to build.
Check out more information at : http://ipinfo.info/html/geolocation_2.php

I have even better idea. Why don't you make a very simple web app, which will do the actual look up; and you PyQt client would do HTTP request to that. Or maybe in that case you don't even need a client. Just make a web page to get IP address and show city.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.