I'm creating a Python 3 spider that scrapes Tor hidden services for useful data. I'm storing this data in a PostgreSQL database using the psycopg2 library. Currently, the spider script and the database are hosted on the same network, so they have no trouble communicating. However, I plan to migrate the database to a remote server on a VPS so that I can have a team of users running the spider script from a number of remote locations, all contributing to the same database. For example, I could be running the script at my house, my friend could run it from his VPS, and my professor could run the script from a few different systems in the lab at the university, and all of these individual systems could synchronize with the PostgreSQL server runnning on my remote VPS.
This would be easy enough if I simply opened the database VPS to accept connections from anywhere, making the database public. However, I do not want to do this, for security reasons. I know I could tunnel the connection through SSH, but that would require giving each person a username and password that would grant them access to the server itself. I don't wish to do this. I'd prefer simply giving them access to the database without granting access to a shell account.
I'd prefer to limit connections to the local system 127.0.0.1 and create a Tor hidden service .onion address for the database, so that my remote spider clients can connect to the database .onion through Tor.
The problem is, I don't know how to connect to a remote database through a proxy using psycopg2. I can connect to remote databases, but I don't see any option for connecting through a proxy.
Does anyone know how this might be done?
This would be easy enough if I simply opened the database VPS to accept connections from anywhere
Here lies your issue. Just simply lock down your VPS using fail2ban and ufw. Create a ufw role to only allow connection to your Postgres port from the IP address you want to give access from to that VPS ip address.
This way, you don't open your Postgres port to anyone (from *) but only to a specific other server or servers that you control. This is how you do it. Don't run an onion service to connect Postgres content because that will only complicate things and slow down the reads to your Postgres database that I am assuming an API will be consuming eventually to get to the "useful data" you will be scraping.
I hope that at least points you in the right direction. Your question was pretty general, so I am keeping my answer along the same vein.
Related
I'm new to Python yet managed to create a lot of good stuff for myself. The problem I faced is how to connect to an SQL database on a remote machine (VPS, VDS, Cloud)
I know that you would likely point me out to other answers on StackOverflow. Unfortunately, there is no one solved question on the website. None of the solutions worked for me.
One more time, I don't want to connect to an SQL database on a local machine. I need to access it remotely.
Can anyone provide me with working instructions?
[https://stackoverflow.com/questions/46913504/connecting-to-mysql-db-via-ssh-with-python][1]
[https://stackoverflow.com/questions/47069829/mysql-and-python-via-ssh][1]
[https://stackoverflow.com/questions/21903411/enable-python-to-connect-to-mysql-via-ssh-tunnelling][1]
[https://practicaldatascience.co.uk/data-science/how-to-connect-to-mysql-via-an-ssh-tunnel-in-python][1]
As you can see, there are numerous upvotes. But none of the approaches helped the topic starter. Otherwise, it would be marked as solved.
If you have a VPS that you can access via SSH then you can also use SSH to forward the port of the MySQL server on the VPS to your local machine.
This is something you would do for the development process of your application
Use this command to forward the remote ssh port to your local machine.
ssh -L 3306:localhost:3306 username#hostname
This way you will be able to access your VPS's MySQL server trough port 3306 on your local machine.
Here is some SSH documentation.
https://www.ssh.com/academy/ssh/tunneling/example
It is also possible to create ssh tunnels from within python but this is not recommended for a use case like yours.
Anyways, if you want to learn about this, you can read about it here
https://github.com/pahaz/sshtunnel/
I can read from my local psql instance like this:
engine = create_engine('postgresql://postgres:postgres#localhost/db_name')
df = pd.read_sql("select * from table_name;", engine)
I have a remote postgresql sever which I successfully accessed with ssh tunneling both in PgAdmin4 and pycharm. I use public key file to login into remote server. Now, my question is how do I access that database with pandas. I tried:
engine = create_engine('postgresql://username:password#localhost/db_name')
Here, username and password are of remote database. I get sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: password authentication failed for user. However, with the same username and password I can access the table in PgAdmin.
From what I read, because of ssh tunneling I have to use localhost and not the remote server address, right? In pgAdmin I can see that the server is running. So, my question is how do I read the table from remote postgresql database with ssh tunneling? In examples I have seen people using different port (different than 5432) but for me the setup only works if I use port 5432. I have disconnected all other servers to avoid the port conflict but I get the same error.
The tunnel created by pgAdmin4 is intended for its own use. It does not arrange for it to listen on 5432, it picks some arbitrary high numbered port and doesn't advertise what port that is. While you can discover what port it is listening on using system tools (like netstat) and then connect to it, you would probably be better served by finding some other way to set up your tunnel. There are python libraries that can help with that.
As for why you can connect to 5432 at all, clearly there is something listening there which is either PostgreSQL or pretending to be PostgreSQL, but it doesn't seem to be the one you intend. You can use netstat -ao to find the pid for it and then look up based on that.
I wish to connect to a database server through my local machine at work, but I do not have direct access to the database server (due to security reasons). The database server is accessible through another intermediary server which I can connect to.
I understand I can connect to the database if I run my script on the intermediary server, but is there any way through which I can connect to the database server directly through my local machine?
I am trying to do this in a Python script as I wish to read the data into a pandas dataframe (I can do this part once I can set up the connection).
If you have SSH access to that intermediary server you can connect via an SSH tunnel. This post describes how to do that: Enable Python to Connect to MySQL via SSH Tunnelling
Background
So after about a year of having a GoDaddy cloud service, and super disappointed with it from the get-go. Once they announced that they would be discontinuing Cloud Server services, it was like a sign from the heavens.
I then created a Google Cloud account. One of the biggest reasons I got a Cloud Server to begin with was to have an eclipse Che instance, an IDE wherever you are! I love it, but despite the temporary partnership between Bitnami and GoDaddy, launching a Eclipse instance with them with such a mind-numbing task since their internal Factory build still required a ton of Docker configurations...
And though I can appreciate the fact that I did learn the ins-and-outs of configuring Dockers Network settings, which is not something to wince at... As soon as I got my Google Cloud account it was simply a 1 2 3 and go!
Question
Whilst I'm running an Eclipse chat instance, what is the proper way to port-forward a given work space to my local machine? The scenario is simple...
I created a Python stack of which I am using Django but when I run server, of course default being the local IP to the project, I have yet to find the easy and more than likely existing standard way to run the Django server and have the eclipse Che create the URL to the project. I'm ninety-nine percent sure that I'm going about this the wrong way given the fact that even some of the demo stack projects with Node or Python are plug-and-play.
PS: I am able to ssh into the workspace no issue, I'm just confused on how to port forward from remote to local as I've only really done it the other way around.. ssh -R ... or -L?
What you need is SSH Tunnel, which is -L. If you need to send a port from local to server that is called a Reverse SSH Tunnel, which is -R.
so simple command
ssh -L <localport>:127.0.0.1:<remoteport> <user>#<server>
Some extension to the other answer mentioning ssh tunneling...
If you run a docker-dev on a server (e.g. 192.168.1.123) not being your local machine in eclipse-che that provides some web service you want to access, then find out the IP address of the docker-dev, e.g. by opening a terminal in your eclipse che workspace and executing ip addr. There you will see some 172.17.x.x that is accessible only from the server. Assume the service in docker-dev is listening on port 12345, then you need the following ssh port forwarding from your local machine to access it:
ssh -L 8888:172.17.0.2:12345 192.168.1.123
While the ssh connection is open, you can access the web service with you browser by accessing http://127.0.0.1:8888/
There have been a few questions like this around the place but none have really answered my question specifically.(for example Connecting to device behind firewall )
What I want is a central server, that receives a heartbeat from multiple ( say 100's) embedded devices behind personal firewalls. These devices need to be able to do two things.
Grab new config from the server. I
suspect I can just do this via a
http get from the device to the
server and pull down some XML, then
reload its own config.
Open an ssh connection to the server
to allow an admin to login to the
command line of the device and do
maintenance and troubleshooting
remotely.ie device => server <= admin and admin can get to bash command line or equivalent.
the device is a low powered embedded device that will be running linux. A solution in python would be preferable (im thinking something with paramiko for the ssh) but im open to other solutions. The main thing is there is there will be no technical users in the private network, so it should be able to plug into a consumer grade ADSL modem, get a DHCP address and all this should work. I can preload the device with anything before hand, for example ssh certificates for passwordless ssh etc.
anybody got any idea's?
Cheers
Mark
You can setup ssh tunnel (from python script or from console):
ssh -NR10022:localhost:22 foo#mainserver.com
Then you can simply login to main server and then ssh bar#localhost -p 10022
You should have ssh keys, so you don't have to put password (google about "ssh without password").
A more elaborate method might be some type of firewall hole punching.
On second though, maybe this is not necessary, since there is only one firewall involved. The trick is to get your embedded device to initiate an outbound connection first.