Accessing Luigi visualizer on AWS

Accessing Luigi visualizer on AWS - python

I’ve been using the Luigi visualizer for pipelining my python code.
Now I’ve started using an aws instance, and want to access the visualizer from my own machine.
Any ideas on how I could do that?

We had the very same problem today on GCP, and solved with the following steps:
setting firewall rules for incoming TCP connections on port used by the service (which by default is 8082);
installing apache2 server on the instance with a site.conf configuration that resolve incoming requests on ip-of-instance:8082.
That's it. Hope this can help.

Good question and I'm amazed I can't find a duplicate on StackOverflow. You broadly need to do two things:
Ensure the luigi webserver is hosting content correct. You can probably do this via site.conf, or you can probably do it via luigi's default-scheduler-host property. This corresponds to #PierluigiPuce second point.
Correctly expose and secure your EC2 instance. This is a VPC exercise (see docs) and is an entire area to learn, but in short you need to configure your VPC so that the valid requests are routed to the instance on the correct port, and invalid requests are blocked. This corresponds to #PierluigiPuce first point.
Your primary consideration is whether it is okay for this to be public facing. Probably not. Then you can secure the instance via IP address ranges, via a VPN, or even via SSH port forwarding through a jump host.
Having it completely open is the easiest and worst solution. Putting the instance in a public subnet, and restricting access based on IP address, is probably the second easiest solution, and might be a reasonable compromise for you.

Related

Are there any flaws regarding python-flask's request.remote_addr? Or is it reliable (without reverse proxy)

Gooday to All, Im writing down a very sensitive web application that functions like a file browser, Instead of using sftp/ftp or ssh. It uses purely http/https. Im using request.remote_addr to determine the client's IP address. and reject if the ip isn't on the list.
good_ips = ['127.0.0.1','192.168.1.10','192.168.1.1']
if request.remote_addr in good_ips:
pass
else:
sys.exit()
It works fine, but I just would like to ask how reliable and safe this is :).
This would be the result if the ip is not on the list. other wise the site would run fine :D.
Thank you and good day!

No, that's not sufficient to build a "sensitive" service as you are using it.
See https://en.wikipedia.org/wiki/IP_address_spoofing for just a start on what could go wrong.
You should use authentication, such as with a public key (which SSH supports), or a password. Kerberos is also a possibility.

Whether source IP filtering is an adequate measure for you depends on your exact scenario, your threat model. #JohnZwinck is right in that it usually is not enough by itself, but for some applications, it can be.
While it is easy to fake source IP in individual IP packets, http goes over tcp, and a modern implementation of tcp is protected against address spoofing. Older implementations of TCP (in older operating systems) are vulnerable though. So if your server has a recent OS, it is not at all straightforward to spoof a source IP. It's probably easier to compromise a client with an address in the allowed list.
Another thing that can go wrong is related to network address translation (nat). Suppose you restrict access in your application to internal IP addresses (192.168.0.0/24). All works well, but your security department decides there need to be reverse proxies for all web applications. A proxy is deployed and all works fine. However, now your application receives all requests from the proxy wuth an internal address, so the restriction in the application doesn't make much sense anymore. Something simlar can also happen with clients, in some scenarios, clients may be behind nat, meaning they will all have the same apparent client IP address - which can be good or bad.
The best practice is of course to have proper authentication (via passwords, client certificates, or multi-factor, etc., however secure you want it to be), with IP restriction being an additional layer to provide more security.

Client/Server role reversal with SimpleXMLRPCServer in Python

I'm working on a project to expose a set of methods from various client machines to a server for the purpose of information gathering and automation. I'm using Python at the moment, and SimpleXMLRPCServer seems to work great on a local network, where I know the addresses of the client machines, and there's no NAT or firewall.
The problem is that the client/server model is backwards for what I want to do. Rather than have an RPC server running on the client machine, exposing a service to the software client, I'd like to have a server listening for connections from clients, which connect and expose the service to the server.
I'd thought about tunneling, remote port forwarding with SSH, or a VPN, but those options don't scale well, and introduce more overhead and complexity than I'd like.
I'm thinking I could write a server and client to reverse the model, but I don't want to reinvent the wheel if it already exists. It seems to me that this would be a common enough problem that there would be a solution for it already.
I'm also just cutting my teeth on Python and networked services, so it's possible I'm asking the wrong question entirely.

What you want is probably WAMP routed RPC.
It seems to address your issue and it's very convenient once you get used to it.
The idea is to put the WAMP router (let's say) in the cloud, and both RPC caller and RPC callee are clients with outbound connections to the router.
I was also using VPN for connecting IoT devices together through the internet, but switching to this router model really simplified things up and it scales pretty well.
By the way WAMP is implemented in different languages, including Python.

Maybe Pyro can be of use? It allows for many forms of distributed computing in Python. You are not very clear in your requirements so it is hard to say if this might work for you, but I advise you to have a look at the documentation or the many examples of Pyro and see if there's something that matches what you want to do.
Pyro abstracts most of the networking intricacy away, you simply invoke a method on a (remote) python object.

Python port forwarding

I'm developing a client-server game in python and I want to know more about port forwarding.
What I'm doing for instance is going to my router (192.168.0.1) and configure it to allow request for my real IP-adress to be redirected to my local adress 192.168.0.X. It works really well. But I'm wondering if I can do it by coding something automatically ?
I think skype works like a kind of p2p and I can see in my router that skype is automatically port forwarded to my pc adress. Can I do it in Python too?

There are different solutions here, but most are not trivial, and you'll have to do some reading, and you'll need some kind of fallback.
UPnP/IGD is the simplest. If your router supports it, and is configured to allow it, and you know how to write either low-level networking code or old-fashioned SOAP web service code, you can ask the router to assign you a port. If it responds with success, start using that port and you're basically done.
If you can run a (very low-bandwidth) server with a public address for all of your users, Hole punching may solve the problem.
Think about how a behind-the-NAT client talks to a public server. You make a request to some IP and port, but the server is seeing your router's IP, not yours (which is a good thing, because yours isn't accessible). When it replies, your router has to know to forward it to you—which it does just by remembering that you're the behind-the-NAT client that just sent a request to that server.
What if, instead of talking to a public server, you talk to some other peer behind his own separate NAT? Well, Your router doesn't know the difference; as long as you get a response from the same place, it'll get through. But how do you get a response, when your message isn't going to get through his NAT? He does the same thing, of course. One of the messages may get lost, but the other one will get through, and then you're both set and can communicate. You will need to send keep-alives regularly so the router doesn't forget you were communicating, but other than that, there's really nothing tricky.
The only problem is that you need to know the public IP address for the other peer, and the port that he's expecting you to come from, and he needs to know the same about you. That's why you need a server—to act as an introducer between peers.
Hole punching will work with UDP from most home networks. It won't work with TCP from many home networks, or with either UDP or TCP from many corporate networks. (Also, in corporate networks, you may have multiple layers of NATs, which means you need introducers at every interface rather than just one on the internet, or symmetric NATs, which can't be punched.)
You can use STUN (or similar) services like ICE or TURN. This only works if there is an ICE, TURN, etc. service to use—which is generally not the case for two peers on different home NATs, unless you deploy your own server and build an introducer to help out, and if you're going to do that, you can just use hole punching. But in a corporate environment, this can be the best way to provide connectivity for P2P apps.
Finally, you can make the user configure port-forwarding manually and enter the forwarded port number into your problem. This is not ideal, but you should always provide it as a fallback (except maybe for apps meant only for corporate deployment), because nothing else is going to work for all of your users.
I believe Skype uses all of these. If you enable UPnP, it tries to IGD your router. Or you can configure it to use a TURN server. Or you can just enter a specific port that you've forwarded manually. If you do none of the above, it tries to use UDP hole punching, with an introducer that Skype runs.

So your application needs to do TCP/UDP networking if I understand correctly. That means that at least one of the connecting clients needs a properly open port, and if both of them is behind NAT (a router) and have no configured open ports, your clients cannot connect.
There are several possible solutions for this, but not all are reliable: UPnP, as suggested here, can open ports on demand but isn't supported (or enabled) on all routers (and it is a security threat), and P2P solutions are complex and still require open ports on some clients.
The only reliable solution is to have a dedicated server that all clients can connect to that will negotiate the connections, and possibly proxy between them.

You could look at something like this (assuming your router supports it): http://en.wikipedia.org/wiki/Universal_Plug_and_Play#NAT_traversal

For implementing port forwarding using python, there's a fantastic ActriveState recipe that does asynchronous port forwarding server using only Python standard library (socket, syncope). Look at
http://code.activestate.com/recipes/483732-asynchronous-port-forwarding/

EC2 fails to connect via FTPS, but works locally

I'm running Python 2.6.5 on ec2 and I've replaced the old ftplib with the newer one from Python2.7 that allows importing of FTP_TLS. Yet the following hangs up on me:
from ftplib import FTP_TLS
ftp = FTP_TLS('host', 'username', 'password')
ftp.retrlines('LIST') (Times out after 15-20 min)
I'm able to run these three lines successfully in a matter of seconds on my local machine, but it fails on ec2. Any idea as to why this is?
Thanks.

It certainly sounds like a problem related to whether or not you're in PASSIVE mode on your FTP connection, and whether both ends of the connection can support it.
The ftplib documentations suggests that it is on by default, which is a shame, because I was going to suggest that you turn it on. Instead, I'll suggest that you set_debuglevel to where you can see the lower levels of the protocol happening and see what mode you're in. That should give you information on how to proceed. Either you're in passive mode and the other end can't deal with it properly, or (hopefully) you'd not, but you should be.
FTP and FTPS (but not SFTP) can be configured so that the server makes a backwards connection to the client for the actual transfers or so that the client makes a second forward connection to the server for the transfers. The former, especially, is prone to complications whenever network address translation is involved. Without the TLS, some firewalls can actually rewrite the FTP session traffic to make it magically work, but with TLS that's impossible due to encryption.
The fact that are presumably authenticating and then timing out when you try to transfer data (LIST requires a 2nd connection in one direction or the other) is the classic symptom, usually, of a setup that either needs passive mode, OR, there's this:
Connect as usual to port 21 implicitly securing* the FTP control connection before authenticating. Securing the data connection requires the user to explicitly ask for it by calling the prot_p() method.
ftps.prot_p() # switch to secure data connection
ftps.retrlines('LIST') # list directory content securely
I don't work with FTPS often, since SFTP is so much less problematic, but if you're not doing that, the far end server might not be cooperating.
*note, I suspect this sentence is trying to say that FTP_TLS "implicitly secures the FTP control connection" in contrast with the explicit securing of the data connection.

If you're still having trouble could you try ruling out Amazon firewall problems. (I'm assuming you're not using a host based firewall.)
If your EC2 instance is in a VPC then in the AWS Management Console could you:
ensure you have an internet gateway
ensure that the subnet your EC2 instance is in has a default route (0.0.0.0/0) configured pointing at the internet gateway
in the Security Group for both inbound and outbound allow All Traffic from all sources (0.0.0.0/0)
in the Network ACLs for both inbound and outbound allow All Traffic from all sources (0.0.0.0/0)
If your EC2 instance is NOT in a VPC then in the AWS Management Console could you:
in the Security Group for inbound allow All Traffic from all sources (0.0.0.0/0)
Only do this in a test environment! (obviously)
This will open your EC2 instance up to all traffic from the internet. Hopefully you'll find that your FTPS is now working. Then you can gradually reapply the security rules until you find out the cause of the problem. If it's still not working then the AWS firewall is not the cause of the problem (or you have more than one problem).

Proper way to publish and find services on a LAN using Python

My app opens a TCP socket and waits for data from other users on the network using the same application. At the same time, it can broadcast data to a specified host on the network.
Currently, I need to manually enter the IP of the destination host to be able to send data. I want to be able to find a list of all hosts running the application and have the user pick which host to broadcast data to.
Is Bonjour/ZeroConf the right route to go to accomplish this? (I'd like it to cross-platform OSX/Win/*Nix)

it can broadcast data to a specified host on the network
This is a non-sequitur.
I'm presuming that you don't actually mean broadcast, you mean Unicast or just "send"?
Is Bonjour/ZeroConf the right route to go to accomplish this?
This really depends on your target environment and what your application is intended to do.
As Ignacio points out, you need to install the Apple software on Windows for Zeroconf/mDNS to work at the moment.
This might be suitable for small office / home use.
However larger networks may have Layer 2 Multicast disabled for a variety of reasons, at which point your app might be in trouble.
If you want it to work in the enterprise environment, then some configuration is required, but that doesn't have to be done at the edge (in the app client instances).
Could be via a DHCP option, or by DNS service records.. in these cases you'd possibly be writing a queryable server to track active clients.. much like a BitTorrent Tracker.
Two things to consider while designing your networked app:
Would there ever be reason to run more than one "installation" of your application on a network?
Always consider the implications of versioning: One client is more up to date than another, can they still talk to each other or at least fail gracefully?

Zeroconf/DNS-SD is an excellent idea in this case. It's provided by Bonjour on OS X and Windows (but must be installed separately or as part of an Apple product on Windows), and by Avahi on FOSS *nix.

I think that ZeroConf is a very good start. You may find this document useful.

I have a list on a webpage, nice if you need internet communications.
<dl_service updated="2010-12-03 11:55:40+01:00">
<client name="internal" ip="10.0.23.234" external_ip="1.1.1.1"/>
<client name="bigone" ip="2.2.2.2" external_ip="2.2.2.2">
<messsage type="connect" from="Bigone" to="internal" />
</client>
</dl_service>
My initial idea was to add firewall punching and all that, but I just couldn't be bothered too many of the hosts where using external IPs for it to be a problem..
But I really recommend Zeroconf, at least if you use Linux+MacOSX, don't know about Windows at all.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.