Is requests response likely to be corrupted?

Is requests response likely to be corrupted? - python

I'm using Python requests to play with a REST API. The response format is JSON and let's assume the server always send correct data. Given the fact that HTTP uses TCP for transmission, do I still have to check the existence of a required key if no exception is thrown by requests?

For TCP transmissions, you don't need to verify the response if you assume that the server always sends correct data:
TCP provides reliable, ordered, and error-checked delivery of a stream of octets between applications running on hosts communicating by an IP network.
Source: Wikipedia
Of course, it's always a good idea to add some error handling and verification to your code just in case the server doesn't send what you'd expect.

Related

Creating a Charles proxy alternative using Python

I am using Charles proxy right now to monitor traffic between my devices and a website. The traffic is SSL and I am able to read it on charles. The issue is charles makes the content hard to read when I am filtering through hundreds of variables in s JSON object. I created a program that will filter the JSON after exporting the charles log. My next step is to get rid of charles completely and create my own proxy in python that can view http and https data. I was wondering if scapy or any other existing libraries existed that would work? I am interested with scapy because I can save the proxy log as a pcap file.

Reading through mitmproxy would be overwhelming since it's a huge source base. If you would like to implement the proxy server from scratch. Here is what I learn during developing Proxyman
Learn how to set up a tiny Proxy server: Basically, open the listening socket at your port (9090 for example). Accept any incoming requests and get the first line of the HTTP Message. It could be done a lightweight http-parser or any Python parser. The raw HTTP message looks like:
CONNECT https://google.com HTTP/1.1
Parse and get the google and the IP: Open the socket connection to the destination IP and start to receive and sent forth and back from the client <-> the destination server.
The first step is essential to implement the HTTP Proxy in this step. Use http-parser to parse the rest of the HTTP Message. Thus, you can get the headers and body from the Request / Response -> Present to UI
Learn how HTTPS and SSL work: Use OpenSSL to generate a self-signed certificate and how to generate the chain certificates too.
Learn how to import those certificate to the macOS keychain by using security CLI or Security framework from Apple.
When you've done: it's time to start the HTTPS interception: Start the 2nd step and do SSL Handshake with appropriate certificate in both sides (Client -> Your Proxy Server and your Proxy Server -> Destination)
Parse the HTTP message as usual and get the rest of the message.
Overall, there are a lot of open sources out there, but I suggest to start from the simple version before moving on.
Hope that could help you.

HTTP Client: What should it do when multiple A records for a domain?

I’m writing an HTTP client, and wondering how it should behave when there are multiple A records for a domain.
For example, say a second concurrent connection request to a given domain is opened. Should it use another IP address? How about on connection error, should it try another IP automatically? Should
it “remember” an IP failed for later requests, and try others first?

Python - How to detect whether coming connections using proxy or not

I am working on a simple program written in Python which sniffs coming network packets. Then, let user use added modules like DoS detection or Ping prevention. With the help of sniffer, I can get incoming connections' IP address, MAC address, protocol flag and packet content. Now, what I want to do is adding a new module that detects whether sender using proxy or not and do some thing according to it. I was searched on the methods that can be used with Python but can not find useful one. How many ways are there to detect proxy for Python?
My sniffer code part is something like that:
.....
sock = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, 8)
while True:
packet = sock.recvfrom(2048)
ipheader = packet[0][14:34]
ip_hdr = struct.unpack("!8sB3s4s4s", ipheader)
sourceIP = socket.inet_ntoa(ip_hdr[3])
tcpheader = packet[0][34:54]
tcp_hdr = struct.unpack("!HH9ss6s", tcpheader)
protoFlag = binascii.hexlify(tcp_hdr[3])
......

Firstly, you mean incoming packets.
secondly,
From the server TCP's point of view it is connected to the proxy, not the downstream client.
so your server can't identify that there is a proxy involved from the packet.
however, if you are in the application level like http proxy, there might be a X-forwarded-for header available in which there will be the original client IP. I said it might be because proxy server will decide whether or not send this header to you. If you are expecting incoming http connections to your server, you can take a look at python's urllib2 although I'm not sure if you can access the X-forwarded-for using this library.
From the docs:
urllib2.urlopen(url[, data][, timeout])
...
This function returns a file-like object with two additional methods:
geturl() — return the URL of the resource retrieved, commonly used to determine if a redirect was followed
info() — return the meta-information of the page, such as headers, in the form of an mimetools.Message instance (see Quick Reference to HTTP Headers)
so using info() will retrieve the headers. hope you find what you're looking for in there.

There aren't many ways to do this, as proxies / VPNs look like real traffic. To add to what Mid said, you can look for headers and/or user agents to help you determine if the user is using a proxy or a VPN.
The only free solution I know is getIPIntel that uses block lists, machine learning, and statistics to determine if the IP is a proxy / VPN or not.
There are other paid solutions like maxmind and blocked.
What you'll need to do is send API queries to these services and parse the results.

How do I safely get the user's ip address in Flask that has a proxy?

I am using Flask and need to get the user's IP address. This is usually done through request.remote_addr but since this app is hosted at a 3rd party (and using cloudflare) it just returns the localhost.
Flask suggests getting the X-Forwarded-Host but then they immediately say it is a security risk. Is there a safe way to get the client's real ip?

The Problem
The issue here is not that the ProxyFix itself will cause the user to get access to your system, but rather the fact that the ProxyFix will take what was once mostly reliable information and replace it instead with potentially unreliable information.
For starters, when you don't use ProxyFix, the REMOTE_ADDR attribute is most likely retrieved from the source IP address in the TCP packets. While not impossible, the source IP address in TCP packets are tough to spoof. Therefore, if you need a reliable way to retrieve the user's IP address, REMOTE_ADDR is a good way to do it; in most cases, you can rely on it to provide you something that is accurate when you do request.remote_addr.
The problem is, of course, in a reverse-proxy situation the TCP connection is not coming from the end user; instead, the end user makes a TCP connection with the reverse proxy, and the reverse proxy then makes a second TCP connection with your web app. Therefore, the request.remote_addr in your app will have the IP address of the reverse proxy rather than the original user.
A Potential Solution
ProxyFix is supposed to solve this problem so that you can make request.remote_addr have the user's IP address rather than the proxy. It does this by looking at the typical HTTP header that remote proxies (like Apache and Nginx) add into the HTTP header (X-Forwarded-For) and use the user's IP address it finds there. Note that Cloudflare uses a different HTTP Header, so ProxyFix probably won't help you; you'll need to write your own implementation of this middleware to get request.remote_addr to use the original client's IP address. However, in the rest of this answer I will continue to refer to that fix as "ProxyFix".
This solution, however, is problematic. The problem is that while the TCP header is mostly reliable, the HTTP headers are not; if a user can bypass your reverse proxy and send data right to the server, they can put whatever they want in the HTTP header. For example, they can make the IP address in the HTTP header the IP address of someone else! If you use the IP address for authentication, the user can spoof that authentication mechanism. If you store the IP address in your database and then display it in your application to another user in HTML, the user could inject SQL or Javascript into the header, potentially causing SQL injection or XSS vulnerabilities.
So, to summarize; ProxyFix takes a known mostly-safe solution to retrieve the user's IP address from a TCP packet and switches it to using the not-very-safe-by-itself solution of parsing an easily-spoofed HTTP header.
Therefore, the recomendation to use ProxyFix ONLY in reverse proxy situations means just that: don't use this if you accept connections from places that are NOT the proxy. This is often means have the reverse proxy (like Nginx or Apache) handle all your incoming traffic and have your application that actually uses ProxyFix safe behind a firewall.
You should also read this post which explains how ProxyFix was broken in the past (although is now fixed). This will also explains how ProxyFix works, and give you ideas on how to set your num_proxies argument.
A Better Solution
Let's say your user is at point A, they send the request to Cloudflare (B) which eventually sends the request to your final application (point C). Cloudflare will send the IP address of A in the CF-Connecting-IP header.
As explained above, if the user finds the IP address to point C, they could send a specially crafted HTTP request directly to point C which includes any header info they want. ProxyFix will use its logic to determine what the IP address is from the HTTP header, which of course is problematic if you rely on that value for, well, mostly anything.
Therefore, you might want to look at using something like mod_cloudflare, which allows you to do these proxy fixes directly in the Apache mod, but only when the HTTP connection comes from Cloudflare IP addresses (as defined by the TCP IP source). You can also have it only accept connections from Cloudflare. See How do I restore original visitor IP to my server logs for more info on this and help doing this with other servers (like Nginx).
This should give you a start. However, keep in mind that you're still not "safe": you've only shut down one possible attack vector, and that attack vector assumed that the attacker knew the IP address of your actual application. In that case, the malicious user could try to do a TCP attack with a spoofed Cloudflare IP address, although this would be extremely difficult. More likely, if they wanted to cause havoc, they would just DDOS your source server since they've bypassed Cloudflare. So, there are plenty more things to think about in securing, your application. Hopefully this helps you with understanding how to make one part slightly safer.

Associating HTTP Requests with Responses in large packet capture

I am attempting to work with large packet captures from wireshark that have been output in pdml format. These captures are then loaded into python using the lxml library to traverse over them. The issue I am having is that I can pull out information regarding a single HTTP response packet and then I need a way to associate this with its HTTP request packet.
The current solution I was thinking of implementing is to search for an HTTP request packet that is part of the same TCP stream as the response, however this seems like an inefficient solution to the problem, having to continually separate out TCP streams and then search through them for the request packet.
Is there a simple way to associate response packets with requests that I am missing?

Best solution I have come up with thus far is to use xpath under the assumption that each TCP connection only contains one request/response pair.
#Get the stream index from the packet
streamIndex = packet.xpath('proto/field[#name="tcp.stream"]')[0].attrib['show']
#Use that stream index to get the matching response packet
return packet.xpath('/pdml/packet[proto/field[#name="tcp.stream" and #show="' + streamIndex + '"] and proto/field[#name="http.request.full_uri"]]')[0]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.