squid external_acl_type concurrent response

squid external_acl_type concurrent response - python

Good afternoon. I ask your advice as to understand with a squid and to make friends with a python does not come out. I'm writing an asynchronous helper to a squid. Squid tuned to:
external_acl_type aclproxy3 ttl = 300 children-max = 1 concurrency = 100% LOGIN python -u /opt/agent/helper.py proxy3.
Squid sends requests to the helper by numbering them. ['0', 'data'], ['1', 'data']
The dock says that:
The helper receives lines expanded per the above format specification and for each input line returns 1 line starting with OK/ERR/BH result code and optionally followed by additional keywords with more details.
But I do not understand how to form an answer. In fact, the requests will come in the order of 1,2,3 and be fulfilled 2,1,3. So the answers need to be identified as well. But how?
At this stage, I solved the problem the way it was written on the gevent stack. In fact, all the requests are added first, then all are processed in an order and the result is OK / ERR in the same order in which they came, and if the 2nd and 3rd, th requests have been processed, then they wait for the end of the first order to answer all in order.
This is a dice, I understand. So I ask advice, can someone already dig this topic. Thanks for any hint

The answer was found in the documentation
When using the concurrency= option the protocol is changed by
introducing a query channel tag in front of the request/response.
The query channel tag is a number between 0 and concurrency-1.
This value must be echoed back unchanged to Squid as the first part
of the response relating to its request.
Blockquote
That is, we get on Stdin, for example: [0, data], [1, data] ...
and must return 1 OK, 0 ERR

Related

Python Requests POST to form records incorrect payload (checkboxes)

I am having quite a bit of trouble with getting the correct form data saved to a server via POST with Requests (2.8.1) module.
I have previous code which does exactly what I want it to do: it encodes a bunch of key:value pairs into the correct header:value payload dict format, and successfully POSTS to the URI. I get a 200 response (what I'm looking for) and everything is great.
This is a section of the OLD payload encoding function, with a ton of key:value pairs omitted for brevity.
Note: the checkbox value set could be any sequence of numbers between 1 and 25, I just wrote it as
item in range(1,5)
to illustrate that the list is comprised of int numbers, i.e. [ "", 1, 2, 3, 4, 5,...] or [ "", 2, 7, 5, 1, 25,...] etc.
checkboxList = ["",]
for item in range(1,5):
checkboxList.append(item)
payload['checkbox[ids][]'] = checkboxList
...
response = request.post(data_url, data=payload)
>> 200 OK!
Here is a print of what the payload dict (checkboxes) looks like before it's sent to the server:
{... "checkbox[ids][]" : [ "", 2, 17, 20, 5], ...}
And when I look on the page with a browser, all the payload information has been correctly recorded (omitted above) AND the checkboxes (shown above) are correct!
Originally, the checkbox values came from an excel file, as did the rest of the information that was put into the payload before being POSTed to the server. However, now I'm retrieving the information from an SQLite db.
Below is the NEW code that records the checkboxes incorrectly. I should note: I do not have access to the server, so I cannot easily tell if it's a server issue, but let's assume it's not the servers fault. I've had this issue previously, but I got it to work with the above code. However, now that I've started to store the values I need in a db, I cannot get the correct checkboxes recorded by the server.
This is what the data from the db column looks like:
12-5-1-22-4
(... I know this isn't great practice for DB mgmt, but I assume this isn't why the POST is recording the wrong data, and I wanted this question to be as closely representational to my code as possible.)
checkList = checkboxesFromDB.split('-')
payload['checkbox[ids][]'] = checkList
...
response = request.post(data_url, data=payload)
>> 200 OK!
When I look at the site with the browser, it records the checkboxes incorrectly. Now, i should note that 3 checkboxes are selected no matter what I pass to payload[checkbox[ids][]]
It's ALWAYS the same 3, incorrect checkboxes, even if I completely omit checkbox[ids][] from the payload dict. Knowing that, we could assume its a server issue. However, the nearly EXACT code from above works (when I grab the info from an excel file).
I've tried the following (with only one value as a test) without getting the correct checkboxes recorded by the server:
payload['checkbox[ids][]'] = '1'
payload['checkbox[ids][]'] = 1
payload['checkbox[ids][]'] = [1]
payload['checkbox[ids][]'] = ["",1]
payload['checkbox[ids][]'] = [1,""]
When uploading images to the same server, I had an encoding issue when retrieving the image BLOB from the db and trying to pass the buffer object directly to Requests as a file, but I fixed this with cStringIO encoding. (It took me forever as I'm really new to programming, and still unsure of syntax, let alone ways to handle this sort of stuff....) I thought I might be having a similar encoding issue, but with the testing and research I've done, I cannot determine either way as I feel like I'm a bit over my head.
I apologize if this is completely NOOB, but I've done extensive research, trying so many different things that I could think of. I tried passing strings, lists, dicts, forcing encoding of lists as utf-8.
The main reason I'm so perplexed is my original code WORKS, and my new code is nearly identical but doesn't. The only real difference I can think of is now my information is coming from a SQLite db (this particular checkbox column is TEXT type)
Can anyone help me, or point me in a new direction I haven't thought of/know of?

I went through all payload pairs to find that it was an issue with HTML.
I was saving HTML in my SQLite db (via BeautifulSoup without prettifying it) as TEXT. Then I was retrieving it and sending it as a string. This was throwing off the server response.
I have since swapped that sql column value type to VARCHAR (as is best for my use) and prettify it like this foo = bar.prettify(formatter="html")before saving to the db. Now, when i retrieve the value and pass it to the payload, everything works as it should.

grabbing HTTP GET parameter from url using Box API in python

I am dealing with the Box.com API using python and am having some trouble automating a step in the authentication process.
I am able to supply my API key and client secret key to Box. Once Box.com accepts my login credentials, they supply me with an HTTP GET parameter like
'http://www.myapp.com/finish_box?code=my_code&'
I want to be able to read and store my_code using python. Any ideas? I am new to python and dealing with APIs.

This is actually a more robust question than it seems, as it exposes some useful functions with web dev in general. You're basically asking how to separate my_code in the string 'http://www.myapp.com/finish_box?code=my_code&'.
Well let's take it in bits and pieces. First of all, you know that you only really need the stuff after the question mark, right? I mean, you don't need to know what website you got it from (though that would be good to save, let's keep that in case we need it later), you just need to know what arguments are being passed back. Let's start with String.split():
>>> return_string = 'http://www.myapp.com/finish_box?code=my_code&'
>>> step1 = return_string.split('?')
["http://www.myapp.com/finish_box","code=my_code&"]
This will return a list to step1 containing two elements, "http://www.myapp.com/finish_box" and "code=my_code&". Well hell, we're there! Let's split the second one again on the equals sign!
>>> step2 = step1[1].split("=")
["code","my_code&"]
Well lookie there, we're almost done! However, this doesn't really allow any more robust uses of it. What if instead we're given:
>>> return_string = r'http://www.myapp.com/finish_box?code=my_code&junk_data=ohyestheresverymuch&my_birthday=nottoday&stackoverflow=usefulplaceforinfo'
Suddenly our plan doesn't work. Let's instead break that second set on the & sign, since that's what's separating the key:value pairs.
step2 = step1[1].split("&")
["code=my_code",
"junk_data=ohyestheresverymuch",
"my_birthday=nottoday",
"stackoverflow=usefulplaceforinfo"]
Now we're getting somewhere. Let's save those as a dict, shall we?
>>> list_those_args = []
>>> for each_item in step2:
>>> list_those_args[each_item.split("=")[0]] = each_item.split("=")[1]
Now we've got a dictionary in list_those_args that contains key and value for every argument the GET passed back to you! Science!
So how do you access it now?
>>> list_those_args['code']
my_code

You need a webserver and a cgi-script to do this. I have setup a single python script solution to this to run this. You can see my code at:
https://github.com/jkitchin/box-course/blob/master/box_course/cgi-bin/box-course-authenticate
When you access the script, it redirects you to box for authentication. After authentication, if "code" is in the incoming request, the code is grabbed and redirected to the site where tokens are granted.
You have to setup a .htaccess file to store your secret key and id.

how to find the last available url which does not return 302 (Redirect) status code in a url list quickly

Now I am facing a problem like this:
Say I have a list of urls, e.g.
['http://example.com/1',
'http://example.com/2',
'http://example.com/3',
'http://example.com/4',
...,
'http://example.com/100']
And some of them are unavailable urls, requesting for those urls will result in 302 redirect status code. e.g. .../1 - .../50 are available urls, but .../51 will cause 302. Then .../50 is the url I want.
I want to find out which url is the last availble url (which does not return 302 code), I believe binary search will do the work, but I wonder how to implement it with better efficiency. I use python's urllib2 to detect 302 status code.
p.s. e.g. .../1 - .../50 are available urls, but .../51 will cause 302. Then .../50 is the url I want.

This answer makes the assumption that your URLs are currently ordered in a meaningful way, and that all URLs up to some value n will be available and all URLs after n will result in a 302.
If this is the case, then you can adapt this binary search answer to fit your needs:
import requests
def binary_search_urls(urls, lo=0, hi=None):
if hi is None:
hi = len(urls)
while lo < hi:
mid = (lo+hi)//2
status = requests.head(urls[mid]).status_code
if status != 302:
lo = mid+1
else:
hi = mid
return lo - 1
This will give you the index of the last good URL, or -1 if there are no good URLs.

I would just check the entire lot, however I would use requests instead of urllib2 and make sure to only make HEAD requests to keep bandwith down (which is possibly going to be your greatest bottle neck anyway).
import requests
urls = [...]
results = [(url, requests.head(url).status_code) for url in urls]
Then go from there...

I don't see how a binary search could be at all faster than straight in order iteration, and in most cases, it would end up being slower. Given n is the length of the list, if you are searching for the last good url of the first good batch, only the case where urls[n/2]-1 is your target would take the same number of searches as just brute force iteration; all others would take more. If you are looking for the last good url in the entire list, the only search target that would take the same number of searches compared to a reversed order iteration would be urls[n/2]-1. Binary search is only faster when your dataset is ordered. For an unordered dataset, sampling the middle of the set tells you nothing about being able to exclude values to either side of it, so you still have to process the entire sequence before you can tell anything.
I suspect what you may really be wanting here is a way to sample your dataset at intervals so that you can run fewer requests before finding your target, which isn't quite the same as a binary search. Binary search relies on the fact that sampling a point in your sequence provides information on being able to exclude either one side or the other of the sequence from subsequent searches based upon a binary condition. What you have is a system where if a sample fails the test, you can exclude one side, but if it passes the test, it tells you nothing about what you can assume about any other values in the list. That doesn't really work for a binary search.

figuring out the possible attributes of an object

regarding this code from python-blogger
def listposts(service, blogid):
feed = service.Get('/feeds/' + blogid + '/posts/default')
for post in feed.entry:
print post.GetEditLink().href.split('/')[-1], post.title.text, "[DRAFT]" if is_draft(post) else ""
I want to know what fields exist in feed.entry but I'm not sure where to look in these docs to find out.
So I dont just want an answer. I want to know how I should've navigated the docs to find out for myself.

Try dir(field.entry)
It may be useful for your case.

It's a case of working through it, step by step.
The first thing I did was click on service on the link you sent... based on service = feed.Get(...)
Which leads here: http://gdata-python-client.googlecode.com/hg/pydocs/gdata.service.html
Then looking at .Get() it states
Returns:
If there is no ResultsTransformer specified in the call, a GDataFeed
or GDataEntry depending on which is sent from the server. If the
response is niether a feed or entry and there is no ResultsTransformer,
return a string. If there is a ResultsTransformer, the returned value
will be that of the ResultsTransformer function.
So guessing you've got a GDataFeed - as you're iterating over it:, and a quick google for "google GDataFeed" leads to: https://developers.google.com/gdata/jsdoc/1.10/google/gdata/Feed

MX Record lookup and check

I need to create a tool that will check a domains live mx records against what should be expected (we have had issues with some of our staff fiddling with them and causing all incoming mail to redirected into the void)
Now I won't lie, I'm not a competent programmer in the slightest! I'm about 40 pages into "dive into python" and can read and understand the most basic code. But I'm willing to learn rather than just being given an answer.
So would anyone be able to suggest which language I should be using?
I was thinking of using python and starting with something along the lines of using 0s.system() to do a (dig +nocmd domain.com mx +noall +answer) to pull up the records, I then get a bit confused about how to compare this to a existing set of records.
Sorry if that all sounds like nonsense!
Thanks
Chris

With dnspython module (not built-in, you must pip install it):
>>> import dns.resolver
>>> domain = 'hotmail.com'
>>> for x in dns.resolver.resolve(domain, 'MX'):
... print(x.to_text())
...
5 mx3.hotmail.com.
5 mx4.hotmail.com.
5 mx1.hotmail.com.
5 mx2.hotmail.com.

Take a look at dnspython, a module that should do the lookups for you just fine without needing to resort to system calls.

the above solutions are correct. some things I would like to add and update.
the dnspython has been updated to be used with python3 and it has superseeded the dnspython3 library so use of dnspython is recommended
the domain will strictly take in the domain and nothing else.
for example: dnspython.org is valid domain, not www.dnspython.org
here's a function if you want to get the mail servers for a domain.
def get_mx_server(domain: str = "dnspython.org") -> str:
mail_servers = resolver.resolve(domain, 'MX')
mail_servers = list(set([data.exchange.to_text()
for data in mail_servers]))
return ",".join(mail_servers)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.