Downloading multiple filenames with one request? Python - python

for v,i in enumerate(assets_files):
a = requests.get(domain+i).content
split_filename = i.split('/')
path = os.path.join(all_folder[4],split_filename[-1])
with open(path,'wb') as w:
w.write(a)
print('Downloaded: ',split_filename[-1],' number: ',v)
I don't want my sys admin banning me for multiple connections. Is there a pythonic option to just download a list of files with one request? I would appreciate it.

requests has a Session object for this as explained here.
Using the global requests.get will not reuse the conection but session.get will probably will.
I am saying probably becase there is a limited connection pool which is used under the hood.

Related

Twilio - making call, giving instructions, making call again, giving different instructions

New to coding here. I am trying to make an application to call a number and give a set of instructions, this part is easy. After the call hangs up I would like to call again and give a different set of instructions. While testing to see if it's possible I am only calling myself and playing DTMF tones so I can hear that it is functioning as I need. I am trying to pass the instructions to TwiML as a variable so I don't have to write multiple functions to perform similar instructions. However, XML doesn't take variables like that. I know the code I have included is completely wrong but is there a method to perform the action I am trying to get.
def dial_numbers(code):
client.calls.create(to=numberToCall, from_=TWILIO_PHONE_NUMBER, twiml='<Response> <Play digits=code></Play> </Response>')
if __name__ == "__main__":
dial_numbers("1234")
dial_numbers("2222")
As I understand from the question: do you need to define a function to send Twilio instructions to the call?
In order to play digit tones, you need to import from twilio.twiml.voice_response import Play, VoiceResponse from Twilio and create XML command for it.
EASY WAY: And then you create a POST request to the Twilio Echo XML service and put it as URL into call function
HARD WAY: There is an alternative - to use Flask or FastAPI framework as a web server and create a global link via DDNS service like ngrok, if you are interested there is official manual.
Try this one:
def dial_numbers(number_to_call, number_from, digit_code):
from twilio.twiml.voice_response import Play, VoiceResponse # Import response module
import urllib.parse # Import urllib to create url for new xml file
response = VoiceResponse() # Create VoiceResponse instance
response.play('', digits=digit_code) # Create xml string of the digit code
url_of_xml = "http://twimlets.com/echo?Twiml=" # Now use twimlet echo service to create simple xml
string_to_add = urllib.parse.quote(str(response)) # Encode xml code to the url
url_of_xml = url_of_xml + string_to_add # Add our xml code to the service
client.calls.create(to=number_to_call, from_=number_from, url=url_of_xml) # Make a call
dial_numbers(number_to_call = numberToCall, number_from = TWILIO_PHONE_NUMBER, digit_code = "1234")

Creating state in a Flask webapplication

I'm building a webapplication with the Flask framework of python. On the server I would like to preserve some state. I think the following code example makes my goal clear (and was also my initial idea):
name = ""
#app.route('/<input_name>')
def home(input_name):
global name
name = input_name
return "name set"
#app.route('/getname')
def getname():
global name
return name
Though when I deployed my website the response for an /getname request behaves inconsistent because there are multiple thread-instances of the code (I could be wrong). I have some plausible solutions to overcome this problem but I wonder if there would be a more 'clean' solution:
Solution 1: read and write the name from a database (a database seems like overkill if a only want to store 1 variable)
Solution 2: store the value of name in a file and setup a locking mechanism so that only one process/thread could write to the file at the same moment.
Goal: when client 'A' requests www.website.com/sven and after that client 'B' requests www.website.com/getname I want the response for client B to be 'sven'
Any suggestions?
Your example should not be done with global state, it will not work for the reason you mentioned - requests might land into different processes that will have different global values.
You can handle storing global variables by using a key-value cache, such as memcached or Redis, or file-system based cache - check Flask-Chaching package and particular docs https://flask-caching.readthedocs.io/en/latest/#built-in-cache-backends

Boto3 S3 resource get stuck on "Object.get" method

I try to get a pickle file from an S3 resource using the "Object.get()" method of the boto3 library from several processes simultaneously. This causes my program to get stuck on one of the processes (No exception raised and the program does not continue to the next line).
I tried to add a "Config" variable to the S3 connection. That didn't help.
import pickle
import boto3
from botocore.client import Config
s3_item = _get_s3_name(descriptor_key) # Returns a path string of the desiered file
config = Config(connect_timeout=5, retries={'max_attempts': 0})
s3 = boto3.resource('s3', config=config)
bucket_uri = os.environ.get(*ct.S3_MICRO_SERVICE_BUCKET_URI) # Returns a string of the bucket URI
estimator_factory_logger.debug(f"Calling s3 with item {s3_item} from URI {bucket_uri}")
model_file_from_s3 = s3.Bucket(bucket_uri).Object(s3_item)
estimator_factory_logger.debug("Loading bytes...")
model_content = model_file_from_s3.get()['Body'].read() # <- Program gets stuck here
estimator_factory_logger.debug("Loading from pickle...")
est = pickle.loads(model_content)
No error message raised. It seems that the "get" method is stuck in a deadlock.
Your help will be much appreciated.
Is there a possibility that one of the files in the bucket is just huge and program takes a long time to read?
If that's the case, as a debugging step I'd look into model_file_from_s3.get()['Body'] object, which is botocore.response.StreamingBody object, and use set_socket_timeout()on it to try and force timeout.
https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html
The problem was that we created a subprocess after our main process opened several threads in it. Apparently, This is a big No-No in Linux.
We fixed it by using "spawn" instead of "fork"

Global variable in Python server

Background: I am a complete beginner when it comes to servers, but I know my way around programming in Python.
I am trying to setup a simple server using the basic Python 2.7 modules (SimpleHTTPServer, CGIHTTPServer, etc). This server needs to load a global, read-only variable with several GB of data from a file when it starts; then, when each user accesses the page, the server uses the big data to generate some output which is then given to the user.
For the sake of example, let's suppose I have a 4 GB file names.txt which contains all possible proper nouns of English:
Jack
John
Allison
Richard
...
Let's suppose that my goal is to read the whole list of names into memory, and then choose 1 name at random from this big list of proper nouns. I am currently able to use Python's native CGIHTTPServer module to accomplish this. To start, I just run the CGIHTTPServer module directly, by executing from a terminal:
python -m CGIHTTPServer
Then, someone accesses www.example-server.net:8000/foo.py and they are given one of these names at random. I have the following code in foo.py:
#!/usr/bin/env python
import random
name_list = list()
FILE = open('names.txt','r')
for line in FILE:
name = line[:-1]
name_list.append(name)
FILE.close()
name_to_return = random.choice(name_list)
print "Content-type: text/html"
print
print "<title>Here is your name</title>"
print "<p>" + name_to_return + "</p>"
This does what I want; however, it is extremely inefficient, because every access forces the server to re-read a 4 GB file.
How can I make this into an efficient process, where the variable name_list is created as global immediately when the server starts, and each access only reads from that variable?
Just for future reference, if anyone ever faces the same problem: I ended up sub-classing CGIHTTPServer's request handler and implementing a new do_POST() function. If you had a working CGI script without global variables, something like this should get you started:
import CGIHTTPServer
import random
import sys
import cgi
class MyRequestHandler(CGIHTTPServer.CGIHTTPRequestHandler):
global super_important_list
super_important_list = range(10)
random.shuffle(super_important_list)
def do_POST(s):
"""Respond to a POST request."""
form = cgi.FieldStorage(fp=s.rfile,headers=s.headers,environ={'REQUEST_METHOD':'POST','CONTENT_TYPE':s.headers['Content-Type'],})
s.wfile.write("<html><head><title>Title goes here.</title></head>")
s.wfile.write("<body><p>This is a test.</p>")
s.wfile.write("<p>You accessed path: %s</p>" % s.path)
s.wfile.write("<p>Also, super_important_list is:</p>")
s.wfile.write(str(super_important_list))
s.wfile.write("<p>Furthermore, you POSTed the following info: ")
for item in form.keys():
s.wfile.write("<p>Item: " + item)
s.wfile.write("<p>Value: " + form[item].value)
s.wfile.write("</body></html>")
if __name__ == '__main__':
server_address = ('', 8000)
httpd = CGIHTTPServer.BaseHTTPServer.HTTPServer(server_address, MyRequestHandler)
try:
httpd.serve_forever()
except KeyboardInterrupt:
sys.exit()
Whenever someone fills out your form and performs a POST, the variable form will be a dictionary-like object with key-value pairs which may differ for each user of your site, but the global variable super_important_list will be the same for every user.
Thanks to everyone who answered my question, especially Mike Steder, who pointed me in the right direction!
CGI works by spawning a process to handle each request. You need to run a server process that stays in memory handles HTTP requests.
You could use a modified BaseHTTPServer, just define your own Handler class. You'd load the dataset once in your code and then the do_GET method of your handler would just pick one randomly.
Personally, I'd look into something like CherryPy as a simple solution that is IMO a lot nicer than BaseHTTPServer. There are tons of options other than CherryPy like bottle, flask, twisted, django, etc. Of course if you need this server to be behind some other webserver you'll need to look into setting up a reverse proxy or running CherryPy as a WSGI app.
You may want to store the values of the names in a db and store the names according to the letter that they start with. Then you can do a random for a letter between a and z and from there randomize again to get a random name from your random beginning letter.
Build a prefix tree (a.k.a. trie) once and generate a random walk whenever you receive a query.
That should be pretty efficient.

How do I queue FTP commands in Twisted?

I'm writing an FTP client using Twisted that downloads a lot of files and I'm trying to do it pretty intelligently. However, I've been having the problem that I'll download several files very quickly (sometimes ~20 per batch, sometimes ~250) and then the downloading will hang, only to eventually have connections time out and then the download and hang start all over again. I'm using a DeferredSemaphore to only download 3 files at a time, but I now suspect that this is probably not the right way to avoid throttling the server.
Here is the code in question:
def downloadFiles(self, result, directory):
# make download directory if it doesn't already exist
if not os.path.exists(directory['filename']):
os.makedirs(directory['filename'])
log.msg("Downloading files in %r..." % directory['filename'])
files = filterFiles(None, self.fileListProtocol)
# from http://stackoverflow.com/questions/2861858/queue-remote-calls-to-a-python-twisted-perspective-broker/2862440#2862440
# use a DeferredSemaphore to limit the number of files downloaded simultaneously from the directory to 3
sem = DeferredSemaphore(3)
jobs = [sem.run(self.downloadFile, f, directory) for f in files]
d = gatherResults(jobs)
return d
def downloadFile(self, f, directory):
filename = os.path.join(directory['filename'], f['filename']).encode('ascii')
log.msg('Downloading %r...' % filename)
d = self.ftpClient.retrieveFile(filename, FTPFile(filename))
return d
You'll noticed that I'm reusing an FTP connection (active, by the way) and using my own FTPFile instance to make sure the local file object gets closed when the file download connection is 'lost' (ie completed). Looking at FTPClient I wonder if I should be using queueCommand directly. To be honest, I got lost following the retrieveFile command to _openDataConnection and beyond, so maybe it's already being used.
Any suggestions? Thanks!
I would suggest using queueCommand, as you suggested I'd suspect the semaphore you're using is probably causing you issues. I believe using queueCommand will limit your FTPClient to a single active connection (though I'm just speculating), so you may want to think about creating a few FTPClient instances and passing download jobs to them if you want to do things quickly. If you use queueStringCommand, you get a Deferred that you can use to determine where each client is up to, and even add another job to the queue for that client in the callback.

Categories

Resources