I'm trying to make a script that will auto-upload images that I place in a folder on my PC, but I'm having trouble getting my POST request to work.
Here is the form I'm trying to submit to...
<form method="POST" action="/uploadeventthumb.php?id=0" enctype="multipart/form-data">
<input type="file" name="image"><p>
<input type="hidden" name="id2" value="3kkggi1618601391"></p><p>
<input type="hidden" name="id3" value="1113887"></p><p>
<input type="Submit" name="Submit" value="Submit">
<br></p><p>Please only upload JPG images. <br>The dimensions should be <b>1280x720</b> pixels Wide and High.
</p></form>
Here is my code...
for f in glob.glob(upload+'*-t.jpg'):
f = f.replace("\\", "/")
sessionObj = requests.session()
r = sessionObj.post(TheSportsDB.login, TheSportsDB.login_data)
if r.status_code == 200:
print("Logged in!")
file = {'file' : open(f, 'rb')}
fn = f.replace(upload, "")
fn = fn.replace("-t.jpg", "")
TSDB = TheSportsDB.thumb+fn
r = sessionObj.post(TSDB, files=file)
print(r.text)
When I run the script I get a successful Logged in! message and then it prints out the text from the upload page, but never actually uploads the image, can somebody tell me what I am doing wrong, please!
What I was intending to use this script for just wasn't possible as the id2 field changed on every request, so I'm guessing it is a CSRF token as mentioned by #NobbyNobbs in the comments.
I managed to easily create the script using Selenium rather than Requests and BeautifulSoup.
Related
idea:
take id's from html input
use id's to run sql and return relevant usernames
download the output as a csv on the front end when the "download" button is clicked
html
Enter comma delimited ids <input type="text" id="text1" name="text1"><br><br>
download
python
#app.route("/getPlotCSV", methods=['GET','POST'])
def getPlotCSV():
text1 = request.form['text1']
result = {}
a = []
x = []
a.extend([str(x) for x in text1.split(",")])
format_strings = ','.join(['%s'] * len(a))
cursor = cnxn.cursor()
sql = "SELECT DisplayName FROM dbo.[Users] where id IN ({seq})".format(
seq=','.join(['?'] * len(a)))
cursor.execute(sql,a)
for row, in cursor:
x.append(row)
csv = x
return Response(
csv,
mimetype="text/csv",
headers={"Content-disposition":
"attachment; filename=myplot.csv"})
The sql and input works because i have tested it separately without the csv download and it returns the correct data. The error i get at the moment is "400 Bad Request: The browser (or proxy) sent a request that this server could not understand." KeyError: 'text1'
what am i missing here?
The KeyError is because you haven't actually passed the value of text1 to your getPlotCSV route. A link on an HTML page won't also transfer the data with it. Instead you need to use a form with an action page, like this:
<form action="/getPageCSV" method="post">
Enter comma delimited ids: <input type="text" id="text1" name="text1">
<input type="submit" name="Submit" id="submit">
</form>
This should then pass those values to the url in the form action attribute, in this case your getPageCSV. It doesn't have to be POST, I've just done that as an example.
Then, when your route receives the data:
#app.route('/getPageCSV')
def getPlotCSV():
if request.method == "POST": #in other words, if the form is being submitted
text1 = request.form.get('text1') #use dict.get in case it isn't there
# your code
csv = x
return redirect(url_for('getPlotCSV')) #this time a get method
return Response(
csv,
mimetype="text/csv",
headers={"Content-disposition":
"attachment; filename=myplot.csv"})
The above won't specifically work without you adding in your own way to move the POST process data/csv over to when the user is redirected. You could do it as a request header, store it in the session storage or even put it in the query string, it's up to you, but you have to be able to display the results of your POST process into a GET request when the user is redirected.
I am quite inexperienced in both HTML, JS and flask but I am working on a chatbot that able to detect sentimental analysis of the sender.
My HTML code:
<div class="bottom_wrapper clearfix">
<div class="message_input_wrapper">
<form action = "{{ url_for('reply') }}" method = "POST">
<input
class="message_input"
id="text_message"
name = "sentimental_name"
placeholder="Tell me how you feel today..."
onkeydown="if (event.keyCode == 13)document.getElementById('send').click()">
</div>
<!--div class = "send_message1" id = 'audio' onclick = "start_dictation()">
<span style="font-size: 32px; color:black;">
<i class="fas fa-microphone"></i>
</span>
</div-->
<div class="send_message" id="send" onclick="get_message()">
<!--<div class="icon"></div>-->
<div class="text">Send</div>
</div>
</form>
</div>
This is my python-flask code:
#app.route('/senti', methods = ['POST'])
def reply():
if request.method == 'POST':
message = request.form['text_message']
a = TextBlob(message).sentiment.polarity
b = TextBlob(message).sentiment.subjectivity
My js that links to the onlick =
function get_message(){
var message = document.getElementById("text_message").value;
var json_data = {"msg":message}
var sender = JSON.stringify(json_data)
console.log(sender)
console.log(message);
insert_chat('me',message);
interact(sender);
}
Console log:
POST http://127.0.0.1:5000/senti 500 (INTERNAL SERVER ERROR)
send # jquery-3.4.1.js:9837
ajax # jquery-3.4.1.js:9434
interact # chat.js:34
get_message # chat.js:55
onclick # chat:58
It seems really simple but it is like I miss something. Thank you so much!
You would have to use "sentimental_name" in
request.form["sentimental_name"]
because you have
But it uses JavaScript function get_message() to get data when you click ENTER
<input ... onkeydown="if (event.keyCode == 13)document.getElementById('send').click()">
<div class="send_message" id="send" onclick="get_message()">
and converts to JSON with field "msg" so it sends it as data or json, not form.
function get_message(){
var message = document.getElementById("text_message").value;
var json_data = {"msg":message}
var sender = JSON.stringify(json_data)
console.log(sender)
console.log(message);
insert_chat('me',message);
interact(sender);
In flask reply() you can check this using:
print(request.args)
print(request.data)
print(request.form)
print(request.json)
JavaScript may expect that reply() returns also JSON - ie.
return jsonify(list_or_dictionary).
In JavaScript I see interact(sender); so you would have to find this function and see what it sends and what result it may expect.
BTW: you can also use requests.data.get("msg") and request.form.get("msg") instead of ["msg"] becauses .get() returns None when it can't find "msg" and you can use if not message: to catch this problem. And ["msg"] raises error when there is no "msg" and you would have to use try:/except: to catch it.
I'm trying to scrape some data from a website where I need to be logged in to see the actual content. It all works fine but takes about 5 seconds per request which is way to slow for my needs (>5000 urls to scrape from). It seems there are faster ways like asyncio aiohttp modules.
However all examples I found on the web did not show how to login to a site and then use these tools.
So I basically need an easy to follow example how to do such a thing.
I tried to rebuild this example:
https://realpython.com/python-concurrency/#what-is-concurrency
with my code, which did not work. I also tried AsyncHTMLSession() from requests_html which returned something but did not seem to remember the login.
This is my code so far:
import requests
from bs4 import BeautifulSoup
payload = {
"name" : "username",
"password" : "example_pass",
"destination" : "MAS_Management_UserConsole",
"loginType" : ""
}
links = [several urls]
### stuff with requests
with requests.Session() as c:
c.get('http://boldsystems.org/')
c.post('http://boldsystems.org/index.php/Login', data = payload)
def return_id(link):
page = c.get(link).content
soup = BeautifulSoup(page, 'html.parser')
return soup.find(id = 'processidLC').text
for link in links:
print(return_id(link))
It looks like you're already using requests so you can try requests-async. The example below should help you with "in reasonable time" part of your question, just adjust parse_html function accordingly to search for your HTML tag. By default it will run 50 requests in parallel (MAX_REQUESTS) to not exhaust resources on your system (file descriptors etc.).
Example:
import asyncio
import requests_async as requests
import time
from bs4 import BeautifulSoup
from requests_async.exceptions import HTTPError, RequestException, Timeout
MAX_REQUESTS = 50
URLS = [
'http://envato.com',
'http://amazon.co.uk',
'http://amazon.com',
'http://facebook.com',
'http://google.com',
'http://google.fr',
'http://google.es',
'http://google.co.uk',
'http://internet.org',
'http://gmail.com',
'http://stackoverflow.com',
'http://github.com',
'http://heroku.com',
'http://djangoproject.com',
'http://rubyonrails.org',
'http://basecamp.com',
'http://trello.com',
'http://yiiframework.com',
'http://shopify.com',
'http://airbnb.com',
'http://instagram.com',
'http://snapchat.com',
'http://youtube.com',
'http://baidu.com',
'http://yahoo.com',
'http://live.com',
'http://linkedin.com',
'http://yandex.ru',
'http://netflix.com',
'http://wordpress.com',
'http://bing.com',
]
class BaseException(Exception):
pass
class HTTPRequestFailed(BaseException):
pass
async def fetch(url, timeout=5):
async with requests.Session() as session:
try:
resp = await session.get(url, timeout=timeout)
resp.raise_for_status()
except HTTPError:
raise HTTPRequestFailed(f'Skipped: {resp.url} ({resp.status_code})')
except Timeout:
raise HTTPRequestFailed(f'Timeout: {url}')
except RequestException as e:
raise HTTPRequestFailed(e)
return resp
async def parse_html(html):
bs = BeautifulSoup(html, 'html.parser')
if not html: print(html)
title = bs.title.text.strip()
return title if title else "Unknown"
async def run(sem, url):
async with sem:
start_t = time.time()
resp = await fetch(url)
title = await parse_html(resp.text)
end_t = time.time()
elapsed_t = end_t - start_t
r_time = resp.elapsed.total_seconds()
print(f'{url}, title: "{title}" (total: {elapsed_t:.2f}s, request: {r_time:.2f}s)')
return resp
async def main():
sem = asyncio.Semaphore(MAX_REQUESTS)
tasks = [asyncio.create_task(run(sem, url)) for url in URLS]
for f in asyncio.as_completed(tasks):
try:
result = await f
except Exception as e:
print(e)
if __name__ == '__main__':
asyncio.run(main())
Output:
# time python req.py
http://google.com, title: "Google" (total: 0.69s, request: 0.58s)
http://yandex.ru, title: "Яндекс" (total: 2.01s, request: 1.65s)
http://github.com, title: "The world’s leading software development platform · GitHub" (total: 2.12s, request: 1.90s)
Timeout: http://yahoo.com
...
real 0m6.868s
user 0m3.723s
sys 0m0.524s
Now, this may still not help you with your logging issue. The HTML tag that you're looking for (or the entire web page) could be generated by JavaScript so you'll need tools like requests-html that is using a headless browser to read content rendered by JavaScript.
It's also possible that your login form is using CSRF protection, example with login to Django admin backend:
>>> import requests
>>> s = requests.Session()
>>> get = s.get('http://localhost/admin/')
>>> csrftoken = get.cookies.get('csrftoken')
>>> payload = {'username': 'admin', 'password': 'abc123', 'csrfmiddlewaretoken': csrftoken, 'next': '/admin/'}
>>> post = s.post('http://localhost/admin/login/?next=/admin/', data=payload)
>>> post.status_code
200
We use session to perform a get request first, to get the token from the csrftoken cookie and then we login with two hidden form fields:
<form action="/admin/login/?next=/admin/" method="post" id="login-form">
<input type="hidden" name="csrfmiddlewaretoken" value="uqX4NIOkQRFkvQJ63oBr3oihhHwIEoCS9350fVRsQWyCrRub5llEqu1iMxIDWEem">
<div class="form-row">
<label class="required" for="id_username">Username:</label>
<input type="text" name="username" autofocus="" required="" id="id_username">
</div>
<div class="form-row">
<label class="required" for="id_password">Password:</label> <input type="password" name="password" required="" id="id_password">
<input type="hidden" name="next" value="/admin/">
</div>
<div class="submit-row">
<label> </label>
<input type="submit" value="Log in">
</div>
</form>
Note: examples are using Python 3.7+
Look at asyncio and using the asyncio.gather function.
Wrap everything below this "links = [several urls]" line in a method.
Be careful this is not thread safe, so don't change variables within the method.
Also this is threading so could be useful to use asyncio.sleep(randint(0,2)), to delay some of the threads, so its not firing all at the same time.
Then using asyncio call the below method with a new url like so
tasks =[]
for url in urls:
tasks.append(wrapped_method(url))
results = asyncio.gather(*tasks)
Hope that helps.
Otherwise look at https://github.com/jreese/aiomultiprocess
First of all I know there are many similar threads, I red all of them and the S3 Docu (please dont close this thread). The fix is everywhere the same:
Simply change the sugnature_version to v4, because eu central was created after 2014 and does not support v2 anymore.
I have tried every syntax now and I am still getting the error.
session = boto3.Session(
aws_access_key_id=app.config['MY_AWS_ID'],
aws_secret_access_key=app.config['MY_AWS_SECRET'],
region_name='eu-central-1'
)
s3 = session.client('s3', config=Config(signature_version='s3v4'))
presigned_post = s3.generate_presigned_post(
Bucket = 'mybucket',
Key = 'videos/' + file_name,
Fields = {"acl": "public-read", "Content-Type": file_type},
Conditions = [
{"acl": "public-read"},
{"Content-Type": file_type}
],
ExpiresIn = 3600
)
I have tried changing it everywhere. I also downgraded my boto3 installation to versions 1.6.6 and 1.4.4, did not work aswell. I upgarded it back to the newest version, which is boto3==1.7.26
The Error:
InvalidRequest
The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.
Every thread suggests the same fix, probably it does not work because I use Python / Flask. Something has to be done in a different way?
I am trying to upload huge video files via clientside directly to S3, therefore I need to sign the request.
EDIT
I thought maybe there is an SSL issue. I am testing everything on localhost and the default option for use_ssl is true.
I tried to upload this version to the live site (there is SSL enabled). Did not work, still the same error.
I also tried to use use_ssl = False on localhost, still the same error.
The problem was in the HTML and how I named the input fields. I took an example from an older tutorial, but you have to build your form the way it is explained here by amazon
I have used every input they provided. I have checked my response, which was generated by my sign_s3 function and pupulated all corresponding fields in the form.
Here is my sign function:
# Sign request for direct file upload through client for video
#app.route('/sign_s3/<path:file_name_data>/<path:file_type_data>/<up_type>', methods=["GET", "POST"])
#login_required
#check_confirmed
def sign_s3(file_name_data, file_type_data, up_type):
if "localhost" in request.url_root:
if up_type == "profile_vid":
file_name = str(current_user.id) + get_random_code(5) + "local-profil-video." + file_name_data.split(".")[-1]
else:
file_name = str(current_user.id) + str(randint(1,100)) + "local-post-video-temp." + file_name_data.split(".")[-1]
else:
if up_type == "profile_vid":
file_name = str(current_user.id) + get_random_code(5) + "-profil-video." + file_name_data.split(".")[-1]
else:
file_name = str(current_user.id) + str(randint(1,100)) + "-post-video-temp." + file_name_data.split(".")[-1]
file_type = file_type_data
session = boto3.Session(
aws_access_key_id=app.config['MY_AWS_ID'],
aws_secret_access_key=app.config['MY_AWS_SECRET'],
region_name='eu-central-1'
)
s3 = session.client('s3', config=Config(signature_version='s3v4'))
presigned_post = s3.generate_presigned_post(
Bucket = 'mybucket',
Key = 'videos/' + file_name,
Fields = {"acl": "public-read", "Content-Type": file_type},
Conditions = [
{"acl": "public-read"},
{"Content-Type": file_type}
],
ExpiresIn = 3600
)
if up_type == "profile_vid":
if current_user.profile_video != None:
delete_file_from_aws("videos/", current_user.profile_video)
setattr(current_user, "profile_video", file_name)
else:
print ('post video has been uploaded, no need to delete or set here')
db_session.commit()
return json.dumps({'data': presigned_post, 'url': 'https://s3.eu-central-1.amazonaws.com/mybucket/' + 'videos/' + file_name, 'created_file_name' : file_name})
I looked at the generated response in the dev console, there I had these values:
The HTML form I used is here, all the input fields which are uncommented have not been used by me. I simply include them as amazon shows them all in their example:
<form id="direct_s3_profile_video_form" class="form-horizontal" role="form" method="POST" enctype="multipart/form-data">
<!-- Content-Type: -->
<input type="hidden" name="Content-Type">
<!-- <input type="hidden" name="x-amz-meta-uuid"> -->
<!-- <input type="hidden" name="x-amz-server-side-encryption"> -->
<input type="hidden" name="X-Amz-Credential">
<input type="hidden" name="X-Amz-Algorithm">
<input type="hidden" name="X-Amz-Date">
<!-- Tags for File: -->
<!-- <input type="hidden" name="x-amz-meta-tag"> -->
<input type="hidden" name="Policy">
<input type="hidden" name="X-Amz-Signature">
<input id="NEW_fileupload_video" type="file" name="file" accept="video/*">
<button type="submit"> Upload </button>
</form>
Also note here that the file input must be at the bottom because:
elements after this will be ignored
In my case the values for the form were dynamically created, so I populated the form with JS:
$('#direct_s3_profile_video_form').find('input[name="key"]').val(response_json_data.data.fields['key']);
$('#direct_s3_profile_video_form').find('input[name="acl"]').val(response_json_data.data.fields['acl']);
$('#direct_s3_profile_video_form').find('input[name="Content-Type"]').val(response_json_data.data.fields['Content-Type']);
$('#direct_s3_profile_video_form').find('input[name="X-Amz-Credential"]').val(response_json_data.data.fields['x-amz-credential']);
$('#direct_s3_profile_video_form').find('input[name="X-Amz-Algorithm"]').val(response_json_data.data.fields['x-amz-algorithm']);
$('#direct_s3_profile_video_form').find('input[name="X-Amz-Date"]').val(response_json_data.data.fields['x-amz-date']);
$('#direct_s3_profile_video_form').find('input[name="Policy"]').val(response_json_data.data.fields['policy']);
$('#direct_s3_profile_video_form').find('input[name="X-Amz-Signature"]').val(response_json_data.data.fields['x-amz-signature']);
$('#direct_s3_profile_video_form').attr('action', 'https://mybucket.s3.amazonaws.com');
So there's two things that we do on a certain site, which is download new files and post files, and they're easily definable (and tedious), so I've been trying to script it.
So anyways, I have the file downloading script done. I'm able to re use the login code
#Cookies
cookies = http.cookiejar.LWPCookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookies))
urllib.request.install_opener(opener)
#Authenticate user
print("logging in")
url = "http://someurl.com/index.php?app=core&module=global§ion=login&do=process"
values = {"username" : USERNAME,
"password" : PASSWORD}
data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data)
urllib.request.urlopen(req)
But to post a thread with said file you have to attach it. I know how to send it the proper title and text for the thread from what I learned on how to log in. My problem is that to attach files you have to send a request to a form not a page request.
E.G. You select what file you want from file dialog and then click attach files, which it then uploads, you then finish writing up you're thread and THEN submit page.
Here's relevant html
<fieldset class='attachments'>
<script type='text/javascript'>
//<![CDATA[
ipb.lang['used_space'] = "Used <strong>[used]</strong> of your <strong>[total]</strong> global upload quota (Max. single file size: <strong>256MB</strong>)";
//]]>
</script>
<h3 class='bar'>Attachments</h3>
<!--SKINNOTE: traditional uploader needs this. -->
<div id='attach_error_box' class='message error' style='display:none'></div>
<input type='file' id='nojs_attach_0_1' class='input_upload' name='FILE_UPLOAD' tabindex='1' />
<input type='file' id='nojs_attach_0_2' class='input_upload' name='FILE_UPLOAD' tabindex='1' />
<ul id='attachments'><li style='display: none'></li></ul>
<br />
<span id='buttonPlaceholder'></span>
<input type='button' id='add_files_attach_0' class='input_submit' value='Attach This File' style='display: none; clear: both' tabindex='1' /> <span class='desc' id='space_info_attach_0'>Used <strong>9.45MB</strong> of your <strong>976.56MB</strong> global upload quota (Max. single file size: <strong>256MB</strong>)</span>
I have no idea how to code this, so I'm looking for direction.
Also on a sidenote if this sorta script might've been easier in other languages tell me which? I only used python because I knew it.
Thanks a lot.