How to upload file using mechanize python? - python

I need to upload a file(txt) in a form like this:
the structure of the add button is as follows:
after upload I have to send the file with the send button:
the structure of the send button is as follows:
I searched and found only examples like this:
browser.select_form(name = 'formForm')
browser.form.add_file(open(directory))
response = browser.submit()
but I didn’t succeed, if anyone can help me thank you very much

this is what you're looking for:
br.form.add_file(open(filename), 'text/plain', filename)

Related

Download a xlsx file by clicking a website button using Python

I'm writing a Python script that creates a COVID-19 dashboard for my country and state and updates it daily.
However, I am struggling to download one of the necessary files.
Basically to download the file I have to access the website (https://covid.saude.gov.br/) and click on a button (class="btn-white md button button-solid button-has-icon-only ion-activatable ion-focusable hydrated ion-activated").
I tried to download via the download link but the site creates a different link every time you click the button and it still has a blob URL before HTTP.
I am very grateful to anyone who tries to help, because the data will be used to monitor the progress of the disease here where I live.
You can use their API to get the file name:
import requests
headers = {
'authority':'xx9p7hp1p7.execute-api.us-east-1.amazonaws.com',
'x-parse-application-id':'unAFkcaNDeXajurGB7LChj8SgQYS2ptm',
}
with requests.Session() as session:
session.headers.update(headers)
resp = session.get('https://xx9p7hp1p7.execute-api.us-east-1.amazonaws.com/prod/PortalGeral').json()
path = resp['results'][0]['arquivo']['url']
The x-parse-application-id doesn't seem to change. If it does, you can get the correct one by querying https://xx9p7hp1p7.execute-api.us-east-1.amazonaws.com/prod/PortalGeralApi and extract it from ['planilha']['arquivo'][url].

File upload through python mechanize

I am trying to upload image file into the browser using mechanize.
Although there is no error, the uploaded file does not reflect when I check manually in the browser (post submit/saving).
I am using the following code to upload the files
import mechanize as mc
br = mc.Browser()
br.set_handle_robots(False)
br.select_form(nr=0)
br.form.add_file(open("test.png"), content_type="image/png",
filename='before',name="ctl00$ContentPlaceHolder1$fileuploadBeforeimages")
br.submit("ctl00$ContentPlaceHolder1$cmdSave")
# this is supposed to save the form on the webpage. It saves the texts in the other fields, whereas the image does not show up.
The add file command seems to work. I can confirm this because when I print br.forms()[0] the file details show up (<FileControl(ctl00$ContentPlaceHolder1$fileuploadBeforeimages=before)>).
But there is no sign of the image file post this code snippet. I have checked several examples which include br.submit() without any specific button control, when I do this no page is saved on the website.
What am I missing?
Thanks in advance.
EDIT
When I manually try to upload the file, I see a pop-up asking for confirmation. Under inspect, this is present as
onchange="if (confirm('Upload ' + this.value + '?')) this.form.submit();"
I am not sure if this is a JavaScript element and mechanize cannot pass through this part for upload function. Can someone confirm this.?
you can just put 'rb' in front of image name like this:
br.form.add_file(open("test.png",'rb'),'images/png',filename,name='file')

Upload a text file to a site for analysis using MechanicalSoup

I'm trying to get a text file to TreeTagger Online to get it analyzed and get the link to the resulting file to download.
import mechanicalsoup
browser = mechanicalsoup.Browser()
homePage = browser.get("http://cental.fltr.ucl.ac.be/treetagger/")
formPart = homePage.soup.select("form[name=treetagger_form]")[0]
formPart.select("[name=file_to_tag]")[0]["name"]=open('test.txt', 'rb')
result = browser.post(formPart, homePage.url)
This gives me the following error:
: (, UnicodeEncodeError('ascii', u'No connection adapters were found for \'\n\n\n\n\n Texte \xe0 \xe9tiqueter : \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\'', 216, 217, 'ordinal not in range(128)'))
How should I proceed to get my file on site (using MechanicalSoup or another module)?
01/04/19 Edit
Even though I did not manage to get #Rolando Urquiza's answer to work on my machine, I was able to get the thing done from his suggestions.
import mechanicalsoup
browser = mechanicalsoup.Browser()
homePage = browser.get("http://cental.fltr.ucl.ac.be/treetagger/")
formPart = homePage.soup.select("form[name=treetagger_form]")[0]
form=mechanicalsoup.Form(formPart)
form.set('file_to_tag', 'test.txt')
upload=browser.submit(form,url="http://cental.fltr.ucl.ac.be/treetagger/")
Thanks #Rolando Urquiza
According to the documentation of MechanicalSoup, you can upload a file using the set function on a mechanicalsoup.Form instance, see here. For example, this is how you can use it:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.get("http://cental.fltr.ucl.ac.be/treetagger/")
form = browser.select_form()
form.set('file_to_tag', 'test.txt')
result = browser.submit_selected()

How to parse web elements into notepad using Python?

can anyone help me with "extracting" stuff from site using Python? Here is the info :
I have folder name with set of numbers (they are ID of item) and i have to use that ID for entering page and then "scrap" info from page to my notepad... It's like this : http://www.somesite.com/pic.mhtml?id=[ID]... I need to exctract picture link (picture link always have ID.jpg at the end of the file)from it and write it in notepad and then replace that txt name with name of the picture... Picture is always in title tags... Thanks in advance...
What you need is a data scraper - http://www.crummy.com/software/BeautifulSoup/ will help you pull data off of websites. You can then load that data into a variable, write it to a file, or do anything you normally do with data.
You could try parsing the html source for images.
Try something similar:
class Parser(object):
__rx = r'(url|src)="(http://www\.page\.com/path/?ID=\d*\.(jpeg|jpg|gif|png)'
def __crawl(self, url):
images = []
code = urllib.urlopen(url).read()
for line in code.split('\n'):
imagesearch = re.search(self.__rx, line)
if imagesearch:
image = '%s.%s' % (imagesearch.group(2), imagesearch.group(4))
images.append(image)
return images
it's untestet, you may want to check the regex

cgi get submitted image data-url

Sorry if this question is dumb. I converted a jqplot image to data-url and send it with a form. In the new page, I used cgi.FieldStorage() to get the submitted data-url, but got nothing. So could anyone give me some suggestions on my approach? Thanks!
I am using Python on Google App Engine
Input page Javascript
var imgData = $('#chart1').jqplotToImageStr({});
$('<tr style="display:none"><td><input type="hidden" name="extract1"></td></tr>')
.appendTo('.getpdf')
.find('input')
.data(imgData);
Output page:
def post(self):
form = cgi.FieldStorage()
extract1 = form.getvalue('extract1') #extract1 is empty
I tried to print the form and it looked as:MiniFieldStorage('extract1', '"data:image/png;base64,iVBORw0KGgoAAAANSUhE
If I assigned the data-url (data:image/png;base64,iVBORw0KGg...), it worked.
I have (working) code that does something similar.
def post(self):
img_data = self.request.get('extract1')
# then strip off the prefix and convert from base64
is what I'm doing. Poking into cgi.FieldStorage isn't something I normally see done in Python App Engine apps.

Categories

Resources