I'm writing a Python script that creates a COVID-19 dashboard for my country and state and updates it daily.
However, I am struggling to download one of the necessary files.
Basically to download the file I have to access the website (https://covid.saude.gov.br/) and click on a button (class="btn-white md button button-solid button-has-icon-only ion-activatable ion-focusable hydrated ion-activated").
I tried to download via the download link but the site creates a different link every time you click the button and it still has a blob URL before HTTP.
I am very grateful to anyone who tries to help, because the data will be used to monitor the progress of the disease here where I live.
You can use their API to get the file name:
import requests
headers = {
'authority':'xx9p7hp1p7.execute-api.us-east-1.amazonaws.com',
'x-parse-application-id':'unAFkcaNDeXajurGB7LChj8SgQYS2ptm',
}
with requests.Session() as session:
session.headers.update(headers)
resp = session.get('https://xx9p7hp1p7.execute-api.us-east-1.amazonaws.com/prod/PortalGeral').json()
path = resp['results'][0]['arquivo']['url']
The x-parse-application-id doesn't seem to change. If it does, you can get the correct one by querying https://xx9p7hp1p7.execute-api.us-east-1.amazonaws.com/prod/PortalGeralApi and extract it from ['planilha']['arquivo'][url].
Related
I'm building a bot that logs into zoom at specified times and the links are being obtained from whatsapp. So i was wondering if it is was possible to retrieve those links from whatsapp directly instead of having to copy paste it into python. Google is filled with guides to send messages but is there any way to READ and RETRIEVE those messages and then manipulate it?
You can, at most, try to read WhatsApp messages with Python using Selenium WebDriver since I strongly doubt that you can access WhatsApp APIs.
Selenium is basically an automation tool that lets you automate tasks in your browser so, perhaps, you could write a Python script using Selenium that automatically opens WhatsApp and parses HTML information regarding your WhatsApp web client.
First of all, we mentioned Selenium, but we will use it only to automate the opening and closing of WhatsApp, now we have to find a way to read what's inside the WhatsApp client, and that's where the magic of Web Scraping comes is hand.
Web scraping is a process of extracting data from a website, in this case, the data is represented by the Zoom link you need to automatically obtain, while the web site is your WhatsApp client. To perform this process you need a way to extract (parse) information from the website, to do so I suggest you use Beautiful Soup, but I advise you that a minimum knowledge of how HTML works is required.
Sorry if this may not completely answer your question but this is all the knowledge I have on this specific topic.
You can open WhatsApp on browser using https://selenium-python.readthedocs.io/ in Python.
Selenium is basically an automation tool that lets you automate tasks in your browser so, perhaps, you could write a Python script using Selenium that automatically opens WhatsApp and parses HTML information regarding your WhatsApp web client.
I learn and use code from "https://towardsdatascience.com/complete-beginners-guide-to-processing-whatsapp-data-with-python-781c156b5f0b" this site. Go through the details written on mentioned link.
You have to install external python library "whatsapp-web" from this link --- "https://pypi.org/project/whatsapp-web/". Just type in command prompt / windows terminal by "python -m pip install whatsapp-web".
It will show result ---
python -m pip install whatsapp-web
Collecting whatsapp-web
Downloading whatsapp_web-0.0.1-py3-none-any.whl (21 kB)
Installing collected packages: whatsapp-web
Successfully installed whatsapp-web-0.0.1
You can read all the cookies from whatsapp web and add them to headers and use the requests module or you can also use selenium with that.
Update :
Please change the xpath's class name of each section from the current time class name of WhatsApp web by using inspect element section in WhatsApp web to use the following code. Because WhatsApp have changed its element's class names.
I have tried that in creating a WhatsApp bot using python.
But there are still many bugs because of I am also beginner.
steps based on my research :
Open browser using selenium webdriver
Login on WhatsApp using qr code
If you know from which number you are going to received the meeting link then use this step otherwise check the following process mention after this process.
Find and open the chat room where you are going to received zoom meeting link.
For getting message from known chat room to perform action
#user_name = "Name of meeting link Sender as in your contact list"
Example :
user_name = "Anurag Kushwaha"
#In above variable at place of `Anurag Kushwaha` pass Name or number of Your Teacher
# who going to sent you zoom meeting link same as you have in your contact list.
user = webdriver.find_element_by_xpath('//span[#title="{}"]'.format(user_name))
user.click()
# For getting message to perform action
message = webdriver.find_elements_by_xpath("//span[#class='_3-8er selectable-text copyable-text']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing received text message of any chat room.
for i in message:
try:
if "zoom.us" in str(i.text):
# Here you can use you code to preform action according to your need
print("Perform Your Action")
except:
pass
If you do not know by which number you are going to received the link.
Then you can get div class of any unread contact block and get open all the chat room list which are containing that unread div class.
Then check all the unread messages of open chat and get the message from the div class.
When you don't know from whom you gonna received zoom meeting link.
# For getting unread chats you can use
unread_chats = webdriver.find_elements_by_xpath("// span[#class='_38M1B']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing the number of unread message showing the contact card inside a green circle before opening the chat room.
# Open each chat using loop and read message.
for chat in unread_chats:
chat.click()
# For getting message to perform action
message = webdriver.find_elements_by_xpath("//span[#class='_3-8er selectable-text copyable-text']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing received text message of any chat room.
for i in messge:
try:
if "zoom.us" in str(i.text):
# Here you can use you code to preform action according to your need
print("Perform Your Action")
except:
pass
Note : In the above code 'webdriver' is the driver by which you open web.whatsapp.com
Example :
from selenium import webdriver
webdriver = webdriver.Chrome("ChromePath/chromedriver.exe")
webdriver.get("https://web.whatsapp.com")
# This wendriver variable is used in above code.
# If you have used any other name then please rename in my code or you can assign your variable in that code variable name as following line.
webdriver = your_webdriver_variable
A complete code reference Example :
from selenium import webdriver
import time
webdriver = webdriver.Chrome("ChromePath/chromedriver.exe")
webdriver.get("https://web.whatsapp.com")
time.sleep(25) # For scan the qr code
# Plese make sure that you have done the qr code scan successful.
confirm = int(input("Press 1 to proceed if sucessfully login or press 0 for retry : "))
if confirm == 1:
print("Continuing...")
elif confirm == 0:
webdriver.close()
exit()
else:
print("Sorry Please Try again")
webdriver.close()
exit()
while True:
unread_chats = webdriver.find_elements_by_xpath("// span[#class='_38M1B']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing the number of unread message showing the contact card inside a green circle before opening the chat room.
# Open each chat using loop and read message.
for chat in unread_chats:
chat.click()
time.sleep(2)
# For getting message to perform action
message = webdriver.find_elements_by_xpath("//span[#class='_3-8er selectable-text copyable-text']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing received text message of any chat room.
for i in messge:
try:
if "zoom.us" in str(i.text):
# Here you can use you code to preform action according to your need
print("Perform Your Action")
except:
pass
Please make sure that the indentation is equal in code blocks if you are copying it.
Can read my another answer in following link for more info about WhatsApp web using python.
Line breaks in WhatsApp messages sent with Python
I am developing WhatsApp bot using python.
For contribution you can contact at : anurag.cse016#gmail.com
Please give a star on my https://github.com/4NUR46 If this Answer helps you.
Try This Its A bit of a hassle but it might work
import pyautogui
import pyperclip
import webbrowser
grouporcontact = pyautogui.locateOnScreen("#group/contact", confidence=.6) # Take a snip of the group or contact name/profile photo
link = pyperclip.paste()
def searchforgroup():
global link
time.sleep(5)
webbrowser.open("https://web.whatsapp.com")
time.sleep(30)#for you to scan the qr code if u have done it then u can edit it to like 10 or anything
grouporcontact = pyautogui.locateOnScreen("#group/contact", confidence=.6)
x = grouporcontact[0]
y = grouporcontact[1]
if grouporcontact == None:
#Do any other option in my case i just gave it my usual link as
link = "mymeetlink"
else:
pyautogui.moveTo(x,y, duration=1)
pyautogui.click()
# end of searching group
def findlink():
global link
meetlink = pyautogui.locateOnScreen("#", confidence=.6)#just take another snap of a meet link without the code after the "/"
f = meetlink[0]
v = meetlink[1]
if meetlink == None:
#Do any other option in my case i just gave it my usual link as
link = "mymeetlink"
else:
pyautogui.moveTo(f,v, duration=.6)
pyautogui.rightClick()
pyautogui.moveRel(0,0, duration=2) # You Have to play with this it basically is considered by your screen size so just edit that and edit it till it reaches the "Copy Link Address"
pyautogui.click()
link = pyperclip.paste()
webbrowser.open(link) # to test it out
So Now You Have It Have To Install pyautogui, pyperclip
and just follow the comments in the snippet and everything should work :)
I'm trying to use the Python SDK for IBM Watson Language Translator v3, testing the beta functionality of translating actual documents. Below is my code:-
from ibm_watson import LanguageTranslatorV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
API = "1234567890abcdefg"
GATEWAY = 'https://gateway-lon.watsonplatform.net/language-translator/api'
document_list = []
"""The below authenticates to the IBM Watson service and initiates an instance"""
authenticator = IAMAuthenticator(API)
language_translator = LanguageTranslatorV3(
version='2018-05-01',
authenticator=authenticator
)
language_translator.set_service_url(GATEWAY)
submission = language_translator.translate_document(file="myfile.txt", filename="myfile.txt", file_content_type='text/plain', model_id=None, source='en', target='es', document_id=None)
document_list.append(submission.result['document_id'])
while len(document_list) > 0:
for document in document_list:
document_status = language_translator.get_document_status(document)
if document_status.result['status'] == "available":
translated_document = language_translator.get_translated_document(document)
document_list.remove(document)
language_translator.delete_document(document)
A few questions on this:-
When I check the content of 'translated_document', it doesn't actually contain any content. It contains the headers and the HTTP status of the response but no actually translated content
I decided to use CURL to download my uploaded document and instead of the actual content of the .txt file being uploaded for translation, when downloading the translated file via CURL, it appears that the content is the actual file name (myfile.txt) that is being submitted for translation as opposed to the content of the file.
Researching this and looking at the actual IBM Watson Github respository, it appears that I may have to read the content of 'myfile.txt' to a variable and then pass this variable as 'file={my_variable}' when submitting the translation but doesn't this defeat the object of being able to submit the actual documents for translation? How is this different to the conventional service offered?
Can anybody advise me as to what I'm doing wrong? I've tried multiple approaches (writing the value of 'translated_content' to a file) for example but I just don't seem to be able to grab the translated content nor can I seem to actually upload the content of the file to the service, instead I simply appear to submit the filename.
Thanks all
The file parameter of translate_document is supposed to be the actual content to be translated. I realize that's not clear from the documentation, but that's how the service works. So try passing the actual content you want translated in the file parameter.
I am trying to upload image file into the browser using mechanize.
Although there is no error, the uploaded file does not reflect when I check manually in the browser (post submit/saving).
I am using the following code to upload the files
import mechanize as mc
br = mc.Browser()
br.set_handle_robots(False)
br.select_form(nr=0)
br.form.add_file(open("test.png"), content_type="image/png",
filename='before',name="ctl00$ContentPlaceHolder1$fileuploadBeforeimages")
br.submit("ctl00$ContentPlaceHolder1$cmdSave")
# this is supposed to save the form on the webpage. It saves the texts in the other fields, whereas the image does not show up.
The add file command seems to work. I can confirm this because when I print br.forms()[0] the file details show up (<FileControl(ctl00$ContentPlaceHolder1$fileuploadBeforeimages=before)>).
But there is no sign of the image file post this code snippet. I have checked several examples which include br.submit() without any specific button control, when I do this no page is saved on the website.
What am I missing?
Thanks in advance.
EDIT
When I manually try to upload the file, I see a pop-up asking for confirmation. Under inspect, this is present as
onchange="if (confirm('Upload ' + this.value + '?')) this.form.submit();"
I am not sure if this is a JavaScript element and mechanize cannot pass through this part for upload function. Can someone confirm this.?
you can just put 'rb' in front of image name like this:
br.form.add_file(open("test.png",'rb'),'images/png',filename,name='file')
I have a subscription to the site https://www.naturalgasintel.com/ for daily feeds of data that show up on their site directly as .txt files; their user login page being https://www.naturalgasintel.com/user/login/
For example a file for today's feed is given by the link https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2019/01/20190104td.txt and shows up on the site like the picture below:
What I'd like to do is to log in using my user_email and user_password and scrape this data in the form of an Excel file.
When I use Twill to try and 'point' me to the data by first logging me into the site I use this code:
from email.mime.text import MIMEText
from subprocess import Popen, PIPE
import twill
from twill.commands import *
year= NOW[0:4]
month=NOW[5:7]
day=NOW[8:10]
date=(year+month+day)
path = "https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/"
end = "td.txt"
go("http://www.naturalgasintel.com/user/login")
fv("2", "user[email]", user_email)
fv("2", "user[password]", user_password)
fv("2", "commit", "Login")
datafilelocation = path + year + "/" + month + "/" + date + end
go(datafilelocation)
However, logging in from the user login page sends me to this referrer link when I go to the data's location.
https://www.naturalgasintel.com/user/login?referer=%2Fext%2Fresources%2FData-Feed%2FDaily-GPI%2F2019%2F01%2F20190104td.txt
Rather than:
https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2019/01/20190104td.txt
I've tried using modules like requests as well to log in from the site and then access this data but whatever method I use sends me to the HTML source rather than the .txt data location itself.
I've posted my complete walk-through with the Python 2.7 module Twill which I attached a bounty to here:
Using Twill to grab .txt from login page Python
What would the best solution to being able to access these password protected files be?
If you have a compatible version of FireFox for this, then get the plugin javascript 0.0.1 by Chee and add the following to run on the page:
document.getElementById('user_email').value = "E-What";
document.getElementById('user_password').value = " ABC Password ";
Change the email and password as you like. It will load the page, then after that it will put in your username and password.
There are other ways to do this all by yourself with your own stand-alone process. You do not have to download other people's programs and try to learn them (beyond this little thing) if you change it this way.
I would have up voted this question.
I have a python script which is trying to export a confluence page as pdf and have tried several methods unsuccessfully:
1.WGET:
wget --ask-password --user xxxxxxxx -O out.pdf -q http://confluence.xxxx.com/spaces/flyingpdf/pdfpageexport.action?pageId=xxxxxxxx
This won't work because it just returns a login dialog rather than the actual pdf.
2.REMOTE API:
Using: https://developer.atlassian.com/confdev/deprecated-apis/confluence-xml-rpc-and-soap-apis/remote-confluence-methods#RemoteConfluenceMethods-Pages
There is an exportSpace method which works but I only want a single page, the getPage method doesn't export to pdf as far as I can tell. Also this is technically deprecated so Atlassian instead recommends:
3.REST API
Using: https://docs.atlassian.com/atlassian-confluence/REST/latest-server/
This doesn't seem to have an option to export a page as PDF
I would appreciate an answer that makes any of these methods work or if you have a completely different approach, I don't care as long as I can get the PDF of the page from a python script.
#This worked for me
#pip install atlassian-python-api
from atlassian import Confluence
#This creates connection object where you provide your confluence URL and credentials.
confluence = Confluence(
url='https://confluence.xxxxx.com/',
username='xxxxxxx',
password='yyyyyyy')
# If you know page_id of the page, you can get page id by going to "Page Information" menu tab and the page id will be visible in browser as viewinfo.action?pageId=244444444444. This will return a response having key:value pairs having page details.
page = confluence.get_page_by_id(page_id=<some_id>)
your_fname = "abc.pdf"
#A function to create pdf from byte-stream responce
def save_file(content):
file_pdf = open(your_fname, 'wb')
file_pdf.write(content)
file_pdf.close()
print("Completed")
#Get your confluence page as byte-stream
response = confluence.get_page_as_pdf(page['id'])
#Call function that will create pdf and save file using byte-stream response you received above.
save_file(content=response)
Based on the answer of Michael Pillai I want to amend that for the Confluence Cloud you must add the api_version='cloud' keyword:
confluence = Confluence(
url='https://confluence.xxxxx.com/',
username='xxxxxxx',
password='yyyyyyy',
api_version='cloud' # <<< important for the pdf export <<<<
)
content = confluence.get_page_as_pdf(page_id)
Copied from the official pdf export example.
You can integrate your script with Confluence Bob Swift CLI plugin. This plugin supports various type of exports.
Step 1: Install the plugin in both places the frontend and backend.
Step 2: Verify your installation by this command -
/location-of-plugin-installation-directory/.confluence.sh --action getServerInfo
Step 3: Use the command below to export your space
/location-of-plugin-installation-directory/.confluence.sh --action exportSpace --space "zconfluencecli" --file "target/output/export/exportSpacepdf.txt" --exportType "PDF"
Link to bob swift plugin
The newest docs use
confluence.export_page(page_id)
For anyone still looking for this info.