How to get resulting URL from search? - python

I am trying to write a program that does chemical search on https://echa.europa.eu/ and gets the result. The "Search for Chemicals" field is on the middle of the main webpage. I want to get the resulting URLs from doing search for each chemicals by providing the cas number (ex. 67-56-1). It seems that the URL I get does not include the cas number provided.
https://echa.europa.eu/search-for-chemicals?p_p_id=disssimplesearch_WAR_disssearchportlet&p_p_lifecycle=0&_disssimplesearch_WAR_disssearchportlet_searchOccurred=true&_disssimplesearch_WAR_disssearchportlet_sessionCriteriaId=dissSimpleSearchSessionParam101401584308302720
I tried inserting different cas number (71-23-8) into "p_p_id" field, but it didn't give expected search result.
https://echa.europa.eu/search-for-chemicals?p_p_id=71-23-8
I also examined the headers of GET methods requested from Chrome which also did not include the cas number.
Is the website using variables to store the input query? Is there a way or a tool that can be used to get the resulting URL including searching cas number?
Once I figure this out, I'll be using Python to get the data and save it as excel file.
Thanks.

You need to get the JESSIONID cookie by requesting the main url once then send a POST on https://echa.europa.eu/search-for-chemicals. But this needs also some required URL parameters
Using curl and bash :
query="71-23-8"
millis=$(($(date +%s%N)/1000000))
curl -s -I -c cookie.txt 'https://echa.europa.eu/search-for-chemicals'
curl -s -L -b cookie.txt 'https://echa.europa.eu/search-for-chemicals' \
--data-urlencode "p_p_id=disssimplesearch_WAR_disssearchportlet" \
--data-urlencode "p_p_lifecycle=1" \
--data-urlencode "p_p_state=normal" \
--data-urlencode "p_p_col_id=column-1" \
--data-urlencode "p_p_col_count=2" \
--data-urlencode "_disssimplesearch_WAR_disssearchportlet_javax.portlet.action=doSearchAction" \
--data-urlencode "_disssimplesearch_WAR_disssearchportlet_backURL=https://echa.europa.eu/home?p_p_id=disssimplesearchhomepage_WAR_disssearchportlet&p_p_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_count=2" \
--data-urlencode "_disssimplesearchhomepage_WAR_disssearchportlet_sessionCriteriaId=" \
--data "_disssimplesearchhomepage_WAR_disssearchportlet_formDate=$millis" \
--data "_disssimplesearch_WAR_disssearchportlet_searchOccurred=true" \
--data "_disssimplesearch_WAR_disssearchportlet_sskeywordKey=$query" \
--data "_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer=on" \
--data "_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox=on"
Using python and scraping with beautifulsoup
import requests
from bs4 import BeautifulSoup
import time
url = 'https://echa.europa.eu/search-for-chemicals'
query = '71-23-8'
s = requests.Session()
s.get(url)
r = s.post(url,
params = {
"p_p_id": "disssimplesearch_WAR_disssearchportlet",
"p_p_lifecycle": "1",
"p_p_state": "normal",
"p_p_col_id": "column-1",
"p_p_col_count": "2",
"_disssimplesearch_WAR_disssearchportlet_javax.portlet.action": "doSearchAction",
"_disssimplesearch_WAR_disssearchportlet_backURL": "https://echa.europa.eu/home?p_p_id=disssimplesearchhomepage_WAR_disssearchportlet&p_p_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_count=2",
"_disssimplesearchhomepage_WAR_disssearchportlet_sessionCriteriaId": ""
},
data = {
"_disssimplesearchhomepage_WAR_disssearchportlet_formDate": int(round(time.time() * 1000)),
"_disssimplesearch_WAR_disssearchportlet_searchOccurred": "true",
"_disssimplesearch_WAR_disssearchportlet_sskeywordKey": query,
"_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer": "on",
"_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox": "on"
}
)
soup = BeautifulSoup(r.text, "html.parser")
table = soup.find("table")
data = [
(
t[0].find("a").text.strip(),
t[0].find("a")["href"],
t[0].find("div", {"class":"substanceRelevance"}).text.strip(),
t[1].text.strip(),
t[2].text.strip(),
t[3].find("a")["href"] if t[3].find("a") else "",
t[4].find("a")["href"] if t[4].find("a") else "",
)
for t in (t.find_all('td') for t in table.find_all("tr"))
if len(t) > 0 and t[0].find("a") is not None
]
print(data)
Note that I've set the timestamp parameter (formDate param) in case of it's actually checked on the server

Related

InfluxDB PythonAPI broken or am I?

Does StackOverflow really autodeletes Hey guys from beginning of text? :D Hello, i have a problem i cant seem to wrap my mind around.
from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS
# You can generate a Token from the "Tokens Tab" in the UI
org = "myorg"
bucket = "mybucket"
token = 'valid_token'
client = InfluxDBClient(url="http://localhost:8086", token=token)
write_api = client.write_api(write_options=SYNCHRONOUS)
d='airSensors,sensor_id=TLM0201 temperature=70.97038159354763,humidity=55.23103248356096,co=0.78445310567793615 1637124357000000000'
write_api.write(bucket, org, d)
This runs and returns no error, i tried making a mistake in eg. bucket and it raises, bad token raises unauthorized, etc..
BUT there is no data in database when i check. BUT when i run this exact line through curl:
curl --request POST \
"http://localhost:8086/api/v2/write?org=myorg&bucket=mybucket&precision=ns" \
--header "Authorization: Token valid_token" \
--header "Content-Type: text/plain; charset=utf-8" \
--header "Accept: application/json" \
--data-binary '
airSensors,sensor_id=TLM0201 temperature=73.97038159354763,humidity=35.23103248356096,co=0.48445310567793615 1637024357000000000
airSensors,sensor_id=TLM0202 temperature=75.30007505999716,humidity=35.651929918691714,co=0.5141876544505826 1637024357000000000
'
This runs also with no errors but this time it actually writes into db.
Am i crazy or what? I tried everything, writing through Points, series,... u name it but it refuses to commit or smthn? Anyone had similar problem?
I run influxdb-client=1.23.0 on python=3.8.10 and Influxdb=2.0.7
Thanks for ur time. Q.
I guess you should use write_api.close() in the end of your write or use with:
with client.write_api() as write_api:
write_api.write(bucket, org, d)
https://github.com/influxdata/influxdb-client-python#writes

Duo API Bash Call

I'm trying to use Curl to perform a call with the DUO API.
I tried reviewed their docs here: https://duo.com/docs/adminapi#authentication
The docs says to pass the creds as HMAC key for the request but now sure how to get that going.
This is what I got so far:
curl --request GET \
--header 'Authorization: Basic 'Integration key:Secret key'' \
--header "Content-Type: application/x-www-form-urlencoded" \
"https://api-12345678.duosecurity.com/auth/v2/check"
Which returns
{"code": 40101, "message": "Missing request credentials", "stat": "FAIL"}
Can one point me to the right direction for an example in Bash. If not in Python.
First, your request format does not seem correct, because Integration key:Secret key'' is outside the header (look at the way the syntax is highlighted in the question).
Try:
curl --request GET \
--header 'Authorization: Basic' \
--header 'Integration key: Secret key' \
--header 'Date: Tue, 21 Aug 2012 17:29:18 -0000' \
--header "Content-Type: application/x-www-form-urlencoded" \
"https://api-12345678.duosecurity.com/auth/v2/check"
It's somewhat uncommon to have header names with a space and a lowercase like Integration key, so you may need to experiment with variants, like Integration-Key.
Second, the 401xx series errors mean:
401 The “Authorization”, “Date”, and/or “Content-Type” headers were missing or invalid.
You'll need to add the missing the Date header, required by the authenticator.
In case anyone else stumbles on this, here's what I came up with:
#!/bin/bash -u
FORM="Content-Type: application/x-www-form-urlencoded"
NOW=$(date -R)
#get these from the Duo Admin interface
INT="<integration key>"
KEY="<secret passcode>"
API="<api host>.duosecurity.com"
URL="/auth/v2/check"
REQ="$NOW\nGET\n$API\n$URL\n"
#could also use awk here, or the --binary mode as suggested elsewhere
HMAC=$(echo -n "$REQ" | openssl sha1 -hmac "$KEY" | cut -d" " -f 2)
AUTH=$(echo -n "$INT:$HMAC" | base64 -w0)
curl -s -H "Date: $NOW" -H $FORM -H "Authorization: Basic $AUTH" https://$API$URL
Running this yields:
{"response": {"time": 1539726254}, "stat": "OK"}
Reference: Duo Api docs section on authentication

Python requests module: Equivalent of cURL --data command

I'm trying to write a script that imitates a cURL command I make to change data on the target web page:
curl -u username:password "https://website.com/update/" --data "simChangesList=%5B%7B%22simId%22%3A760590802%2C%22changeType%22%3A2%2C%22targetValue%22%3A%220003077%22%2C%22effectiveDate%22%3Anull%7D%5D" --compressed
As you can see above, I am POSTing a url-encoded string to the target web page.
The following code does not work:
import requests
import urllib
enc = urllib.quote('[{"simId":760590802,"changeType":2,"targetValue":000307,"effectiveDate":null}]')
simChangesList = 'simChangesList=' + enc
print simChangesList
auth = s.post(url, data=simChangesList)
print auth.text
Even though I'm fairly certain the above code imitates my cURL command previously, but it obviously isn't.
I am getting a Required List parameter 'simChangesList' is not present error.
What is the equivalent of the cURL command to POST a url-encoded string with the requests module in Python?
EDIT:
I've tried to make multiple dictionaries with simChangesList as the key, but I cannot seem to do it.
Here are my attempts:
simChangesList: [{"simId":760590802,"changeType":2,"targetValue":000307,"effectiveDate":null}]
data = {'simChangesList': ['simId': 760590802, 'changeType': 2, 'targetValue': '0003077', 'effectiveDate': null]}
data['simChangesList'] = ['simId': 760590802, 'changeType': 2, 'targetValue': '0003077', 'effectiveDate': null]
simChangesList:[{"simId":760590802,"changeType":2,"targetValue":"000307","effectiveDate":null}]
payload = {
'simChangesList':
[{'simId': '760590802',
'changeType': '2',
'targetValue': '0003077',
'effectiveDate': 'null'}]
}

an idea or module te generate structured text in python?

Well first am sorry because i didn't know how to ask my question, last time in a security challenge i was trying to send some request with curl, after a few moment where they have a lot test to find out how the challenge is really working , i tried to write some python code for generating my request automatically and win some time
here are some of the request that i used to try :
The basic one
curl http://10.20.0.50:80/
then i have to specify the path example :
curl http://10.20.0.50:80/transfert/solde
curl http://10.20.0.50:80/account/creat
...
some time add authorization or cookie ...
curl http://10.20.0.50:80/transfert/solde -H "Authorization:Basic bXlhcGk6U3VwZXJTZWNyZXRQYXMkdzByZA==" -H "cookie: PHPSESSID=23c3jc3spuh27cru38kf9l2au5;"
or add some parameters :
curl http://10.20.0.50:80/transfert/solde -H "Authorization:Basic bXlhcGk6U3VwZXJTZWNyZXRQYXMkdzByZA==" -H "cookie: PHPSESSID=23c3jc3spuh27cru38kf9l2au5;" --data-raw '{"id":"521776"}' -v
So the thing is i have to test a lot of thing with and without authorization with and without cookie and changing cookie some times and add --data-raw ... i tried to write a script to do this for me but it's ugly :
url = "http://10.20.0.50:80/"
auth = ' -H "Authorization:Basic bXlhcGk6U3VwZXJTZWNyZXRQYXMkdzByZA=="'
def generate(path,c=None,h=True,plus = None):
#c cookie , h if we put authentification
#plus add more code at the end of the request
global auth # authentification
global url
if c:
cook = ' -H "cookie: PHPSESSID={};"'.format(c)
req = "curl "+url+path
if h:#h bool
req += auth
if c :
req += cook
if plus :
req += plus
req+=" -v "
return req
I removed one parameter the --data-row for readability, the idea is that i want to know if there is a better way for doing that ! and not just with this example but in general, if i want to create python code that generate a code source of class where i have to specify the name of class the attributes and type and the code generate a template ...
I hope that you can help me :D
PS : Sorry for my English if i mad some mistakes
Maybe, a one way to "improve" your code is doing something like this:
def generate(command = "", headers = [], raws = [], other = [], v = True):
if headers:
command += "".join(" -H " + k for k in h)
if raws:
command += "".join(" --data-raw " + k for k in raw)
if v:
command += " -v"
if other:
command += "".join(" " + k for k in other)
return command
h = ['"Authorization:Basic bXlhcGk6U3VwZXJTZWNyZXRQYXMkdzByZA=="', '"cookie: PHPSESSID=23c3jc3spuh27cru38kf9l2au5;"']
raw = ["'{\"id\":\"521776\"}'"]
cmd = "curl http://10.20.0.50:80/transfert/solde"
command1 = generate(command=cmd,headers=h,raws= raw)
command2 = generate(command=cmd,headers=h,raws=raw, v=False)
command3 = generate(command=cmd,v = False)
print("command1:",command1)
print("command2:", command2)
print("command3:", command3)
Output:
command1: curl http://10.20.0.50:80/transfert/solde -H "Authorization:Basic bXlhcGk6U3VwZXJTZWNyZXRQYXMkdzByZA==" -H "cookie: PHPSESSID=23c3jc3spuh27cru38kf9l2au5;" --data-raw '{"id":"521776"}' -v
command2: curl http://10.20.0.50:80/transfert/solde -H "Authorization:Basic bXlhcGk6U3VwZXJTZWNyZXRQYXMkdzByZA==" -H "cookie: PHPSESSID=23c3jc3spuh27cru38kf9l2au5;" --data-raw '{"id":"521776"}'
command3: curl http://10.20.0.50:80/transfert/solde

Problems with python + json vs. curl

so when I run the python code the server (google) give me a different response than when I run curl command. Can someone tell me where I'm wrong please?
code:
import urllib2, simplejson
def MapsWIFI(card):
req = urllib2.Request("https://www.googleapis.com/geolocation/v1/geolocate?key=AI...")
jWifi = """
{
"wifiAccessPoints": [
{
"macAddress": "64:D1:A3:0A:11:65",
"channel": 6,
},
... #some AP here
]
}
"""
print jWifi
req.add_header("Content-Type", "application/json")
jWifiReport = urllib2.urlopen(req,simplejson.dumps(jWifi)).read()
print jWifiReport
APdetected = str(len(wifiCell))
mapsDict = simplejson.loads(jWifiReport)
location = str(mapsDict.get("location",{}))[1:-1]
accuracy = "Accuracy: "+str(mapsDict.get("accuracy",{}))[1:-1]
mapMe = "|---"+location.split(",")[0]+"\n|---"+location.split(",")[1][1:]+"\n|---$
return mapMe
MapsWIFI("wlp8s0")
And the command is:
curl -d #file2.json -H "Content-Type: application/json" -i "https://www.googleapis.com/geolocation/v1/geolocate?key=AI..."
where file2.json contains exactly jWifi in that format.
The problem is that, as said, the location returned by the code is different from the location returned by curl. I don't get error code so I thing that the syntax is correct.
The data is already a JSON encoded string, you don't want to encode it twice.
Pass it in without encoding it again:
jWifiReport = urllib2.urlopen(req, jWifi).read()
You only need to encode if you have a Python data structure (a dictionary in this case).

Categories

Resources