Changing text variable from another imported script - python

So we have two scripts the first being AdidasStock.py and the second being StockWindow.py. I am trying to replace the base url in getVarientStock from StockWindow.py. Once again my apology's I am really new to python.
I am getting an error :
aulocale1() takes exactly 2 arguments (1 given)
class AdidasStock:
def __init__(self, clientId, sku):
self.session = requests.session()
self.headers = {"User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36",
"Accept-Language" : "REPLACETHISPLIZZZ"}
self.locale = ''
self.clientId = clientId
self.sku = sku
self.skus = []
def getVarientStock(self, sku, base):
base = "http://www.adidas.com.au/on/demandware.store/Sites-adidas-AU-Site/en_AU"
urlVariantStock = base + '/Product-GetVariants?pid=' + sku
r = requests.get(urlVariantStock, headers=self.headers)
Here is how I am trying to change the above base , self.locale, and a portion of self.headers. I am using a Tkinter Checkbutton to trigger this function.
Checkbutton
aulocale = IntVar()
aucheck = Checkbutton(self.master, variable=aulocale, onvalue=1, offvalue=0, text="AU",command=self.aulocale1)
This is the Function
def aulocale1(self,base):
base.replace = "http://www.adidas.com.au/on/demandware.store/Sites-adidas-AU-Site/en_AU"
self.locale.replace = ('','AU')
self.headers.replace = ('REPLACETHISPLIZZZ','en-AU,en;q=0.8')
def uklocale1(self,base):
base.replace = "www.adidas.co.uk/on/demandware.store/Sites-adidas-GB-Site/en_GB"
self.locale.replace = ('','GB')
elf.headers.replace = ('REPLACETHISPLIZZZ','en-GB,en;q=0.8')

Function def aulocale1(self,base): expects one argument - base but when you assign this function to Checkbox using command=self.aulocale1 then Checkbox will execute this function without arguments - it will run self.aulocale1()
You can assign to command function with arguments using lambda
command=lambda:self.aulocale1("argument")
(BTW: if you will use lambda in for loop then you will have other problems ;) )
base is local variable so you can't change it ... but you can run this function with argument base so you can use default value for this argument
def getVarientStock(self, sku, base="http://www.adidas.com.au/ ...")
urlVariantStock = base + '/Product-GetVariants?pid=' + sku
r = requests.get(urlVariantStock, headers=self.headers)
If you run it without base
getVarientStock("XX")
then it uses "http://www.adidas.com.au/ ..." as base
but if you run it with second argument
getVarientStock("XX", "http://stackoverflow.com")
then it uses "http://stackoverflow.com" as base

Related

Recreating python mechanize script in R

I'd like to recreate the python script below which uses mechanize and http.cookiejar in R. I thought it would be straight forward using rvest but I was unable to do so. Any insight on which packages to use and apply would be extremely helpful. I realize reticulate may be a possibility but I figure that there has to be a way to do this in R that is straight forward.
import mechanize
import http.cookiejar
b = mechanize.Browser()
b.set_handle_refresh(True)
b.set_debug_redirects(True)
b.set_handle_redirect(True)
b.set_debug_http(True)
cj = http.cookiejar.CookieJar()
b.set_cookiejar(cj)
b.addheaders = [
('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36'),
('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'),
('Host', 'www.fangraphs.com'),
('Referer', 'https://www.fangraphs.com/auctiontool.aspx?type=pit&proj=atc&pos=1,1,1,1,5,1,1,0,0,1,5,5,0,18,0&dollars=400&teams=12&mp=5&msp=5&mrp=5&mb=1&split=&points=c|0,1,2,3,4,5|0,1,2,3,4,5&lg=MLB&rep=0&drp=0&pp=C,SS,2B,3B,OF,1B&players=')
]
b.open("https://www.fangraphs.com/auctiontool.aspx?type=pit&proj=atc&pos=1,1,1,1,5,1,1,0,0,1,5,5,0,18,0&dollars=400&teams=12&mp=5&msp=5&mrp=5&mb=1&split=&points=c|0,1,2,3,4,5|0,1,2,3,4,5&lg=MLB&rep=0&drp=0&pp=C,SS,2B,3B,OF,1B&players=")
def is_form1_form(form):
return "id" in form.attrs and form.attrs['id'] == "form1"
b.select_form(predicate=is_form1_form)
b.form.find_control(name='__EVENTTARGET').readonly = False
b.form.find_control(name='__EVENTARGUMENT').readonly = False
b.form['__EVENTTARGET'] = 'AuctionBoard1$cmdCSV'
b.form['__EVENTARGUMENT'] = ''
print(b.submit().read())
The R code I was using to attempt to recreate this with rvest is below. The comments indicate the main source of my confusion. In particular the needed fields grabbed by the python code were not showing up when I grabbed the form with rvest and when I tried to manually insert them I got a Connection Refused upon submitting.
library(rvest)
atc.pitcher.link = "https://www.fangraphs.com/auctiontool.aspx?type=pit&proj=atc&pos=1,1,1,1,5,1,1,0,0,1,5,5,0,18,0&dollars=400&teams=12&mp=5&msp=5&mrp=5&mb=1&split=&points=c|0,1,2,3,4,5|0,1,2,3,4,5&lg=MLB&rep=0&drp=0&pp=C,SS,2B,3B,OF,1B&players="
proj.data = html_session(atc.pitcher.link)
form.unfilled = proj.data %>% html_node("form") %>% html_form()
# note: I am suprised "__EVENTTARGET" and "__EVENTARGUMENT" are not included as attributes of the unfilled form. I can select them in the posted python script.
# If I try and create them with the appropriate values I get a Connection Refused Error.
form.unfilled[[5]]$`__EVENTTARGET` = form.unfilled[[5]]$`__VIEWSTATE`
form.unfilled[[5]]$`__EVENTARGUMENT`= form.unfilled[[5]]$`__VIEWSTATE`
form.unfilled[[5]]$`__EVENTTARGET`$readonly = FALSE
form.unfilled[[5]]$`__EVENTTARGET`$value = "AuctionBoard1$cmdCSV"
form.unfilled[[5]]$`__EVENTARGUMENT`$value = ""
form.unfilled[[5]]$`__EVENTARGUMENT`$readonly = FALSE
form.filled = form.unfilled
session = submit_form(proj.data, form.filled)
Here is a way to do it using RSelenium and setting chrome to be headless an enabling remote download to your working directory. It automatically brings up a headless browser and then lets the code drive it.
I believe to do the equivalent in rvest you need to write some native phantomjs.
library(RSelenium)
library(wdman)
eCaps <- list(
chromeOptions = list(
args = c('--headless','--disable-gpu', '--window-size=1280,800'),
prefs = list(
"profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE,
"download.default_directory" = getwd()
)
)
)
cDrv <- wdman::chrome()
rD <- RSelenium::rsDriver(extraCapabilities = eCaps)
remDr <- rD$client
remDr$queryRD(
ipAddr = paste0(remDr$serverURL, "/session/", remDr$sessionInfo[["id"]], "/chromium/send_command"),
method = "POST",
qdata = list(
cmd = "Page.setDownloadBehavior",
params = list(
behavior = "allow",
downloadPath = getwd()
)
)
)
atc.pitcher.link= "http://www.fangraphs.com/auctiontool.aspx?type=pit&proj=atc&pos=1,1,1,1,5,1,1,0,0,1,5,5,0,18,0&dollars=400&teams=12&mp=5&msp=5&mrp=5&mb=1&split=&points=c|0,1,2,3,4,5|0,1,2,3,4,5&lg=MLB&rep=0&drp=0&pp=C,SS,2B,3B,OF,1B&players="
remDr$navigate(atc.pitcher.link)
# sleep to be nice and give things time to load
Sys.sleep(8)
# find the button the page we want to click
option <- remDr$findElement('id', 'AuctionBoard1_cmdCSV')
#click it
option$clickElement()
list.files(getwd(),pattern = 'sysdata')
remDr$closeall()
cDrv$stop()

python requests enable cookies/javascript

I try to download an excel file from a specific website. In my local computer it works perfectly:
>>> r = requests.get('http://www.health.gov.il/PublicationsFiles/IWER01_2004.xls')
>>> r.status_code
200
>>> r.content
b'\xd0\xcf\x11\xe0\xa1\xb1...\x00\x00' # Long binary string
But when I connect to a remote ubuntu server, I get a message related to enabling cookies/javascript.
r = requests.get('http://www.health.gov.il/PublicationsFiles/IWER01_2004.xls')
>>> r.status_code
200
>>> r.content
b'<HTML>\n<head>\n<script>\nChallenge=141020;\nChallengeId=120854618;\nGenericErrorMessageCookies="Cookies must be enabled in order to view this page.";\n</script>\n<script>\nfunction test(var1)\n{\n\tvar var_str=""+Challenge;\n\tvar var_arr=var_str.split("");\n\tvar LastDig=var_arr.reverse()[0];\n\tvar minDig=var_arr.sort()[0];\n\tvar subvar1 = (2 * (var_arr[2]))+(var_arr[1]*1);\n\tvar subvar2 = (2 * var_arr[2])+var_arr[1];\n\tvar my_pow=Math.pow(((var_arr[0]*1)+2),var_arr[1]);\n\tvar x=(var1*3+subvar1)*1;\n\tvar y=Math.cos(Math.PI*subvar2);\n\tvar answer=x*y;\n\tanswer-=my_pow*1;\n\tanswer+=(minDig*1)-(LastDig*1);\n\tanswer=answer+subvar2;\n\treturn answer;\n}\n</script>\n<script>\nclient = null;\nif (window.XMLHttpRequest)\n{\n\tvar client=new XMLHttpRequest();\n}\nelse\n{\n\tif (window.ActiveXObject)\n\t{\n\t\tclient = new ActiveXObject(\'MSXML2.XMLHTTP.3.0\');\n\t};\n}\nif (!((!!client)&&(!!Math.pow)&&(!!Math.cos)&&(!![].sort)&&(!![].reverse)))\n{\n\tdocument.write("Not all needed JavaScript methods are supported.<BR>");\n\n}\nelse\n{\n\tclient.onreadystatechange = function()\n\t{\n\t\tif(client.readyState == 4)\n\t\t{\n\t\t\tvar MyCookie=client.getResponseHeader("X-AA-Cookie-Value");\n\t\t\tif ((MyCookie == null) || (MyCookie==""))\n\t\t\t{\n\t\t\t\tdocument.write(client.responseText);\n\t\t\t\treturn;\n\t\t\t}\n\t\t\t\n\t\t\tvar cookieName = MyCookie.split(\'=\')[0];\n\t\t\tif (document.cookie.indexOf(cookieName)==-1)\n\t\t\t{\n\t\t\t\tdocument.write(GenericErrorMessageCookies);\n\t\t\t\treturn;\n\t\t\t}\n\t\t\twindow.location.reload(true);\n\t\t}\n\t};\n\ty=test(Challenge);\n\tclient.open("POST",window.location,true);\n\tclient.setRequestHeader(\'X-AA-Challenge-ID\', ChallengeId);\n\tclient.setRequestHeader(\'X-AA-Challenge-Result\',y);\n\tclient.setRequestHeader(\'X-AA-Challenge\',Challenge);\n\tclient.setRequestHeader(\'Content-Type\' , \'text/plain\');\n\tclient.send();\n}\n</script>\n</head>\n<body>\n<noscript>JavaScript must be enabled in order to view this page.</noscript>\n</body>\n</HTML>'
On local I run from MACos that has Chrome installed (I'm not actively using it for the script, but maybe it's related?), on remote I run ubuntu on digital ocean without any GUI browser installed.
The behavior of requests has nothing to do with what browsers are installed on the system, it does not depend on or interact with them in any way.
The problem here is that the resource you are requesting has some kind of "bot mitigation" mechanism enabled to prevent just this kind of access. It returns some javascript with logic that needs to be evaluated, and the results of that logic are then used for an additional request to "prove" you're not a bot.
Luckily, it appears that this specific mitigation mechanism has been solved before, and I was able to quickly get this request working utilizing the challenge-solving functions from that code:
from math import cos, pi, floor
import requests
URL = 'http://www.health.gov.il/PublicationsFiles/IWER01_2004.xls'
def parse_challenge(page):
"""
Parse a challenge given by mmi and mavat's web servers, forcing us to solve
some math stuff and send the result as a header to actually get the page.
This logic is pretty much copied from https://github.com/R3dy/jigsaw-rails/blob/master/lib/breakbot.rb
"""
top = page.split('<script>')[1].split('\n')
challenge = top[1].split(';')[0].split('=')[1]
challenge_id = top[2].split(';')[0].split('=')[1]
return {'challenge': challenge, 'challenge_id': challenge_id, 'challenge_result': get_challenge_answer(challenge)}
def get_challenge_answer(challenge):
"""
Solve the math part of the challenge and get the result
"""
arr = list(challenge)
last_digit = int(arr[-1])
arr.sort()
min_digit = int(arr[0])
subvar1 = (2 * int(arr[2])) + int(arr[1])
subvar2 = str(2 * int(arr[2])) + arr[1]
power = ((int(arr[0]) * 1) + 2) ** int(arr[1])
x = (int(challenge) * 3 + subvar1)
y = cos(pi * subvar1)
answer = x * y
answer -= power
answer += (min_digit - last_digit)
answer = str(int(floor(answer))) + subvar2
return answer
def main():
s = requests.Session()
r = s.get(URL)
if 'X-AA-Challenge' in r.text:
challenge = parse_challenge(r.text)
r = s.get(URL, headers={
'X-AA-Challenge': challenge['challenge'],
'X-AA-Challenge-ID': challenge['challenge_id'],
'X-AA-Challenge-Result': challenge['challenge_result']
})
yum = r.cookies
r = s.get(URL, cookies=yum)
print(r.content)
if __name__ == '__main__':
main()
you can use this code to avoid block
url = 'your url come here'
s = HTMLSession()
s.headers['user-agent'] = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
r = s.get(url)
r.html.render(timeout=8000)
print(r.status_code)
print(r.content)

How to change text in TextEdit actively [duplicate]

This question already has answers here:
Pyqt Gui Freezes while in loop
(2 answers)
Closed 7 years ago.
I'm having problem with actively updating my TextEdit box from PyQt. I want to make an app that will download files in parts (new thread for each part, downloading parallely) and update the current status of each part in textbox, but my app "freezes" for the downloading time and sets the textbox after downloading is complete although if I print the result it looks fine, no freeze on console.
I know that this code is "a mess" right now, but I was changing many things and experimented with different approaches. I marked this "print" which works fine, and just below there is setText which freezes my app for the downloading time.If it's the problem with "TextEdit" from PyQt please let me know, I'll change it but I didn't find any information like that so far.Thanks!
def supervi(self):
import os
import urllib2
N=2
url = self.__url
dir = self.path
f_name = url.split("/")[len(url.split("/")) - 1]
dir_tmp=dir + "\\TMP." + f_name
if os.path.isdir(dir_tmp) == False:
os.mkdir(dir_tmp)
for n in range(0,N):
with open(dir_tmp+"\\file"+str(n), "w+b") as f:
#f.write("")
pass
data = urllib2.urlopen(url)
file_size = int(data.headers["Content-Length"].strip())
import multiprocessing as mp
data_block = file_size/N
p=mp.Pool(N)
for i in range(0, N):
start = i * data_block
stop = 0
if not i == N - 1:
stop = i * data_block + data_block - 1
else:
stop = file_size
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0",
"Accept-Encoding": "gzip, deflate, sdch",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4",
"Connection": "keep-alive",
"Range": "bytes=" + str(start) + "-" + str(stop)
}
req = urllib2.Request(url, headers=headers)
from main import dziecko
p.apply_async(dziecko,[i,req,dir_tmp])
while True:
sum=0
for n in range(0,N):
sum=sum+os.path.getsize(dir_tmp + "\\file" + str(n))
if not sum < file_size:
from main import del_and_combine
del_and_combine(dir,dir_tmp,f_name,N)
break
for n in range(0,N):
size=os.path.getsize(dir_tmp + "\\file" + str(n))
print size ##################THIS ONE
self.url.setText(str(os.path.getsize(dir_tmp + "\\file0")))
Add : QtCore.QCoreApplication.processEvents() inside your loop. This will update the text every iteration.
Without this, PyQt will always freeze during loops.
For more information :
< pyqt-gui-freezes-while-in-loop >

parsing string from request data

I am using python requests to obtain a file's source code, and then parse a string from the source. The string I am trying to parse is magic: 8susjdhdyrhsisj3864jsud (not always the same string). If I observe the source by printing it out to the screen it shows just fine. When I parse the string sometimes I get a result, and other times I get nothing. Please see the following screenshots: http://i.imgur.com/NW1zFZK.png, http://i.imgur.com/cb9e2cb.png. Now the string I want always appears in the source so it must be a regex issue? I've tried findall and search, but both methods give me the same outcome. Results sometimes and other times I get nothing. What seems to be my issue?
class Solvemedia():
def __init__(self, key):
self.key = key
def timestamp(self, source):
timestamp_regex = re.compile(ur'chalstamp:\s+(\d+),')
print re.findall(timestamp_regex, source)
def magic(self, source):
magic_regex = re.compile(ur'magic:\s+\'(\w+)\',')
print re.findall(magic_regex, source)
def source(self):
solvemedia = requests.Session()
solvemedia.headers.update({
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
})
source = solvemedia.get('http://api.solvemedia.com/papi/challenge.script?k={}'.format(self.key)).text
return source
def test(self):
js_source = self.source()
print js_source
self.magic(js_source)
self.timestamp(js_source)
solvemedia = Solvemedia('HUaZ-6d2wtQT3-LkLVDPJB5C.E99j9ZK')
solvemedia.test()
There is a . in one of the values, but \w doesn't match dots. Compare:
magic: 'AZJEXYx.ZsExcTHvjH9mwQ',
// ^
with:
magic: 'xfF9i4YBAQP1EgoNhgEBAw',
A better bet is to allow all characters except a quote:
magic_regex = re.compile(ur"magic:\s+'([^']+)',")
Demo:
>>> import re
>>> samples = [
... u"magic: 'xfF9i4YBAQP1EgoNhgEBAw',",
... u"magic: 'AZJEXYx.ZsExcTHvjH9mwQ',",
... ]
>>> magic_regex = re.compile(ur"magic:\s+'([^']+)',")
>>> for sample in samples:
... print magic_regex.search(sample).group(1)
...
xfF9i4YBAQP1EgoNhgEBAw
AZJEXYx.ZsExcTHvjH9mwQ

Header Check in Python (GAE)

I was wondering how I would go about checking HTTP headers to determine whether the request is valid or malformed. How can I do this in Python, more specifically, how can I do this in GAE?
For some debugging and viewing the request with the headers I use the following DDTHandler class.
import cgi
import wsgiref.handlers
import webapp2
class DDTHandler(webapp2.RequestHandler):
def __start_display(self):
self.response.out.write("<!--\n")
def __end_display(self):
self.response.out.write("-->\n")
def __show_dictionary_items(self,dictionary,title):
if (len(dictionary) > 0):
request = self.request
out = self.response.out
out.write("\n" + title + ":\n")
for key, value in dictionary.iteritems():
out.write(key + " = " + value + "\n")
def __show_request_members(self):
request = self.request
out = self.response.out
out.write(request.url+"\n")
out.write("Query = "+request.query_string+"\n")
out.write("Remote = "+request.remote_addr+"\n")
out.write("Path = "+request.path+"\n\n")
out.write("Request payload:\n")
if (len(request.arguments()) > 0):
for argument in request.arguments():
value = cgi.escape(request.get(argument))
out.write(argument+" = "+value+"\n")
else:
out.write("Empty\n")
self.__show_dictionary_items(request.headers, "Headers")
self.__show_dictionary_items(request.cookies, "Cookies")
def view_request(self):
self.__start_display()
self.__show_request_members()
self.__end_display()
def view(self, aString):
self.__start_display()
self.response.out.write(aString+"\n")
self.__end_display()
Example:
class RootPage(DDTHandler):
def get(self):
self.view_request()
Will output the request and contains the headers.
So check the code and get what you need. Thought as said, a malformed "invalid" request won't probably hit your app.
<!--
http://localhost:8081/
Query =
Remote = 127.0.0.1
Path = /
Request payload:
Empty
Headers:
Referer = http://localhost:8081/_ah/login?continue=http%3A//localhost%3A8081/
Accept-Charset = ISO-8859-7,utf-8;q=0.7,*;q=0.3
Cookie = hl=en_US; dev_appserver_login="test#example.com:False:185804764220139124118"
User-Agent = Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17
Host = localhost:8081
Accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language = en-US,en;q=0.8,el;q=0.6
Cookies:
dev_appserver_login = test#example.com:False:185804764220139124118
hl = en_US
-->

Categories

Resources