CloudFlare Access Denied while running Download and Parse Script - python

I am dealing with a legal issue, and built a script so I didn't have to search a website by hand.
Script:
import sys, urllib
servno = 2000
servernomax = 2676
alldat = ""
while True:
newdat = ""
url = "http://coc-servers.com/servers/"+str(servno)
wp = str(urllib.urlopen(url).read())
print wp
ind1 = wp.find('"IP: "')
if ind1 != -1:
ind1 += 7
ind2 = wp.find('http',ind1)
ind3 = wp.find('"',ind2)
IPurl = wp[ind2:ind3]
newdat += IPurl
ind4 = wp.find("<th>Webiste</th>")
if ind4 != -1:
ind4 +=22
ind5 = wp.find('http',ind4)
ind6 = wp.find('"',ind5)
Website = wp[ind5:ind6]
newdat += ", "
newdat += Website
alldat += newdat
servno +=1
#print ind1, ind4
if servno > 2676: break
print alldat
sys.exit()
Bug free, however some values need tweaking.
The output?
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Access denied | coc-servers.com used CloudFlare to restrict access</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">body{margin:0;padding:0}</style>
<!--[if lte IE 9]><script type="text/javascript" src="/cdn-cgi/scripts/jquery.min.js"></script><![endif]-->
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->
<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
<div id="cf-error-details" class="cf-error-details-wrapper">
<div class="cf-wrapper cf-header cf-error-overview">
<h1>
<span class="cf-error-type" data-translate="error">Error</span>
<span class="cf-error-code">1010</span>
<small class="heading-ray-id">Ray ID: 24730841e07509a6 • 2015-11-18 10:36:04 UTC</small>
</h1>
<h2 class="cf-subheadline" data-translate="error_desc">Access denied</h2>
</div><!-- /.header -->
<section></section><!-- spacer -->
<div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="what_happened">What happened?</h2>
<p>The owner of this website (coc-servers.com) has banned your access based on your browser's signature (24730841e07509a6-ua48).</p>
</div>
</div>
</div><!-- /.section -->
<div class="cf-error-footer cf-wrapper">
<p>
<span class="cf-footer-item">CloudFlare Ray ID: <strong>24730841e07509a6</strong></span>
<span class="cf-footer-separator">•</span>
<span class="cf-footer-item"><span data-translate="your_ip">Your IP</span>: 64.18.227.167</span>
<span class="cf-footer-separator">•</span>
<span class="cf-footer-item"><span data-translate="performance_security_by">Performance & security by</span> <a data-orig-proto="https" data-orig-ref="www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">CloudFlare</a></span>
</p>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script type="text/javascript">
window._cf_translation = {};
</script>
</body>
</html>
Alright, so it wor- wait.. What? Access Denied? I have been banned? Based on my browser?
How can I get around this? I'm aware CloudFlare was built to prevent DDoSing, but, this is not a DDoS at all.
I would try implementing a delay, however, the first through last response is the same message.
Would implementing multiple browser agents and a delay fix it, or am I done for?

Following the docs over at http://wolfprojects.altervista.org/articles/change-urllib-user-agent/ , I was successfully able to run the script without error, or cloudflare banning me.
The new script is:
import sys
from urllib import FancyURLopener
servno = 2224 #2000
servernomax = 2676
alldat = ""
class MyOpener(FancyURLopener):
version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
mopen = MyOpener()
while True:
newdat = ""
url = "http://coc-servers.com/servers/"+str(servno)
wp = str(mopen.open(url).read())#str(urlopen(url).read())
#print wp
ind1 = wp.find('IP: ')
if ind1 != -1:
ind1 += 7
ind2 = wp.find('http',ind1)
ind3 = wp.find('"',ind2)
IPurl = wp[ind2:ind3]
newdat += IPurl
ind4 = wp.find("<th>Website</th>")
if ind4 != -1:
ind4 +=22
ind5 = wp.find('http',ind4)
ind6 = wp.find('"',ind5)
Website = wp[ind5:ind6]
newdat += ", "
newdat += Website
newdat += ";;; "
alldat += newdat
servno +=1
#print ind1, ind4
if servno > 2676: break
print alldat
sys.exit()
Who knew FancyURLOpener would be so useful? :)

Related

Why can't I scrape all data from ecommerce websites?

Actually I'm working on a project where I have to scrape data from e-commerce websites. But I can't access my desired data from these sites. For example, when I want to scrape all list from https://evaly.com.bd/search-results?query=remax%20610d site, I only get <li class="ais-InfiniteHits-sentinel"></li> as output. Besides, when I print HTML code of the site using print(soup.prettify()) The full code is not in the output. Here is my code for all list items :
from bs4 import BeautifulSoup
import requests
link = "https://evaly.com.bd/search-results?query=remax%20610"
source = requests.get(
link).text
soup = BeautifulSoup(source, 'lxml')
#print(soup.prettify())
li = soup.find_all("li")
print(li)
And here is the output when I run print(soup.prettify()) :
<!DOCTYPE html>
<html>
<head>
<style data-styled="" data-styled-version="5.2.0">
.lfkzsQ{background-color:white;-webkit-letter-spacing:0.025em;-moz-letter-spacing:0.025em;-ms-letter-spacing:0.025em;letter-spacing:0.025em;font-weight:500;font-size:15px;height:46px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex:1;-ms-flex:1;flex:1;padding:0 17px;border:1px solid var(--primary);border-radius:6px 0 0 6px;outline:none;}/*!sc*/
#media (max-width:425px){.lfkzsQ{width:50%;min-width:50%;}}/*!sc*/
data-styled.g87[id="Searchbar__SeachInput-xnx3kr-0"]{content:"lfkzsQ,"}/*!sc*/
.jtCmJd{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:100%;height:100%;border-radius:5px;overflow:hidden;background-color:#f6f6f6;}/*!sc*/
data-styled.g88[id="Searchbar__Container-xnx3kr-1"]{content:"jtCmJd,"}/*!sc*/
.BVXNH{cursor:pointer;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;padding-right:29px;padding-left:29px;background:var(--primary);color:#fff;}/*!sc*/
#media (max-width:425px){.BVXNH{padding-right:5px;padding-left:5px;}}/*!sc*/
data-styled.g90[id="Searchbar__Button-xnx3kr-3"]{content:"BVXNH,"}/*!sc*/
.XBQPS{font-size:25px;}/*!sc*/
#media (max-width:768px){.XBQPS{font-size:20px;}}/*!sc*/
data-styled.g92[id="Searchbar___StyledMdSearch-xnx3kr-5"]{content:"XBQPS,"}/*!sc*/
.jCIuWZ{display:grid;grid-template-columns:repeat(auto-fill,minmax(200px,1fr));grid-gap:1vw;}/*!sc*/
#media (max-width:768px){.jCIuWZ{grid-template-columns:repeat(auto-fill,minmax(150px,1fr));grid-gap:1vw;}}/*!sc*/
data-styled.g246[id="algoliaConnectComponent__GridP-sc-1c85asy-0"]{content:"jCIuWZ,"}/*!sc*/
.jmbKPm{width:100%;max-width:100px;min-width:0;height:32px;padding:0 16px;-webkit-appearance:none;-moz-appearance:none;appearance:none;background-color:#f5f5fa;font-size:12px;border-radius:4px;}/*!sc*/
data-styled.g247[id="algoliaConnectComponent___StyledInput-sc-1c85asy-1"]{content:"jmbKPm,"}/*!sc*/
.eZHEjD{width:100%;max-width:100px;min-width:0;height:32px;padding:0 16px;-webkit-appearance:none;-moz-appearance:none;appearance:none;background-color:#f5f5fa;font-size:12px;color:#5d6494;border-radius:4px;}/*!sc*/
data-styled.g248[id="algoliaConnectComponent___StyledInput2-sc-1c85asy-2"]{content:"eZHEjD,"}/*!sc*/
.gqxLmc{display:block;height:32px;margin-left:8px;padding-left:16px;padding-right:16px;background:linear-gradient(90deg,#f5515f 0%,#9f041b 100%);color:#fff;border-radius:4px;box-shadow:0 4px 11px 0 rgba(37,44,97,0.15),0 2px 3px 0 rgba(93,100,148,0.2);-webkit-transition:all 0.2s ease-out;transition:all 0.2s ease-out;}/*!sc*/
data-styled.g249[id="algoliaConnectComponent___StyledButton-sc-1c85asy-3"]{content:"gqxLmc,"}/*!sc*/
.gWgnak{display:grid;grid-template-columns:6% 10% auto 25%;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;grid-template-areas:"logo menu search notification";}/*!sc*/
#media (max-width:768px){.gWgnak{grid-template-columns:25% 25% 25% 25%;grid-template-areas:"menu logo logo user" "notification notification notification notification" "search search search search";}.gWgnak .logo{justify-self:center;margin-bottom:1rem;max-width:76px;width:100%;}.gWgnak .menu{position:relative;justify-self:left;}}/*!sc*/
data-styled.g253[id="search-results__GridContainer-sc-6ln6mm-1"]{content:"gWgnak,"}/*!sc*/
.jpeNuX{min-height:3rem;}/*!sc*/
data-styled.g254[id="search-results___StyledDiv-sc-6ln6mm-2"]{content:"jpeNuX,"}/*!sc*/
.ejWvfj{right:30px;bottom:30px;background:linear-gradient(90deg,#f5515f 0%,#9f041b 100%);}/*!sc*/
#media (max-width:767px){.ejWvfj{bottom:75px;}}/*!sc*/
data-styled.g255[id="search-results___StyledButton-sc-6ln6mm-3"]{content:"ejWvfj,"}/*!sc*/
</style>
<link href="/static/manifest.json" rel="manifest"/>
<title>
E-valy Limited | Online Shopping Mall
</title>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1.0, shrink-to-fit=no, maximum-scale=1.0, user-scalable=no" name="viewport"/>
<meta content="E-valy Limited | Online Shopping Mall" property="og:title"/>
<meta content="article" property="og:type"/>
<meta content="https://s3-ap-southeast-1.amazonaws.com/media.evaly.com.bd/media/2019-08-04_090235.843922android-icon-200x200.png" property="og:image"/>
<meta content="450" property="og:image:width"/>
<meta content="298" property="og:image:height"/>
<meta content="https://evaly.com.bd" property="og:url"/>
<meta content="E-valy is an e-commerce site which will be capable of providing every kind of goods and products from every sector to every consumer located in Bangladesh." property="og:description"/>
<link href="/static/images/icons/favicon.ico" rel="shortcut icon"/>
<meta content="evaly://" property="al:android:url"/>
<meta content="Evaly" property="al:android:app_name"/>
<meta content="bd.com.evaly.evalymarchant" property="al:android:package"/>
<meta content="14" name="next-head-count"/>
<link as="style" href="/_next/static/css/d48fe9f040f8d2f97c7e.css" rel="preload"/>
<link href="/_next/static/css/d48fe9f040f8d2f97c7e.css" rel="stylesheet"/>
<link as="script" href="/_next/static/RZ7VftogY8QkgPiLg6BPz/pages/_app.js" rel="preload"/>
<link as="script" href="/_next/static/RZ7VftogY8QkgPiLg6BPz/pages/search-results.js" rel="preload"/>
<link as="script" href="/_next/static/runtime/webpack-6b3d3cda09a7b5b5debf.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/framework.7dfd02d307191d63a37e.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/b637e9a5.a705a21716e5b01f8145.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/0c9dcbbe.7fbd830a3d684b32423b.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/commons.afffbbb0420dd9af938a.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/6a597b002e9daab94e2e0adeb626acca4f1f6515.28c9d68d9749974f08e1.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/bba5516912876db85383b691379c4486ab998795.071cf6d38264238f2f49.js" rel="preload"/>
<link as="script" href="/_next/static/runtime/main-3c89e50e2c7d7034f938.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/252f366e.32bec51017e26b1dae31.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/95b64a6e.a74dcc7937bf0c356811.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/d7eeaac4.afdce0938beabe8eef9a.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/2dc48ec14d05924f473dce007726385374c258b9.0a52afc0ae53472a590f.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/3ad14741d7bfb55e1bcea5bfc6670f090f0855af.b5af8ef4be1abd2d5791.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/f6d549f16f3909adbb4f9a302aacab15937bfbda.94c734c42c1caf61b869.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/a9dd91d4607a584382b3e8a70a910ee9fb417c65.cabb84905704185ea6f6.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/4cbc61372435748121077b3b94e57617b6c8338d.5ae2119035f5c9d8c81c.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/411365f484ca502253106aae57d21ae3bb416d15.2f90a1a0cb46996155b4.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/69ef8573555555a232f56c2d2a1de6a4101c15d0.d8f92afd6f8ceb35f607.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/5d7bf10f24bff82d5530a050de689a7c020a359b.36ce757546da64e3337c.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/c8a8012dbcfaeb41f17a667b3a927ba45766e4a2.312913bb8463128a068e.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/c1f80152d80b1129cab9e73f90501b8957be40a7.04f2303ad32c2682fab1.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/8d4460396e9219a79f33af22e0a8f4fe429b291e.cda426e58b75b281586e.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/57f045ed70322177467d785413f62aff844e25d2.ad35b737612878a9f01a.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/0378a7d7ac3f1a3f5f0e99380b068fe3a41b14e6.46f0a10d89a7db3593b1.js" rel="preload"/>
<link as="script" href="/_next/static/chunks/680dd3e5bbe68ece4bf42804461f8830da8bd4e0.d71300269070cc46823a.js" rel="preload"/>
</head>
<body>
<div id="__next">
<div class="jsx-2334610719 min-h-screen pb-2" style="background-color:#F7F8FA">
<div class="ais-InstantSearch__root">
<div class="topbar bg-gray-100 py-1 text-gray-600 hidden md:block">
<div class="container flex justify-between text-sm">
<div class="flex">
<div class="mr-4">
<a href="https://merchant.evaly.com.bd/">
<svg class="w-3 h-3 mr-1 inline align-baseline">
<use href="/static/images/icons.svg#shop" xlink:href="/static/images/icons.svg#shop">
</use>
</svg>
Merchant zone
</a>
</div>
<div class="mr-4">
<a href="/feeds">
<svg class="w-3 h-3 mr-1 inline align-baseline">
<use href="/static/images/icons.svg#newsfeed" xlink:href="/static/images/icons.svg#newsfeed">
</use>
</svg>
News Feed
</a>
</div>
<div class="mr-4">
<a href="https://play.google.com/store/apps/details?id=bd.com.evaly.evalyshop">
<svg class="w-3 h-3 mr-1 inline align-baseline">
<use href="/static/images/icons.svg#mobile" xlink:href="/static/images/icons.svg#mobile">
</use>
</svg>
Download App
</a>
</div>
</div>
<div class="flex">
<div class="mr-4">
<a href="https://www.facebook.com/groups/EvalyHelpDesk/">
<svg class="w-3 h-3 mr-1 inline align-baseline">
<use href="/static/images/icons.svg#help" xlink:href="/static/images/icons.svg#help">
</use>
</svg>
<!-- -->
Help
</a>
</div>
<div>
<a href="https://www.facebook.com/evaly.com.bd/">
<svg class="w-3 h-3 mr-1 inline align-baseline">
<use href="/static/images/icons.svg#facebook" xlink:href="/static/images/icons.svg#facebook">
</use>
</svg>
<!-- -->
Follow us
</a>
</div>
</div>
</div>
</div>
<div class="bg-white header" style="box-shadow:0 4px 16px 0 rgba(0,0,0,0.04)">
<div class="search-results__Container-sc-6ln6mm-0 hFUCjp container py-5 px-8">
<div class="search-results__GridContainer-sc-6ln6mm-1 gWgnak">
<a class="logo xs:w-1/2" href="/" style="grid-area:logo">
<img alt="logo" class="" src="/static/images/logo.svg" style="max-width:76px"/>
</a>
<button class="text-2xl menu md:block mb-4 md:mb-0" style="grid-area:menu">
<svg class="m-auto text-gray-700" fill="currentColor" height="1em" stroke="currentColor" stroke-width="0" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg">
<path d="M3 18h18v-2H3v2zm0-5h18v-2H3v2zm0-7v2h18V6H3z">
</path>
</svg>
</button>
<div class="md:hidden mb-4" style="grid-area:user;justify-self:right">
<button class="flex items-center">
<span class="flex w-full items-center text-gray-700">
<span>
<svg color="#1D2531" fill="currentColor" height="25" size="25" stroke="currentColor" stroke-width="0" style="color:#1D2531" viewbox="0 0 1024 1024" width="25" xmlns="http://www.w3.org/2000/svg">
<path d="M858.5 763.6a374 374 0 0 0-80.6-119.5 375.63 375.63 0 0 0-119.5-80.6c-.4-.2-.8-.3-1.2-.5C719.5 518 760 444.7 760 362c0-137-111-248-248-248S264 225 264 362c0 82.7 40.5 156 102.8 201.1-.4.2-.8.3-1.2.5-44.8 18.9-85 46-119.5 80.6a375.63 375.63 0 0 0-80.6 119.5A371.7 371.7 0 0 0 136 901.8a8 8 0 0 0 8 8.2h60c4.4 0 7.9-3.5 8-7.8 2-77.2 33-149.5 87.8-204.3 56.7-56.7 132-87.9 212.2-87.9s155.5 31.2 212.2 87.9C779 752.7 810 825 812 902.2c.1 4.4 3.6 7.8 8 7.8h60a8 8 0 0 0 8-8.2c-1-47.8-10.9-94.3-29.5-138.2zM512 534c-45.9 0-89.1-17.9-121.6-50.4S340 407.9 340 362c0-45.9 17.9-89.1 50.4-121.6S466.1 190 512 190s89.1 17.9 121.6 50.4S684 316.1 684 362c0 45.9-17.9 89.1-50.4 121.6S557.9 534 512 534z">
</path>
</svg>
</span>
</span>
</button>
</div>
<div style="grid-area:search">
<form action="" novalidate="" role="search">
<div class="Searchbar__Container-xnx3kr-1 jtCmJd">
<input class="Searchbar__SeachInput-xnx3kr-0 lfkzsQ" placeholder="Search..." type="search" value="remax 610"/>
<figure class="Searchbar__Button-xnx3kr-3 BVXNH" color="black">
<svg _css2="
#media (max-width: ,768px,) {
,
font-size:20px;
,
}
" class="Searchbar___StyledMdSearch-xnx3kr-5 XBQPS" color="white" fill="currentColor" height="1em" stroke="currentColor" stroke-width="0" style="color:white" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg">
<path d="M15.5 14h-.79l-.28-.27C15.41 12.59 16 11.11 16 9.5 16 5.91 13.09 3 9.5 3S3 5.91 3 9.5 5.91 16 9.5 16c1.61 0 3.09-.59 4.23-1.57l.27.28v.79l5
4.99L20.49 19l-4.99-5zm-6 0C7.01 14 5 11.99 5 9.5S7.01 5 9.5 5 14 7.01 14 9.5 11.99 14 9.5 14z">
</path>
</svg>
</figure>
</div>
</form>
</div>
<div class="md:pl-4 notification hidden md:block" style="grid-area:notification">
<div class="flex justify-between items-center mb-4 mx-16 md:mx-0 md:mb-0 lg:ml-8">
<button class="text-2xl menu md:hidden">
<svg class="m-auto" fill="currentColor" height="1em" stroke="currentColor" stroke-width="0" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg">
<path d="M3 18h18v-2H3v2zm0-5h18v-2H3v2zm0-7v2h18V6H3z">
</path>
</svg>
</button>
<button class="relative">
<svg color="#1D2531" fill="currentColor" height="25" size="25" stroke="currentColor" stroke-width="0" style="color:#1D2531" view
How to solve these problems?
EDIT : using Selenium and Chrome Driver will be more time consuming for my project
Try the below approach using requests and json. I have created the script with the API URL which is fetched by inspecting the network calls in chrome which are triggering on page load and then creating a dynamic form data to traverse on each and every page to get the data.
What exactly script is doing:
First script will create a form data to query the the API call where page_no, query string and max values per facet(numbers of results to show) are dynamic where parameter page_no will increment by 1 upon completion of each traversal.
Requests will get the data from the created form data and URL using POST method which will then pass to JSON to parse it and load in json format.
Then from the parsed data script will traverse on the json object where data is actually present.
Finally looping on all the batch of each and every page data one by one and printing.
Right now script is displaying few information you can access more information form the json object like i have done below.
import json
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
from bs4 import BeautifulSoup as bs
def scrap_evaly_data():
QUERY = 'remax%20610' #query string can be changed to fetch another product data
MAX_VALUES_PER_FACET = 10 #no. of result show per page
page_no = 0 # default page no.
URL = 'https://eza2j926q5-3.algolianet.com/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20JavaScript%20(3.35.1)%3B%20Browser%20(lite)%3B%20react%20(16.13.1)%3B%20react-instantsearch%20(5.7.0)%3B%20JS%20Helper%20(2.28.1)&x-algolia-application-id=EZA2J926Q5&x-algolia-api-key=ca9abeea06c16b7d531694d6783a8f04' # API URL for querying
while True:
print('Hold on creating new form data...')
form_data = {
"requests":[{"indexName":"products","params":"query=" + QUERY + "&maxValuesPerFacet=" + str(MAX_VALUES_PER_FACET) + "&page=" + str(page_no) + "&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&facets=%5B%22price%22%2C%22category_name%22%2C%22brand_name%22%2C%22shop_name%22%2C%22color%22%5D&tagFilters="}]
} # form_data which is dynamic and creates new set of results and send back
response = requests.post(URL,json = form_data,verify = False) #requests for data using POST and JSON form data
print('Created new form data going to fetch data...')
result = json.loads(response.text) #load json data result
if len(result) == 0: #condition to check whether result has length or not if not then break and come out from the while loop.
break
else:
for item in result['results'][0]['hits']: #loop on the product information JSON object
print('-' * 100)
print('Brand Name: ', item['brand_name'])
print('Category Name: ' , item['category_name'])
print('Discount Price: ' , item['discounted_price'])
print('Max Price: ' , item['max_price'])
print('Min Price: ' , item['min_price'])
print('Product Name: ' , item['name'])
print('Product Image: ' , item['product_image'])
print('Shop Item ID: ' , item['shop_item_id'])
print('Shop Name: ' , item['shop_name'])
print('Slug Info: ' , item['slug'])
print('-' * 100)
page_no +=1 #Increment the page number by 1 after each traversal
scrap_evaly_data()

Beautiful Soup Can't Redact Phone Number with Parentheses

I'm trying to redact phone number information from an html file ... and while I can identify all of the phone numbers easily enough I can't figure out why I am unable to replace the phone numbers that have parentheses in them. Sample below:
import re
from bs4 import BeautifulSoup
text = '''<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<title>Big Title</title>
<style type="text/css">
.parsed {font-size: 75%; color: #474747;}
</style>
</head>
<body>
<div class="parsed">
<h1>Redacted Redacted</h1>
<h2> Contact Info</h2>
<ul>
<li>Position Title: My Fake Title</li>
<li>Email: Redacted#gmail.com</li>
<li>Phones: (555) 555-5555</li>
</ul><b>Category:</b> <ul><li>Title 2 </li><li>Fake Info</li></ul>
City, MO 11111 | (555) 111-1111 | myemail#gmail.com
Some Category / Some Name: 555-222-2222 | Record Number#:
</html>'''
soup = BeautifulSoup(text, 'html.parser')
def find_phone_numbers(text):
phones = re.findall(r"((?:\d{3}|\(\d{3}\))?(?:\s|-|\.)?\d{3}(?:\s|-|\.)\d{4})", text)
return phones
phones = find_phone_numbers(str(soup))
print(phones)
for i in phones:
target = soup.find_all(text=re.compile(i, re.I))
try:
for v in target:
v.replace_with(v.replace(i,'(XXX) XXX-XXXX'))
except TypeError:
pass;
print(soup)
These are my results from running the above:
['(555) 555-5555', '(555) 111-1111', '555-222-2222']
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<title>Big Title</title>
<style type="text/css">
.parsed {font-size: 75%; color: #474747;}
</style>
</head>
<body>
<div class="parsed">
<h1>Redacted Redacted</h1>
<h2> Contact Info</h2>
<ul>
<li>Position Title: My Fake Title</li>
<li>Email: Redacted#gmail.com</li>
<li>Phones: (555) 555-5555</li>
</ul><b>Category:</b> <ul><li>Title 2 </li><li>Fake Info</li></ul>
City, MO 11111 | (555) 111-1111 | myemail#gmail.com
Some Category / Some Name: (XXX) XXX-XXXX | Record Number#:
</div></body></html>
You can use .find_all(text=True) to obtain all text content from the HTML soup, and then replace it with re.sub (that way, you preserve all tags, including <li>):
for content in soup.find_all(text=True):
s = re.sub(r'(\(?\d{3}\)?)([\s.-]*)(\d{3})([\s.-]*)(\d{4})', '(XXX) XXX-XXXX', content)
content.replace_with(s)
print(soup)
Prints:
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<title>Big Title</title>
<style type="text/css">
.parsed {font-size: 75%; color: #474747;}
</style>
</head>
<body>
<div class="parsed">
<h1>Redacted Redacted</h1>
<h2> Contact Info</h2>
<ul>
<li>Position Title: My Fake Title</li>
<li>Email: Redacted#gmail.com</li>
<li>Phones: (XXX) XXX-XXXX</li>
</ul><b>Category:</b> <ul><li>Title 2 </li><li>Fake Info</li></ul>
City, MO 11111 | (XXX) XXX-XXXX | myemail#gmail.com
Some Category / Some Name: (XXX) XXX-XXXX | Record Number#:
</div></body></html>
Slight change of approach. Get all li tags, then for each tag, replace the phone numbers with your mask, if a phone number exists. I have used an interim variable for that (temp_text), just to keep the code a bit more readable.
all_li=soup.find_all('li')
for li in all_li:
temp_text=re.sub(r"((?:\d{3}|\(\d{3}\))?(?:\s|-|\.)?\d{3}(?:\s|-|\.)\d{4})", '(XXX) XXX-XXXX', li.text)
if temp_text:
li.replace_with(temp_text)
print(soup) output:

Why Beautifulsoup is getting weird source code characters while downloading a web-page?

I'm new in python and web-crawling. I'm doing a few exercises crawling some web-sites and seems great with beautifulsoup. But recently, while I was crawling a Persian site (https://video.varzesh3.com) with the code below, I am receiving weird characters. I have done the same procedure on other Persian websites an I believe the problem is not with the encoding. This is my code:
url = 'https://video.varzesh3.com'
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
print(soup)
And this is a part of the result:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<link href="/favicon.ico" rel="shortcut icon" type="image/x-icon"/>
<title>ÙÛدÛÙ Ùرزش س٠| خاÙÙ</title>
<meta content="sport ,varzesh ,football, soccer,livescores, live score, livescore, iran,football3,Daily soccer news , broadcast ,ÙÙتبا٠س٠, ÙتاÛج زÙد٠, Ø®ÙÛج Ùارس , perian gulf , ÙÛÚ¯ آزادگا٠, ÙÙرÙاÙا٠آسÛا , ÙÙرÙاÙا٠ارÙپا , ÙÛÚ¯ برتر , جا٠حذÙÛ , شبک٠س٠, Ùرزش , ÙÙتبا٠برتر , اÛرا٠, جا٠جÙاÙÛ , جا٠جÙاÙÛ 2010,ÙÙتبا٠3 ," name="keywords">
<meta content="پاÙگا٠ÙÛدÛÙ ÙØ±Ø²Ø´Û Ø¨Ø±Ø§Ù Ùارس٠زباÙا٠ÙÙ ÙÛدÛÙ Ø­Ùز٠Ùرزش (ÙÙتباÙØÙاÙÙبا٠ØبسÙتبا٠Ù...) را ارائ٠ÙÛ Ú©Ùد" name="description">
<link href="/Static/css/frontend.min.css?v=9" rel="stylesheet" type="text/css"/>
<link href="https://static2.farakav.com/v3/static/css/fonts.css?version=6" rel="stylesheet" type="text/css"/>
<link href="https://static2.farakav.com/varzesh3/assets/font/varzesh3-icon/varzesh3.min.css" rel="stylesheet" type="text/css"/>
<script src="https://static2.farakav.com/football3_jscripts/jquery-1.8.0.min.js" type="text/javascript"></script>
<script src="/Static/js/jquery.cookie.js" type="text/javascript"></script>
<script src="/Static/js/mustache.js" type="text/javascript"></script>
<script type="text/javascript">
now = new Date();
var head = document.getElementsByTagName('head')[0];
var script = document.createElement('script');
script.type = 'text/javascript';
var script_address = 'https://cdn.yektanet.com/js/varzesh3.com/article.v1.min.js';
script.src = script_address + '?v=' + now.getFullYear().toString() + '0' + now.getMonth() + '0' + now.getDate() + '0' + now.getHours();
head.appendChild(script);
</script>
Why do I get this weird like "پاÙگا٠ÙÛدÛÙ ÙØ±Ø²Ø´Û Ø¨Ø±Ø§Ù Ùارس" characters?
You get those wired characters because of it's encoding.
>>> source_code.encoding
'ISO-8859-1'
Try this, set encoding to UTF-8
>>> source_code.encoding = 'UTF-8'
>>> plain_text = source_code.text
>>> BeautifulSoup(plain_text, "html.parser")
Output:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<link href="/favicon.ico" rel="shortcut icon" type="image/x-icon"/>
<title>ویدیو ورزش سه | خانه</title>
<meta content="sport ,varzesh ,football, soccer,livescores, live score, livescore, iran,football3,Daily soccer news , broadcast ,فوتبال سه , نتایج زنده , خلیج فارس , perian gulf , لیگ آزادگان , قهرمانان آسیا , قهرمانان اروپا , لیگ برتر , جام حذفی , شبکه سه , ورزش , فوتبال برتر , ایران , جام جهانی , جام جهانی 2010,فوتبال 3 ," name="keywords">
<meta content="پايگاه ویدیو ورزشی براي فارسي زبانان كه ویدیو حوزه ورزش (فوتبال،واليبال ،بسكتبال و...) را ارائه می کند" name="description">
<link href="/Static/css/frontend.min.css?v=9" rel="stylesheet" type="text/css"/>
....
...
..

Python Transcrypt addEventListener

I have written a Python program for translation with Transcrypt to Javascript.
I can not get the addEventListener function to work. Any ideas?
Here is the code as dom7.py:
class TestSystem:
def __init__ (self):
self.text = 'Hello, DOM!'
self.para = 'A new paragraph'
self.textblock = 'This is an expandable text block.'
self.button1 = document.getElementById("button1")
self.button1.addEventListener('mousedown', self.pressed)
def insert(self):,
document.querySelector('output').innerText = self.text
# document.querySelector('test').innerText = "Test"+self.button1+":"
def pressed(self):
container = document.getElementById('textblock')
newElm = document.createElement('p')
newElm.innerText = self.para
container.appendChild(newElm)
testSystem = TestSystem()
And here follows the corresponding dom7.html for it:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<script src="__javascript__/dom7.js"></script>
<title>Titel</title>
</head>
<body onload=dom7.testSystem.insert()>
<button id="button1">Click me</button><br>
<main>
<h1>DOM examples</h1>
<p>Testing DOM</p>
<p>
<output></output>
</p>
<p>
<test>Test String:</test>
</p>
<div id="textblock">
<p>This is an expandable text block.</p>
</div>
</main>
</body>
</html>
The problem is that your TestSystem constructor is called before the DOM tree is ready. There are three ways to deal with this, the last of which is the best.
The first way is to include your script after you populated your body:
class TestSystem:
def __init__ (self):
self.text = 'Hello, DOM!'
self.para = 'A new paragraph'
self.textblock = 'This is an expandable text block.'
self.button1 = document.getElementById("button1")
self.button1.addEventListener('mousedown', self.pressed)
def insert(self):
document.querySelector('output').innerText = self.text
# document.querySelector('test').innerText = "Test"+self.button1+":"
def pressed(self):
container = document.getElementById('textblock')
newElm = document.createElement('p')
newElm.innerText = self.para
container.appendChild(newElm)
testSystem = TestSystem()
and:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Titel</title>
</head>
<body onload=dom7.testSystem.insert()>
<button id="button1">Click me</button><br>
<main>
<h1>DOM examples</h1>
<p>
Testing DOM
</p>
<p>
<output></output>
</p>
<p>
<test>Test String:</test>
</p>
<div id="textblock">
<p>This is an expandable text block.</p>
</div>
<script src="__javascript__/dom7.js"></script>
</main>
</body>
</html>
Still your insert function may be called too early, so may not work.
The second way is to include the script at the beginning and call an initialization function to connect event handlers to the DOM:
class TestSystem:
def __init__ (self):
self.text = 'Hello, DOM!'
self.para = 'A new paragraph'
self.textblock = 'This is an expandable text block.'
self.button1 = document.getElementById("button1")
self.button1.addEventListener('mousedown', self.pressed)
def insert(self):
document.querySelector('output').innerText = self.text
# document.querySelector('test').innerText = "Test"+self.button1+":"
def pressed(self):
container = document.getElementById('textblock')
newElm = document.createElement('p')
newElm.innerText = self.para
container.appendChild(newElm)
def init ():
testSystem = TestSystem()
and:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<script src="__javascript__/dom7.js"></script>
<title>Titel</title>
</head>
<body onload=dom7.testSystem.insert()>
<button id="button1">Click me</button><br>
<main>
<h1>DOM examples</h1>
<p>
Testing DOM
</p>
<p>
<output></output>
</p>
<p>
<test>Test String:</test>
</p>
<div id="textblock">
<p>This is an expandable text block.</p>
</div>
<script>dom7.init ();</script>
</main>
</body>
</html>
Still there is a possibility that some browsers call the initialization function before the page is loaded, although this is rare. In addition to this the insert method is again called too early.
Third and best way, to solve both problems, is to run your initialization after a page load event, and call insert after you create your testSystem, so e.g. in the initalization function:
class TestSystem:
def __init__ (self):
self.text = 'Hello, DOM!'
self.para = 'A new paragraph'
self.textblock = 'This is an expandable text block.'
self.button1 = document.getElementById("button1")
self.button1.addEventListener('mousedown', self.pressed)
def insert(self):
document.querySelector('output').innerHTML = self.text
# document.querySelector('test').innerText = "Test"+self.button1+":"
def pressed(self):
container = document.getElementById('textblock')
newElm = document.createElement('p')
newElm.innerText = self.para
container.appendChild(newElm)
def init ():
testSystem = TestSystem()
testSystem.insert ()
and:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<script src="__javascript__/dom7.js"></script>
<title>Titel</title>
</head>
<body onload="dom7.init ();">
<button id="button1">Click me</button><br>
<main>
<h1>DOM examples</h1>
<p>
Testing DOM
</p>
<p>
<output></output>
</p>
<p>
<test>Test String:</test>
</p>
<div id="textblock">
<p>This is an expandable text block.</p>
</div>
</main>
</body>
</html>
I looked at your mondrian example in the tutorial section and I saw that there is also a very simple way to attach the addEventListener to a document after it has loaded. You can use the DOMContentLoaded attribute in the header of the html doc for doing so:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<script src="__javascript__/addEventListener_example1.js"></script>
<script>document.addEventListener("DOMContentLoaded", addEventListener_example1.init)</script>
<title>Titel</title>
</head>
<body>
<button id="button1">Click me</button><br>
<main>
<h1>DOM examples</h1>
<p>
Testing DOM
</p>
<p>
<output></output>
</p>
<p>
<test>Test String:</test>
</p>
<div id="textblock">
<p>This is an expandable text block.</p>
</div>
</main>
</body>
</html>
and the code for addEventListener_example1.py would be:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
def init():
insert()
def insert():
document.querySelector('output').innerHTML = 'Hello, DOM!'
button1 = document.getElementById("button1")
button1.addEventListener('mousedown', pressed)
def pressed():
para = 'A new paragraph'
container = document.getElementById('textblock')
newElm = document.createElement('p')
newElm.innerText = para
container.appendChild(newElm)

Click a href button with selenium (PhantomJS) and python

So i got this html from a website:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" dir="rtl" lang="he" id="vbulletin_html">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta id="e_vb_meta_bburl" name="vb_meta_bburl" content="http://www.fxp.co.il" />
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<base href="//www.fxp.co.il" /><!--[if IE]></base><![endif]-->
<link rel="canonical" href="http://www.fxp.co.il/login.php?do=login" />
<link rel="shortcut icon" href="//images.fxp.co.il/images3/fav.png">
<meta name="generator" content="vBulletin 4.2.2" />
<meta name="keywords" content="FXP,פורום,פורומים,fxp,משחקים,סרטים,כיף,רשת,מחשבים,הורדות,הורדה,סרגל כלים,בדיקת IP,העלאת תמונות" />
<meta name="description" content="מחפשים אתר פורומים ? אתר FXP מכיל קהילות פורומים, משחקים, תמונות גולשים ועוד. הכנסו עכשיו אל קהילות הפורומים של FXP" />
<meta property="fb:app_id" content="415294715208536" />
<meta property="og:site_name" content="FXP" />
<meta property="og:description" content="מחפשים אתר פורומים ? אתר FXP מכיל קהילות פורומים, משחקים, תמונות גולשים ועוד. הכנסו עכשיו אל קהילות הפורומים של FXP" />
<meta property="og:url" content="http://www.fxp.co.il" />
<meta property="og:type" content="website" />
<link rel="stylesheet" type="text/css" href="//images.fxp.co.il/css_static_main/main_fxp_20.2.14.css?v=7.11" />
<link href="//www.fxp.co.il/clientscript/awesome/css/font-awesome.min.css" rel="stylesheet">
<script type="text/javascript" src="//images.fxp.co.il/clientscript/yui-2.9.js"></script>
<script type="text/javascript">
<!--
var SESSIONURL = "";
var SECURITYTOKEN = "1456672267-7067c7f37055c9dd77a4fa83ba3b7b6f316c82b1";
var IMGDIR_MISC = "//images.fxp.co.il/images_new/misc";
var IMGDIR_BUTTON = "//images.fxp.co.il/images_new/buttons";
var vb_disable_ajax = parseInt("0", 10);
var SIMPLEVERSION = "4116";
var BBURL = "http://www.fxp.co.il";
var LOGGEDIN = 1152224 > 0 ? true : false;
var THIS_SCRIPT = "login";
var RELPATH = "login.php?do=login";
var PATHS = {
forum : "",
cms : "",
blog : ""
};
var AJAXBASEURL = "http://www.fxp.co.il/";
//var AJAXBASEURL = "//www.fxp.co.il/";
// -->
</script>
<script type="text/javascript" src="//images.fxp.co.il/clientscript/vbulletin-core.js"></script>
<script type="text/javascript" src="//ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script type="text/javascript" src="//images.fxp.co.il/css_static_main/jquery.cookie_new.js"></script>
<script type="text/javascript">
$(document).ready(function(){
$("#ajax").load('notifc.php?userid=1152224');
$("#noti").click(function () {
$("#ajax").load('notifc.php?userid=1152224');
});
$("#ajax_likes").load('likesno.php?userid=1152224');
$("#noti_like").click(function () {
$("#ajax_likes").load('likesno.php?userid=1152224');
});
});
</script>
<script type="text/javascript" src="//images.fxp.co.il/clientscript/set.js?v=6.5"></script>
<script type="text/javascript" src="//images.fxp.co.il/clientscript/lazyload.js"></script>
<script type="text/javascript">
$(function() {
if (getCookie_bar('bbc_lazyload_fxp') != '1') {
$(".postbody img").lazyload({placeholder : "clear.gif", effect: "fadeIn"});
}
});
</script>
<script type="text/javascript" src="//images.fxp.co.il/advertising/ads.js"></script>
<script type="text/javascript" src="//images.fxp.co.il/skinfxp/s.php"></script>
<title>FXP</title>
<script type="text/javascript">
var forumname = "";
var fxpcategory = "none";
</script>
<script type='text/javascript'>DY = {scsec : 8765235 ,API: function(){(DY.API.actions = DY.API.actions || []).push(arguments)}};</script>
<script type='text/javascript' src='//dy2.ynet.co.il/scripts/8765235/api_static.js'></script>
<script type='text/javascript' src='//dy2.ynet.co.il/scripts/8765235/api_dynamic.js'></script>
<script type="text/javascript">
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-598971-1', 'auto');
ga('require', 'displayfeatures');
ga('send', 'pageview');
</script>
<script type="text/javascript">
ga('set', 'dimension1', 'Registered');
</script>
</head>
<body>
<div class="standard_error">
<form class="block vbform" method="post" action="http://www.fxp.co.il/" name="postvarform">
<h2 class="blockhead">מעביר...</h2>
<div class="blockbody formcontrols">
<p class="blockrow restore">התחברת בהצלחה, Copy_Pasta.</p>
</div>
<div class="blockfoot actionbuttons redirect_button">
<div class="group" id="redirect_button">
לחץ כאן אם הדפדפן אינו מעביר אותך אוטומטית
</div>
</div>
</form>
</div>
<noscript>
<meta http-equiv="Refresh" content="2; URL=http://www.fxp.co.il/" />
</noscript>
<script type="text/javascript">
<!--
function exec_refresh()
{
window.status = "מעביר..." + myvar;
myvar = myvar + " .";
var timerID = setTimeout("exec_refresh();", 100);
if (timeout > 0)
{
timeout -= 1;
}
else
{
clearTimeout(timerID);
window.status = "";
window.location = "http://www.fxp.co.il/";
}
}
var myvar = "";
var timeout = 20;
exec_refresh();
//-->
</script>
</body>
</html>
The page above is supposed to redirect you to another page.
I'm looking to click on the button that redirects you immediately if you press it.
It's this button if anyone's wondering
This is the relevant code :
def login_and_redirect():
#login into the site
usrnm=raw_input("Please enter your username: ")
pswrd=raw_input("Please enter your password: ")
print("Logging in, please stand by...")
driver.find_element_by_id("navbar_username").send_keys(usrnm)
driver.find_element_by_id("navbar_password_hint").click()
driver.find_element_by_id("navbar_password").send_keys(pswrd)
driver.find_element_by_id("navbar_password").submit()
#redirect to another page in the site after logging in
driver.get("http://www.fxp.co.il/forumdisplay.php?f=236")
for some weird reason the function is stuck and phantomJS isn't beign redirected to the URL I wanted it to. My guess is it's because of the redirection page, so i'm trying to get around it by clicking the button. i'd appreciate any help, thanks :)
You can try below xpaths to click on that re-direct button
//div[#id='redirect_button']/a
or
//a[#href='http://www.fxp.co.il/']
Of-course you can also use css selector here..
Thanks,
Murali

Categories

Resources