I am automating our Web application using Python with Selenium Webdriver.
I log into the application and I want to click the Administration button.
When i run my code it cannot find the Administration button by my Xpath. I have tried a few different ways.
If i enter //div[7]/div/div in selenium IDE and click Find it highlights the Administration button. I do not know why it won't find it when i run the code.
I would prefer to use CSS as that is faster than Xpath.
I need some help please.
I get the following error:
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: {"method":"xpath","selector":"html/body/div[2]/div[2]/div/div[2]/div/div[2]/div/div[7]/div/div"}
I inspect the HTML element. The full HTML is as follows:
<html style="overflow: hidden;">
<head>
<body style="margin: 0px;">
<html style="overflow: hidden;">
<head>
<body style="margin: 0px;">
<iframe id="__gwt_historyFrame" style="position: absolute; width: 0; height: 0; border: 0;" tabindex="-1" src="javascript:''">
<html>
</iframe>
<noscript> <div style="width: 22em; position: absolute; left: 50%; margin-left: -11em; color: red; background-color: white; border: 1px solid red; padding: 4px; font-family: sans-serif;"> Your web browser must have JavaScript enabled in order for this application to display correctly.</div> </noscript>
<script src="spinner.js" type="text/javascript">
<script type="text/javascript">
<script src="ClearCore/ClearCore.nocache.js" type="text/javascript">
<script defer="defer">
<iframe id="ClearCore" src="javascript:''" style="position: absolute; width: 0px; height: 0px; border: medium none;" tabindex="-1">
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<script>
<script type="text/javascript">
<script type="text/javascript">
</head>
<body>
</html>
</iframe>
<div style="position: absolute; z-index: -32767; top: -20cm; width: 10cm; height: 10cm; visibility: hidden;" aria-hidden="true"> </div>
<div style="position: absolute; left: 0px; top: 0px; right: 0px; bottom: 0px;">
<div style="position: absolute; z-index: -32767; top: -20ex; width: 10em; height: 10ex; visibility: hidden;" aria-hidden="true"> </div>
<div style="position: absolute; overflow: hidden; left: 0px; top: 0px; right: 0px; bottom: 0px;">
<div style="position: absolute; left: 0px; top: 0px; right: 0px; bottom: 0px;">
<div style="position: absolute; z-index: -32767; top: -20ex; width: 10em; height: 10ex; visibility: hidden;" aria-hidden="true"> </div>
<div style="position: absolute; overflow: hidden; left: 1px; top: 1px; right: 1px; bottom: 1px;">
<div class="gwt-TabLayoutPanel" style="position: absolute; left: 0px; top: 0px; right: 0px; bottom: 0px;">
<div style="position: absolute; z-index: -32767; top: -20ex; width: 10em; height: 10ex; visibility: hidden;" aria-hidden="true"> </div>
<div style="position: absolute; overflow: hidden; left: 0px; top: 0px; right: 0px; height: 30px;">
<div class="gwt-TabLayoutPanelTabs" style="position: absolute; left: 0px; right: 0px; bottom: 0px; width: 16384px;">
<div class="gwt-TabLayoutPanelTab GEGQEWXCK gwt-TabLayoutPanelTab-selected" style="background-color: rgb(254, 255, 238);">
<div class="gwt-TabLayoutPanelTab GEGQEWXCK" style="background-color: rgb(254, 255, 238);">
<div class="gwt-TabLayoutPanelTab GEGQEWXCK" style="background-color: rgb(254, 255, 238);">
<div class="gwt-TabLayoutPanelTab GEGQEWXCK" style="background-color: rgb(254, 255, 238);">
<div class="gwt-TabLayoutPanelTab GEGQEWXCK" style="background-color: rgb(254, 255, 238);">
<div class="gwt-TabLayoutPanelTab GEGQEWXCK" style="background-color: rgb(254, 255, 238);">
<div class="gwt-TabLayoutPanelTab GEGQEWXCK" style="background-color: rgb(254, 255, 238);">
<div class="gwt-TabLayoutPanelTabInner">
<div class="gwt-HTML">Administration</div>
</div>
</div>
</div>
</div>
<div style="position: absolute; overflow: hidden; left: 0px; top: 30px; right: 0px; bottom: 0px;">
</div>
</div>
<div style="position: absolute; overflow: hidden; top: 1px; right: 1px; width: 30px; height: 25px;">
<div style="position: absolute; overflow: hidden; left: 0px; top: -25px; right: 0px; height: 25px;">
</div>
</div>
</div>
<div style="display: none;" aria-hidden="true"></div>
</body>
</html>
My code is as follows:
element.py
from selenium.webdriver.support.ui import WebDriverWait
class BasePageElement(object):
def __set__(self, obj, value):
driver = obj.driver
WebDriverWait(driver, 100).until(
lambda driver: driver.find_element_by_name(self.locator))
driver.find_element_by_name(self.locator).send_keys(value)
def __get__(self, obj, owner):
driver = obj.driver
WebDriverWait(driver, 100).until(
lambda driver: driver.find_element_by_name(self.locator))
element = driver.find_element_by_name(self.locator)
return element.get_attribute("value")
locators.py
from selenium.webdriver.common.by import By
class MainPageLocators(object):
Submit_button = (By.ID, 'submit')
usernameTxtBox = (By.ID, 'unid')
passwordTxtBox = (By.ID, 'pwid')
submitButton = (By.ID, 'button')
AdministrationButton = (By.CSS_SELECTOR, 'div.gwt-HTML.firepath-matching-node')
AdministrationButtonXpath = (By.XPATH, '//html/body/div[2]/div[2]/div/div[2]/div/div[2]/div/div[7]/div/div')
AdministrationButtonCSS = (By.CSS_SELECTOR, '/body/div[2]/div[2]/div/div[2]/div/div[2]/div/div[7]/div/div')
AdministrationButtonXpath2 = (By.XPATH, 'html/body/div[2]/div[2]/div/div[2]/div/div[2]/div/div[7]/div/div/text()')
AdministrationButtonXpath3 = (By.XPATH, '//div[7]/div/div')
contentFrame = (By.ID, 'ClearCore')
Page.py
from element import BasePageElement
from locators import MainPageLocators
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import NoAlertPresentException
class SearchTextElement(BasePageElement):
class BasePage(object):
def __init__(self, driver):
self.driver = driver
class LoginPage(BasePage):
search_text_element = SearchTextElement()
def userLogin_valid(self):
userName_textbox = self.driver.find_element(*MainPageLocators.usernameTxtBox)
userName_textbox.clear()
userName_textbox.send_keys("riaz.ladhani")
password_textbox = self.driver.find_element(*MainPageLocators.passwordTxtBox)
password_textbox.clear()
password_textbox.send_keys("test123")
submitButton = self.driver.find_element(*MainPageLocators.submitButton)
submitButton.click()
#mydriver.find_element_by_xpath(xpaths['usernameTxtBox']).clear()
def clickAdministration_button(self):
#administrationButton = self.driver.find_element(*MainPageLocators.AdministrationButton)
content_frame = self.driver.find_element(*MainPageLocators.contentFrame)
self.driver.switch_to.frame(content_frame)
#self.driver.switch_to.frame(*MainPageLocators.contentFrame)
#self.driver.Switch_to().Frame(*MainPageLocators.contentFrame)
#administrationButtonCSS = self.driver.find_element(*MainPageLocators.AdministrationButtonCSS)
#administrationButtonXpath= self.driver.find_element(*MainPageLocators.AdministrationButtonXpath)
#administrationButtonXpath= self.driver.find_element(*MainPageLocators.AdministrationButton_CSS_regex)
#administrationButtonCSS2 = self.driver.find_element(*MainPageLocators.AdministrationButtonCSS2)
adminButton = self.driver.find_element(*MainPageLocators.AdministrationButtonXpath3)
adminButton.click()
LoginPage_TestCase.py
import unittest
from selenium import webdriver
import page
class LoginPage_TestCase(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Firefox()
self.driver.get("http://my-pc.company.local:8080/clearcore")
def test_login_valid_user(self):
login_page = page.LoginPage(self.driver)
login_page.userLogin_valid()
login_page.ClickAdministration_button()
def tearDown(self):
self.driver.close()
if __name__ == "__main__":
unittest.main()
As the “Administration button” is located under the frame whose id is “ClearCore” and it is not in the webpage. That is the reason why the element is unable to locate while executing the code.
So before clicking that button you need to switch to that frame either by using
1. driver.switch_to_window("windowName")
2. driver.switch_to_frame("frameName")
Once we are done with working on frames, we will have to come back to the parent frame which can be done using:
driver.switch_to_default_content()
I Have finally managed to solve my issue. The dev said I had to wait for the page to have fully completed loading. The page was still loading the JavaScript functions when all the elements were displayed on the screen.
I first tried time.sleep(30) then click the button. It worked. Waiting for 30 secs every time is not efficient. I then used WebDriverWait and this is more efficient.
Here is the code i used:
WebDriverWait(mydriver, 10).until(lambda d: mydriver.find_element_by_xpath("//div[. = 'Administration']").click())
You have to use
driver.switch_to_frame("__gwt_historyFrame");
before you Administration button click code. This code will take WebDriver into frame, then only WebDriver able to find button inside the frame,
if you want to come out of the frame to navigate outside,
use
driver.switch_to_default_content()
*"__gwt_historyFrame" this is your frame name
Related
Im working on a crypto trading system, I don't have an access to the exchange API at the moment so I decided to try the solution using Selenium automation.
What I cannot figure out is how to move vue slider in exchange (to set a buying amount to 100%).
This is my code:
from selenium.webdriver import ActionChains
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from time import sleep
import time
from selenium.webdriver.common.keys import Keys
import io
import subprocess
#proc = subprocess.Popen("./ChannelMessages.py", stdout=subprocess.PIPE)
chrome_options = Options()
#chrome_options.add_argument('--no-sandbox')
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
driver = webdriver.Chrome(options=chrome_options)
executor_url = driver.command_executor._url
session_id = driver.session_id
print(session_id)
print(executor_url)
driver.get("https://www.hotbit.io/exchange?symbol=XRP_USDT")
time.sleep(10)
en = driver.find_element('xpath', '//*[#id="app"]/div[1]/div[2]/div/div/div/div[1]/div[1]/div[4]/ul/li/form[1]/section[1]/div[4]/div/div/div[1]')
move = ActionChains(driver)
move.click_and_hold(en).move_by_offset(50, 0).release().perform()
This is a slider code:
<div data-v-33e6e6c8="" class="percent-box"><div class="vue-slider vue-slider-ltr v-left-slider" style="padding: 7px 0px; width: auto; height: 4px;"><div class="vue-slider-rail"><div class="vue-slider-process" style="height: 100%; top: 0px; left: 0%; width: 0%; transition-property: width, left; transition-duration: 0.5s;"></div><div class="vue-slider-marks"><div class="vue-slider-mark vue-slider-mark-active" style="height: 100%; width: 4px; left: 0%;"><div class="vue-slider-mark-step vue-slider-mark-step-active"></div></div><div class="vue-slider-mark" style="height: 100%; width: 4px; left: 25%;"><div class="vue-slider-mark-step"></div></div><div class="vue-slider-mark" style="height: 100%; width: 4px; left: 50%;"><div class="vue-slider-mark-step"></div></div><div class="vue-slider-mark" style="height: 100%; width: 4px; left: 75%;"><div class="vue-slider-mark-step"></div></div><div class="vue-slider-mark" style="height: 100%; width: 4px; left: 100%;"><div class="vue-slider-mark-step"></div></div></div><div aria-valuetext="0%" class="vue-slider-dot vue-slider-dot-hover" role="slider" aria-valuenow="0" aria-valuemin="0" aria-valuemax="100" aria-orientation="horizontal" tabindex="0" style="width: 14px; height: 14px; transform: translate(-50%, -50%); top: 50%; left: 0%; transition: left 0.5s ease 0s;"><div class="vue-slider-dot-handle"></div><div class="vue-slider-dot-tooltip vue-slider-dot-tooltip-top"><div class="vue-slider-dot-tooltip-inner vue-slider-dot-tooltip-inner-top"><span class="vue-slider-dot-tooltip-text">0%</span></div></div></div></div></div></div>
Maybe I'm looking for the wrong element in "driver.find_element", tried different elements thought, not sure.
P.S. I tried to locate elements using "name" and "xpath", tried basically all levels of classes to use, but still I couldn't move it, or even select it.
Any help will be much appreciated!
P.P.S: Resolved!
Needed to add a line:
from selenium.webdriver.common.by import By
and to modify "driver.find_element" line:
en = driver.find_element(By.CLASS_NAME, 'v-left-slider')
Needed to add a line:
from selenium.webdriver.common.by import By
and to modify "driver.find_element" line:
en = driver.find_element(By.CLASS_NAME, 'v-left-slider')
I’m trying to create pdf file for payment receipt, but I’m not able to figure out how I should set border for it.
As border I want to use this image:
But while converting it to pdf, next page gets like this:
How can I make it constant border for all pages?
Python + Django code:
from weasyprint import HTML
html_string = render_to_string('receipt.html', DATA)
html = HTML(string=html_string)
result = html.write_pdf()
f = open(str(os.path.join(MEDIA_URL + "invoice_receipt/", 'temp.pdf')), 'wb')
f.write(result)
file_obj = File(open(MEDIA_URL + "invoice_receipt/" + "temp.pdf", 'rb'))
transaction.receipt_file = file_obj
transaction.save()
receipt.html template:
<style>
table tbody tr td{
border-top: unset !important;
}
table tbody tr:nth-child(7) td,
table tbody tr:nth-child(8) td,
table tbody tr:nth-child(9) td,
table tbody tr:nth-child(10) td,
table tbody tr:nth-child(11) td,
table tbody tr:nth-child(12) td
{
padding-top: 0;
padding-bottom: 0;
}
.amount-in-words{
border-bottom:3px solid black;
}
.table thead th {
vertical-align: bottom;
border-bottom: 4px solid black;
}
/* .invoice-template{
padding: 20px;
border: 20px solid transparent;
border-image: linear-gradient(to right,#633363 50%,#f3c53d 50%);
border-image-slice: 1;
} */
.logo{
margin-top: 2rem;
}
.logo2{
margin-top: 2rem;
height: 160px;
width:200px;
}
.invoice-template{
padding: 20px;
background-image: url('https://dev-api.test.com/files/files/DumpData/Frame.png');
background-repeat: no-repeat;
background-size: contain;
break-inside: auto;
}
.main-container{
border: 1px solid black;
padding: 20px 10px;
background: white;
}
p {
font-weight: 500;
}
</style>
</head>
<body>
<div class="container invoice-template">
<!-- <div class="main-container"> -->
<div class="row justify-content-center">
<div class="col-md-5 logo"><img src={{ logo }} class="logo2"></div>
<div class="col-md-5 text-right">
<ul style="list-style: none; color: purple; margin-top: 2rem;">
<li>{{ phone }}<span></span></li>
<li><p>{{ email }}<br>{{ website }}</p><span></span></li>
<li>Resource Factory Pvt. Ltd.<br>{{ shop_address|linebreaksbr }}<span></span></li>
</ul>
</div>
</div>
<div class="row text-center">
<div class="col-md-12"><h6>INVOICE</h6></div>
</div>
<div class="row justify-content-center">
<div class="col-md-5">
<p>
To,<br>
{{ user_name }}<br>
{{ user_address|linebreaksbr }}
</p>
<p>Client GST Number.:</p>
</div>
<div class="col-md-5 text-center">
<p>Date: {{ order_date|date:"d-m-Y" }}</p>
<p>Invoice No. {{ invoice }}</p>
</div>
</div>
I’m giving a short version of my html code. If needed full code please mention.
The behavior of box decorations when a box is split (like your main <div> here) is controller by box-decoration-break. Default is slice which breaks the borders after rendering them. clone will compute the borders on each part of the box:
.invoice-template {
box-decoration-break: clone;
}
I'm trying to fill a field with text inputs from a CSV, Send Keys works fine with all fields except for the below one
<div class="col-xs-12 col-md-6">
<div class="custom-select" data-qa="work-tags" data-testid="work-tags" aria-disabled="false">
<div class="custom-select__label">Tags</div>
<div class=" css-2b097c-container">
<div class=" css-yk16xz-control">
<div class=" css-1hwfws3">
<div class=" css-1wa3eu0-placeholder">Select</div>
<div class="css-1g6gooi">
<div class="" style="display: inline-block;"><input autocapitalize="none" autocomplete="off" autocorrect="off" id="react-select-10-input" spellcheck="false" tabindex="0" type="text" aria-autocomplete="list" value="" style="box-sizing: content-box; width: 2px; background: 0px center; border: 0px; font-size: inherit; opacity: 1; outline: 0px; padding: 0px; color: inherit;">
<div
style="position: absolute; top: 0px; left: 0px; visibility: hidden; height: 0px; overflow: scroll; white-space: pre; font-size: 14px; font-family: "Open Sans", sans-serif; font-weight: 400; font-style: normal; letter-spacing: normal; text-transform: none;"></div>
</div>
</div>
</div>
<div class=" css-1wy0on6"><span class=" css-1okebmr-indicatorSeparator"></span>
<div aria-hidden="true" class=" css-tlfecz-indicatorContainer"><svg height="20" width="20" viewBox="0 0 20 20" aria-hidden="true" focusable="false" class="css-19bqh2r"><path d="M4.516 7.548c0.436-0.446 1.043-0.481 1.576 0l3.908 3.747 3.908-3.747c0.533-0.481 1.141-0.446 1.574 0 0.436 0.445 0.408 1.197 0 1.615-0.406 0.418-4.695 4.502-4.695 4.502-0.217 0.223-0.502 0.335-0.787 0.335s-0.57-0.112-0.789-0.335c0 0-4.287-4.084-4.695-4.502s-0.436-1.17 0-1.615z"></path></svg></div>
</div>
</div>
</div>
</div>
</div>
From UI I can simply input text and save.
I have tried the following but didn't work.
driver.find_element_by_xpath("//div[#data-qa='work-tags']//div[#class=' css-2b097c-container']//div[#class=' css-yk16xz-control']").click()
time.sleep(1)
driver.find_element_by_xpath("//div[#data-qa='work-tags']//div[#class=' css-2b097c-container']//div[#class=' css-yk16xz-control']").send_keys(SSID_rows[SSIDs][1],Keys.TAB)
Thank you
You're trying to put text into div. Try to use input node:
driver.find_element_by_id("react-select-10-input").send_keys(SSID_rows[SSIDs][1],Keys.TAB)
I am having issues with comparing two HTML files using Pythob difflib. While I was able to generate a comparison file that highlighted any changes, when I opened the comparison file, it displayed the raw HTML and CSS script/tags instead of the plain text.
E.g
<Html><Body><div class="a"> Text Here</div></Body></html>
instead of
Text Here
My Python Script is as follows:
import difflib
file1 = open('file1.html', 'r').readlines()
file2 = open('file2.html', 'r').readlines()
diffHTML = difflib.HtmlDiff()
htmldiffs = diffHTML.make_file(file1,file2)
with open('Comparison.html', 'w') as outFile:
outFile.write(htmldiffs)
My input files looks something like this
<!DOCTYPE html>
<html>
<head>
<title>Text here</title>
<style type="text/css">
#media all {
h1 {
margin: 0px;
color: #222222;
}
#page-title {
color: #222222;
font-size: 1.4em;
font-weight: bold;
}
body {
font: 0.875em/1.231 tahoma, geneva, verdana, sans-serif;
padding: 30px;
min-width: 800px;
}
.divider {
margin: 0.5em 15% 0.5em 10%;
border-bottom: 1px solid #000;
}
}
.section.header {
background-color: #303030;
color: #ffffff;
font-weight: bold;
margin: 0 0 5px 0;
padding: 5px 0 5px 5px;
}
.section.subheader {
background-color: #CFCFCF;
color: #000;
font-weight: bold;
padding: 1px 5px;
margin: 0px auto 5px 0px;
}
.response_rule_prefix {
font-style: italic;
}
.exception-scope
{
color: #666666;
padding-bottom: 5px;
}
.where-clause-header
{
color:#AAAA99;
}
.section {
padding: 0em 0em 1.2em 0em;
}
#generated-Time {
padding-top:5px;
float:right;
}
#page-title, #generated-Time {
display: inline-block;
}
</style></head>
<body>
<div id="title-section" class="section ">
<div id="page-branding">
<h1>Title</h1>
</div>
<div id="page-title">
Sub title
</div>
<div id="generated-Time">
Date & Time : Jul 2, 2020 2:42:48 PM
</div>
</div>
<div class="section header">General</div>
<div id="general-section" class="section">
<div class="general-detail-label-container">
<label id="policy-name-label">Text here</label>
</div>
<div class="general-detail-content-container">
<span id="policy-name-content" >Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-description-label">Description A :</label>
</div>
<div class="general-detail-content-container">
<span id="policy-description-content""></span>Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-label-label" class="general-detail-label">Description b:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-label-content" class="wrapping-text"></span>
</div>
<div class="general-detail-label-container">
<label id="policy-group-label" class="general-detail-label">Group:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-group-content" class="wrapping-text">Text here</span>
</div>
<div class="general-detail-label-container">
<label id="policy-status-label" class="general-detail-label">Status:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-status-content">
<label id="policy-status-message">Active</label>
</span>
</div>
<div class="general-detail-label-container">
<label id="policy-version-label" class="general-detail-label">Version:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-version-content" class="wrapping-text">7</span>
</div>
<div class="general-detail-label-container">
<label id="policy-last-modified-label" class="general-detail-label">Last Modified:</label>
</div>
<div class="general-detail-content-container">
<span id="policy-last-modified-content" class="wrapping-text">Jun 15, 2020 2:41:48 PM</span>
</div>
</div>
</body>
</html>
Assuming that you are only looking for the text changes and not changes to the HTML, you could strip the output of HTML after the comparison. There are a number of ways to achieve this. The two that I first thought of was:
RegEx, because this is native in Python, and
BeautifulSoup, because it was created to read webpages
Create a function, using either of above methods, to strip the output of HTML
e.g. using BeautifulSoup
UPDATE
Reading through the documentation again, it seems to me that the comparison will yield the actual HTML too, however, you could create an additional output that only shows the text changes.
Also to avoid showing the whole document, I've set the context parameter to True
Using BeautifulSoup
import re
import difflib
from bs4 import BeautifulSoup
def remove_html_bs(raw_html):
data = BeautifulSoup(raw_html, 'html.parser')
data = data.findAll(text=True)
def visible(element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif re.match('<!--.*-->', str(element.encode('utf-8'))):
return False
return True
result = list(filter(visible, data))
return result
def compare_files(file1, file2):
"Creates two comparison files"
file1 = file1.readlines()
file2 = file2.readlines()
# Difference line by line - HTML
difference_html = difflib.HtmlDiff(tabsize=8).make_file(file1, file2, context=True, numlines=5)
# Difference line by line - raw
difference_file = set(file1).difference(file2)
# List of differences by line index
difference_index = []
for dt in difference_file:
for index, t in enumerate(file1):
if dt in t:
difference_index.append(f'{index}, {remove_html_bs(dt)[0]}, {remove_html_bs(file2[index])[0]}')
# Write entire line with changes
with open('comparison.html', 'w') as outFile:
outFile.write(difference_html)
# Write only text changes, by index
with open('comparison.txt', 'w') as outFile:
outFile.write('LineNo, File1, File2\n')
outFile.write('\n'.join(difference_index))
return difference_html, difference_file
file1 = open('file1.html', 'r')
file2 = open('file2.html', 'r')
difference_html, difference_file = compare_files(file1, file2)
I'm getting this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2661' in position 1409: ordinal not in range(128)
I'm very green to programming still, so have mercy on me and my ignorance. But I understand the error to be that it's not able to handle unicode characters. There's that at least one unicode char, but there could be countless others that'll perk up in that feed.
I've done some looking for others who've had similar problems, but I can't can't find a solution I understand or can make work.
#import library to do http requests:
import urllib
from xml.dom.minidom import parseString, parse
f = open('games.html', 'w')
document = urllib.urlopen('https://itunes.apple.com/us/rss/topfreemacapps/limit=300/genre=12006/xml')
dom = parse(document)
image = dom.getElementsByTagName('im:image')
title = dom.getElementsByTagName('title')
price = dom.getElementsByTagName('im:price')
address = dom.getElementsByTagName('id')
imglist = []
titlist = []
pricelist = []
addlist = []
i = 0
j = 20
k = 40
f.write('''\
<!DOCTYPE html>
<html>
<head>
<style type="text/css">
<!--
A:link {text-decoration: none; color: #246DA8;}
A:visited {text-decoration: none; color: #246DA8;}
A:active {text-decoration: none; color: #40A9E3;}
A:hover {text-decoration: none; color: #40A9E3;}
.box {
vertical-align:middle;
width: 180px;
height: 120px;
border: 1px solid #99c;
padding: 5px;
margin: 0px;
margin-left: auto;
margin-right: auto;
-moz-border-radius: 5px;
border-radius: 5px;
-webkit-border-radius: 5px;
background-color:#ffffff;
font-family: Arial, Helvetica, sans-serif; color: black;
font-size: small;
font-weight: bold;
}
-->
</style>
</head>
<body>
''')
for i in range(0,len(image)):
if image[i].getAttribute('height') == '53':
imglist.append(image[i].firstChild.nodeValue)
for i in range(1,len(title)):
titlist.append(title[i].firstChild.nodeValue)
for i in range(0,len(price)):
pricelist.append(price[i].firstChild.nodeValue)
for i in range(1,len(address)):
addlist.append(address[i].firstChild.nodeValue)
for i in range(0,20):
f.write('''
<div style="width: 600px;">
<div style="float: left; width: 200px;">
<div class="box" align="center">
<div align="center">
''' + titlist[i] + '''<br>
<img src="''' + imglist[i] + '''" alt="" width="53" height="53" border="0" ><br>
<span>''' + pricelist[i] + '''</span>
</div>
</div>
</div>
<div style="float: left; width: 200px;">
<div class="box" align="center">
<div align="center">
''' + titlist[i+j] + '''<br>
<img src="''' + imglist[i+j] + '''" alt="" width="53" height="53" border="0" ><br>
<span>''' + pricelist[i+j] + '''</span>
</div>
</div>
</div>
<div style="float: left; width: 200px;">
<div class="box" align="center">
<div align="center">
''' + titlist[i+k] + '''<br>
<img src="''' + imglist[i+k] + '''" alt="" width="53" height="53" border="0" ><br>
<span>''' + pricelist[i+k] + '''</span>
</div>
</div>
</div>
<br style="clear: left;" />
</div>
<br>
''')
f.write('''</body>''')
f.close()
The basic problem is that you're concatenating the Unicode strings with ordinary byte-strings without converting them using a proper encoding; in these cases, ASCII is used by default (which, clearly, can't handle extended characters).
The line in your script that does this is too long to quote, but another practical example which displays the same problem could look like this:
parameter = u"foo \u2661"
sys.stdout.write(parameter + " bar\n")
You will need to instead encode the Unicode strings with an explicitly specified encoding, e.g. like this:
parameter = u"foo \u2661"
sys.stdout.write(parameter.encode("utf8") + " bar\n")
In your case, you can do this in your loops so as to not have to specify it on every concatenation:
for i in range(1,len(title)):
titlist.append(title[i].firstChild.nodeValue.encode("utf8"))
--
Also, while we're at it, you can improve your code by not iterating through the elements using an integer index. For instance, instead of this:
title = dom.getElementsByTagName('title')
for i in range(1,len(title)):
titlist.append(title[i].firstChild.nodeValue.encode("utf8"))
... you can do this instead:
for title in dom.getElementsByTagName('title')
titlist.append(title.firstChild.nodeValue.encode("utf8"))