Text extraction using BeautifulSoup - python

I have html data like:
<!DOCTYPE html>
<html>
<head>
<script type="text/blzscript">
</script>
<title></title>
</head>
<body>
<p class="status-box">In some countries, this medicine may only be approved for veterinary use.</p>
<h3>Scheme</h3>
<p>Rec.INN</p>
<h3>CAS registry number (Chemical Abstracts Service)</h3>
<p>0000850-52-2</p>
<h3>Chemical Formula</h3>
<p>C21-H26-O2</p>
<h3>Molecular Weight</h3>
<p>310</p>
<h3>Therapeutic Category</h3>
<p>Progestin</p>
<h3>Chemical Names</h3>
<p>17α-Allyl-17-hydroxyesta-4,9,11-trien-3-one (WHO)</p>
<p>Estra-4,9,11-trien-3-one, 17β-hydroxy-17-(2-propenyl)- (USAN)</p>
<h3>Foreign Names</h3>
<ul>
<li>Altrenogestum (Latin)</li>
<li>Altrenogest (German)</li>
<li>
Altrénogest (French)
</li>
<li>Altrenogest (Spanish)</li>
</ul>
<h3>Generic Names</h3>
<ul>
<li>Altrenogest (OS: BAN, USAN)</li>
<li>
Altrénogest (OS: DCF)
</li>
<li>A 35957 (IS)</li>
<li>A 41300 (IS)</li>
<li>RH 2267 (IS)</li>
<li>RU 2267 (IS: RousselUclaf)</li>
</ul>
<h3>Brand Names</h3>
<div class='contentAdRight' id='third_ad_unit'>
<div class='adsense-ad adsense-ad-text-image-flash-html adsense-ad-300 adsense-ad-300x600 adsense-ad-international'>
<script type="text/blzscript">
google_ad_client="pub-3964816748264478";google_ad_channel="";google_ad_format="300x600_pas_abgc";google_ad_width="300";google_ad_height="600";google_ad_type="text,image,flash,html";google_color_border="FFFFFF";google_color_bg="FFFFFF";google_color_link="0000FF";google_color_text="000000";google_color_url="008000";google_analytics_domain_name="drugs.com";
</script>
<h1></h1>
</div>
</div>
</body>
</html>
and i want to extract :
Foreign names , generic names and brand names:
I tried
test = soup.select('h1')[0].text.strip()
print(test)
But it is not giving what i want i also tried to extract for script but none of them are giving result as i required

Related

Can't select dropdown in selenium python

I am new to selenium. I have managed to login in to our work practice management system. So the basic setup is fine.
I am then faced with this:
I need to drop down the Work dropdown and select a premade report (All Tasks For Export):
I have tried a lot of stuff...... CSS Selector, Class, ID
But I always get error: Message: no such element: Unable to locate element:
Code:
driver = webdriver.Chrome()
driver.get('https://xxxxx.senta.co/a/i/a')
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input[name='email']"))).send_keys("xxx#xxx.com")
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='password']"))).send_keys("xxxxxx")
driver.find_element(By.CSS_SELECTOR, 'button#submit').click()
dropdown = Select(driver.find_element(By.CSS_SELECTOR,'major#navjobs'))
But maybe I am selecting the wrong element entirely. I will post the HTML below. Maybe I understand the Selenium but not the HTML!! Thanks in advance.
And then the HTML for the elements in the list look like this:
OK here is the HTML of the page. Not sure it's going to help much!
<!DOCTYPE html><html lang="en" ng-app="senta" se-file-drop="onFileSelect($files)" ng-controller="BodyCtrl" ng-class="{ selectfile:selectfile }"> <head> <base href="/"> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" /> <meta name="HandheldFriendly" content="true" /> <meta name="robots" content="noindex"> <link id="favicon" rel='icon' type='image/ico' sizes='32x32' href='https://dsik6juztonps.cloudfront.net/client/public/images/favicon.ico'> <title ng-bind="$root.notiftitle + $root.title + $root.appTitle">Loading...</title> <link href="https://dsik6juztonps.cloudfront.net/client/public/dist/lib/20211115/lib.css" rel="stylesheet" /> <link href="https://dsik6juztonps.cloudfront.net/client/public/dist/_m258abe0ebbf4d5628de49b9a35dff42e/style.css" rel="stylesheet" /> <script src="https://dsik6juztonps.cloudfront.net/client/public/dist/lib/20211115/prelib.min.js"></script> <script src="https://dsik6juztonps.cloudfront.net/client/public/dist/lib/20211115/momentjs/en-gb.js"></script> </head> <body id="{{$root.bodyId}}" class="droptarget {{userClass}} {{skinClass}}" ng-class="{ 'on-scrolled':notAtTop, 'preheader-on':preheaderOn, 'preheader2-on':preheader2On, 'preheadertimer-on':preheaderTimerOn }"> <div ng-if="$root.user.loggedin" ng-include="'https://dsik6juztonps.cloudfront.net/client/public/dist/_m258abe0ebbf4d5628de49b9a35dff42e/html/en-gb/header.html'" ng-controller="NavBarCtrl"></div> <div class="dropindic"> <div class="lightbox"></div> <div class="centred"> <p class="text" style="">Drop your files here to upload into Senta</p><i class="fa fa-file"></i> <p class="selectbutton">Alternatively: <input type="file" id="selectfile"> <button type="button" class="btn btn-primary pseudoselect" ng-click="selectFile()">Select file</button> <button type="button" class="btn btn-normal" ng-click="cancelSelectFile()">Cancel</button> </p> </div> </div> <div ng-if="deepheader" class="deepheader"></div> <div class="container"> <div ui-view> <div class="positioner"> <div class="notifier"> <span class="spinning"><span class="spinner"><i class="fa fa-spin fa-refresh"></i></span></span> <span class="msg">Loading...</span> </div> </div> </div> <div id="react-root"></div> </div> <div ng-if="$root.expressionfooter" ng-include="'https://dsik6juztonps.cloudfront.net/client/public/dist/_m258abe0ebbf4d5628de49b9a35dff42e/html/en-gb/settings/expression/footer.html'" ng-controller="ExpressionTesterCtrl" ></div> <div ng-if="$root.previewfooter" ng-include="'https://dsik6juztonps.cloudfront.net/client/public/dist/_m258abe0ebbf4d5628de49b9a35dff42e/html/en-gb/modal/preview-footer.html'"></div> <script src="https://dsik6juztonps.cloudfront.net/client/public/dist/lib/20211115/postlib.min.js"></script> <script src="https://dsik6juztonps.cloudfront.net/client/public/dist/_m258abe0ebbf4d5628de49b9a35dff42e/app.min.en-gb.js"></script> <script src="https://www.gstatic.com/charts/loader.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/1.9.638/pdf.min.js"></script> </body><script src="https://dsik6juztonps.cloudfront.net/react/static/js/en-gb/main.eda07c95.js"></script></html>
It's look like you want find element By.CSS_SELECTOR like you are using By.XPATH.
For example, if you want to find <ul> which contains here on your screenshot
you can use:
driver.find_element(By.CLASS_NAME, 'dropdown-menu')
driver.find_element(By.CSS_SELECTOR, '.dropdown-menu')
driver.find_element(By.XPATH, '//ul[#class='dropdown-menu']')
driver.find_element(By.XPATH, '//ul[#id='work-dropdown'][#class='dropdown-menu'])
but i can't understand what you are looking for...
UPDATE:
try to find all <li> with ng-repeat='viewt in viewst track by viewt._id'
driver.find_elements(By.XPATH, '//li[#ng-repeat="viewt in viewst track by viewt._id"]')[here_is_index(even 0 idk)]
and choose needed by indexing. It's really hard to help you without html code...

Injecting data into html using Flask

I have a flask app, about saving strings into some db files.
I have a base.html file which is like navbar which i extend to every page. That navbar has a lots of links which require a specific string that the user has to enter, so i wanna know if there's a way to inject strings into that base.html file, cuz i can't make a route for a navbar base file right?
Navbar base file down below
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="/static/css/base.css">
<title>
BukkitList - {% block title %}{% endblock %}
</title>
</head>
<body>
<div class="NAV_B Hidden" id="MENU">
<div class="NAV_B_LINKS">
<img src="/static/assets/img/cube.png" alt="">
<a class="SUS" href="/">Home</a>
</div>
<div class="NAV_B_LINKS">
<img src="/static/assets/img/list.png" alt="">
<a class="/List{{UserId}}" href="/List">List</a>
</div>
<div class="NAV_B_LINKS">
<img src="/static/assets/img/add.png" alt="">
<a class="/Task_Add/{{UserId}}">Add Task</a>
</div>
<div class="NAV_B_LINKS">
<img src="/static/assets/img/settings.png" alt="">
<a class="SUS">Settings</a>
</div>
</div>
<div class="NAV_S" id="NAV">
<img src="/static/assets/img/cube.png" alt="">
<h3>{% block navtitle %}
{% endblock %}
</h3>
<img src="/static/assets/img/menu.png" alt="" onclick="Menu()">
</div>
{% block main %}
{% endblock %}
</body>
<script src="/static/js/base.js"></script>
</html>
Yes i need that UserId to be injected.
the question is not very understandable of where the user is inputting the {{UserID}} but from what I understand that there is that userID that you can select from the db in the Python file and you want to pass it into the HTML page or if you have a sign-in in your page, you can grab that ID when they sign in using flask_session either way if you need to pass that userID from the Python file you will need to include it in your return, so in python it will look like that if you are using session:
#app.route("/")
def main():
UserIDpy = Session["YourSessionVar"]
return render_template("YourHTMLpage.html", UserID = UserIDpy)
The UserID is the var name that will be passed into the HTML page and UserIDpy is the var name that what UserID saved at.
So that code will replace all of {{ UserID }} you have at you HTML page
I believe you can do this with Flask's session variable. It allows you to create and update a global variable that can be referenced in templates even when you don't render them directly. This is similar to Lychas' answer, but should be more suited for your purpose.
Create/update a session variable in your login route (or wherever you want to update this value) with this line:
session['UserId'] = your_id_value_here
You can then use this session variable in your jinja templates with something like the following:
<a class="/Task_Add/{{ session['UserId'] }}">Add Task</a>
(Note, if you are not already using session, you will need to import it with from Flask import session.)

How to print variable present in python with the help of html?

I've written some code for deep learning text summarization, and I'm trying to render the template using the Flask library. I'm unable to see the results. The python code can be found below.
text = ' '.join([summ['summary_text'] for summ in res])
print(text)
return render_template('result.html', prediction=text)
I'm trying to print the prediction variable which is present in the above code. Below is the html code
<!DOCTYPE html>
<html>
<head>
<title></title>
<link rel="stylesheet" type="text/css" href="{{ url_for('static', filename='css/styles.css') }}">
</head>
<body>
<header>
<div class="container">
<div id="brandname">
Deep Learning App
</div>
<h2>Summarized text</h2>
</div>
</header>
<p style="color:blue;font-size:20;text-align: center;"><b>Result for Text</b></p>
<div class="results">
<p><strong>{prediction}</strong></p>
</div>
</body>
</html>
Below is output image
enter image description here
Can anyone help me how to display text present in prediction variable on web page?
You need double curly braces
<p><strong>{{ prediction }}</strong></p>

Django not showing dynamic list content

I am making a website where I show off the specs of each product using flip cards and I have django is as the backend. I was trying to make the specs dynamic using jinja format but everytime I try to put my multiple objects in list it messes the code up.
views.py before
def prodospecs(request):
product1 = product()
product1.name = 'iPhone 12 Pro 5G'
product1.screen = '6.1" Super Amoled Display 60hz'
product1.chipset = 'A14'
product1.camera = 'Triple Camera Setup (UltraWide, Telephoto, Wide)'
product1.special = 'New Design'
product1.price = 999
product2 = product()
product2.name = 'S21 Ultra 5G'
product2.screen = '6.8" Amoled Display 120hz'
product2.chipset = 'Snapdragon 888, Exynos 2100'
product2.camera = 'Quad Camera Setup (UltraWide, 2 Telephoto, Wide)'
product2.special = 'New Camera Design and S-Pen Support'
product2.price = 1199
product3 = product()
product3.name = 'Asus Zenbook Duo'
product3.screen = '14 inch 16:9'
product3.chipset = 'i5 or i7'
product3.camera = '720p Webcam'
product3.special = 'Two Displays'
product3.price = 999
return render(request, 'prodospecs.html', {'product1' : product1,'product2' : product2, 'product3' : product3 })
And this one works and shows all the information necessary
views.py after
def prodospecs(request):
product1 = product()
product1.name = 'iPhone 12 Pro 5G'
product1.screen = '6.1" Super Amoled Display 60hz'
product1.chipset = 'A14'
product1.camera = 'Triple Camera Setup (UltraWide, Telephoto, Wide)'
product1.special = 'New Design'
product1.price = 999
product2 = product()
product2.name = 'S21 Ultra 5G'
product2.screen = '6.8" Amoled Display 120hz'
product2.chipset = 'Snapdragon 888, Exynos 2100'
product2.camera = 'Quad Camera Setup (UltraWide, 2 Telephoto, Wide)'
product2.special = 'New Camera Design and S-Pen Support'
product2.price = 1199
product3 = product()
product3.name = 'Asus Zenbook Duo'
product3.screen = '14 inch 16:9'
product3.chipset = 'i5 or i7'
product3.camera = '720p Webcam'
product3.special = 'Two Displays'
product3.price = 999
prods = [product1, product2, product3]
return render(request, 'prodospecs.html', {'products': prods})
While this one doesn't show any information
prodospecs.html
{% load static %}
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Listen to Talk Tech Teen Tech</title>
<link rel="stylesheet" href="{% static 'prodospecs.css' %}">
</head>
<body>
<div class="container">
<div class="menu">
<ul>
<li class = "logo"><img src="{% static 'images/icon-1.png' %}"></li>
<li>Home</li>
<li>Listen</li>
<li>Premium Techy</li>
<li class = "active">Product Specs</li>
<li>Contact</li>
<li><span>Sign Up</span></li>
</ul>
</div>
<div class="title">
<h1>Product Specs</h1>
<p>The list of important specs for each product that we talk about on Talk Tech Teen Tech</p>
</div>
<div class="flip-card">
<div class="flip-card-inner">
<div class="flip-card-front">
<img src="{% static 'images/iPhone12Pro.png' %}" alt="12pro" style="width:300px;height:300px;">
</div>
<div class="flip-card-back">
<h1>{{product1.name}}</h1>
<p>{{product1.screen}}</p>
<p>{{product1.chipset}}</p>
<p>{{product1.camera}}</p>
<p>{{product1.special}}</p>
<p>Price: ${{product1.price}} USD</p>
</div>
</div>
</div>
<div class="flip-card">
<div class="flip-card-inner">
<div class="flip-card-front">
<img src="{% static 'images/s21ultra.png' %}" alt="s21" style="width:300px;height:300px;">
</div>
<div class="flip-card-back">
<h1>{{product2.name}}</h1>
<p>{{product2.screen}}</p>
<p>{{product2.chipset}}</p>
<p>{{product2.camera}}</p>
<p>{{product2.special}}</p>
<p>Price: ${{product2.price}} USD</p>
</div>
</div>
</div>
<div class="flip-card">
<div class="flip-card-inner">
<div class="flip-card-front">
<img src="{% static 'images/zenbookduo.png' %}" alt="s21" style="width:300px;height:300px;">
</div>
<div class="flip-card-back">
<h1>{{product3.name}}</h1>
<p>{{product3.screen}}</p>
<p>{{product3.chipset}}</p>
<p>{{product3.camera}}</p>
<p>{{product3.special}}</p>
<p>Price: ${{product3.price}} USD</p>
</div>
</div>
</div>
</body>
Any help would be greatly appreciated
You want to be looping over the products:
{% for product in products %}
<div class="flip-card">
<div class="flip-card-inner">
<div class="flip-card-front">
<img src="{% static 'images/iPhone12Pro.png' %}" alt="12pro" style="width:300px;height:300px;">
</div>
<div class="flip-card-back">
<h1>{{product.name}}</h1>
<p>{{product.screen}}</p>
<p>{{product.chipset}}</p>
<p>{{product.camera}}</p>
<p>{{product.special}}</p>
<p>Price: ${{product.price}} USD</p>
</div>
</div>
</div>
{% endfor %}
in prodospecs.html
{% load static %}
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Listen to Talk Tech Teen Tech</title>
<link rel="stylesheet" href="{% static 'prodospecs.css' %}">
</head>
<body>
<div class="container">
<div class="menu">
<ul>
<li class = "logo"><img src="{% static 'images/icon-1.png' %}"></li>
<li>Home</li>
<li>Listen</li>
<li>Premium Techy</li>
<li class = "active">Product Specs</li>
<li>Contact</li>
<li><span>Sign Up</span></li>
</ul>
</div>
<div class="title">
<h1>Product Specs</h1>
<p>The list of important specs for each product that we talk about on Talk Tech Teen Tech</p>
</div>
{% for product in products %}
<div class="flip-card">
<div class="flip-card-inner">
<div class="flip-card-front">
<img src="{% static 'images/iPhone12Pro.png' %}" alt="12pro" style="width:300px;height:300px;">
</div>
<div class="flip-card-back">
<h1>{{product.name}}</h1>
<p>{{product.screen}}</p>
<p>{{product.chipset}}</p>
<p>{{product.camera}}</p>
<p>{{product.special}}</p>
<p>Price: ${{product.price}} USD</p>
</div>
</div>
</div>
{% endfor %}
</div>
</div>
</body>
You need to add some special tags {% for ... in ... %} and {{ variable }}. They are part of the Django Template Language. This for loop will iterate over a list of objects using a for. The {{ product.<field name> }} renders the name of the product in the HTML template, generating a dynamic HTML document.
Your answer:
{% for product in products %}
<h1>{{product.name}}</h1>
<p>{{product.screen}}</p>
<p>{{product.chipset}}</p>
<p>{{product.camera}}</p>
<p>{{product.special}}</p>
<p>Price: ${{product.price}} USD</p>
{% endfor %}
One small Suggestion(for code optimisation): Instead of creating a product object in views ,Its better to create those objects in models.py like below
**Models.py:**
class Product(models.Model):
name = models.CharField(max_length=30, unique=True)
screen= models.CharField(max_length=100)
chipset= models.CharField(max_length=100)
camera= models.CharField(max_length=100)
special= models.CharField(max_length=100)
price= models.DecimalField(max_digits=6, decimal_places=2)
def __str__(self):
return self.name
Open the Command Line Tools, activate the virtual environment, go to the folder where the manage.py file is, and run the commands below:
python manage.py makemigrations
As an output you will get something like this:
Migrations for 'products':
boards/migrations/0001_initial.py
- Create model Product
- Add field name to Product
- Add field screen to Product
- Add field chipset to Product
- Add field camera to Product
- Add field special to Product
- Add field price to Product
At this point, Django created a file named 0001_initial.py inside the app/migrations directory. It represents the current state of our application’s models. In the next step, Django will use this file to create the tables and columns.
The next step now is to apply the migration we generated to the database:
python manage.py migrate
After this go to /admin and create your products there.
That’s it! Your Product objects is ready to be use.

Python renders wrong page using web.py

I am trying to teach myself Python. I have the following code in my controller.py file:
import web
urls = {
'/', 'home',
'/register', 'registerclick'
}
render = web.template.render("views/templates", base="MainLayout")
app = web.application(urls, globals())
# Classes/Routes
class home:
def GET(self):
return render.home()
class registerclick:
def GET(self):
return render.register()
if __name__ == "__main__":
app.run()
And this is the code in my MainLayout.html:
$def with (page)
$var css: static/css/bootstrap.css
$var js1: static/js/jquery-3.1.0.min.js static/js/bootstrap.js static/js/material.min.js static/js/ripple.min.js static/js/scripty.js
<html lang="en">
<head>
<meta charset="UTF-8">
<title>CodeWizard</title>
$if self.css:
$for style in self.css.split():
<link rel="stylesheet" href="$style" />
</head>
<body>
<div id="app">
<div class="navbar navbar-info navbar-fixed-top">
<div class="navbar-header">
<a class="navbar-brand">CodeWizard</a>
</div>
<ul class="nav navbar-nav">
<li>
<a class="waves-effect" href="/">Home Feed<div class="ripple-container"></div></a>
</li>
<li>
Discover<div class="ripple-container"></div>
</li>
<li>
Profile<div class="ripple-container"></div>
</li>
<li>
Settings<div class="ripple-container"></div>
</li>
</ul>
<div class="pull-right">
Register
</div>
</div>
<br /><br />
$:page
</div>
$if self.js1:
$for script in self.js1.split():
<script src="$script"></script>
</body>
</html>
I have 2 additional files (home.html, and register.html) and I have bootstrap available (although that has nothing to do with my issue).
When I start the application and I open a browser and enter localhost:8080 as the url, MainLayout.html is loaded into the browser (which I expect) but the contents of register.html are loaded into $:page and I don't know why.
When I remove the second entry from the urls and remove the regnsterclick class from controller.py, the MainLayout.html page is loaded and nothing appears to be loaded into $:page.
Any ideas why the contents of register.html get presented? Any help is greatly appreciated.
Thanks.
By defining urls with braces, you made it a set, which is unordered. You need to define urls as a tuple which can be done using parentheses.
This answer explains it well: https://stackoverflow.com/a/46633252/2150542

Categories

Resources