Form.is_valid is always false - python

I am writing a simple survey with modelForm. I have searched online for this issue too and it says that's because it's because the form is unbound... but I for simplicity I only offered one choice in models.py
Edit: form._isbound is true... so it's because of something else
form.errors show property object at 0x03A146C0
models.py
Those are hardcoded as radio inputs in html
class Office(models.Model):
Office_Space = (
('R1B1', 'R1B1'),
('R2B1', 'R2B1'),
('R3B1', 'R3B1'),
('R1B2', 'R1B2'),
('R2B2', 'R2B2'),
('R3B2', 'R3B2'),
('R1B3', 'R1B3'),
('R2B3', 'R2B3'),
('R3B3', 'R3B3')
)
space = models.CharField(max_length=4, choices=Office_Space)
form.py
class officeForm(forms.ModelForm):
class Meta:
model = Office
fields = ['space',]
Views.py
def get_SenarioChoice(request):
form_class = officeForm(request.POST or None)
if request.method == 'POST':
if form_class.is_valid():
space = request.POST.get('result')
response_data = {}
print(space+ "is valid") # here is the RxCx printed for debugging
response_data['space'] = space
form_class.save()
print (connection.queries) #the SQL log
return JsonResponse(response_data)enter code here
return render(request, 'Front.html', {'officeform': form_class})
Added: template- I am very new to web-dev so when I wrote this form I did not know that it could render by itself therefore I hardcoded everything
Survey is consisted of 3 bids, each bid has 3 issues and each issue has 3 options. (I could potentially separated them but I didn't know how so I coded them in one choicefield numbered by the issueID ("R#") + BidID ("B#"))
i.e: R1B1 = issue 1 bid 1
<tr>
<th>Bigger office</th>
</tr>
<tr>
<td>Bigger cubible</td>
<td>5</td>
<td><input type="radio" name="R1B1" value="5" required><br></td>
<td> </td>
<td><input type="radio" name="R1B2" value="5" required><br></td>
<td> </td>
<td><input type="radio" name="R1B3" value="5" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>Shared office</td>
<td>60</td>
<td><input type="radio" name="R1B1" value="60"><br></td>
<td id =R1C1></td>
<td><input type="radio" name="R1B2" value="60"><br></td>
<td id =R1C2></td>
<td><input type="radio" name="R1B3" value="60"><br></td>
<td id = R1C3></td>
<td id =R1C1C></td>
<td id =R1C2C></td>
<td id = R1C3C></td>
</tr>
<tr>
<td>No change</td>
<td>30</td>
<td><input type="radio" name="R1B1" value="30" required><br></td>
<td> </td>
<td><input type="radio" name="R1B2" value="30" required><br></td>
<td> </td>
<td><input type="radio" name="R1B3" value="30" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<th>New and challenging individual assignments</th>
</tr>
<tr>
<td>Some teamwork, some individual work</td>
<td>80</td>
<td><input type="radio" name="R2B1" value="80" required><br></td>
<td> </td>
<td><input type="radio" name="R2B2" value="80" required><br></td>
<td> </td>
<td><input type="radio" name="R2B3" value="80" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>No (i.e., no change to current situation)</td>
<td>10</td>
<td><input type="radio" name="R2B1" value="10"><br></td>
<td id =R2C1></td>
<td><input type="radio" name="R2B2" value="10"><br></td>
<td id =R2C2></td>
<td><input type="radio" name="R2B3" value="10"><br></td>
<td id =R2C3></td>
<td id =R2C1C></td>
<td id =R2C2C></td>
<td id =R2C3C></td>
</tr>
<tr>
<td>Mostly Group Work</td>
<td>40</td>
<td><input type="radio" name="R2B1" value="40" required><br></td>
<td> </td>
<td><input type="radio" name="R2B2" value="40" required><br></td>
<td> </td>
<td><input type="radio" name="R2B3" value="40" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<th>Working hours</th>
</tr>
<tr>
<td>Yes, flextime and others</td>
<td>50</td>
<td><input type="radio" name="R3B1" value="50" required><br></td>
<td> </td>
<td><input type="radio" name="R3B2" value="50" required><br></td>
<td> </td>
<td><input type="radio" name="R3B3" value="50" required><br></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>No change</td>
<td>0</td>
<td><input type="radio" name="R3B1" value="0"><br></td>
<td id =R3C1></td>
<td><input type="radio" name="R3B2" value="0"><br></td>
<td id =R3C2></td>
<td><input type="radio" name="R3B3" value="0"><br></td>
<td id =R3C3></td>
<td id =R3C1C></td>
<td id =R3C2C></td>
<td id =R3C3C></td>
</tr>
<tr>
<tr>
<td>Work more</td>
<td>10</td>
<td><input type="radio" name="R3B1" value="10" required><br></td>
<td> </td>
<td><input type="radio" name="R3B2" value="10" required><br></td>
<td> </td>
<td><input type="radio" name="R3B3" value="10" required><br></td>
<td> </td>
Thanks in advance.

Related

Xpath Python Extract Data From Table Between Two Headings

I'm trying to extract data from a table that lies in between two headers in an html file using Python. IN this case, the required id to lookup lies in a span inside a header (I need id="Perlis", which lies between Perlis and Kedah):
<h2>
<span class="mw-headline" id="Perlis">Perlis</span>
<span class="mw-editsection">
<span class="mw-editsection-bracket">[</span>
edit
<span class="mw-editsection-bracket">]</span>
</span>
</h2>
<table class="wikitable" style="text-align:center; font-size:90%; width:100%;">
<tbody>
<tr>
<th width="30"># </th>
<th width="150">Constituency s </th>
<th width="150">Winner </th>
<th width="80">Votes </th>
<th width="80">Majority </th>
<th width="150">Opponent(s) </th>
<th width="80">Votes </th>
<th width="150">Incumbent </th>
<th width="80">
<b>Incumbent Majority</b>
</th>
</tr>
<tr>
<td colspan="13">
BN
<b>2</b> | GS
<b>0</b> | PH
<b>1</b> | Independent
<b>0</b>
</td>
</tr>
<tr align="center">
<td rowspan="2">P1 </td>
<td rowspan="2">
Padang Besar
</td>
<td rowspan="2" bgcolor="#B5BED9">
Zahidi Zainul Abidin
<br /> ( <b>BN</b>- <b>UMNO</b>)
</td>
<td rowspan="2">
<b>15,032</b>
</td>
<td rowspan="2">
<b>1,438</b>
</td>
<td bgcolor="#F18A8F">Izizam Ibrahim <br /> ( <b>PH</b>- <b>PPBM</b>) </td>
<td>
<b>13,594</b>
</td>
<td rowspan="2" bgcolor="#B5BED9">
Zahidi Zainul Abidin
<br /> ( <b>BN</b>- <b>UMNO</b>)
</td>
<td rowspan="2">
<b>7,426</b>
</td>
</tr>
<tr>
<td bgcolor="#B2DBB2">Mokhtar Senik <br /> ( <b>GS</b>- <b>PAS</b>) </td>
<td>
<b>7,874</b>
</td>
</tr>
<tr align="center">
<td rowspan="2">P2 </td>
<td rowspan="2">
Kangar
</td>
<td rowspan="2" bgcolor="#C7F2F2">Noor Amin Ahmad <br /> ( <b>PH</b>- <b>PKR</b>) </td>
<td rowspan="2">
<b>20,909</b>
</td>
<td rowspan="2">
<b>5,603</b>
</td>
<td bgcolor="#B5BED9">Ramli Shariff <br /> ( <b>BN</b>- <b>UMNO</b>) </td>
<td>
<b>15,306</b>
</td>
<td rowspan="2" bgcolor="#B5BED9">
Shaharuddin Ismail
<br /> ( <b>BN</b>- <b>UMNO</b>)
</td>
<td rowspan="2">
<b>4,037</b>
</td>
</tr>
<tr>
<td bgcolor="#B2DBB2">Mohamad Zahid Ibrahim <br /> ( <b>GS</b>- <b>PAS</b>) </td>
<td>
<b>8,465</b>
</td>
</tr>
</tbody>
</table>
<h2>
<span class="mw-headline" id="Kedah">Kedah</span>
<span class="mw-editsection">
<span class="mw-editsection-bracket">[</span>
edit
<span class="mw-editsection-bracket">]</span>
</span>
</h2>
<table class="wikitable" style="text-align:center; font-size:90%; width:100%;"></table>
This is the resulting JSON that I am trying to construct:
[
{
"state": "Perlis",
"constituencies": [
{
"id": "P1",
"name": "Padang Besar"
},
{
"id": "P2",
"name": "Kangar"
}
]
}
]
I'd like to know how to reference the specific table so I can extract the data into a JSON format. I have used Scrapy before but not sure how to in this case- this is what I had in mind:
class PostSpider(scrapy.Spider):
name = 'manual_spider'
start_urls = [
'%URL%'
]
def parse(self, response):
doc = response.xpath('//comment()').getall() //This is the bit I need
//code continues here

Loop through table rows and print text in selenium using python

I have an HTML Table:
<div class="report-data">
<table>
<thead>
<tr>
<td></td>
<td>All</td>
<td>Long</td>
<td>Short</td>
</tr>
</thead>
<tbody>
<tr>
<td>Net Profit</td>
<td>
<div>3644.65</div>
<div><span class="additional_percent_value">3.64 %</span></div>
</td>
<td>
<div>3713.90</div>
<div><span class="additional_percent_value">3.71 %</span></div>
</td>
<td>
<div><span class="neg">69.25</span></div>
<div><span class="additional_percent_value"><span class="neg">0.07 %</span></span>
</div>
</td>
</tr>
<tr>
<td>Net Profit</td>
<td>
<div>3644.65</div>
<div><span class="additional_percent_value">3.64 %</span></div>
</td>
<td>
<div>3713.90</div>
<div><span class="additional_percent_value">3.71 %</span></div>
</td>
<td>
<div><span class="neg">69.25</span></div>
<div><span class="additional_percent_value"><span class="neg">0.07 %</span></span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
Now I want to print all the td[1] values for each row, so My output should be:
Net Profit
Net Profit
So I executed the below code:
for dt in driver.find_element_by_xpath("//div[#class='report-data']/following-sibling::table/tbody/tr"):
text_label = dt.find_element_by_xpath(".//td").text
print(text_label)
But it throws error:
selenium.common.exceptions.NoSuchElementException: Message: no such
element: Unable to locate element:
{"method":"xpath","selector":"//div[#class='report-data']/following-sibling::table/tbody/tr"}
You're almost there, I believe. Try this:
content = driver.find_elements_by_xpath("//div[#class='report-data']/table/tbody/tr")
for dt in content:
text_label = dt.find_element_by_xpath("./td").text
print(text_label)

Flask checkboxes and textboxes issue

I am creating a flask form which requires login, and after login it goes to the entry form where we have check boxes and text entry.
I am facing a specific problem: I am unable to get the value of text boxes but getting the value of checkboxes.
I using flask and using every request method to print my text boxes but not getting the values.
below is my code for main file:
from flask import Flask, render_template
import os
from flask import redirect, url_for, request
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config["SQLALCHEMY_DATABASE_URI"]="sqlite:////OtrsSummary.db"
app.config["SECRET_KEY"]="thisiskey"
db = SQLAlchemy(app)
# #app.route("/ndex")
# def home():
# names = os.getlogin().split(".")[0].title()
# return render_template("index.html", name=names)
#app.route("/welcome", methods=['GET', 'POST'])
def welcome():
if request.method=="POST":
try:
phase = request.form.get("phase")
rphase = phase.replace("on", "1")
print(rphase)
sale = request.form.get("sale")
rsale = sale.replace("on", "1")
print(rsale)
floor = request.form.get("floor")
rfloor = floor.replace("on", "1")
options = request.form.get("options")
roptions = options.replace("on", "1")
image = request.form.get("image")
rimage = image.replace("on", "1")
video = request.form.get("video")
rvideo = video.replace("on", "1")
possession = request.form.get("possession")
rpossession = possession.replace("on", "1")
amenities = request.form.get("amenities")
ramenities = amenities.replace("on", "1")
prdeactivation = request.form.get("prdeactivation")
rprdeactivation = prdeactivation.replace("on", "1")
np = request.form.get("np")
rnp = np.replace("on", "1")
newbooking = request.form.get("newbooking")
rnewbooking = newbooking.replace("on", "1")
bank = request.form.get("bank")
rbank = bank.replace("on", "1")
lat = request.form.get("lat")
rlat = lat.replace("on", "1")
usp = request.form.get("usp")
rusp = usp.replace("on", "1")
fact = request.form.get("fact")
rfact = fact.replace("on", "1")
prname = request.form.get("prname")
rprname = prname.replace("on", "1")
prdescription = request.form.get("prdescription")
rprdescription = prdescription.replace("on", "1")
prspecification = request.form.get("prspecification")
rprspecification = prspecification.replace("on", "1")
builderdetails = request.form.get("builderdetails")
rbuilerdetails = builderdetails.replace("on", "1")
tco = request.form.get("tco")
rtco = tco.replace("on", "1")
npdeactivation = request.form.get("npdeactivation")
rnpdeactivation = npdeactivation.replace("on", "1")
constuctionimages = request.form.get("constuctionimages")
rconstuctionimages = constuctionimages.replace("on", "1")
brochure = request.form.get("brochure")
rbrochure = brochure.replace("on", "1")
rera = request.form.get("rera")
rrera = rera.replace("on", "1")
rticketnumber = request.form["ticketnumber"]##here not getting the value
rxidnumber = request.form["xidnumber"]##here not getting the value
rreranumber = request.form["reranumber"]##here not getting the value
print(rxidnumber)
print(rticketnumber)
msg = "Entry Submitted Successfully"
except AttributeError:
msg = "Please Do Not Submit Blank Form"
#con.close()
return render_template("same.html")
#app.route("/", methods=["GET", "POST"])
def log():
names = os.getlogin().split(".")[0].title()
error= None
if request.method == "POST":
if request.form["username"]!= os.getlogin() or request.form["password"]!="1234":
error = "Invalid Credentials.Please Try again."
else:
return redirect(url_for("welcome"))
return render_template("index.html", error=error, name=names)
app.run(debug=True)
This is my html:
<!doctype <!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Page Title</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- <link rel="stylesheet" type="text/css" media="screen" href="main.css" /> -->
<!-- <script src="main.js"></script> -->
</head>
<body>
<form action="" ALIGN="CENTRE" method="POST">
<h3>OTRS Basic Information form</h3>
<TABLE>
<TR>
<TD>
Phases and Tower</TD>
<TD>
<input type="checkbox" name="phase">
</TD>
<TD>
Saleable Mapping </TD>
<TD>
<input type="checkbox" name="sale">
</TD>
</TR>
<TR>
<TD>
Floor Plan </TD>
<TD>
<input type="checkbox" name="floor">
</TD>
<TD>
Options </TD>
<TD>
<input type="checkbox" name="options">
</TD>
</TR>
<TR>
<TD>
Images </TD>
<TD>
<input type="checkbox" name="image">
</TD>
<TD>
Video </TD>
<TD>
<input type="checkbox" name="video">
</TD>
</TR>
<TR>
<TD>
Possession Status/Date </TD>
<TD>
<input type="checkbox" name="possession">
</TD>
<TD>
Amenities </TD>
<TD>
<input type="checkbox" name="amenities">
</TD>
</TR>
<TR>
<TD>
Project Deactivation </TD>
<TD>
<input type="checkbox" name="prdeactivation">
</TD>
<TD>
Np Slot Changes/Refresh </TD>
<TD>
<input type="checkbox" name="np">
</TD>
</TR>
<TR>
<TD>
New Booking/Resale Lock </TD>
<TD>
<input type="checkbox" name="newbooking">
</TD>
<TD>
Bank
</TD>
<TD>
<input type="checkbox" name="bank">
</TD>
</TR>
<TR>
<TD>
Lat Long/Location </TD>
<TD>
<input type="checkbox" name="lat">
</TD>
<TD>
USP </TD>
<TD>
<input type="checkbox" name="usp">
</TD>
</TR>
<TR>
<TD>
Fact Table </TD>
<TD>
<input type="checkbox" name="fact">
</TD>
<TD>
Project Name </TD>
<TD>
<input type="checkbox" name="prname">
</TD>
</TR>
<TR>
<TD>
Project Description </TD>
<TD>
<input type="checkbox" name="prdescription">
</TD>
<TD>
Project Specification </TD>
<TD>
<input type="checkbox" name="prspecification">
</TD>
</TR>
<TR>
<TD>
Builder Details </TD>
<TD>
<input type="checkbox" name="builderdetails">
</TD>
<TD>
TCO/Payment Plan</TD>
<TD>
<input type="checkbox" name="tco">
</TD>
</TR>
<TR>
<TD>
NP Deactivation </TD>
<TD>
<input type="checkbox" name="npdeactivation">
</TD>
<TD>
Construction Images </TD>
<TD>
<input type="checkbox" name="constuctionimages">
</TD>
</TR>
<TR>
<TD>
Brochure </TD>
<TD>
<input type="checkbox" name="brochure">
</TD>
<TD>
Rera Available </TD>
<TD>
<input type="checkbox" name="rera">
</TD>
</TR>
<TR>
<TD>
Ticket Number </TD>
<TD><input type="text" name="ticketnumber">
</TD>
<TD>
XID Number </TD>
<TD>
<input type="text" name="xidnumber">
</TD>
<TD>
Rera Number </TD>
<TD>
<input type="text" name="reranumber">
</TD>
</TABLE>
<input type="submit" value="submit"><br>{{msg}}
</form>
</body>
</html>
Can anyone please suggest some solutions?
Actually I tried running your code and I'll get the values of textbox only if all the above checkbox is checked.
Actually you are catching the attribute exception but not handling it properly.In your code if any of the checkbox goes unchecked it'll give the exception 'NoneType' object has no attribute 'replace'.which is not handled and because of that it will not execute the next lines of code.
My suggestion on working with checkbox is make a hidden type with same name and value='off' so if it is unchecked it'll give the off value.
<input type='hidden' name="checkbox_name" value="off">
request.form is a dictionary, so you can check if your checkbox has been checked like this:
sale = 'checked' if 'sale' in request.form else 'not_checked'

How to select only one tag to fill date frame in a scrape?

In Python 3, I need to scrape tables on a website:
from bs4 import BeautifulSoup
import requests
import pandas as pd
res = requests.get("http://portal.stf.jus.br/processos/listarPartes.asp?termo=paulo%20salim%20maluf")
soup = BeautifulSoup(res.text, "lxml")
parts = soup.find_all('table', {'class': 'table m-b-0'})
print(parts)
[<table class="table m-b-0"> <th>Identificação</th> <th>Número Único</th> <th>Data Autuação</th> <th>Meio</th> <th>Publicidade</th> <tr> <td>Inq 138</td> <td>0000344-45.1983.0.01.0000</td> <td>29/04/1983</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 170</td> <td>0000243-71.1984.0.01.0000</td> <td>23/03/1984</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 202</td> <td>0000199-18.1985.0.01.0000</td> <td>26/02/1985</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 228</td> <td>0001497-45.1985.0.01.0000</td> <td>04/11/1985</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 229</td> <td>0001526-95.1985.0.01.0000</td> <td>11/11/1985</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AP 449</td> <td>0004490-89.2007.0.01.0000</td> <td>14/08/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 351</td> <td>0001562-69.1987.0.01.0000</td> <td>09/09/1987</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 1430</td> <td>0004092-60.1998.0.01.0000</td> <td>04/12/1998</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 1644</td> <td>0002280-12.2000.0.01.0000</td> <td>27/06/2000</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 1249</td> <td>0003249-66.1996.0.01.0000</td> <td>18/11/1996</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 1268</td> <td>0003587-40.1996.0.01.0000</td> <td>20/12/1996</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 2518</td> <td>0001854-53.2007.0.01.0000</td> <td>16/04/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 1901</td> <td>0000494-25.2003.0.01.0000</td> <td>12/02/2003</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 2469</td> <td>0000746-86.2007.0.01.0000</td> <td>21/02/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 1837</td> <td>0002951-64.2002.0.01.0000</td> <td>16/08/2002</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Rcl 2980</td> <td>0004831-23.2004.0.01.0000</td> <td>23/11/2004</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Rcl 2984</td> <td>0004822-61.2004.0.01.0000</td> <td>23/11/2004</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Pet 2188</td> <td>0003954-25.2000.0.01.0000</td> <td>14/11/2000</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Rcl 3338</td> <td>0002065-60.2005.0.01.0000</td> <td>18/05/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Pet 3923</td> <td>0001447-47.2007.0.01.0000</td> <td>23/03/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Pet 3891</td> <td>0000957-25.2007.0.01.0000</td> <td>05/03/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Pet 3960</td> <td>0002077-06.2007.0.01.0000</td> <td>26/04/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Pet 4466</td> <td>0006782-13.2008.0.01.0000</td> <td>07/11/2008</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Pet 4132</td> <td>0004655-39.2007.0.01.0000</td> <td>23/08/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Pet 4133</td> <td>0004636-33.2007.0.01.0000</td> <td>23/08/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Rcl 4899</td> <td>0000244-50.2007.0.01.0000</td> <td>18/01/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>MS 26863</td> <td>0004617-27.2007.0.01.0000</td> <td>22/08/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>HC 72731</td> <td>0001236-31.1995.0.01.0000</td> <td>18/05/1995</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 77205</td> <td> </td> <td>02/08/1973</td><td>Físico</td><td>Público</td> </tr> <tr> <td>HC 86828</td> <td>0004531-27.2005.0.01.0000</td> <td>29/09/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>HC 86759</td> <td>0004378-91.2005.0.01.0000</td> <td>22/09/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>HC 86964</td> <td>0004857-84.2005.0.01.0000</td> <td>18/10/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>HC 86991</td> <td>0004913-20.2005.0.01.0000</td> <td>20/10/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 93293</td> <td> </td> <td>23/09/1980</td><td>Físico</td><td>Público</td> </tr> <tr> <td>HC 97511</td> <td>0000296-75.2009.0.01.0000</td> <td>19/01/2009</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 148897</td> <td> </td> <td>27/11/1992</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 176743</td> <td> </td> <td>02/10/1995</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 223221</td> <td> </td> <td>09/07/1998</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 242546</td> <td> </td> <td>21/12/1998</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 323305</td> <td> </td> <td>01/12/2000</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 446706</td> <td> </td> <td>02/05/2003</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 457100</td> <td> </td> <td>24/06/2003</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 495324</td> <td> </td> <td>13/02/2004</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 489263</td> <td> </td> <td>07/01/2004</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 495714</td> <td> </td> <td>17/02/2004</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 548966</td> <td> </td> <td>07/06/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 540712</td> <td> </td> <td>19/03/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 577771</td> <td> </td> <td>13/02/2008</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 574636</td> <td> </td> <td>31/12/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 564510</td> <td> </td> <td>06/10/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 556727</td> <td> </td> <td>30/11/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 564740</td> <td> </td> <td>06/10/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 571596</td> <td> </td> <td>21/11/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 571366</td> <td> </td> <td>19/11/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 570742</td> <td> </td> <td>12/11/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 610979</td> <td> </td> <td>04/10/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 613045</td> <td> </td> <td>10/10/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 620055</td> <td> </td> <td>06/11/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 613330</td> <td> </td> <td>18/10/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 613011</td> <td> </td> <td>04/10/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 614436</td> <td> </td> <td>16/10/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 611851</td> <td> </td> <td>04/10/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 607605</td> <td> </td> <td>22/09/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 596356</td> <td> </td> <td>28/07/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 631129</td> <td> </td> <td>03/12/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 632771</td> <td> </td> <td>10/12/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 668455</td> <td> </td> <td>28/06/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 623286</td> <td> </td> <td>10/11/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 623444</td> <td> </td> <td>14/11/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 656365</td> <td> </td> <td>23/04/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 634273</td> <td> </td> <td>29/12/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 637744</td> <td> </td> <td>17/12/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 635450</td> <td> </td> <td>11/12/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 688217</td> <td> </td> <td>22/10/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 685015</td> <td> </td> <td>02/10/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 688019</td> <td> </td> <td>22/10/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 673117</td> <td> </td> <td>20/07/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 651181</td> <td> </td> <td>19/03/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 649978</td> <td> </td> <td>09/03/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 720168</td> <td> </td> <td>25/06/2008</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 720220</td> <td> </td> <td>26/06/2008</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 712490</td> <td> </td> <td>25/04/2008</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 675174</td> <td> </td> <td>03/08/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 696448</td> <td> </td> <td>19/12/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 689113</td> <td> </td> <td>26/10/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 697555</td> <td> </td> <td>07/01/2008</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 708183</td> <td> </td> <td>25/03/2008</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 739587</td> <td> </td> <td>28/12/2008</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 526303</td> <td> </td> <td>07/12/2004</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 525709</td> <td> </td> <td>03/01/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AR 1178</td> <td>0000716-91.1983.0.01.0000</td> <td>01/09/1983</td><td>Físico</td><td>Público</td> </tr> <tr> <td>MS 26865</td> <td>0004635-48.2007.0.01.0000</td> <td>22/08/2007</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 130440</td> <td> </td> <td>26/11/1992</td><td>Físico</td><td>Público</td> </tr> </table>, <table class="table m-b-0"> <th>Identificação</th> <th>Número Único</th> <th>Data Autuação</th> <th>Meio</th> <th>Publicidade</th> <tr> <td>RE 479887</td> <td> </td> <td>10/02/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 575141</td> <td> </td> <td>30/01/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 575232</td> <td> </td> <td>06/10/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 575233</td> <td> </td> <td>07/10/2005</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 576333</td> <td> </td> <td>16/01/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 612917</td> <td> </td> <td>12/10/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 602157</td> <td> </td> <td>26/08/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 602232</td> <td> </td> <td>29/08/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 606013</td> <td> </td> <td>06/09/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 594486</td> <td> </td> <td>29/06/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 596041</td> <td> </td> <td>18/07/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 600394</td> <td> </td> <td>21/08/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 634236</td> <td> </td> <td>08/12/2006</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 758671</td> <td> </td> <td>18/06/2009</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 755758</td> <td> </td> <td>03/06/2009</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 745737</td> <td> </td> <td>13/03/2009</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 763900</td> <td> </td> <td>30/07/2009</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 756173</td> <td> </td> <td>04/06/2009</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 777535</td> <td> </td> <td>23/11/2009</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 793282</td> <td> </td> <td>24/03/2010</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 779401</td> <td> </td> <td>06/12/2009</td><td>Eletrônico</td><td>Público</td> </tr> <tr> <td>AI 777535</td> <td> </td> <td>23/11/2009</td><td>Físico</td><td>Público</td> </tr> <tr> <td>RE 605763</td> <td> </td> <td>10/11/2009</td><td>Eletrônico</td><td>Público</td> </tr> <tr> <td>AI 824064</td> <td> </td> <td>28/10/2010</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 811146</td> <td> </td> <td>05/08/2010</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AI 833777</td> <td> </td> <td>05/01/2011</td><td>Físico</td><td>Público</td> </tr> <tr> <td>ARE 647372</td> <td> </td> <td>28/06/2011</td><td>Físico</td><td>Público</td> </tr> <tr> <td>ARE 783482</td> <td> </td> <td>08/11/2013</td><td>Eletrônico</td><td>Público</td> </tr> <tr> <td>Inq 3601</td> <td>9930137-38.2013.1.00.0000</td> <td>29/01/2013</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Inq 3638</td> <td>9955034-33.2013.1.00.0000</td> <td>22/03/2013</td><td>Físico</td><td>Público</td> </tr> <tr> <td>Pet 5019</td> <td>9985912-29.2012.0.01.0000</td> <td>14/12/2012</td><td>Físico</td><td>Público</td> </tr> <tr> <td>ARE 730457</td> <td> </td> <td>14/01/2013</td><td>Físico</td><td>Público</td> </tr> <tr> <td>AP 968</td> <td>9930137-38.2013.1.00.0000</td> <td>17/11/2015</td><td>Físico</td><td>Público</td> </tr> <tr> <td>ARE 962505</td> <td> </td> <td>11/04/2016</td><td>Físico</td><td>Público</td> </tr> <tr> <td>ARE 1092382</td> <td>2029851-65.2014.8.26.0000</td> <td>16/11/2017</td><td>Eletrônico</td><td>Público</td> </tr> </table>, <table class="table m-b-0"> <th>Identificação</th> <th>Número Único</th> <th>Data Autuação</th> <th>Meio</th> <th>Publicidade</th> <tr> <td>HC 151913</td> <td>0016064-06.2017.1.00.0000</td> <td>21/12/2017</td><td>Eletrônico</td><td>Público</td> </tr> <tr> <td>AC 4373</td> <td>0016050-22.2017.1.00.0000</td> <td>20/12/2017</td><td>Eletrônico</td><td>Público</td> </tr> <tr> <td>HC 152707</td> <td>0065290-43.2018.1.00.0000</td> <td>01/02/2018</td><td>Eletrônico</td><td>Público</td> </tr> <tr> <td>HC 151919</td> <td>0016071-95.2017.1.00.0000</td> <td>21/12/2017</td><td>Eletrônico</td><td>Público</td> </tr> <tr> <td>HC 152016</td> <td>0016194-93.2017.1.00.0000</td> <td>27/12/2017</td><td>Eletrônico</td><td>Público</td> </tr> <tr> <td>HC 152385</td> <td>0064793-29.2018.1.00.0000</td> <td>16/01/2018</td><td>Eletrônico</td><td>Público</td> </tr> <tr> <td>ARE 1104590</td> <td>0400459-17.1996.8.26.0053</td> <td>29/01/2018</td><td>Eletrônico</td><td>Público</td> </tr> </table>]
This site has three tables (table class = "table m-b-0")
Each has the headers "Identificacao", "Numero_Unico", "Data_Autuacao", "Meio" and "Publicidade"
As print shows, the information I'm looking for is in several "td"
My idea was to capture all and then iterate through them ao populate a dataframe with the the value in the rows, with the use of pandas
With values like these:
identificacao = 'Inq 138'
identificacao_link = 'http://portal.stf.jus.br/processos/detalhe.asp?incidente=1455705'
numero_unico = '0000344-45.1983.0.01.0000'
data_autuacao = '29/04/1983'
meio = 'Físico'
publicidade = 'Público'
...
...
But I got this error message when I tried to capture the "td":
lines = parts.find_all('td')
AttributeError Traceback (most recent call last)
<ipython-input-21-3885de8e7363> in <module>()
----> 1 lines = parts.find_all('td')
~/Documentos/Code/raspa/lib/python3.6/site-packages/bs4/element.py in __getattr__(self, key)
1805 def __getattr__(self, key):
1806 raise AttributeError(
-> 1807 "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
1808 )
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
Please, does anyone know a method for capturing all for iteration next?
As the error message states you are working with a ResultSet here, that doesn't have the attribute find_all.
In other words: You can't call find_all on your variable parts.
You need to iterate over parts and call find_all on its members for this to work.
for part in parts:
lines = part.find_all('td')

Extracting table data from html with python and BeautifulSoup

I'm new with python and beautifulsopu lib. I have tried many things, but no luck.
My html code could be like:
<form method = "post" id="FORM1" name="FORM1">
<table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc">
<tr>
<td class="producto"><b>Club</b><br>
<input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55">
</td>
<tr>
<td colspan="2" class="producto"><b>Nombre Equipo</b><br>
<input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80">
</td>
</tr>
<tr>
<td class="producto"><b>Telefono fijo</b><br>
<input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10">
</td
and I need JUST to take what is within <"b"><"/b"> and its "input value" .
Many thanks!!
First find() your form by id, then find_all() inputs inside and get the value of value attribute:
from bs4 import BeautifulSoup
data = """<form method = "post" id="FORM1" name="FORM1">
<table cellpadding=0 cellspacing=1 border=0 align="center" bgcolor="#cccccc">
<tr>
<td class="producto"><b>Club</b><br>
<input value="CLUB TENIS DE MESA PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtClub" size="60" maxlength="55">
</td>
<tr>
<td colspan="2" class="producto"><b>Nombre Equipo</b><br>
<input value="C.T.M. PORTOBAIL" disabled class="txtmascaraform" type="TEXT" name="txtNomEqu" size="100" maxlength="80">
</td>
</tr>
<tr>
<td class="producto"><b>Telefono fijo</b><br>
<input value="63097005534" disabled class="txtmascaraform" type="TEXT" name="txtTelf" size="15" maxlength="10">
</td>
</tr>
</table>
</form>"""
soup = BeautifulSoup(data)
form = soup.find("form", {'id': "FORM1"})
print [item.get('value') for item in form.find_all('input')]
# UPDATE for getting table cell values
table = form.find("table")
print [item.text.strip() for item in table.find_all('td')]
prints:
['CLUB TENIS DE MESA PORTOBAIL', 'C.T.M. PORTOBAIL', '63097005534']
[u'Club', u'Nombre Equipo', u'Telefono fijo']

Categories

Resources