Related
I have a table like this (old html):
<table>
<!-- Begin Table Body -->
<tr style="background: #eeeeee" valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">U.S. federal statutory income tax rate</div></td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">35.0</td>
<td nowrap="">%</td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">35.0</td>
<td nowrap="">%</td>
</tr>
<tr valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">Federal income tax at statutory rate</div></td>
<td> </td>
<td align="right" nowrap="">$</td>
<td align="right">(2,813</td>
<td nowrap="">)</td>
<td> </td>
<td align="right">$</td>
<td align="right">5,834</td>
<td> </td>
</tr>
<tr style="background: #eeeeee" valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">State and local income taxes, net of federal income tax effect</div></td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">(733</td>
<td nowrap="">)</td>
<td> </td>
<td> </td>
<td align="right">812</td>
<td> </td>
</tr>
<tr style="font-size: 1px">
<td><div style="margin-left:10px; text-indent:-10px"> </div></td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="1"/> </td>
<td> </td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="1"/> </td>
<td> </td>
</tr>
<tr valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">Provision (benefit) for income taxes</div></td>
<td> </td>
<td align="right" nowrap="">$</td>
<td align="right">(3,546</td>
<td nowrap="">)</td>
<td> </td>
<td align="right">$</td>
<td align="right">6,646</td>
<td> </td>
</tr>
<tr style="font-size: 1px">
<td><div style="margin-left:10px; text-indent:-10px"> </div></td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="4"/> </td>
<td> </td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="4"/> </td>
<td> </td>
</tr>
<tr style="background: #eeeeee" valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">Effective income tax rate</div></td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">44.1</td>
<td nowrap="">%</td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">39.9</td>
<td nowrap="">%</td>
</tr>
<!-- End Table Body -->
</table>
and I want it to look like:
U.S. federal statutory income tax rate 35.0% 35.0%
Federal income tax at statutory rate $(2,813) $5,834
State and local income taxes, net of federal income tax effect (733) 812
Provision (benefit) for income taxes $(3,546) $6,646
Effective income tax rate 44.1% 39.9%
I have two problems getting from the code to the code above to the table below:
1. there are empty cells like
2. some values are distributed over cells
I want to get rid of the empty cells by decomposing them and concatenate some cells like (2,813 and ) or 44.1 and %
I tried the following code for decomposing but it does not work and I have no clue how to concatenate cells in BeautifulSoup:
s= """<table>
<!-- Begin Table Body -->
<tr style="background: #eeeeee" valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">U.S. federal statutory income tax rate</div></td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">35.0</td>
<td nowrap="">%</td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">35.0</td>
<td nowrap="">%</td>
</tr>
<tr valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">Federal income tax at statutory rate</div></td>
<td> </td>
<td align="right" nowrap="">$</td>
<td align="right">(2,813</td>
<td nowrap="">)</td>
<td> </td>
<td align="right">$</td>
<td align="right">5,834</td>
<td> </td>
</tr>
<tr style="background: #eeeeee" valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">State and local income taxes, net of federal income tax effect</div></td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">(733</td>
<td nowrap="">)</td>
<td> </td>
<td> </td>
<td align="right">812</td>
<td> </td>
</tr>
<tr style="font-size: 1px">
<td><div style="margin-left:10px; text-indent:-10px"> </div></td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="1"/> </td>
<td> </td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="1"/> </td>
<td> </td>
</tr>
<tr valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">Provision (benefit) for income taxes</div></td>
<td> </td>
<td align="right" nowrap="">$</td>
<td align="right">(3,546</td>
<td nowrap="">)</td>
<td> </td>
<td align="right">$</td>
<td align="right">6,646</td>
<td> </td>
</tr>
<tr style="font-size: 1px">
<td><div style="margin-left:10px; text-indent:-10px"> </div></td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="4"/> </td>
<td> </td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="4"/> </td>
<td> </td>
</tr>
<tr style="background: #eeeeee" valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">Effective income tax rate</div></td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">44.1</td>
<td nowrap="">%</td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">39.9</td>
<td nowrap="">%</td>
</tr>
<!-- End Table Body -->
</table>"""
soup = bs(s, "lxml")
table = soup.find('table')
for row in table.find_all('tr'):
for cell in row.find_all('td'):
if cell.text=='':
cell.decompose()
df = pd.read_html(str(soup))
print(df)
Provided you can isolate the right table then just loop the trs within attribute valign and concantenate the tds where != ' '
from bs4 import BeautifulSoup as bs
html = '''<table>
<!-- Begin Table Body -->
<tr style="background: #eeeeee" valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">U.S. federal statutory income tax rate</div></td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">35.0</td>
<td nowrap="">%</td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">35.0</td>
<td nowrap="">%</td>
</tr>
<tr valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">Federal income tax at statutory rate</div></td>
<td> </td>
<td align="right" nowrap="">$</td>
<td align="right">(2,813</td>
<td nowrap="">)</td>
<td> </td>
<td align="right">$</td>
<td align="right">5,834</td>
<td> </td>
</tr>
<tr style="background: #eeeeee" valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">State and local income taxes, net of federal income tax effect</div></td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">(733</td>
<td nowrap="">)</td>
<td> </td>
<td> </td>
<td align="right">812</td>
<td> </td>
</tr>
<tr style="font-size: 1px">
<td><div style="margin-left:10px; text-indent:-10px"> </div></td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="1"/> </td>
<td> </td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="1"/> </td>
<td> </td>
</tr>
<tr valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">Provision (benefit) for income taxes</div></td>
<td> </td>
<td align="right" nowrap="">$</td>
<td align="right">(3,546</td>
<td nowrap="">)</td>
<td> </td>
<td align="right">$</td>
<td align="right">6,646</td>
<td> </td>
</tr>
<tr style="font-size: 1px">
<td><div style="margin-left:10px; text-indent:-10px"> </div></td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="4"/> </td>
<td> </td>
<td> </td>
<td> </td>
<td align="right"><hr noshade="" size="4"/> </td>
<td> </td>
</tr>
<tr style="background: #eeeeee" valign="bottom">
<td><div style="margin-left:10px; text-indent:-10px">Effective income tax rate</div></td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">44.1</td>
<td nowrap="">%</td>
<td> </td>
<td align="right" nowrap=""> </td>
<td align="right">39.9</td>
<td nowrap="">%</td>
</tr>
<!-- End Table Body -->
</table>'''
soup = bs(html, 'lxml')
for tr in soup.select('table tr[valign]'):
print(' '.join([td.text for td in tr.select('td') if td.text != ' ']))
Actually, I am able to select day and put the value.
Already try use some solution from other link :
1. Getting availability from datepicker
2. Python Selenium Date Picker
3. Python & Selenium Cannot select date in datepicker
When try to select month and year still no lock to get the result
Below my code :
start_date = wait.until(EC.visibility_of_element_located((
By.CSS_SELECTOR, "#departureDate_i")))
start_date.click() #Show Datepciker
browser.execute_script("document.getElementsByClassName('next')[0].click()")
current_month = browser.find_element_by_css_selector(".datepicker-months").text
print("current_month:", current_month)
Below HTML format :
<div class="datepicker datepicker-dropdown dropdown-menu datepicker-orient-left datepicker-orient-top" style="display: none; top: 176.4px; left: 448.667px;">
<div class="datepicker-days" style="display: block;">
<table class=" table-condensed">
<thead>
<tr>
<th class="prev" style="visibility: hidden;"></th>
<th colspan="5" class="datepicker-switch">January 2019</th>
<th class="next" style="visibility: visible;"></th>
</tr>
<tr>
<th class="dow">Su</th>
<th class="dow">Mo</th>
<th class="dow">Tu</th>
<th class="dow">We</th>
<th class="dow">Th</th>
<th class="dow">Fr</th>
<th class="dow">Sa</th>
</tr>
</thead>
<tbody>
<tr>
<td class="day disabled old">30</td>
<td class="day disabled old">31</td>
<td class="day disabled">1</td>
<td class="day disabled">2</td>
<td class="day">3</td>
<td class="day today">4</td>
<td class="day">5</td>
</tr>
<tr>
<td class="day">6</td>
<td class="day">7</td>
<td class="day">8</td>
<td class="day">9</td>
<td class="day">10</td>
<td class="day">11</td>
<td class="day">12</td>
</tr>
<tr>
<td class="day">13</td>
<td class="day">14</td>
<td class="day">15</td>
<td class="day active">16</td>
<td class="day">17</td>
<td class="day">18</td>
<td class="day">19</td>
</tr>
<tr>
<td class="day">20</td>
<td class="day">21</td>
<td class="day">22</td>
<td class="day">23</td>
<td class="day">24</td>
<td class="day">25</td>
<td class="day">26</td>
</tr>
<tr>
<td class="day">27</td>
<td class="day">28</td>
<td class="day">29</td>
<td class="day">30</td>
<td class="day">31</td>
<td class="day new">1</td>
<td class="day new">2</td>
</tr>
<tr>
<td class="day new">3</td>
<td class="day new">4</td>
<td class="day new">5</td>
<td class="day new">6</td>
<td class="day new">7</td>
<td class="day new">8</td>
<td class="day new">9</td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="7" class="today" style="display: none;">Today</th>
</tr>
<tr>
<th colspan="7" class="clear" style="display: none;">Clear</th>
</tr>
</tfoot>
</table>
</div>
<div class="datepicker-months" style="display: none;">
<table class="table-condensed">
<thead>
<tr>
<th class="prev" style="visibility: hidden;"></th>
<th colspan="5" class="datepicker-switch">2019</th>
<th class="next" style="visibility: visible;"></th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="7" style=""><span class="month active">Jan</span><span class="month">Feb</span><span class="month">Mar</span><span class="month">Apr</span><span class="month">May</span><span class="month">Jun</span><span class="month">Jul</span><span class="month">Aug</span><span class="month">Sep</span><span class="month">Oct</span><span class="month">Nov</span><span class="month">Dec</span></td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="7" class="today" style="display: none;">Today</th>
</tr>
<tr>
<th colspan="7" class="clear" style="display: none;">Clear</th>
</tr>
</tfoot>
</table>
</div>
<div class="datepicker-years" style="display: none;">
<table class="table-condensed">
<thead>
<tr>
<th class="prev" style="visibility: hidden;"></th>
<th colspan="5" class="datepicker-switch">2010-2019</th>
<th class="next" style="visibility: visible;"></th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="7"><span class="year old disabled">2009</span><span class="year disabled">2010</span><span class="year disabled">2011</span><span class="year disabled">2012</span><span class="year disabled">2013</span><span class="year disabled">2014</span><span class="year disabled">2015</span><span class="year disabled">2016</span><span class="year disabled">2017</span><span class="year disabled">2018</span><span class="year active">2019</span><span class="year new">2020</span></td>
</tr>
</tbody>
<tfoot>
<tr>
<th colspan="7" class="today" style="display: none;">Today</th>
</tr>
<tr>
<th colspan="7" class="clear" style="display: none;">Clear</th>
</tr>
</tfoot>
</table>
</div>
</div>
much appreciate for suggest how to handle it
Thank you
I'm trying to add a 3rd axis or 2nd Y-axis to the group chart. I'm not sure if it is possible.
Ideally, I want to -
1) add a line to this chart, which represents the "percentage of Arrest" made for the given year and a crime type.
2) sort the bars with each group using a value of column "rank" from the data.
Here is my code and the current visualization. Your valuable feedback is much appreciated. Thank you.
import altair as alt
base = alt.Chart().encode(
x=alt.X('primary_type',scale=alt.Scale(rangeStep=12),title=None,sort=alt.EncodingSortField(op='sum', field='rank')),
color=alt.Color('primary_type:N')
)
bar = base.mark_bar().encode(
alt.Y('sum(Number_of_Incidents):Q',title='Total Number of Incidents')
)
line = base.mark_line(color='red').encode(
alt.Y('percent_arrest',
axis=alt.Axis(title=None))
)
combined = alt.layer(bar, line, data=q13a)
combined.facet(
column=alt.Column('year')
).resolve_scale(x='independent'
).configure_view(
stroke='transparent'
)
Sample Data -
<table class="table table-bordered table-hover table-condensed">
<thead><tr><th title="Field #1">year</th>
<th title="Field #2">primary_type</th>
<th title="Field #3">Number_of_Incidents</th>
<th title="Field #4">number_of_arrests</th>
<th title="Field #5">percent_arrest</th>
<th title="Field #6">rank</th>
</tr></thead>
<tbody><tr>
<td align="right">2018</td>
<td>THEFT</td>
<td align="right">57330</td>
<td align="right">5503</td>
<td align="right">9.6</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2018</td>
<td>BATTERY</td>
<td align="right">44667</td>
<td align="right">8886</td>
<td align="right">19.89</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2018</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">24889</td>
<td align="right">1498</td>
<td align="right">6.02</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2018</td>
<td>ASSAULT</td>
<td align="right">18229</td>
<td align="right">2931</td>
<td align="right">16.08</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2018</td>
<td>DECEPTIVE PRACTICE</td>
<td align="right">15879</td>
<td align="right">713</td>
<td align="right">4.49</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2017</td>
<td>THEFT</td>
<td align="right">64334</td>
<td align="right">6459</td>
<td align="right">10.04</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2017</td>
<td>BATTERY</td>
<td align="right">49213</td>
<td align="right">10060</td>
<td align="right">20.44</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2017</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">29040</td>
<td align="right">1747</td>
<td align="right">6.02</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2017</td>
<td>ASSAULT</td>
<td align="right">19298</td>
<td align="right">3455</td>
<td align="right">17.9</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2017</td>
<td>DECEPTIVE PRACTICE</td>
<td align="right">18816</td>
<td align="right">805</td>
<td align="right">4.28</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2016</td>
<td>THEFT</td>
<td align="right">61600</td>
<td align="right">6518</td>
<td align="right">10.58</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2016</td>
<td>BATTERY</td>
<td align="right">50292</td>
<td align="right">10328</td>
<td align="right">20.54</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2016</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">31018</td>
<td align="right">1668</td>
<td align="right">5.38</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2016</td>
<td>ASSAULT</td>
<td align="right">18738</td>
<td align="right">3490</td>
<td align="right">18.63</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2016</td>
<td>DECEPTIVE PRACTICE</td>
<td align="right">18733</td>
<td align="right">815</td>
<td align="right">4.35</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2015</td>
<td>THEFT</td>
<td align="right">57335</td>
<td align="right">6771</td>
<td align="right">11.81</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2015</td>
<td>BATTERY</td>
<td align="right">48918</td>
<td align="right">11558</td>
<td align="right">23.63</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2015</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">28675</td>
<td align="right">1835</td>
<td align="right">6.4</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2015</td>
<td>NARCOTICS</td>
<td align="right">23883</td>
<td align="right">23875</td>
<td align="right">99.97</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2015</td>
<td>OTHER OFFENSE</td>
<td align="right">17552</td>
<td align="right">4795</td>
<td align="right">27.32</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2014</td>
<td>THEFT</td>
<td align="right">61561</td>
<td align="right">7415</td>
<td align="right">12.04</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2014</td>
<td>BATTERY</td>
<td align="right">49447</td>
<td align="right">12517</td>
<td align="right">25.31</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2014</td>
<td>NARCOTICS</td>
<td align="right">29116</td>
<td align="right">29000</td>
<td align="right">99.6</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2014</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">27798</td>
<td align="right">2095</td>
<td align="right">7.54</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2014</td>
<td>OTHER OFFENSE</td>
<td align="right">16979</td>
<td align="right">4159</td>
<td align="right">24.49</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2013</td>
<td>THEFT</td>
<td align="right">71530</td>
<td align="right">7727</td>
<td align="right">10.8</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2013</td>
<td>BATTERY</td>
<td align="right">54002</td>
<td align="right">12927</td>
<td align="right">23.94</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2013</td>
<td>NARCOTICS</td>
<td align="right">34127</td>
<td align="right">33819</td>
<td align="right">99.1</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2013</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">30853</td>
<td align="right">2107</td>
<td align="right">6.83</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2013</td>
<td>OTHER OFFENSE</td>
<td align="right">17993</td>
<td align="right">3400</td>
<td align="right">18.9</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2012</td>
<td>THEFT</td>
<td align="right">75460</td>
<td align="right">8249</td>
<td align="right">10.93</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2012</td>
<td>BATTERY</td>
<td align="right">59135</td>
<td align="right">13061</td>
<td align="right">22.09</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2012</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">35854</td>
<td align="right">2462</td>
<td align="right">6.87</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2012</td>
<td>NARCOTICS</td>
<td align="right">35488</td>
<td align="right">35226</td>
<td align="right">99.26</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2012</td>
<td>BURGLARY</td>
<td align="right">22843</td>
<td align="right">1285</td>
<td align="right">5.63</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2011</td>
<td>THEFT</td>
<td align="right">75148</td>
<td align="right">8468</td>
<td align="right">11.27</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2011</td>
<td>BATTERY</td>
<td align="right">60458</td>
<td align="right">14139</td>
<td align="right">23.39</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2011</td>
<td>NARCOTICS</td>
<td align="right">38605</td>
<td align="right">38544</td>
<td align="right">99.84</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2011</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">37332</td>
<td align="right">2583</td>
<td align="right">6.92</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2011</td>
<td>BURGLARY</td>
<td align="right">26619</td>
<td align="right">1272</td>
<td align="right">4.78</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2010</td>
<td>THEFT</td>
<td align="right">76754</td>
<td align="right">7844</td>
<td align="right">10.22</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2010</td>
<td>BATTERY</td>
<td align="right">65403</td>
<td align="right">14277</td>
<td align="right">21.83</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2010</td>
<td>NARCOTICS</td>
<td align="right">43393</td>
<td align="right">43294</td>
<td align="right">99.77</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2010</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">40653</td>
<td align="right">2641</td>
<td align="right">6.5</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2010</td>
<td>BURGLARY</td>
<td align="right">26422</td>
<td align="right">1382</td>
<td align="right">5.23</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2009</td>
<td>THEFT</td>
<td align="right">80973</td>
<td align="right">9900</td>
<td align="right">12.23</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2009</td>
<td>BATTERY</td>
<td align="right">68462</td>
<td align="right">16325</td>
<td align="right">23.85</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2009</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">47724</td>
<td align="right">3270</td>
<td align="right">6.85</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2009</td>
<td>NARCOTICS</td>
<td align="right">43543</td>
<td align="right">43193</td>
<td align="right">99.2</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2009</td>
<td>BURGLARY</td>
<td align="right">26766</td>
<td align="right">1412</td>
<td align="right">5.28</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2008</td>
<td>THEFT</td>
<td align="right">88433</td>
<td align="right">9291</td>
<td align="right">10.51</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2008</td>
<td>BATTERY</td>
<td align="right">75922</td>
<td align="right">15520</td>
<td align="right">20.44</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2008</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">52841</td>
<td align="right">3403</td>
<td align="right">6.44</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2008</td>
<td>NARCOTICS</td>
<td align="right">46507</td>
<td align="right">45459</td>
<td align="right">97.75</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2008</td>
<td>OTHER OFFENSE</td>
<td align="right">26533</td>
<td align="right">3496</td>
<td align="right">13.18</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2007</td>
<td>THEFT</td>
<td align="right">85156</td>
<td align="right">9783</td>
<td align="right">11.49</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2007</td>
<td>BATTERY</td>
<td align="right">79591</td>
<td align="right">19386</td>
<td align="right">24.36</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2007</td>
<td>NARCOTICS</td>
<td align="right">54454</td>
<td align="right">53251</td>
<td align="right">97.79</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2007</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">53749</td>
<td align="right">3994</td>
<td align="right">7.43</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2007</td>
<td>OTHER OFFENSE</td>
<td align="right">26863</td>
<td align="right">4230</td>
<td align="right">15.75</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2006</td>
<td>THEFT</td>
<td align="right">86240</td>
<td align="right">10108</td>
<td align="right">11.72</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2006</td>
<td>BATTERY</td>
<td align="right">80666</td>
<td align="right">18892</td>
<td align="right">23.42</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2006</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">57124</td>
<td align="right">4135</td>
<td align="right">7.24</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2006</td>
<td>NARCOTICS</td>
<td align="right">55813</td>
<td align="right">55236</td>
<td align="right">98.97</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2006</td>
<td>OTHER OFFENSE</td>
<td align="right">27100</td>
<td align="right">4010</td>
<td align="right">14.8</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2005</td>
<td>THEFT</td>
<td align="right">85685</td>
<td align="right">11338</td>
<td align="right">13.23</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2005</td>
<td>BATTERY</td>
<td align="right">83965</td>
<td align="right">19994</td>
<td align="right">23.81</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2005</td>
<td>NARCOTICS</td>
<td align="right">56234</td>
<td align="right">56121</td>
<td align="right">99.8</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2005</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">54548</td>
<td align="right">4083</td>
<td align="right">7.49</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2005</td>
<td>OTHER OFFENSE</td>
<td align="right">28028</td>
<td align="right">4726</td>
<td align="right">16.86</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2004</td>
<td>THEFT</td>
<td align="right">95463</td>
<td align="right">12068</td>
<td align="right">12.64</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2004</td>
<td>BATTERY</td>
<td align="right">87136</td>
<td align="right">20718</td>
<td align="right">23.78</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2004</td>
<td>NARCOTICS</td>
<td align="right">57060</td>
<td align="right">57034</td>
<td align="right">99.95</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2004</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">53164</td>
<td align="right">3965</td>
<td align="right">7.46</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2004</td>
<td>OTHER OFFENSE</td>
<td align="right">29532</td>
<td align="right">5386</td>
<td align="right">18.24</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2003</td>
<td>THEFT</td>
<td align="right">98875</td>
<td align="right">12889</td>
<td align="right">13.04</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2003</td>
<td>BATTERY</td>
<td align="right">88378</td>
<td align="right">20459</td>
<td align="right">23.15</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2003</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">55011</td>
<td align="right">4060</td>
<td align="right">7.38</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2003</td>
<td>NARCOTICS</td>
<td align="right">54288</td>
<td align="right">54283</td>
<td align="right">99.99</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2003</td>
<td>OTHER OFFENSE</td>
<td align="right">31147</td>
<td align="right">5856</td>
<td align="right">18.8</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2002</td>
<td>THEFT</td>
<td align="right">98327</td>
<td align="right">13697</td>
<td align="right">13.93</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2002</td>
<td>BATTERY</td>
<td align="right">94153</td>
<td align="right">21331</td>
<td align="right">22.66</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2002</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">55940</td>
<td align="right">4403</td>
<td align="right">7.87</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2002</td>
<td>NARCOTICS</td>
<td align="right">51789</td>
<td align="right">51781</td>
<td align="right">99.98</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2002</td>
<td>OTHER OFFENSE</td>
<td align="right">32599</td>
<td align="right">5701</td>
<td align="right">17.49</td>
<td align="right">5</td>
</tr>
<tr>
<td align="right">2001</td>
<td>THEFT</td>
<td align="right">99264</td>
<td align="right">15543</td>
<td align="right">15.66</td>
<td align="right">1</td>
</tr>
<tr>
<td align="right">2001</td>
<td>BATTERY</td>
<td align="right">93447</td>
<td align="right">20463</td>
<td align="right">21.9</td>
<td align="right">2</td>
</tr>
<tr>
<td align="right">2001</td>
<td>CRIMINAL DAMAGE</td>
<td align="right">55851</td>
<td align="right">4548</td>
<td align="right">8.14</td>
<td align="right">3</td>
</tr>
<tr>
<td align="right">2001</td>
<td>NARCOTICS</td>
<td align="right">50567</td>
<td align="right">50559</td>
<td align="right">99.98</td>
<td align="right">4</td>
</tr>
<tr>
<td align="right">2001</td>
<td>ASSAULT</td>
<td align="right">31384</td>
<td align="right">7150</td>
<td align="right">22.78</td>
<td align="right">5</td>
</tr>
</tbody></table>
The trouble is that, as far as I know, you cannot draw lines across charts. When creating a grouped bar chart, you have to facet across a column of your data. In effect, this produces several charts that are horizontally concatenated. So, for each chart you have only one point (for each color). If you want to have a line across years, you have to define your x axis to be years, and not facet it, and plot it separately. I would suggest vertical concatenation, to have the lines below the bars.
Note that I have taken the data from your previous question (How to create a nested Grouped Bar Chart using Altair? - Added sample data) because the way you provided it is not practical and I already had this one.
import altair as alt
import pandas as pd
from io import StringIO
q13a = pd.read_table(StringIO("""year primary_type Number_of_Incidents number_of_arrests percent_arrest rank
2018 THEFT 57330 5503 9.6 1
2018 BATTERY 44667 8886 19.89 2
2018 CRIMINAL DAMAGE 24889 1498 6.02 3
2018 ASSAULT 18229 2931 16.08 4
2018 DECEPTIVE PRACTICE 15879 713 4.49 5
2017 THEFT 64334 6459 10.04 1
2017 BATTERY 49213 10060 20.44 2
2017 CRIMINAL DAMAGE 29040 1747 6.02 3
2017 ASSAULT 19298 3455 17.9 4
2017 DECEPTIVE PRACTICE 18816 805 4.28 5
2016 THEFT 61600 6518 10.58 1
2016 BATTERY 50292 10328 20.54 2
2016 CRIMINAL DAMAGE 31018 1668 5.38 3
2016 ASSAULT 18738 3490 18.63 4
2016 DECEPTIVE PRACTICE 18733 815 4.35 5
2015 THEFT 57335 6771 11.81 1
2015 BATTERY 48918 11558 23.63 2
2015 CRIMINAL DAMAGE 28675 1835 6.4 3
2015 NARCOTICS 23883 23875 99.97 4
2015 OTHER OFFENSE 17552 4795 27.32 5
2014 THEFT 61561 7415 12.04 1
2014 BATTERY 49447 12517 25.31 2
2014 NARCOTICS 29116 29000 99.6 3
2014 CRIMINAL DAMAGE 27798 2095 7.54 4
2014 OTHER OFFENSE 16979 4159 24.49 5
2013 THEFT 71530 7727 10.8 1
2013 BATTERY 54002 12927 23.94 2
2013 NARCOTICS 34127 33819 99.1 3
2013 CRIMINAL DAMAGE 30853 2107 6.83 4
2013 OTHER OFFENSE 17993 3400 18.9 5"""))
bar = alt.Chart(height=200, width=100).mark_bar().encode(
x=alt.X('primary_type:N',
axis=None,
title=None,
sort=alt.EncodingSortField(op='sum', field='rank')),
y=alt.Y('sum(Number_of_Incidents):Q',
title='Total Number of Incidents'),
color=alt.Color('primary_type:N')
).facet(
column=alt.Column('year:O')
).resolve_scale(
x='independent'
)
line = alt.Chart().mark_line(point=True, color='red').encode(
x=alt.X('year:O', axis=alt.Axis(labelAngle=0)),
y=alt.Y('percent_arrest:Q'),
color=alt.Color('primary_type:N', legend=None)
).properties(height=80, width=680)
alt.vconcat(bar, line, data=q13a).configure_view(stroke='transparent')
Created on 2018-11-29 by the reprexpy package
I am trying to scrape a webpage with following url
https://www.bseindia.com/corporates/shpSecurities.aspx?scripcd=500209&qtrid=96.00
and I want to scrape a table with following html code. I have tried few things but not able to achieve the desired table to insert into csv.Here the <"tr"> tag is not closed for the data so segregating the data into different row is an issue.
Thanks for help
--J
<table border='0' width='900' align='center' cellspacing='1' cellpadding='4'>
<tr>
<td class='innertable_header1' rowspan='3'>Category of shareholder</td>
<td class='innertable_header1' rowspan='3'>Nos. of shareholders</td>
<td class='innertable_header1' rowspan='3'>No. of fully paid up equity shares held</td>
<td class='innertable_header1' rowspan='3'>No. of shares underlying Depository Receipts</td>
<td class='innertable_header1' rowspan='3'>Total nos. shares held</td>
<td class='innertable_header1' rowspan='3'>Shareholding as a % of total no. of shares (calculated as per SCRR, 1957)As a % of (A+B+C2)</td>
<td class='innertable_header1' rowspan='3'> Number of equity shares held in dematerialized form</td>
</tr>
<tr></tr>
<tr></tr>
<tr>
<td class='TTRow_left'>(A) Promoter & Promoter Group</td>
<td class='TTRow_right'>19</td>
<td class='TTRow_right'>28,17,02,889</td>
<td class='TTRow_right'></td>
<td class='TTRow_right'>28,17,02,889</td>
<td class='TTRow_right'>12.90</td>
<td class='TTRow_right'>28,17,02,889</td>
<tr>
<td class='TTRow_left'>(B) Public</td>
<td class='TTRow_right'>9,16,058</td>
<td class='TTRow_right'>1,87,81,45,362</td>
<td class='TTRow_right'>1,32,95,642</td>
<td class='TTRow_right'>1,89,14,41,004</td>
<td class='TTRow_right'>86.61</td>
<td class='TTRow_right'>1,88,74,40,959</td>
<tr>
<td class='TTRow_left'>(C1) Shares underlying DRs</td>
<td class='TTRow_right'></td>
<td class='TTRow_right'></td>
<td class='TTRow_right'></td>
<td class='TTRow_right'></td>
<td class='TTRow_right'>0.00</td>
<td class='TTRow_right'></td>
<tr>
<td class='TTRow_left'>(C2) Shares held by Employee Trust</td>
<td class='TTRow_right'>1</td>
<td class='TTRow_right'>1,08,05,896</td>
<td class='TTRow_right'></td>
<td class='TTRow_right'>1,08,05,896</td>
<td class='TTRow_right'>0.49</td>
<td class='TTRow_right'>1,08,05,896</td>
<tr>
<td class='TTRow_left'>(C) Non Promoter-Non Public</td>
<td class='TTRow_right'>1</td>
<td class='TTRow_right'>1,08,05,896</td>
<td class='TTRow_right'></td>
<td class='TTRow_right'>1,08,05,896</td>
<td class='TTRow_right'>0.49</td>
<td class='TTRow_right'>1,08,05,896</td>
<tr>
<td class='TTRow_left'>Grand Total</td>
<td class='TTRow_right'>9,16,078</td>
<td class='TTRow_right'>2,17,06,54,147</td>
<td class='TTRow_right'>1,32,95,642</td>
<td class='TTRow_right'>2,18,39,49,789</td>
<td class='TTRow_right'>100.00</td>
<td class='TTRow_right'>2,17,99,49,744</td>
</tr>
</table>
You can try this:
from bs4 import BeautifulSoup as soup
import urllib
import re
s = soup(str(urllib.urlopen('https://www.bseindia.com/corporates/shpSecurities.aspx?scripcd=500209&qtrid=96.00').read()), 'lxml')
results = filter(None, [re.sub('[\n\r]+|\s{2,}', '', i.text) for i in s.find_all('td', {'class':re.compile('TTRow_right|TTRow_left')})])
Output:
[u'(A) Promoter & Promoter Group', u'19', u'28,17,02,889', u'28,17,02,889', u'12.90', u'28,17,02,889', u'(B) Public', u'9,16,058', u'1,87,81,45,362', u'1,32,95,642', u'1,89,14,41,004', u'86.61', u'1,88,74,40,959', u'(C1) Shares underlying DRs', u'0.00', u'(C2) Shares held by Employee Trust', u'1', u'1,08,05,896', u'1,08,05,896', u'0.49', u'1,08,05,896', u'(C) Non Promoter-Non Public', u'1', u'1,08,05,896', u'1,08,05,896', u'0.49', u'1,08,05,896', u'Grand Total', u'9,16,078', u'2,17,06,54,147', u'1,32,95,642', u'2,18,39,49,789', u'100.00', u'2,17,99,49,744']
Hello all I am hoping to get some help with taking the tables in my HTML file and importing them into a csv file. I am very very new to web scraping so for give me if I am completely wrong with my code. The HTML file holds three separate table I am trying to extract; estimate, sampling error, and number of non-zero plots in estimate.
My code is shown below:
#import necessary libraries
import urllib2
import pandas as pd
#specify URL
table = "file:///C:/Users/TMccw/Anaconda2/FiaAPI/outFArea18.html"
#Query the website & return the html to the variable 'page'
page = urllib2.urlopen(table)
#import the bs4 functions to parse the data returned from the website
from bs4 import BeautifulSoup
#Parse the html in the 'page' variable & store it in bs4 format
soup = BeautifulSoup(page, 'html.parser')
#Print out the html code with the function prettify
print soup.prettify()
#Find the tables & check type
table2 = soup.find_all('table')
print(table2)
print type(table2)
#Create new table as a dataframe
new_table = pd.DataFrame(columns=range(0,4))
#Extract the info from the HTML code
soup.find('table').find_all('td'),{'align':'right'}
#Remove the tags and extract table info into CSV
???
Here is the HTML for the first table "Estimate":
` Estimate:
</b>
</caption>
<tr>
<td>
</td>
<td align="center" colspan="5">
<b>
Ownership group
</b>
</td>
</tr>
<tr>
<th>
<b>
Forest type group
</b>
</th>
<td>
<b>
Total
</b>
</td>
<td>
<b>
National Forest
</b>
</td>
<td>
<b>
Other federal
</b>
</td>
<td>
<b>
State and local
</b>
</td>
<td>
<b>
Private
</b>
</td>
</tr>
<tr>
<td nowrap="">
<b>
Total
</b>
</td>
<td align="right">
4,875,993
</td>
<td align="right">
195,438
</td>
<td align="right">
169,500
</td>
<td align="right">
392,030
</td>
<td align="right">
4,119,025
</td>
</tr>
<tr>
<td nowrap="">
<b>
White / red / jack pine group
</b>
</td>
<td align="right">
40,492
</td>
<td align="right">
3,426
</td>
<td align="right">
-
</td>
<td align="right">
10,850
</td>
<td align="right">
26,217
</td>
</tr>
<tr>
<td nowrap="">
<b>
Loblolly / shortleaf pine group
</b>
</td>
<td align="right">
38,267
</td>
<td align="right">
11,262
</td>
<td align="right">
997
</td>
<td align="right">
4,015
</td>
<td align="right">
21,993
</td>
</tr>
<tr>
<td nowrap="">
<b>
Other eastern softwoods group
</b>
</td>
<td align="right">
25,181
</td>
<td align="right">
-
</td>
<td align="right">
-
</td>
<td align="right">
-
</td>
<td align="right">
25,181
</td>
</tr>
<tr>
<td nowrap="">
<b>
Exotic softwoods group
</b>
</td>
<td align="right">
5,868
</td>
<td align="right">
-
</td>
<td align="right">
-
</td>
<td align="right">
662
</td>
<td align="right">
5,206
</td>
</tr>
<tr>
<td nowrap="">
<b>
Oak / pine group
</b>
</td>
<td align="right">
144,238
</td>
<td align="right">
9,592
</td>
<td align="right">
-
</td>
<td align="right">
21,475
</td>
<td align="right">
113,171
</td>
</tr>
<tr>
<td nowrap="">
<b>
Oak / hickory group
</b>
</td>
<td align="right">
3,480,272
</td>
<td align="right">
152,598
</td>
<td align="right">
123,900
</td>
<td align="right">
285,305
</td>
<td align="right">
2,918,470
</td>
</tr>
<tr>
<td nowrap="">
<b>
Oak / gum / cypress group
</b>
</td>
<td align="right">
76,302
</td>
<td align="right">
-
</td>
<td align="right">
12,209
</td>
<td align="right">
9,311
</td>
<td align="right">
54,782
</td>
</tr>
<tr>
<td nowrap="">
<b>
Elm / ash / cottonwood group
</b>
</td>
<td align="right">
652,001
</td>
<td align="right">
7,105
</td>
<td align="right">
25,431
</td>
<td align="right">
46,096
</td>
<td align="right">
573,369
</td>
</tr>
<tr>
<td nowrap="">
<b>
Maple / beech / birch group
</b>
</td>
<td align="right">
346,718
</td>
<td align="right">
10,871
</td>
<td align="right">
818
</td>
<td align="right">
12,748
</td>
<td align="right">
322,281
</td>
</tr>
<tr>
<td nowrap="">
<b>
Other hardwoods group
</b>
</td>
<td align="right">
21,238
</td>
<td align="right">
585
</td>
<td align="right">
-
</td>
<td align="right">
-
</td>
<td align="right">
20,653
</td>
</tr>
<tr>
<td nowrap="">
<b>
Exotic hardwoods group
</b>
</td>
<td align="right">
2,441
</td>
<td align="right">
-
</td>
<td align="right">
-
</td>
<td align="right">
-
</td>
<td align="right">
2,441
</td>
</tr>
<tr>
<td nowrap="">
<b>
Nonstocked
</b>
</td>
<td align="right">
42,975
</td>
<td align="right">
-
</td>
<td align="right">
6,144
</td>
<td align="right">
1,570
</td>
<td align="right">
35,261
</td>
</tr>
</table>
<br/>
<table border="4" cellpadding="4" cellspacing="4">
<caption>
<b>`
I made four tables almost identical to yours and put them into a fairly respectable page of HTML. Then I ran this code.
>>> import bs4
>>> import pandas as pd
>>> soup = bs4.BeautifulSoup(open('temp.htm').read(), 'html.parser')
>>> tables = soup.findAll('table')
>>> for t, table in enumerate(tables):
... df = pd.read_html(str(table), skiprows=2)
... df[0].to_csv('table%s.csv' % t)
The results were four files like this, named table0.csv through table3.csv.
,0,1,2,3,4,5
0,Total,4875993,195438,169500,392030,4119025
1,White / red / jack pine group,40492,3426,-,10850,26217
2,Loblolly / shortleaf pine group,38267,11262,997,4015,21993
3,Other eastern softwoods group,25181,-,-,-,25181
4,Exotic softwoods group,5868,-,-,662,5206
5,Oak / pine group,144238,9592,-,21475,113171
6,Oak / hickory group,3480272,152598,123900,285305,2918470
7,Oak / gum / cypress group,76302,-,12209,9311,54782
8,Elm / ash / cottonwood group,652001,7105,25431,46096,573369
9,Maple / beech / birch group,346718,10871,818,12748,322281
10,Other hardwoods group,21238,585,-,-,20653
11,Exotic hardwoods group,2441,-,-,-,2441
12,Nonstocked,42975,-,6144,1570,35261
Perhaps the main thing I should mention is that I skipped the same number of rows in each table that BeautifulSoup delivered. If the number of header lines in the tables varies then you will have to do something more clever or just discard lines in the output files and omit the skiprows parameter.
Unsure as to what the exact question is here but right off the bat I can see an error that will throw you off a bit.
new_table = pd.DataFrame(columns=range(0-4))
Needs to be
new_table = pd.DataFrame(columns=range(0,4))
The result of range(0-4) is actually range(-4) which evaluates to range(0,-4) whereas you want range(0,4). You can just pass range(4) as the parameter or range(0,4).