How to convert text article with keywords to pandas data frame

How to convert text article with keywords to pandas data frame - python

I have similar text files to below, about 5,000 times and I want to extract the text article to one df column and the keywords in a list to another df column. I need this to have more training data.
In below sample, the article I want to extract is everything from 'Addis Abeba' to 'private bank' and the keywords are all keywords after 'SUBJECT' without percentages in brackets.
Sample of the dataset:
Addis Fortune
February 2011
Declaration? AU Action Needed in Favour of Democracy [opinion]
LENGTH: 692 words
Addis Abeba has been hosting delegates and heads of state for the AU Summit. It
is encouraging to see leaders of Africa discussing issues of continental
importance that accelerate the process of integration and thereby put Africa in
a better bargaining position in its relations with the outside world.
Indeed, "United We Stand, Divided We Fall."
It is time that the AU took a bold step to ensure that the leaders of the
continent win the hearts and minds of their citizens. It should ensure the
existence of democratic governments, which, at a minimum, guarantee popular
participation based on an acceptance of political equality among all citizens,
respect for civil liberties, and meaningful checks and balances on the power of
the executive.
This is also indispensable to the realisation of the age-old dream of the
formation of the United States of Africa. Donor countries and organisations also
have moral obligations to extend much needed support in this aspect.
Dawit Haile is a loan officer at a private bank.
SUBJECT: HEADS OF STATE & GOVERNMENT (90%); ELECTIONS (90%); INTERNATIONAL
ASSISTANCE (89%); INTERNATIONAL RELATIONS (73%); GROSS DOMESTIC PRODUCT (70%);
ECONOMIC NEWS (70%); EMBEZZLEMENT (68%); ELECTION FRAUD (68%) Ethiopia;
International Organizations and Africa
GEOGRAPHIC: AFRICA (96%); EGYPT (93%); UNITED STATES (93%); CHINA (92%);
ETHIOPIA (79%); TUNISIA (79%); ISRAEL (79%) Africa
LOAD-DATE: February 8, 2011
LANGUAGE: ENGLISH
PUBLICATION-TYPE: Newspaper
Copyright 2011 AllAfrica Global Media.
All Rights Reserved
2 of 1352 DOCUMENTS
Addis Fortune
February 2011
Gebrekidan Beyene's Prosecutors Repeat Request for 25 Years
BYLINE: Eden Sahle
LENGTH: 815 words
During the appeals hearing last week of Gebrekidan Beyene, a.k.a. Morocco,
general manager and a shareholder of a private limited company by the same name,
prosecutors of the Ethiopian Revenues and Customs Authority (ERCA) requested
almost the same sentence they originally had, in August 2010: a maximum jail
term and confiscation of properties.
However, the lower court's decision to mitigate the sentence was correct and the
Appeals Bench should release Gebrekidan, either as a free man or on parole, the
defence argued. His good behaviour in prison and the investment he had made in
his country should be counted as mitigating circumstances, the lawyer claimed,
also counting the defendant's poor health in mitigation. The case was adjourned
for a verdict until May 2, 2011.
An alleged similar offence involving money laundering and loan sharking against
Ayalew Tesema, board chairman and major shareholder of Ayat Real Estate, is
underway at the Federal High Court.
SUBJECT: LITIGATION (91%); JUSTICE DEPARTMENTS (90%); BANKING & FINANCE (90%);
EXCISE & CUSTOMS (90%); LIMITED LIABILITY COMPANIES (90%); SENTENCING (90%);
APPEALS (89%); LAW COURTS & TRIBUNALS (89%); JAIL SENTENCING (89%); LAWYERS
(89%); VERDICTS (89%); SUPREME COURTS (89%); FINES & PENALTIES (89%);
SETTLEMENTS & DECISIONS (78%); CRIMINAL CONVICTIONS (78%); DECISIONS & RULINGS
(78%); PRISONS (77%); SUITS & CLAIMS (77%); VALUE ADDED TAX (77%); JUDGES (73%);
INCOME TAX (72%); MONEY LAUNDERING (69%); COUNTERFEITING (68%); INTEREST RATES
(55%); ECONOMIC NEWS (55%) Ethiopia; Legal and Judicial Affairs
GEOGRAPHIC: MOROCCO (90%)
LOAD-DATE: March 1, 2011
LANGUAGE: ENGLISH
PUBLICATION-TYPE: Newspaper
My expected result would be:
df
content keywords
1 'string article 1' [HEADS OF STATE & GOVERNMENT, ELECTIONS, ...]
2 'string article 2' [LITIGATION, JUSTICE DEPARTMENTS, ...]

Related

Trying to rearrange multiple columns in a dataframe based on ranking row values

I'm working on a matching company names and I have a dataframe that returns output in the format below.
The table has an original name and for each original name, there could be N number of matches. For each match, there are 3 columns, match_name_0, score_0, match_index_0 and so on up to match_name_N.
I'm trying to figure out a way to return a new dataframe that sorts the columns after the original_name by the highest match scores. Essentially, if match_score_2 was the highest then match_score_0 followed by match_score_1 the columns would be
original_score, match_name_2, match_score_2, match_index_2, match_name_0, match_score_0, match_index_0, match_name_2, match_score_2, match_index_2,
In the event of a tie, the leftmost match should be ranked higher. I should note that sometimes they will be in the correct order but 30-40% of the times, they are not.
I've been staring at my screen for 2 hours and totally stumped so any help is greatly appreciated
index
original_name
match_name_0
score_0
match_index_0
match_name_1
score_1
match_index_1
match_name_2
score_2
match_index_2
match_name_3
score_3
match_index_3
match_name_4
score_4
match_index_4
0
aberdeen asset management plc
aberdeen asset management sa
100
2114
aberdeen asset management plc esop
100
2128
aberdeen asset management inc
100
2123
aberdeen asset management spain
71.18779356
2132
aberdeen asset management ireland
69.50514818
2125
2
agi partners llc
agi partners llc
100
5274
agi partners llc
100
5273
agr partners llc
57.51100704
5378
aci partners llc
53.45090217
3097
avi partners llc
53.45090217
17630
3
alberta investment management corporation
alberta investment management corporation
100
6754
alberta investment management corporation pension arm
100
6755
anchor investment management corporation
17.50748486
10682
cbc investment management corporation
11.79760839
36951
harvest investment management corporation
31.70316571
85547

I am assuming you want to impose the ordering of matches first by score and then by match_number individually for each original_name.
Wide datasets are usually difficult to deal with, including this case. I suggest to reshape to a long dataset, where you can easily impose your required ordering by
sort_values(by=['original_name','score','match_number'], ascending=[True,False,True])
Finally, you can reshape it back to a wide dataset.
import pandas as pd
from io import StringIO
# sample data
df = """
original_name,match_name_0,score_0,match_index_0,match_name_1,score_1,match_index_1,match_name_2,score_2,match_index_2,match_name_3,score_3,match_index_3,match_name_4,score_4,match_index_4
aberdeen asset management plc,aberdeen asset management sa,100,2114,aberdeen asset management plc esop,100,2128,aberdeen asset management inc,100,2123,aberdeen asset management spain,71.18779356,2132,aberdeen asset management ireland,69.50514818,2125
agi partners llc,agi partners llc,100,5274,agi partners llc,100,5273,agr partners llc,57.51100704,5378,aci partners llc,53.45090217,3097,avi partners llc,53.45090217,17630
alberta investment management corporation,alberta investment management corporation,100,6754,alberta investment management corporation pension arm,100,6755,anchor investment management corporation,17.50748486,10682,cbc investment management corporation,11.79760839,36951,harvest investment management corporation,31.70316571,85547
"""
df= pd.read_csv(StringIO(df.strip()), sep=',', engine='python')
# wide to long
result = pd.wide_to_long(df, ['match_name','score','match_index'], i='original_name', j='match_number', sep='_').reset_index()
# sort matches as per requirement
result = result.sort_values(by=['original_name','score','match_number'], ascending=[True,False,True])
# overwrite ranking imposed by previous sort
# this ensures that the order is maintained once it is
# reshaped back to a wide dataset
result['match_number'] = result.groupby('original_name').cumcount()
# reshape long to wide
result = result.set_index(['original_name','match_number']).unstack()
# tidy up to match expected result
result = result.swaplevel(axis=1).sort_index(axis=1)
result = result.reindex(['match_name','score','match_index'], axis=1, level=1)
result.columns = [f'{col[1]}_{col[0]}' for col in result.columns]
As a result, for example, previous match 4 of alberta investment management corporation is now match 2 (based on score). The order of matches 3 and 4 for agi partners llc remain the same because they have the same score.

Python beautifulsoup extract all urls from a website search results. New Python beginner

I am attempting to extract all the urls from the search results of this website. It has 754 search results across 26 pages. https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/search/searchterm/Integrated%20Data%20Infrastructure%20(IDI)/field/projeb/mode/exact/conn/and
This is the code I wrote but it didn't get anything...sorry I am new to Python, can anyone give me some clue how I could be there? Many thanks
import requests
from bs4 import BeautifulSoup
url = 'https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/search/searchterm/Integrated%20Data%20Infrastructure%20(IDI)/field/projeb/mode/exact/conn/and'
reqs = requests.get(url,verify=False)
soup = BeautifulSoup(reqs.text, 'html.parser')
urls = []
for link in soup.find_all('a'):
print(link.get('href'))

There is 754 book, i show example with 35.
To get all the books, at the end of the url change 35 to 754
import pandas as pd
import requests
url = 'https://cdm20045.contentdm.oclc.org/digital/api/search/collection/p20045coll17/searchterm/Integrated%20Data%20Infrastructure%20(IDI)/field/projeb/mode/exact/conn/and/maxRecords/35'
response = requests.get(url)
books = []
for book in response.json()['items']:
books.append({
'link': ('https://cdm20045.contentdm.oclc.org' + book['itemLink']).replace('singleitem', 'digital'),
'title': book['metadataFields'][0]['value'],
'subjec': book['metadataFields'][1]['value'],
'date': book['metadataFields'][2]['value'],
'publis': book['metadataFields'][3]['value']
})
df = pd.DataFrame(books)
print(df.to_string())
OUTPUT:
link title subjec date publis
0 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1181 The Future of Work in New Zealand - An Empirical Examiniation (MAA2019-95) Business Practices; 2021 Auckland University of Technology;
1 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/993 In Fairness to Our Schools: Better measures for better outcomes\n Education 2019 The New Zealand Initiative
2 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1029 Dementia case-finding and prevalence estimation using routinely collected health data in the Integrated Data Infrastructure (IDI) (MAA2020-12) Health; 2020 University of Auckland
3 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1218 Investigating linkage bias in the IDI using education and census data (MAA2020-69) Meta-research; Education; 2020-12 University of Auckland;
4 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/757 Explaining Ethnic Differences in Student Success at University in New Zealand [MAA2018-09] Education and training 2018 Auckland University of Technology
5 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1357 Health and social harms from alcohol: what does NZ's data tell us? Health; 2022-03 University of Otago;
6 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1268 'What about the Menz?' Low employer attachment and ineligibility for partner parental leave Income and Work; People and Communities; 2021-08 Auckland Council; Social Wellbeing Agency;
7 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/936 Migrant Networks, Brain Waste and its Economic Impacts: Evidence from Immigrants in New Zealand [MAA2019-31] Employment; People and Communities 2019 University of Auckland
8 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1306 Pacific uptake of temporary work visas Employment; Business financials; People and Communities; 2020-05 NZIER; Ministry of Business, Innovation & Employment, MBIE; Ministry of Foreign Affairs and Trade;
9 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/123 Firm Productivity Growth and Skill (MAA2012-16) Income and work; Business practices 2015 Motu Economic and Public Policy Research
10 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/380 The Productivity Costs of Four Health Conditions in New Zealand [MAA2016-59] Health; Income and work 2016 University of Otago
11 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/732 Differential labour supply effects of pension eligibility to beneficiaries and non-beneficiaries [MAA2018-50] Income and Work; Benefits and Social Services; 2018 Auckland University of Technology;
12 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/189 Intergenerational Analyses Using the IDI People and communities 2017 COMPASS (The University of Auckland)
13 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/198 School to work: What matters? Education and Employment of Young People Born in 1991 Education and training 2016 Ministry of Education
14 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/975 Equity Index of socioeconomic disadvantage in education [MAA2019-85] Education; Benefits and Social Services; Income and Work 2019-12 Ministry of Education
15 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1225 Linkage bias in the IDI (MAA2020-58) Meta-research; 2020-11 University of Otago;
16 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/431 The relationship between exposure to the natural environment and children's health at different life stages [MAA2017-11] Health 2017 Massey University
17 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1025 Who are the 1M and 1X? Police engagement with citizens in mental distress (MAA2020-08) Justice; Health; 2020 University of Auckland
18 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1073 Who received the Wage Subsidy and Wage Subsidy Extension? (MAA2018-48) Benefits and Social Services; 2020 Ministry of Social Development, MSD;
19 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/929 Factsheet IDI first stage - Exploring work-related claims difference by Maori and non-Maori Business Practices; Health; Income and Work; People and Communities 2019-02 Worksafe New Zealand
20 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1138 Measuring Commute Patterns Over Time: Using administrative data to identify where employees live and work (MAA2018-55) Transport; Employment; 2020 Motu Economic and Public Policy Research; New Zealand Transport Agency;
21 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/835 Evaluating the Family Start Programme [MAA2018-87] People and Communities; Health; Education; Employment; Benefits and Social Services; Justice 2018-12 Ministry for Vulnerable Children Oranga Tamariki
22 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/980 Access to primary health care services for people in Canterbury with poor access: improving our understanding of people who are unenrolled or tenuously enrolled with a general practice team [MAA2019-51] Health 2019-11 Pegasus Health (Charitable) Limited
23 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/996 Intergenerational Analyses Using the IDI:\nAn update\n [MAA2016-53] Population 2020-03 COMPASS, University of Auckland;
24 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1351 Individualised Funding in Aotearoa Benefits and Social Services; 2020 Nicholson Consulting;
25 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/39 Comparing the Household Economic Survey to administrative records: an analysis of income and benefit receipt (MAA2015-27) Benefits and Social Services 2017 New Zealand Treasury
26 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1320 International students and graduates and their impact on the NZ housing market (MAA2017-31) Housing; Education and Training; 2021 Universities New Zealand;
27 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1002 The expression, experience and transcendence of low-skill in Aotearoa New Zealand (MAA2019-91) Education and Training; 2019 Auckland University of Technology
28 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/394 Comparison of NZDep2013 with Index of Multiple Deprivation (IMD2013) [MAA2017-70] Health; Income and Work; Housing; Justice 2017 University of Otago
29 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1005 Accessibility of Disability Support Services funding in NZ (MAA2019-102) Health; 2019 Nicholson Consulting;
30 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1135 Understanding Parental education and health of Pacific families: Background and study protocol: Parental Education and Pacific Health, study protocol (MAA2018-47) Education; People and Communities; Children; 2020 University of Otago;
31 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/47 Evaluation of the Impact of the Youth Service: Youth Payment and Young Parent Payment (MAA2013-16) Benefits and Social Services 2017 New Zealand Treasury
32 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/63 Using IDI data to estimate fiscal impacts of better social sector performance (MAA2013-16) Benefits and Social Services 2016 New Zealand Treasury
33 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/1040 Māori Student Transitions (MAA2020-35) Education and Training; Benefits and Social Services; Income and Work; Employment; 2020 Social Wellbeing Agency;
34 https://cdm20045.contentdm.oclc.org/digital/collection/p20045coll17/id/64 Financial wellbeing of older workers following injury: research utilising Statistics New Zealand’s Integrated Data Infrastructure Health; Income and work 2017 University of Otago

#bmcculley,already has stated that the required data is loaded dynamically via JS and Bs4 can render JS. So you have two options: use either selenium (more complex) or use API. As they are using API, so you can esily grab the required data from API as GET methosd as Json format data which is the robust way.
I've made the pagination using for loop and range function.
Example:
import requests
import pandas as pd
api_url = 'https://cdm20045.contentdm.oclc.org/digital/api/search/collection/p20045coll17/searchterm/Integrated%20Data%20Infrastructure%20(IDI)/field/projeb/mode/exact/conn/and/page/{page}/maxRecords/30'
data = []
for page in range(1,27):
r = requests.get(api_url.format(page=page))
for link in r.json()['items']:
url = 'https://cdm20045.contentdm.oclc.org' + link['itemLink'].replace('singleitem','digital')
data.append({
'URL':url
})
df = pd.DataFrame(data)
print(df)
Outout:
URL
0 https://cdm20045.contentdm.oclc.org/digital/co...
1 https://cdm20045.contentdm.oclc.org/digital/co...
2 https://cdm20045.contentdm.oclc.org/digital/co...
3 https://cdm20045.contentdm.oclc.org/digital/co...
4 https://cdm20045.contentdm.oclc.org/digital/co...
.. ...
749 https://cdm20045.contentdm.oclc.org/digital/co...
750 https://cdm20045.contentdm.oclc.org/digital/co...
751 https://cdm20045.contentdm.oclc.org/digital/co...
752 https://cdm20045.contentdm.oclc.org/digital/co...
753 https://cdm20045.contentdm.oclc.org/digital/co...
[754 rows x 1 columns]

Python/regex - Bypass table of contents when extracting text

I have a dataframe consisting of the following:
identifier
text
34678
0000950123-04-010521.txt : 20040901.....
87902
0000950123-04-010521.txt : 20040901.....
I am trying extract a portion of text from the "text" variable in Python that follows a line starting with "Item 5.02". I am placing the extracted text in a new variable called ("important_text"). With the help of fellow stack overflowers, I was able to construct the following code to extract the text:
pattern = r'\bItem\s+5\.02\s*([\w\W]*?)(?=\s*(?:Item\s+
[89]\.01|Item\s+5\.03|Item\s+5\.07|Item\s+7\.01|SIGNATURES|SIGNATURE|' + r'Pursuant
to the requirements of the Securities Exchange Act of 1934)\b)'.replace(' ', '\s*')
pd_00['important_text'] = pd_00['text'].str.extract(pattern, re.IGNORECASE, expand=False)
So, this is extracting all text between the first occurrence of "Item 5.02" and the first occurrence of various terms (i.e., "Item 8.01", "Item 9.01", "SIGNATURES", etc.).
In general, this does a really good job of extracting the portion of text I am looking for. However, in some instances, the text variable contains a Table of Contents that will have a line starting with "Item 5.02". In these instances, the regex code does not grab the portion of text I need. Does anyone have any advice for how to bypass the Table of Contents?
Here is an example that includes a Table of Contents (apologies for the large amount of text...I thought it would be best to give a full example):
<SEC-DOCUMENT>0000950137-05-007782.txt : 20050623
<SEC-HEADER>0000950137-05-007782.hdr.sgml : 20050623
<ACCEPTANCE-DATETIME>20050623154401
ACCESSION NUMBER: 0000950137-05-007782
CONFORMED SUBMISSION TYPE: 8-K/A
PUBLIC DOCUMENT COUNT: 3
CONFORMED PERIOD OF REPORT: 20050511
ITEM INFORMATION: Entry into a Material Definitive Agreement
ITEM INFORMATION: Departure of Directors or Principal Officers; Election of
Directors; Appointment of Principal Officers
ITEM INFORMATION: Financial Statements and Exhibits
FILED AS OF DATE: 20050623
DATE AS OF CHANGE: 20050623
FILER:
COMPANY DATA:
COMPANY CONFORMED NAME: HILLENBRAND INDUSTRIES INC
CENTRAL INDEX KEY: 0000047518
STANDARD INDUSTRIAL CLASSIFICATION: MISCELLANEOUS FURNITURE & FIXTURES [2590]
IRS NUMBER: 351160484
STATE OF INCORPORATION: IN
FISCAL YEAR END: 0930
FILING VALUES:
FORM TYPE: 8-K/A
SEC ACT: 1934 Act
SEC FILE NUMBER: 001-06651
FILM NUMBER: 05912533
BUSINESS ADDRESS:
STREET 1: 700 STATE ROUTE 46 E
CITY: BATESVILLE
STATE: IN
ZIP: 47006-8835
BUSINESS PHONE: 8129347000
</SEC-HEADER>
<DOCUMENT>
<TYPE>8-K/A
<SEQUENCE>1
<FILENAME>c96192ae8vkza.htm
<DESCRIPTION>AMENDMENT TO CURRENT REPORT
<TEXT>
e8vkza
Table of Contents
UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
FORM 8-K/A
CURRENT REPORT
Pursuant to Section 13 or 15(d) of the Securities Exchange Act of 1934
Date of Report (Date of earliest event reported): May 11, 2005
HILLENBRAND INDUSTRIES, INC.
(Exact name of registrant as specified in its charter)
Indiana (State or other jurisdiction of incorporation) 1-6651 (Commission File Number) 35-1160484 (IRS Employer Identification No.)
700 State Route 46 East Batesville, Indiana (Address of principal executive offices) 47006-8835 (Zip Code)
Registrants telephone number, including area code: (812) 934-7000
Not Applicable
(Former name or former address, if changed since last report.)
Check the appropriate box below if the Form 8-K filing is intended to simultaneously satisfy
the filing obligation of the registrant under any of the following provisions:
o Written communications pursuant to Rule 425 under the Securities Act (17 CFR 230.425)
o Soliciting material pursuant to Rule 14a-12 under the Exchange Act (17 CFR 240.14a-12)
o Pre-commencement communications pursuant to Rule 14d-2(b) under the Exchange Act
(17 CFR 240.14d-2(b))
o Pre-commencement communications pursuant to Rule 13e-4(c) under the Exchange
Act
(17 CFR 240.13e-4(c))
1
TABLE OF CONTENTS
Item 1.01 ENTRY INTO A MATERIAL DEFINITIVE AGREEMENT
Item 5.02. DEPARTURE OF DIRECTORS OR PRINCIPAL OFFICERS; ELECTION OF DIRECTORS;
APPOINTMENT OF PRINCIPAL OFFICERS
Item 9.01. FINANCIAL STATEMENTS AND EXHIBITS
SIGNATURES
EXHIBIT INDEX
Employment Agreement
Stock Award
Table of Contents
Item 1.01 ENTRY INTO A MATERIAL DEFINITIVE AGREEMENT.
Item 5.02. DEPARTURE OF DIRECTORS OR PRINCIPAL OFFICERS; ELECTION OF DIRECTORS;
APPOINTMENT OF PRINCIPAL OFFICERS.
As previously disclosed, on May 11, 2005 Hillenbrand Industries, Inc.s Board of
Directors
appointed Rolf A. Classon to serve as President and Chief Executive Officer of
Hillenbrand on an
interim basis. At the time the Form 8-K announcing this appointment was filed,
Mr. Classon did not
have an employment agreement with Hillenbrand.
Similar to the above example, The Table of Contents will usually start with "Table of Contents" and end with "Table of Contents". To further complicate things, the text will sometimes randomly say "Table of Contents" towards the beginning of the text (this is also shown in the above example).
Here is what I would like to extract:
DEPARTURE OF DIRECTORS OR PRINCIPAL OFFICERS; ELECTION OF DIRECTORS;
APPOINTMENT OF PRINCIPAL OFFICERS.
As previously disclosed, on May 11, 2005 Hillenbrand Industries, Inc.s Board of
Directors
appointed Rolf A. Classon to serve as President and Chief Executive Officer of
Hillenbrand on an
interim basis. At the time the Form 8-K announcing this appointment was filed,
Mr. Classon did not
have an employment agreement with Hillenbrand.

Is there anyway to convert specific text data to csv format and give Header names in python?

I have this format of the dataset in a text file.
Here the dataset link is https://drive.google.com/file/d/1RqU2s0dqjd60dcYlxEJ8vnw9_z2fWixd/view?usp=sharing
PMID- 20301691
STAT- Publisher
DA - 20100320
DRDT- 20210311
CTDT- 20000204
PB - University of Washington, Seattle
DP - 1993
TI - Classic Galactosemia and Clinical Variant Galactosemia
BTI - GeneReviews((R))
AB - CLINICAL CHARACTERISTICS: The term "galactosemia" refers to disorders of
galactose metabolism that include classic galactosemia, clinical variant
galactosemia, and biochemical variant galactosemia (not covered in this chapter).
This GeneReview focuses on: Classic galactosemia, which can result in
life-threatening complications including feeding problems, failure to thrive,
hepatocellular damage, bleeding, and E coli sepsis in untreated infants. If a
lactose-restricted diet is provided during the first ten days of life, the
neonatal signs usually quickly resolve and the complications of liver failure,
sepsis, and neonatal death are prevented; however, despite adequate treatment
from an early age, children with classic galactosemia remain at increased risk
for developmental delays, speech problems (termed childhood apraxia of speech and
dysarthria), and abnormalities of motor function. Almost all females with classic
galactosemia manifest hypergonadatropic hypogonadism or premature ovarian
insufficiency (POI). Clinical variant galactosemia, which can result in
life-threatening complications including feeding problems, failure to thrive,
hepatocellular damage including cirrhosis, and bleeding in untreated infants.
This is exemplified by the disease that occurs in African Americans and native
Africans in South Africa. Persons with clinical variant galactosemia may be
missed with newborn screening as the hypergalactosemia is not as marked as in
classic galactosemia and breath testing is normal. If a lactose-restricted diet
is provided during the first ten days of life, the severe acute neonatal
complications are usually prevented. African Americans with clinical variant
galactosemia and adequate early treatment do not appear to be at risk for
long-term complications, including POI. DIAGNOSIS/TESTING: The diagnosis of
classic galactosemia and clinical variant galactosemia is established by
detection of elevated erythrocyte galactose-1-phosphate concentration, reduced
erythrocyte galactose-1-phosphate uridylyltranserase (GALT) enzyme activity,
and/or biallelic pathogenic variants in GALT. In classic galactosemia,
erythrocyte galactose-1-phosphate is usually >10 mg/dL and erythrocyte GALT
enzyme activity is absent or barely detectable. In clinical variant galactosemia,
erythrocyte GALT enzyme activity is close to or above 1% of control values but
probably never >10%-15%. However, in African Americans with clinical variant
galactosemia, the erythrocyte GALT enzyme activity may be absent or barely
detectable but is often much higher in liver and in intestinal tissue (e.g., 10%
of control values). Virtually 100% of infants with classic galactosemia or
clinical variant galactosemia can be detected in newborn screening programs that
include testing for galactosemia in their panel. However, infants with clinical
variant galactosemia may be missed if the program only measures blood total
galactose level and not erythrocyte GALT enzyme activity. MANAGEMENT: Treatment
of manifestations: Standard of care in any newborn who is "screen-positive" for
galactosemia is immediate dietary intervention while diagnostic testing is under
way. Once a diagnosis is confirmed, restriction of galactose intake is continued
and all milk products are replaced with lactose-free formulas (e.g., Isomil((R))
or Prosobee((R))) containing non-galactose carbohydrates; dietary restrictions on
all lactose-containing foods and other dairy products should continue throughout
life, although management of the diet becomes less important after infancy and
early childhood. In rare instances, cataract surgery may be needed in the first
year of life. Childhood apraxia of speech and dysarthria require expert speech
therapy. Developmental assessment at age one year by a psychologist and/or
developmental pediatrician is recommended in order to formulate a treatment plan
with the speech therapist and treating physician. For school-age children, an
individual education plan and/or professional help with learning skills and
special classrooms as needed. Hormone replacement therapy as needed for delayed
pubertal development and/or primary or secondary amenorrhea. Stimulation with
follicle-stimulating hormone may be useful in producing ovulation in some women.
Prevention of secondary complications: Recommended calcium, vitamin D, and
vitamin K intake to help prevent decreased bone mineralization; standard
treatment for gastrointestinal dysfunction. Surveillance: Biochemical genetics
clinic visits every three months for the first year of life or as needed
depending on the nature of the potential acute complications; every six months
during the second year of life; yearly thereafter. Routine monitoring for: the
accumulation of toxic analytes (e.g., erythrocyte galactose-1-phosphate and
urinary galactitol); cataracts; speech and development; movement disorder; POI;
nutritional deficiency; and osteoporosis. Agents/circumstances to avoid: Breast
milk, proprietary infant formulas containing lactose, cow's milk, dairy products,
and casein or whey-containing foods; medications with lactose and galactose.
Evaluation of relatives at risk: To allow for earliest possible diagnosis and
treatment of at-risk sibs: Perform prenatal diagnosis when the GALT pathogenic
variants in the family are known; or If prenatal testing has not been performed,
test the newborn for either the family-specific GALT pathogenic variants or
erythrocyte GALT enzyme activity. Pregnancy management: Women with classic
galactosemia should maintain a lactose-restricted diet during pregnancy. GENETIC
COUNSELING: Classic galactosemia and clinical variant galactosemia are inherited
in an autosomal recessive manner. Couples who have had one affected child have a
25% chance of having an affected child in each subsequent pregnancy. Molecular
genetic carrier testing for at-risk sibs and prenatal testing for pregnancies at
increased risk are an option if the GALT pathogenic variants in the family are
known. If the GALT pathogenic variants in a family are not known, prenatal
testing can rely on assay of GALT enzyme activity in cultured amniotic fluid
cells.
CI - Copyright (c) 1993-2021, University of Washington, Seattle. GeneReviews is a
registered trademark of the University of Washington, Seattle. All rights
reserved.
FED - Adam, Margaret P
ED - Adam MP
FED - Ardinger, Holly H
ED - Ardinger HH
FED - Pagon, Roberta A
I want to give the left side value as column name and right side values will be a row format.
Output should be
PMID STAT DA CTDT
33237688 Publisher 20201126 20201125
I have tried with text to CSV but not working
import pandas as pd
medical = pd.read_csv("sepsis2015.txt",
sep="\n")
print(medical)

The simplest way I know is:
read data file with:
with open("sepsis2015.txt") as file:
lines = file.readlines()
lines = ''.join(lines).split('\n\n')
this will give you a list of your records:
['PMID- 20301691 \nSTAT- Publisher\nDA - 20100320\nDRDT- 20210311\nCTDT- 20000204\nPB - University of Washington, Seattle\nDP - 1993\nTI - Classic Galactosemia and Clinical Variant Galactosemia\nBTI - GeneReviews((R))', '\nPMID- 33237688\nSTAT- Publisher\nDA - 20201126\nCTDT- 20201125\nPB - University of Washington, Seattle\nDP - 1993\nTI - MIRAGE Syndrome\nBTI - GeneReviews((R))']
convert data stored in lines list into a data dictionary:
data = {i: {item.split('-')[0].replace(' ', ''): item.split('-')[1][1:] for item in row.split('\n') if '-' in item} for i, row in enumerate(lines)}
so you have:
{0: {'PMID': '20301691', 'STAT': 'Publisher', 'DA': '20100320', 'DRDT': '20210311', 'CTDT': '20000204', 'PB': 'University of Washington, Seattle', 'DP': '1993', 'TI': 'Classic Galactosemia and Clinical Variant Galactosemia', 'BTI': 'GeneReviews((R))'}, 1: {'PMID': '33237688', 'STAT': 'Publisher', 'DA': '20201126', 'CTDT': '20201125', 'PB': 'University of Washington, Seattle', 'DP': '1993', 'TI': 'MIRAGE Syndrome', 'BTI': 'GeneReviews((R))'}}
finally, convert this dictionary into a pandas.DataFrame with:
df = pd.DataFrame.from_dict(data, orient = 'index')
Complete Code
import pandas as pd
with open(r'data/data.csv') as file:
lines = file.readlines()
lines = ''.join(lines).split('\n\n')
data = {i: {item.split('-')[0].replace(' ', ''): item.split('-')[1][1:] for item in row.split('\n') if '-' in item} for i, row in enumerate(lines)}
print(data)
df = pd.DataFrame.from_dict(data, orient = 'index')
PMID STAT DA DRDT CTDT PB DP TI BTI
0 20301691 Publisher 20100320 20210311 20000204 University of Washington, Seattle 1993 Classic Galactosemia and Clinical Variant Galactosemia GeneReviews((R))
1 33237688 Publisher 20201126 NaN 20201125 University of Washington, Seattle 1993 MIRAGE Syndrome GeneReviews((R))

Maybe...
Given a file like this:
Containing the text:
PMID- 20301691
STAT- Publisher
DA - 20100320
DRDT- 20210311
CTDT- 20000204
PB - University of Washington, Seattle
DP - 1993
TI - Classic Galactosemia and Clinical Variant Galactosemia
BTI - GeneReviews((R))
PMID- 33237688
STAT- Publisher
DA - 20201126
CTDT- 20201125
PB - University of Washington, Seattle
DP - 1993
TI - MIRAGE Syndrome
BTI - GeneReviews((R))
Try:
import pandas as pd
df = pd.read_csv('text.csv', sep='-', header=None)
# clean up
df[0] = df[0].str.strip()
df[1] = df[1].str.strip()
# create a dictionary
data = df.groupby(0)[1].apply(list).to_dict()
# create a dataframe and make sure the arrays are equal length
# borrowed from https://stackoverflow.com/a/19736406/9192284
df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in data.items() ]))
print(df)
Output:
BTI CTDT DA DP DRDT \
0 GeneReviews((R)) 20000204 20100320 1993 20210311
1 GeneReviews((R)) 20201125 20201126 1993 NaN
PB PMID STAT \
0 University of Washington, Seattle 20301691 Publisher
1 University of Washington, Seattle 33237688 Publisher
TI
0 Classic Galactosemia and Clinical Variant Gala...
1 MIRAGE Syndrome

How to use python beautifulsoup to get image description from html?

I did not find this answer in other location, so seek your's help:
I had a python code try to access http://news.yahoo.com/rss/entertainment
To get the title and descriptions. but some is in image alt format:
This is my code:
for child in body_tag.contents[0].channel.children:
if (child.__class__ != NavigableString):
if child.title != None :
print "------title----------"
print(child.title.contents[0].encode('ascii','ignore'))
print "-----description-class------------"
mchild=child.find_next("description").contents[0]
print mchild.__class__
print "-------description---------"
print mchild.find_next("img")
print(mchild.encode('ascii','ignore'))
print "-------end---------"
This is part of the output:
------title----------
University of Connecticut revokes Cosby's honorary degree
-----description-class------------
class 'bs4.element.NavigableString'
-------description---------
None
To display it, I use () replace "<" and ">"
(p) (a href="http://news.yahoo.com/university-connecticut-revokes-cosbys-honorary-degree-155552959.html")
(img src="http://l.yimg.com/bt/api/res/1.2/cjgCZP4YBj7M6SmdpoGj.Q--/YXBwaWQ9eW5ld3NfbGVnbztmaT1maWxsO2g9ODY7cT03NTt3PTEzMA--/http://media.zenfs.com/en_us/News/ap_webfeeds/7b35f971ec59428491aef6308db4567e.jpg" width="130" height="86" alt="FILE - In this May 24, 2016 file photo, Bill Cosby departs the Montgomery County Courthouse after a preliminary hearing, in Norristown, Pa. A 72-year-old New Hampshire woman who says Bill Cosby raped her in 1965 has withdrawn her civil defamation lawsuit against the comedian after a federal judge had allowed the case to move forward. (AP Photo/Matt Rourke, File)" align="left" title="FILE - In this May 24, 2016 file photo, Bill Cosby departs the Montgomery County Courthouse after a preliminary hearing, in Norristown, Pa. A 72-year-old New Hampshire woman who says Bill Cosby raped her in 1965 has withdrawn her civil defamation lawsuit against the comedian after a federal judge had allowed the case to move forward. (AP Photo/Matt Rourke, File)" border="0" /((/a)STORRS, Conn. (AP) The University of Connecticut on Wednesday revoked an honorary degree awarded to Bill Cosby, saying he engaged in conduct "incongruent" with the university's values.(/p((br) clear="all"/)
-------end---------
-------end---------
How could I get the tile inside the img tag:
title="FILE - In this May 24, 2016 file photo,
I tried to find_next("img") and others but I couldn't get them.

So you want all the text from the description and the title from any img tags, you can find all the decription tags then turn the description.text in a BeautifulSoup object then look for the img in that try to pull either the title or alt attribute, to find the matching title find the previous title to the description tag:
for desc in soup.find_all("description"):
d = BeautifulSoup(desc.text,"lxml")
img = d.find("img")
print("Title = {}".format(desc.find_previous("title").text))
img_text = img.get("title") or img.get("alt","") if img else ""
print("Decscription = {}\n" .format(d.find(text=True) + img_text))
Which gives you:
Title = Entertainment News Headlines — Yahoo! News
Decscription = Get the latest entertainment news headlines from Yahoo! News. Find breaking entertainment news, including analysis and opinion on top entertainment stories.
Title = Spotify's Top 10 most streamed tracks
Decscription = The following list represents the most streamed tracks on Spotify, based on the number of people who shared it divided by the number who listened to it, from Monday, Oct. 20 to Sunday Oct. 26 via Facebook, Tumblr, Twitter and Spotify.FILE - In this Sept. 7, 2012 file photo, musician Robin Thicke performs during Macy's Passport presents Glamorama 2012 at The Orpheum Theatre in Los Angeles. Thicke's "Blurred Lines (feat. T.I. & Pharrell)" was the top streamed tracks on Spotify from Monday, June 10, to Sunday, June 16, 2013. (Photo by Matt Sayles/Invision/AP, File)
Title = Who will win at the Tony Awards? AP predicts
Decscription = NEW YORK (AP) — The great comedian W.C. Fields is credited with the line, "Never work with children or animals." He would have had trouble on Broadway this season.This theater image released by The O+M Company shows the cast during a performance of the musical "Kinky Boots." The Cyndi Lauper-scored "Kinky Boots," based on the 2005 British movie about a real-life shoe factory that struggles until it finds new life in fetish footwear, is nominated for 13 Tony Award nominations. The awards will be broadcast on CBS from Radio City Music Hall on June 9. (AP Photo/The O+M Company, Matthew Murphy)
Title = The top iPhone and iPad apps on App Store
Decscription = App Store Official Charts for the week ending November 3, 2014:
Title = Fairey: 'Vindicated' by dismissal of Detroit tagging case
Decscription = DETROIT (AP) — Graffiti artist Shepard Fairey says he feels "relieved and vindicated" now that a malicious destruction of property case in Detroit has been dismissed.
Title = FBI seeks Rockwell painting on 40th anniversary of its theft
Decscription = CHERRY HILL, N.J. (AP) — Federal authorities are seeking the public's help in recovering a 1919 Norman Rockwell painting on the 40th anniversary of its theft from a New Jersey home.
Title = APNewsBreak: Union has deal with 4th Atlantic City casino
Decscription = ATLANTIC CITY, N.J. (AP) — Atlantic City's main casino workers union reached agreement Thursday with four of the five casinos it had been targeting for a strike this weekend.Union members cheer as they discuss preparations for a strike against as many as five of the city's eight casinos in Atlantic City, N.J. on Wednesday June 29, 2016. Local 54 of the Unite-HERE union says it will go on strike Friday if it can't reach new contracts with three casinos owned by Caesars Entertainment (Bally's, Caesars and Harrah's) and two casinos owned by billionaire investor Carl Icahn (the Tropicana and the Trump Taj Mahal). About 6,500 of the union's nearly 10,000 workers are at the five hotels. (AP Photo/Wayne Parry)
Title = The Latest: APNewsBreak: Union has deal with 4th casino
Decscription = The Latest on contract negotiations with casinos (all times local): 4:35 p.m. Atlantic City's main casino workers union has reached agreement with the fourth of five casinos it had been targeting for a ...Union members cheer as they discuss preparations for a strike against as many as five of the city's eight casinos in Atlantic City, N.J. on Wednesday June 29, 2016. Local 54 of the Unite-HERE union says it will go on strike Friday if it can't reach new contracts with three casinos owned by Caesars Entertainment (Bally's, Caesars and Harrah's) and two casinos owned by billionaire investor Carl Icahn (the Tropicana and the Trump Taj Mahal). About 6,500 of the union's nearly 10,000 workers are at the five hotels. (AP Photo/Wayne Parry)
Title = CBS reporter traveling to 59 parks in a year
Decscription = NEW YORK (AP) — Conor Knighton didn't take the easy route when he proposed a "CBS Sunday Morning" story on the National Park Service's centennial.
Title = Here come the virtual reality Olympics ... for Samsung users
Decscription = NEW YORK (AP) — Athletes in Rio will compete to be the fastest sprinter and highest jumper at the Olympics this August. But there's another test underway as well: How well can virtual reality capture sporting events?This photo provided by NBC and HD Studio shows NBC's daytime and late night set for the Rio Olympics located on Copacabana Beach in Rio. NBC says it will provide 85 hours of virtual reality programming during the Rio Olympics in August, but only to users of Samsung Galaxy smartphones and the Samsung Gear VR headset. (HD Studio/Courtesy of NBC via AP)
Title = Oscars timetable for 2017 revealed
Decscription = Movie buffs, mark your calendars: your 2017 Oscars party will be on Sunday, February 26. The Academy of Motion Picture Arts and Sciences announced the timetable for the 89th Oscars on Thursday, one day after it announced that it had invited a record number of artists to join the body, the majority of them women and people of color.A view of the Oscars logo at the 88th Annual Academy Awards nominee luncheon on February 8, 2016 in Beverly Hills, California
Title = Queen marks deadly Somme centenary at Westminster Abbey
Decscription = LONDON (AP) — Queen Elizabeth II attended a service at Westminster Abbey on Thursday, the eve of the centenary of the Battle of the Somme, one of the deadliest chapters of World War I.
Title = Rob Wasserman, accomplished bass player, dead at 64
Decscription = NEW YORK (AP) — Rob Wasserman, a highly respected bass player and composer who performed and recorded with Lou Reed, Neil Young, Brian Wilson and many other musicians, has died. He was 64.
Title = Documents filed by some Prince claimants to become public
Decscription = CHASKA, Minn. (AP) — A Minnesota judge overseeing the legal proceedings about Prince's estate will allow documents filed by some claimants to become public.
Title = The Latest: Oprah Winfrey to appear at Essence Festival
Decscription = NEW ORLEANS (AP) — The Latest on the annual Essence Festival held over the July 4th holiday in New Orleans (all times local):FILE - In this Jan. 20, 2009, file photo, Mariah Carey performs at the Neighborhood Inaugural Ball in Washington. Music is at the heart of the annual Essence Festival in New Orleans, and this year is no different. Fans will get to hear from first-timers Mariah Carey, Puff Daddy and Jeremih as well as from festival veterans Charlie Wilson, Maxwell, New Edition, Tyrese and Lalah Hathaway - all of whom are scheduled to perform inside the Superdome Friday, July 1, 2016, through Sunday. (AP Photo/Alex Brandon, File)
Title = Brad Paisley: West Virginia floods shocking, heartbreaking
Decscription = CHARLESTON, W.Va. (AP) — Brad Paisley said he's shocked and heartbroken by the destruction from deadly flooding in his home state of West Virginia.Principal Mike Kelley walks through a hallway that is filled with slick mud at Herbert Hoover High School in Clendenin, W.Va., Monday, June 27, 2016. The first floor hallways and rooms of the school are caked in 3-5 inches mud, which was left by over six feet of flood water that swamped the building late last week. (Sam Owens/Charleston Gazette-Mail via AP)
Title = Chechen leader Kadyrov seeks apprentice on reality TV show
Decscription = MOSCOW (AP) — Another powerful, controversial man is taking to reality TV to find an assistant — not Donald Trump but the leader of Chechnya.FILE - In this Wednesday March 23, 2016 file photo, Chechen regional leader Ramzan Kadyrov addresses a rally marking the 13th anniversary of the adoption of the Constitution of Russian region of Chechnya, in the regional capital of Grozny, Russia. Russian state television on Thursday is to broadcast the opening episode of "Live - The Team," in which participants compete to become an assistant to leader of Chechnya Ramzan Kadyrov. (AP Photo/Musa Sadulayev, File)
Title = With an eye to Tuscany, Debi Mazar plots culinary future
Decscription = NEW YORK (AP) — Debi Mazar and her brood spend at least a month in Tuscany each year, but if the "Younger" actress had her way, the region would be a far more permanent fixture in her life.FILE - In this Wednesday, Jan. 6, 2016 file photo, Debi Mazar speaks during the "Younger" panel at the TV Land 2016 Winter TCA in Pasadena, Calif. After the success of her award-winning cooking show "Extra Virgin," Mazar's creative juices are still flowing, as the actress talks about the possibility of another show and more of her culinary dreams. (Photo by Richard Shotwell/Invision/AP)
Title = Wisecracking De Niro touts Catskills with NY governor
Decscription = BETHEL, N.Y. (AP) — Robert De Niro is conjuring the legacy — and the stand-up jokes — of comedians like Rodney Dangerfield, Henny Youngman and Milton Berle while praising the natural beauty of New York's Catskills region.
Title = Music Review: Sara Watkins branches out
Decscription = Sara Watkins, "Young in All the Wrong Ways" (New West Records)FILE - In this July 29, 2012 file photo, Sara Watkins performs at the Newport Folk Festival in Newport, R.I. Watkins describes her latest venture as “a breakup album with myself,” but it seems like there might have been someone else involved. The songs on her new album, “Young in All the Wrong Ways,” have bite to them. There is anger here, a jarring departure from Watkins’ previous work. A couple of the songs push into hard-edged rock, her voice straining against a jagged electric guitar. (AP Photo/Joe Giblin)
Title = Disney Animation's 'Wreck-It Ralph 2' set for March 2018
Decscription = LOS ANGELES (AP) — "Wreck-It Ralph" is headed back to the arcade, and theaters, in a sequel planned for release on March 9, 2018. Co-directors Rich Moore and Phil Johnston announced the sequel to the 2012 animated film Thursday morning on Facebook Live.FILE - In this Oct. 29, 2012 file photo, Director Rich Moore arrives at the world premiere of "Wreck-It Ralph" at El Capitan Theatre in Los Angeles. “Wreck-It Ralph” is headed back to the arcade, and theaters, in a sequel planned for release on March 9, 2018. Co-directors Rich Moore and Phil Johnston announced the sequel to the 2012 animated film Thursday, June 30, 2016 on Facebook Live. (Photo by Jordan Strauss/Invision/AP)
Title = Scarlett Johansson ranked Hollywood's top-grossing actress
Decscription = Scarlett Johansson has taken the crown as Hollywood's highest-grossing actress ever.FILE - In this April 21, 2015, file photo, Scarlett Johansson poses for photographers upon arrival at the premiere for the film 'The Avengers Age of Ultron' in London. Box Office Mojo has crowned Johansson as Hollywood's highest grossing actress on a list updated June 29, 2016.(Photo by Joel Ryan/Invision/AP, File)
Title = HLN's Nancy Grace leaving her legal show
Decscription = NEW YORK (AP) — Tough-talking former prosecutor Nancy Grace is leaving her prime-time show on the HLN network in October.FILE - In this Friday, Oct. 21, 2014, file photo, television host Nancy Grace arrives at the 7th annual GLSEN Respect Awards in Beverly Hills, Calif. Grace is leaving her prime-time show on the HLN network in October 2016. The CNN sister station said Grace told her staff on Thursday, June 30, 2016 that her show would be ending after 12 years. An HLN spokeswoman said the network had no immediate announcement on what program would go in its place. (AP Photo/Matt Sayles, File)
Title = Moviegoers to Hollywood: It better be good
Decscription = NEW YORK (AP) — As Hollywood girds for a low-key Fourth of July box office weekend and watches its summer season dip 15 percent below last year's, an even more worrisome trend has taken shape: Moviegoers are growing pickier.FILE - This image released by Warner Bros. Entertainment shows Alexander Skarsgard from "The Legend of Tarzan." For films that aren’t “the movie to see,” moviegoers are increasingly staying home. With word-of-mouth traveling at the speed of Twitter, quality has become a more vital currency. (Jonathan Olley/Warner Bros. Entertainment via AP, File)
Title = 8 rescued after Oklahoma City roller coaster gets stuck
Decscription = OKLAHOMA CITY (AP) — No one was injured when a roller coaster at an Oklahoma City amusement park stalled out and stranded eight people, including seven children.
Title = Smallest national park? Kosciuszko, forgotten son of liberty
Decscription = PHILADELPHIA (AP) — If the hip-hop Broadway smash "Hamilton" can reignite interest in the first U.S. treasury secretary, what will it take to drum up interest in another forgotten hero from America's fight for independence?FILE - In this April 1, 2013 file photo a statue of Poland's General Thaddeus Kosciuszko is enveloped in the early morning fog in Lafayette Park across from the White House in Washington. Kosciuszko was a military engineer from Poland, Kosciuszko came to Philadelphia in August 1776 to offer his services in the fight against the British. (AP Photo/Jacquelyn Martin, File)
Title = Pregnant Alanis Morissette posts nude underwater photo
Decscription = Alanis Morissette has posted a nude photo of herself sporting a large baby bump while floating underwater.FILE - In this Nov. 22, 2015, file photo, Souleye, left, and Alanis Morissette arrive at the American Music Awards in Los Angeles. Morissette posted a nude photo of herself sporting a large baby bump while floating underwater on Instagram on June 28, 2016. (Photo by Jordan Strauss/Invision/AP, File)
Title = New Orleans ready to 'party with a purpose' at Essence Fest
Decscription = NEW ORLEANS (AP) — Music has always been at the heart of the annual Essence Festival, now in its 22nd year, and this year will be no different.FILE - In this Jan. 20, 2009, file photo, Mariah Carey performs at the Neighborhood Inaugural Ball in Washington. Music is at the heart of the annual Essence Festival in New Orleans, and this year is no different. Fans will get to hear from first-timers Mariah Carey, Puff Daddy and Jeremih as well as from festival veterans Charlie Wilson, Maxwell, New Edition, Tyrese and Lalah Hathaway - all of whom are scheduled to perform inside the Superdome Friday, July 1, 2016, through Sunday. (AP Photo/Alex Brandon, File)
Title = Alvin Toffler, author of 'Future Shock,' dead at 87
Decscription = NEW YORK (AP) — Alvin Toffler, a guru of the post-industrial age whose million-selling "Future Shock" and other books anticipated the disruptions and transformations brought about by the rise of digital technology, has died. He was 87.
Title = Theater shows R-rated comedy trailer with "Finding Dory"
Decscription = CONCORD, Calif. (AP) — The owner of a California movie theater is apologizing after a trailer for an R-rated upcoming Seth Rogen comedy was shown ahead of a screening of Disney's "Finding Dory."FILE - This undated file image released by Disney shows the character Dory, voiced by Ellen DeGeneres, in a scene from "Finding Dory." In its second week, “Finding Dory” easily remained on top with an estimated $73.2 million, according to studio estimates Sunday, June 26, 2016. (Pixar/Disney via AP, File)
Title = Christie's to sell contents of Reagans' LA home
Decscription = NEW YORK (AP) — A two-day auction of the contents of Ronald and Nancy Reagan's ranch-style home in California will include everything from personal mementos from heads of state and friends to objects the couple took with them to the White House.This undated photo provided by Christie's shows a needlepoint cushion given to Ronald Reagan for his 70th birthday in 1981. The pillow, which will be sold by Christie's New York during a two-day auction of the contents of Ronald and Nancy Reagan's ranch-style home in California, has a pre-sale estimate of $1,000-1,500. Christie’s announced Thursday, June 30, 2016, highlights of the Sept. 21-22 sale in New York City. (Christie's via AP) MANDATORY CREDIT
Title = Asian actors too busy to fret over Hollywood 'white-washing'
Decscription = TOKYO (AP) — The film world of Asia, known for producing Akira Kurosawa, Satyajit Ray, Brillante Mendoza and other greats, is too busy making movies of its own to fret much about the debate slamming Hollywood — the casting of white people in roles written for Asians.FILE - In this Sept. 5, 2007, file photo, Japanese actress Kaori Momoi poses during the photo call for the movie "Sukiyaki Western Django" at the 64th Venice Film Festival, in Venice, Italy. The film world of Asia is too busy making movies of its own to fret much about the debate slamming Hollywood - the casting of white people in roles written for Asians. Momoi, who appeared in “Memoirs of a Geisha,” as well as Russian filmmaker Aleksandr Sokurov’s “The Sun,” suggested acting was ultimately about individual talent, not skin color or nationality. (AP Photo/Andrew Medichini, File)
Title = Film academy invites 683 new members to join
Decscription = LOS ANGELES (AP) — Six months after announcing intentions to double the number of female and minority members in its ranks by 2020, the Academy of Motion Picture Arts and Sciences has invited 683 new members to join the organization.FILE - In this March 2, 2014 file photo, an Oscar statue is displayed at the Oscars at the Dolby Theatre in Los Angeles. Six months after announcing intentions to double the number of female and minority members in its ranks, the Academy of Motion Picture Arts and Sciences has invited 683 new members to join the organization. The academy says its invitees are 46 percent female, 41 percent minority and represent 59 countries.(Photo by Matt Sayles/Invision/AP, File)
Title = Miss Teen USA pageant replaces swimsuits with athletic wear
Decscription = LAS VEGAS (AP) — The Miss Teen USA pageant is dropping the swimsuit portion of its competition.
Title = YouTube personality charged with making false police report
Decscription = LOS ANGELES (AP) — A gay YouTube personality who said he was assaulted outside a West Hollywood club has been charged with filing a false police report and faking his injuries.This Wednesday, June 29, 2016, photo released by Los Angeles County Sheriff's Department shows Calum McSwiggan. The London-native gay YouTube personality who said he was assaulted outside a West Hollywood club has been charged with filing a false police report and faking his injuries. (Los Angeles County Sheriff's Department via AP) MANDATORY CREDIT
Title = Jesus Christ film coming to virtual reality
Decscription = LOS ANGELES (AP) — The story of Jesus Christ is coming to virtual reality for the first time.This undated photo provided by Autumn VR Inc. and VRWERX, LLC, shows a production still from "Jesus VR - The Story of Christ." The story of Jesus Christ is coming to virtual reality for the first time. Autumn Productions and VRWerx announced plans Wednesday, June 29, 2016, to release the live-action film on all major VR platforms this Christmas. (Autumn VR Inc. and VRWERX, LLC via AP)
Title = The Latest: Celebrities record tribute to nightclub victims
Decscription = ORLANDO, Fla. (AP) — The Latest on the mass shooting at a gay Orlando nightclub that left 49 people dead (all times local):
Title = The Latest: Golfer Bubba Watson plans to help flood victims
Decscription = CHARLESTON, W.Va. (AP) — The Latest on flooding that has devastated parts of West Virginia (all times local):
Title = Twitter dominated by tongue-in-cheek #HeterosexualPrideDay
Decscription = What appears to be a tongue-in-cheek social media movement to mark June 29 as a day to celebrate heterosexual pride has become one of the day's top online trends.
Title = Miss Teen USA axes 'outdated' bikini competition
Decscription = One of America's top beauty pageants has axed its swimsuit competition, ditching bikinis for sportswear to fend off years of complaints that parading in a bikini is sexist and demeaning. The Miss Universe Organization, which operates the pageant, said from now on contestants would be judged on athletic wear, in addition to the evening wear and personality competitions. "Miss Teen USA's transition to athletic wear reads as less exploitative and more focused on the importance of physical fitness for its younger participants," it said.Miss Teen USA 2016 Katherine Haik (R) congratulates Miss District of Columbia USA 2016 Deshauna Barber during the 2016 Miss USA pageant at T-Mobile Arena on June 5, 2016 in Las Vegas, Nevada
Title = Kayne West, Adidas expand partnership for Yeezy line
Decscription = LOS ANGELES (AP) — Rapper Kanye West and Adidas are expanding their partnership that began almost two years ago with retail hubs for his Yeezy products and additional sportswear designs.FILE - In this Aug. 30, 2015, file photo, Kanye West accepts the video vanguard award at the MTV Video Music Awards at the Microsoft Theater in Los Angeles. West and Adidas are expanding their partnership that began almost two years ago with retail hubs for his Yeezy products and additional sportswear designs. The sportswear company announced the collaboration on Wednesday, June 29, 2016, and described it as the most significant partnership between a non-athlete and an athletic brand. (Photo by Matt Sayles/Invision/AP, File)
You cannot find every title first and then the following description as not all titles are related to a description but all descriptions are related to a title.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.