How to get a list of tickers in Jupyter Notebook? - python

Write code to get a list of tickers for all S&P 500 stocks from Wikipedia. As of 2/24/2021, there are 505 tickers in that list. You can use any method you want as long as the code actually queries the following website to get the list:
https://en.wikipedia.org/wiki/List_of_S%26P_500_companies
One way would be to use the requests module to get the HTML code and then use the re module to extract the tickers. Another option would be the .read_html function in pandas and then export the tickers column to a list.
You should save the tickers in a list with the name sp500_tickers

This will grab the data in the table named 'constituents'.
# find a specific table by table count
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
print(df[0].to_json(orient='records'))
Result:
[{"Symbol":"MMM","Security":"3M Company","SEC filings":"reports","GICS Sector":"Industrials","GICS Sub-Industry":"Industrial Conglomerates","Headquarters Location":"St. Paul, Minnesota","Date first added":"1976-08-09","CIK":66740,"Founded":"1902"},{"Symbol":"ABT","Security":"Abbott Laboratories","SEC filings":"reports","GICS Sector":"Health Care","GICS Sub-Industry":"Health Care Equipment","Headquarters Location":"North Chicago, Illinois","Date first added":"1964-03-31","CIK":1800,"Founded":"1888"},{"Symbol":"ABBV","Security":"AbbVie Inc.","SEC filings":"reports","GICS Sector":"Health Care","GICS Sub-Industry":"Pharmaceuticals","Headquarters Location":"North Chicago, Illinois","Date first added":"2012-12-31","CIK":1551152,"Founded":"2013 (1888)"},{"Symbol":"ABMD","Security":"Abiomed","SEC filings":"reports","GICS Sector":"Health Care","GICS Sub-Industry":"Health Care Equipment","Headquarters Location":"Danvers, Massachusetts","Date first added":"2018-05-31","CIK":815094,"Founded":"1981"},{"Symbol":"ACN","Security":"Accenture","SEC filings":"reports","GICS Sector":"Information Technology","GICS Sub-Industry":"IT Consulting & Other Services","Headquarters Location":"Dublin, Ireland","Date first added":"2011-07-06","CIK":1467373,"Founded":"1989"},{"Symbol":"ATVI","Security":"Activision Blizzard","SEC filings":"reports","GICS Sector":"Communication Services","GICS Sub-Industry":"Interactive Home Entertainment","Headquarters Location":"Santa Monica, California","Date first added":"2015-08-31","CIK":718877,"Founded":"2008"},{"Symbol":"ADBE","Security":"Adobe Inc.","SEC filings":"reports","GICS Sector":"Information Technology","GICS Sub-Industry":"Application Software","Headquarters Location":"San Jose, California","Date first added":"1997-05-05","CIK":796343,"Founded":"1982"},
Etc., etc., etc.
That's JSON. If you want a table, kind of like what you would use in Excel, simply print the df.
Result:
[ Symbol Security SEC filings GICS Sector \
0 MMM 3M Company reports Industrials
1 ABT Abbott Laboratories reports Health Care
2 ABBV AbbVie Inc. reports Health Care
3 ABMD Abiomed reports Health Care
4 ACN Accenture reports Information Technology
.. ... ... ... ...
500 YUM Yum! Brands Inc reports Consumer Discretionary
501 ZBRA Zebra Technologies reports Information Technology
502 ZBH Zimmer Biomet reports Health Care
503 ZION Zions Bancorp reports Financials
504 ZTS Zoetis reports Health Care
GICS Sub-Industry Headquarters Location \
0 Industrial Conglomerates St. Paul, Minnesota
1 Health Care Equipment North Chicago, Illinois
2 Pharmaceuticals North Chicago, Illinois
3 Health Care Equipment Danvers, Massachusetts
4 IT Consulting & Other Services Dublin, Ireland
.. ... ...
500 Restaurants Louisville, Kentucky
501 Electronic Equipment & Instruments Lincolnshire, Illinois
502 Health Care Equipment Warsaw, Indiana
503 Regional Banks Salt Lake City, Utah
504 Pharmaceuticals Parsippany, New Jersey
Date first added CIK Founded
0 1976-08-09 66740 1902
1 1964-03-31 1800 1888
2 2012-12-31 1551152 2013 (1888)
3 2018-05-31 815094 1981
4 2011-07-06 1467373 1989
.. ... ... ...
500 1997-10-06 1041061 1997
501 2019-12-23 877212 1969
502 2001-08-07 1136869 1927
503 2001-06-22 109380 1873
504 2013-06-21 1555280 1952
[505 rows x 9 columns]]
Alternatively, you can export the df to a CSV file.
df.to_csv('constituents.csv')

Related

Python web scraper will not work on deeply nested tags

This is my second week of coding in Python. I wanted to write a scraper that would return a location and its phone number for all locations. The scraper is incomplete and I have tried a few versions but they all return either an empty list [] or an error.
import requests
from bs4 import BeautifulSoup
import requests
webpage_response = requests.get('https://www.orangetheory.com/en-us/locations/')
webpage = webpage_response.content
soup = BeautifulSoup(webpage, "html.parser")
soup.find_all(attrs={'class': 'aria-label'})
To get info about all fitness clubs in USA from that page you can use next example:
import requests
import pandas as pd
url = "https://api.orangetheory.co/partners/v2/studios?country=United%20States&sort=studioName"
data = requests.get(url).json()
df = pd.DataFrame([d[0] for d in data["data"]])
df = pd.concat([df, df.pop("studioLocation").apply(pd.Series)], axis=1)
print(df)
df.to_csv("data.csv", index=False)
Prints:
studioId studioUUId mboStudioId studioName studioNumber description studioStatus openDate reOpenDate taxRate logoUrl contactEmail timeZone environment studioProfiles physicalAddress physicalCity physicalState physicalPostalCode physicalRegion physicalCountryId physicalCountry phoneNumber latitude longitude
0 2266 f627d35c-9e2b-452a-8017-bfbcccff5a4d 610952.0 14th Street, DC 0943 The science of excess post-exercise oxygen consumption(EPOC) takes your results to new heights in this exciting group fitness concept. You will feel new energy and see amazing results with only 2-4 workouts per week. Each 60-minute class is broken into intervals of high-energy cardiovascular training and strength training. Use a variety of equipment including treadmills, rowing machines, suspension training, and free weights to tone your body and gain energy throughout the day. Exciting and inspiring group classes motivate you to beat plateaus and stick to your goals. Pay-as-you-go or get deep discounts with customized packages.\r\n\r\nThe best part of the Orange Experience is the results. You can burn calories for up to 38 hours after your workout! Active 2017-09-02 00:00:00 2020-06-27 00:00:00 0.000000 https://clients.mindbodyonline.com/studios/OrangetheoryFitnessWashingtonDC0943/logo_mobile.png?imageversion=1513008044 studiomanager0943#orangetheoryfitness.com America/New_York PROD {'isWeb': 1, 'introCapacity': 1, 'isCrm': 1} 1925 14th Street NW Suite C Washington District of Columbia 20009 DC-01 1 United States 2028691700 38.91647339 -77.03197479
1 47914 01ddd24d-58bf-4959-bcb2-34587d6e48fc 660917.0 2021 Virtual Summit 10001 None Coming Soon None None 0.000000 None studiomanager10001#orangetheoryfitness.com America/New_York PROD {'isWeb': 0, 'introCapacity': 0, 'isCrm': 0} * * * * NV-01 1 United States -81.66339500 -15.58054000
2 2964 9fd74853-4bad-4f1d-a9c7-fcbf27eb1651 576312.0 Abilene, TX 0862 None Active 2018-09-22 00:00:00 2020-05-18 00:00:00 0.000000 None studiomanager0862#orangetheoryfitness.com America/Chicago PROD {'isWeb': 1, 'introCapacity': 1, 'isCrm': 1} 3950 Catclaw Drive Abilene Texas 79606 TX-06 1 United States 3254006191 32.40399933 -99.77462006
3 3139 2a5a5bc7-ea4a-4a2a-b166-56ce5e6ee7e2 415638.0 Acworth 1188 None Active 2019-04-17 00:00:00 2020-05-17 00:00:00 0.000000 None studiomanager1188#orangetheoryfitness.com America/New_York PROD {'isWeb': 1, 'introCapacity': 1, 'isCrm': 1} 4391 Acworth Dallas Rd NW Suite 212 Acworth Georgia 30101 GA-01 1 United States 7706748722 34.05842590 -84.72319794
...
and saves data.csv (screenshot from LibreOffice):

How to extract only the n-nth HTML title tag in a sequence in Python with BeautifulSoup?

I'm trying to extract data from a Wikipedia table (https://en.wikipedia.org/wiki/NBA_Most_Valuable_Player_Award) about the MVP winners over NBA history.
This is my code:
wik_req = requests.get("https://en.wikipedia.org/wiki/NBA_Most_Valuable_Player_Award")
wik_webpage = wik_req.content
soup = BeautifulSoup(wik_webpage, "html.parser")
my_table = soup('table', {"class":"wikitable plainrowheaders sortable"})[0].find_all('a')
print(my_table)
for x in my_table:
test = x.get("title")
print(test)
However, this code prints all HTML title tags of the table as in the following (short version):
'1955–56 NBA season
Bob Pettit
Power Forward (basketball)
United States
St. Louis Hawks
1956–57 NBA season
Bob Cousy
Point guard
Boston Celtics'
Eventually, I want to create a pandas dataframe in which I store all the season years in a column, all the player years in a column, and so on and so forth. What code does the trick to only print one of the HTML tag titles (e.g. only the NBA season years)? I can then store those into a column to set up my dataframe and do the same with player, position, nationality and team.
All you should need for that dataframe is:
import pandas as pd
url = "https://en.wikipedia.org/wiki/NBA_Most_Valuable_Player_Award"
df=pd.read_html(url)[5]
Output:
print(df)
Season Player ... Nationality Team
0 1955–56 Bob Pettit* ... United States St. Louis Hawks
1 1956–57 Bob Cousy* ... United States Boston Celtics
2 1957–58 Bill Russell* ... United States Boston Celtics (2)
3 1958–59 Bob Pettit* (2) ... United States St. Louis Hawks (2)
4 1959–60 Wilt Chamberlain* ... United States Philadelphia Warriors
.. ... ... ... ... ...
59 2014–15 Stephen Curry^ ... United States Golden State Warriors (2)
60 2015–16 Stephen Curry^ (2) ... United States Golden State Warriors (3)
61 2016–17 Russell Westbrook^ ... United States Oklahoma City Thunder (2)
62 2017–18 James Harden^ ... United States Houston Rockets (4)
63 2018–19 Giannis Antetokounmpo^ ... Greece Milwaukee Bucks (4)
[64 rows x 5 columns]
If you really want to stick with BeautifulSoup, here's an example to get you started:
my_table = soup('table', {"class":"wikitable plainrowheaders sortable"})[0]
season_col=[]
for row in my_table.find_all('tr')[1:]:
season = row.findChildren(recursive=False)[0]
season_col.append(season.text.strip())
I expect there may be some differences between columns, but as you indicated you want to get familiar with BeautifulSoup, that's for you to explore :)

how to scrape Wikipedia tables with Python

i want to extract table url is https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia
my code is not giving data.how we can get?
Code:
import requests
from bs4 import BeautifulSoup as bs
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
ta=soup.find_all('table',class_="wikitable sortable jquery-tablesorter")
print(ta)
If I'm pulling a table and see <table> tags, I would always try first Pandas .read_html(). It'll do the iterating over rows for you. Most of the time you can get exactly what you need, or at the very least only have to do some minor manipulationg of the dataframe. In this case, it gives you the full table nicely:
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
table = pd.read_html(url)[1]
Output:
print (table.to_string())
0 1 2 3 4 5
0 Name Industry Sector Headquarters Founded Notes
1 Airfast Indonesia Consumer services Airlines Tangerang 1971 Private airline
2 Angkasa Pura Industrials Transportation services Jakarta 1962 State-owned airports
3 Astra International Conglomerates - Jakarta 1957 Automotive, financials, industrials, technology
4 Bank Central Asia Financials Banks Jakarta 1957 Bank
5 Bank Danamon Financials Banks Jakarta 1956 Bank
6 Bank Mandiri Financials Banks Jakarta 1998 Bank
7 Bank Negara Indonesia Financials Banks Jakarta 1946 Bank
8 Bank Rakyat Indonesia Financials Banks Jakarta 1895 Micro-finance bank
9 Bumi Resources Basic materials General mining Jakarta 1973 Mining
10 Djarum Consumer goods Tobacco Kudus and Jakarta 1951 Tobacco
11 Dragon Computer & Communication Technology Computer hardware Jakarta 1980 Computer hardware
12 Elex Media Komputindo Consumer services Publishing Jakarta 1985 Publisher
13 Femina Consumer services Media Jakarta 1972 Weekly magazine
14 Garuda Indonesia Consumer services Travel & leisure Tangerang 1949 State-owned airline
15 Gudang Garam Consumer goods Tobacco Kediri 1958 Tobacco
16 Gunung Agung Consumer services Specialty retailers Jakarta 1953 Bookstores
17 Indocement Tunggal Prakarsa Industrials Building materials & fixtures Jakarta 1985 Cement, part of HeidelbergCement (Germany)
18 Indofood Consumer goods Food products Jakarta 1968 Food production
19 Indonesian Aerospace Industrials Aerospace Bandung 1976 State-owned aircraft design
20 Indonesian Bureau of Logistics Consumer goods Food products Jakarta 1967 Food distribution
21 Indosat Telecommunications Fixed line telecommunications Jakarta 1967 Telecommunications network
22 Infomedia Nusantara Consumer services Publishing Jakarta 1975 Directory publisher
23 Jalur Nugraha Ekakurir (JNE) Industrials Delivery services Jakarta 1990 Express logistics
24 Kalbe Farma Health care Pharmaceuticals Jakarta 1966 Pharmaceuticals
25 Kereta Api Indonesia Industrials Railroads Bandung 1945 State-owned railway
26 Kimia Farma Health care Pharmaceuticals Jakarta 1971 State-owned pharma
27 Kompas Gramedia Group Consumer services Media agencies Jakarta 1965 Media holding
28 Krakatau Steel Basic materials Iron & steel Cilegon 1970 State-owned steel
29 Lion Air Consumer services Airlines Jakarta 2000 Low-cost airline
30 Lippo Group Financials Real estate holding & development Jakarta 1950 Development
31 Matahari Consumer services Broadline retailers Tangerang 1982 Department stores
32 MedcoEnergi Oil & gas Exploration & production Jakarta 1980 Energy, oil and gas
33 Media Nusantara Citra Consumer services Broadcasting & entertainment Jakarta 1997 Media
34 Panin Sekuritas Financials Investment services Jakarta 1989 Broker
35 Pegadaian Financials Consumer finance Jakarta 1901 State-owned financial services
36 Pelni Industrials Marine transportation Jakarta 1952 Shipping
37 Pos Indonesia Industrials Delivery services Bandung 1995 State-owned postal service
38 Pertamina Oil & gas Integrated oil & gas Jakarta 1957 State-owned oil and natural gas
39 Perusahaan Gas Negara Oil & gas Exploration & production Jakarta 1965 Gas
40 Perusahaan Gas Negara Utilities Gas distribution Jakarta 1965 State-owned natural gas transportation
41 Perusahaan Listrik Negara Utilities Conventional electricity Jakarta 1945 State-owned electrical distribution
42 Phillip Securities Indonesia, PT Financials Investment services Jakarta 1989 Financial services
43 Pindad Industrials Defense Bandung 1808 State-owned defense
44 PT Lapindo Brantas Oil & gas Exploration & production Jakarta 1996 Oil and gas
45 PT Metro Supermarket Realty Tbk Consumer services Food retailers & wholesalers Jakarta 1955 Supermarkets
46 Salim Group Conglomerates - Jakarta 1972 Industrials, financials, consumer goods
47 Sampoerna Consumer goods Tobacco Surabaya 1913 Tobacco
48 Semen Indonesia Industrials Building materials & fixtures Gresik 1957 Cement
49 Susi Air Consumer services Airlines Pangandaran 2004 Charter airline
50 Telkom Indonesia Telecommunications Fixed line telecommunications Bandung 1856 Telecommunication services
51 Telkomsel Telecommunications Mobile telecommunications Jakarta 1995 Mobile network, part of Telkom Indonesia
52 Trans Corp Conglomerates - Jakarta 2006 Media, consumer services, real estate, part of...
53 Unilever Indonesia Consumer goods Personal products Jakarta 1933 Personal care products, part of Unilever (Neth...
54 United Tractors Industrials Commercial vehicles & trucks Jakarta 1972 Heavy equipment
55 Waskita Industrials Heavy construction Jakarta 1961 State-owned construction
import requests
from bs4 import BeautifulSoup as bs
URL = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
ta=soup.find_all('table',{'class':'wikitable'})
print(ta)
You can search for table by class name using old way. It seems still working.
Fixes:
Use URL instead of url in your code (line 4)
Use class wikitable
Optimized your code a little
Hence:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia")
soup = BeautifulSoup(page.content, 'html.parser')
ta = soup.find_all('table',class_="wikitable")
print(ta)
OUTPUT:
[<table class="wikitable sortable">
<tbody><tr>
<th>Rank
</th>
<th>Image
</th>
<th>Name
</th>
<th>2016 Revenues (USD $M)
</th>
<th>Employees
</th>
<th>Notes
.
.
.
Maybe it's not what you are looking for. But you can try this one.
import requests
from bs4 import BeautifulSoup as bs
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
for data in soup.find_all('table', {"class":"wikitable"}):
for td in data.find_all('td'):
for link in td.find_all('a'):
print (link.text)
try the below,
import requests
from bs4 import BeautifulSoup as bs
URL = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(URL).text
soup = bs(html, 'html.parser')
ta=soup.find("table",{"class":"wikitable sortable"})
print(ta)
to get all the tables
ta=soup.find_all("table",{"class":"wikitable sortable"})
If you want to parse table data then you can do this using pandas and very efficient if you want to manipulate the table data, you can navigate the table using pandas DataFrame()
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
table = pd.read_html(url,header=0)
print(table[1])

Inserting a header row for pandas dataframe

I have just started python and am trying to rewrite one of my perl scripts in python. Essentially, I had a long script to convert a csv to json.
I've tried to import my csv into a pandas dataframe, and wanted to insert a header row at the top, since my csv lacks that.
Code:
import pandas
db=pandas.read_csv("netmedsdb.csv",header=None)
db
Output:
0 1 2 3
0 3M CAVILON NO STING BARRIER FILM SPRAY 28ML OTC 0 Rs.880.00 3M INDIA LTD
1 BACTI BAR SOAP 75GM OTC Rs.98.00 6TH SKIN PHARMACEUTICALS PVT LTD
2 KWIKNIC MINT FLAVOUR 4MG CHEW GUM TABLET 30'S NICOTINE Rs.180.00 A S V LABORATORIES INDIA PVT LTD
3 RIFAGO 550MG TABLET 10'S RIFAXIMIN 550MG Rs.298.00 AAREEN HEALTHCARE
4 999 OIL 60ML AYURVEDIC MEDICINE Rs.120.00 AAKASH PHARMACEUTICALS
5 AKASH SOAP 75GM AYURVEDIC PRODUCT Rs.80.00 AAKASH PHARMACEUTICALS
6 GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
7 GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
8 RHUNS OIL 30ML AYURVEDIC Rs.50.00 AAKASH PHARMACEUTICALS
9 VILLO CAPSULE 10'S AYURVEDIC MEDICINE Rs.70.00 AAKASH PHARMACEUTICALS
10 VITAWIN FORTE CAPSULE 10'S AYURVEDIC MEDICINE Rs.150.00 AAKASH PHARMACEUTICALS
I wrote the following code to insert the first element at row 0,coloumn 0:
db.insert(loc=0,column='0',value='Brand')
db
Output:
0 0 1 2 3
0 Brand 3M CAVILON NO STING BARRIER FILM SPRAY 28ML OTC 0 Rs.880.00 3M INDIA LTD
1 Brand BACTI BAR SOAP 75GM OTC Rs.98.00 6TH SKIN PHARMACEUTICALS PVT LTD
2 Brand KWIKNIC MINT FLAVOUR 4MG CHEW GUM TABLET 30'S NICOTINE Rs.180.00 A S V LABORATORIES INDIA PVT LTD
3 Brand RIFAGO 550MG TABLET 10'S RIFAXIMIN 550MG Rs.298.00 AAREEN HEALTHCARE
4 Brand 999 OIL 60ML AYURVEDIC MEDICINE Rs.120.00 AAKASH PHARMACEUTICALS
5 Brand AKASH SOAP 75GM AYURVEDIC PRODUCT Rs.80.00 AAKASH PHARMACEUTICALS
6 Brand GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
7 Brand GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
8 Brand RHUNS OIL 30ML AYURVEDIC Rs.50.00 AAKASH PHARMACEUTICALS
9 Brand VILLO CAPSULE 10'S AYURVEDIC MEDICINE Rs.70.00 AAKASH PHARMACEUTICALS
10 Brand VITAWIN FORTE CAPSULE 10'S AYURVEDIC MEDICINE Rs.150.00 AAKASH PHARMACEUTICALS
But unfortunately I got the word "Brand" inserted at coloumn 0 in all rows.
I'm trying to add the header coloumns "Brand", "Generic", "Price", "Company"
Need parameter names in read_csv only:
import pandas as pd
temp=u"""a,b,10,d
e,f,45,r
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'netmedsdb.csv'
df = pd.read_csv(pd.compat.StringIO(temp), names=["Brand", "Generic", "Price", "Company"])
print (df)
Brand Generic Price Company
0 a b 10 d
1 e f 45 r

Creating a corpus from different JSON files

I would like to create a corpus composed by the body of different articles stored in a JSON format. They are in different files named after the year, for example:
with open('Scot_2005.json') as f:
data = [json.loads(line) for line in f]
corresponds to a newspaper, Scotsman for the year 2005. Moreover, the rest of the files for this newspaper are named: APJ_2006.... APJ2015. Also. I have another newspaper, Scottish Daily Mail, that goes only from the years 2014-1015: SDM_2014, SDM_2015. I would like to create a common list with the body of all these articles:
doc_set = [d['body'] for d in data]
My problem is looping the first part of the code that I posted so that data corresponds to all articles rather than just the ones from a given newspaper at a given year. Any ideas of how to accomplish this task? In my attempt I try using Pandas such:
for i in range(2005,2016):
df = pandas.DataFrame([json.loads(l) for l in open('Scot_%d.json' % i)])
doc_set = df.body
The problem with this method seems to me to be: it does not append all years; I am not sure how to include other newspapers with time intervals other than from 2005-15. The outcome of this method looks like:
date
2015-12-31 The Institute of Directors (IoD) has added its...
2015-12-31 It is startling to see how much the Holyrood l...
2015-12-31 A hike in interest rates in the new year will ...
2015-12-31 The First Minister has resolved to make 2016 a...
2015-12-30 The Scottish Government announced yesterday th...
2015-12-30 The Footsie closed lower amid falling oil pric...
2015-12-28 BEFORE we start the guessing game for 2016, a ...
2015-12-27 AS WE ushered in 2015, few would have predicte...
2015-12-23 No matter how hard Derek McInnes and his Aberd...
2015-12-21 THE HEAD of a Scottish Government task force s...
2015-12-17 A Scottish local authority has fought off a le...
2015-12-17 Markets lifted after the Federal Reserve hiked...
2015-12-17 Significant increases in UK quotas for fish in...
2015-12-17 WAR of words with Donald Trump suggests its ti...
2015-12-16 SCOTLAND'S national performance companies have...
2015-12-15 Markets jumped ahead of what investors expect ...
2015-12-14 Political uncertainty in back seat as transpor...
2015-12-11 The International Monetary Fund (IMF) has warn...
2015-12-08 Scotland has a "spring in its step" with the j...
2015-12-07 London's leading share index struggled for dir...
2015-12-03 REDUCING carbon is just the start of it, write...
2015-11-26 One of the country's most prized salmon rivers...
2015-11-23 Tax and legislative changes undermine strong f...
2015-11-23 A second House of Lords committee has called f...
2015-11-14 At first glance, Scotland's economic performan...
2015-11-13 THE United States has long been viewed as the ...
2015-11-12 IT IS vital for a new governance group to rest...
2015-11-12 Former SSE chief Ian Marchant has criticised r...
2015-11-11 Telecoms firm TalkTalk said it will take a hit...
2015-11-09 Improvements to consumer rights legislation ma...
...
2015-02-25 Traders baulked at an assault on the 7,000 lev...
2015-02-24 BRITISH military personnel are to be deployed ...
2015-02-20 DAVID Cameron has announced a £859 million inv...
2015-02-16 Falling oil prices and slowing inflation have ...
2015-02-14 DEFENCE spending cuts and falling oil prices h...
2015-02-14 Brent crude rallied to a 2015 high and helped ...
2015-02-12 THE HOUSING markets in Scotland and Northern I...
2015-02-10 INVESTMENT in Scotland's commercial property m...
2015-02-09 Investors took flight after Greece's new gover...
2015-02-01 Experts say large numbers are delaying decisio...
2015-01-29 MORE than 300 jobs are at risk after Tesco sai...
2015-01-27 THE Three Bears have hit out at the Rangers bo...
2015-01-21 GEORGE Osborne has challenged the right of SNP...
2015-01-19 Employment figures this week should show Briti...
2015-01-19 Why haven't petrol pump prices fallen as fast ...
2015-01-18 Without an agreement on immediate action, the...
2015-01-17 A SECOND independence referendum could be trig...
2015-01-14 THE RETAILER, which like its rivals has come u...
2015-01-14 HOUSE prices in Scotland rose by more than 4 p...
2015-01-13 HOUSE builder Taylor Wimpey is preparing for a...
2015-01-13 Supermarket group Sainsbury's today said it wo...
2015-01-13 INFLATION has tumbled to its lowest level on r...
2015-01-12 BUSINESSES are bullish about their ­prospects ...
2015-01-11 FOR decades, oil has dripped through our natio...
2015-01-09 Shares in the housebuilding sector fell heavil...
2015-01-08 THE Bank of England is expected to leave inter...
2015-01-05 COMPANIES in Scotland are more optimistic abou...
2015-01-04 UK is doing OK, but uncertainty looms on mid-y...
2015-01-02 The London market began the new year in a subd...
2015-01-02 The famous election mantra of Bill Clinton's c...
Name: body, dtype: object
Assuming you have a file list:
file_name_list = ( 'Scot_2005.json', 'APJ_2006.json' )
You can append to a list like this:
data = list()
for file_name in file_name_list:
with open(file_name, 'r') as json_file:
for line in json_file:
data.append(json.loads(line))
If you want to create the file_name_list programmatically, you can use the glob library.

Categories

Resources