ARIMA exogenous variable out of sample - python

fit = statsmodels.api.tsa.ARIMA(efRates[0], (1,1,1), exog=ueRate).fit(transparams=False)
predicts = fit.predict(start=len(efRates[0]), end = len(efRates[0])+11, exog=ueRate, typ = 'levels')
Generates a
File "C:\Users\Saul Ramirez\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\tsa\arima_model.py", line 720, in predict
if self.k_exog == 1 and exog.ndim == 1:
AttributeError: 'bool' object has no attribute 'ndim'
Some info: efRates[0] and ueRate are lists of the same length.
efRates[0]
[0.030052056971642007,
0.03917330288542586,
0.02828475062426216,
0.03644101079605235,
0.03378605359919436,
0.02743587918046455,
0.03342745492501596,
0.026205917483282503,
0.030503758568976337,
0.024550760529053202,
0.03261189266424876,
0.03506521240864593,
0.027338276601998696,
0.053725765854704746,
0.02676967429100413,
0.03442977438269886,
0.033314687425925964,
0.027406120117972988,
0.037085495711527916,
0.021131004053371122,
0.03342530957311805,
0.02011467948214261,
0.03674645825546184,
0.030766279328527657,
0.022010347634637235,
0.048441932020847935,
0.055182794314502556,
0.037653187998947804,
0.054329400023020905,
0.030487014172364307,
0.04828703019272537,
0.029364609341652963,
0.04420916320116292,
0.0245732204143899,
0.04007219462688283,
0.030088483595491378,
0.04503547974992547,
0.050414257448672777,
0.03650945820093438,
0.0271939590858418,
0.043825558271225154,
0.02887263694287208,
0.034395655516300985,
0.033476222069816444,
0.02364138126589003,
0.034956784469719566,
0.025488157761323762,
0.03284135171594629,
0.0352266773873871,
0.02578522887525815,
0.030801158226067212,
0.017836011389627614,
0.03237266466197845,
0.020781381627205192,
0.03507981277516531,
0.030619701683938114,
0.0200645972051283,
0.02340543468851082,
0.022232375406303732,
0.031450255120488005,
0.030807264010862326,
0.02520300632649576,
0.02683432106844716,
0.01719544921035768,
0.022245308176032028,
0.015787396423808154,
0.02236691164709978,
0.022948859956318242,
0.018302596298743336,
0.02356268219722402,
0.020514907102090335,
0.029322000183361653,
0.030253386469667742,
0.02389996663574461,
0.026350732450672106,
0.018634569853141162,
0.02993859530565429,
0.01762489169698181,
0.028369112029450066,
0.024207088908217232,
0.019513438046869554,
0.02149236584384482,
0.020792834468107983,
0.0252767276304043,
0.025754940371044845,
0.01633653635317383,
0.02562719118582408,
0.01718720874173012,
0.02915438356543398,
0.017238835380189263,
0.028044663751279383,
0.027504015027686957,
0.020989801458819447,
0.025215885766374995,
0.02422123160263125,
0.03253702270430853,
0.02095284431753602,
0.03241141468118923,
0.018667854534336364,
0.03997670839216877,
0.022116655885610726,
0.030336876645878957,
0.03418820217137176,
0.018663800522544426,
0.02623414798030232,
0.020524065760586897]
ueRate
[4.9,
5,
5,
5,
4.9,
4.7,
4.8,
4.7,
4.7,
4.6,
4.6,
4.7,
4.7,
4.5,
4.4,
4.5,
4.4,
4.6,
4.5,
4.4,
4.5,
4.4,
4.6,
4.7,
4.6,
4.7,
4.7,
4.7,
5,
5,
4.9,
5.1,
5,
5.4,
5.6,
5.8,
6.1,
6.1,
6.5,
6.8,
7.3,
7.8,
8.3,
8.7,
9,
9.4,
9.5,
9.5,
9.6,
9.8,
10,
9.9,
9.9,
9.7,
9.8,
9.9,
9.9,
9.6,
9.4,
9.5,
9.5,
9.5,
9.5,
9.8,
9.4,
9.1,
9,
9,
9.1,
9,
9.1,
9,
9,
9,
8.8,
8.6,
8.5,
8.2,
8.3,
8.2,
8.2,
8.2,
8.2,
8.2,
8.1,
7.8,
7.8,
7.8,
7.9,
7.9,
7.7,
7.5,
7.5,
7.5,
7.5,
7.3,
7.2,
7.2,
7.2,
7,
6.7,
6.6,
6.7,
6.7,
6.3,
6.3]

Related

Best way to split a dict based on key

I have a dict that looks like this (shortened)
{'18/09/2022-morning': [5.4, 6.0, 6.5, 6.7, 6.9, 7.9, 8.5, 7.5, 7.9, 7.8, 7.6, 6.8],
'18/09/2022-night': [6.4, 5.7, 4.8, 5.4, 4.7, 4.3],
'19/09/2022-morning': [3.8],
'19/09/2022-night': [4.1, 4.4, 4.3, 3.8, 3.5, 2.8]}
What is the best way to split it into two different dictionaries based on morning/night?
I can't think of an easy way to this! Example of desired output:
dic1 = {'18/09/2022-morning': [5.4, 6.0, 6.5, 6.7, 6.9, 7.9, 8.5, 7.5, 7.9, 7.8, 7.6, 6.8], '19/09/2022-morning': [3.8]}
dic2 = {'18/09/2022-night': [6.4, 5.7, 4.8, 5.4, 4.7, 4.3], '19/09/2022-night': [4.1, 4.4, 4.3, 3.8, 3.5, 2.8]}
mornings = {}
nights = {}
for k, v in d.items():
if k.endswith("morning"):
mornings[k] = v
else:
nights[k] = v

Fit gamma distribution with fixed mean with scipy?

scipy.stats.rv_continuous.fit, allows you to fix parameters when fitting a distribution, but it's dependent on scipy's choice of parametrization. For the gamma distribution is uses the k, theta (shape, scale) parametrization, so it would be easy to fit while holding theta constant, for example. I want to fit to a data set where I know the mean, but the observed mean might vary due to sampling error. This would be easy if scipy used the parametrization that uses mu = k*theta instead of theta. Is there a way to make scipy do this? And if not, is there another library that can?
Here's some example code with a data set has an observed mean of 9.952, but I know the actual mean of the underlying distribution is 11:
from scipy.stats import gamma
observations = [17.6, 24.9, 3.9, 17.6, 11.8, 10.4, 4.1, 11.7, 5.7, 1.6,
8.6, 12.9, 5.7, 8.0, 7.4, 1.2, 11.3, 10.4, 1.0, 1.9,
6.0, 9.3, 13.3, 5.4, 9.1, 4.0, 12.8, 11.1, 23.1, 4.2,
7.9, 11.1, 10.0, 3.4, 27.8, 7.2, 14.9, 2.9, 5.5, 7.0,
3.9, 12.3, 10.6, 22.1, 5.0, 4.1, 21.3, 15.9, 34.5, 8.1,
19.6, 10.8, 13.4, 22.8, 27.6, 6.8, 5.9, 9.0, 7.1, 21.2,
1.0, 14.6, 16.9, 1.0, 6.5, 2.9, 7.1, 14.1, 15.2, 7.8,
9.0, 4.9, 2.1, 9.5, 5.6, 11.1, 7.7, 18.3, 3.8, 11.0,
4.2, 12.5, 8.4, 3.2, 4.0, 3.8, 2.0, 24.7, 24.6, 3.4,
4.3, 3.2, 7.6, 8.3, 14.5, 8.3, 8.4, 14.0, 1.0, 9.0]
shape, _, scale = gamma.fit(observations, floc = 0)
print(shape*scale)
and this gives
9.952
but I would like a fit such that shape*scale = 11.0
The fit method of the SciPy distributions provides the maximum likelihood estimate of the parameters. You are correct that it only provides for fitting the shape, location and scale. (Actually, you said shape and scale, but SciPy also includes a location parameter. Sometimes this is called the three parameter gamma distribution.)
For most of the distributions in SciPy, the fit method uses a numerical optimizer to minimize the negative log-likelihood, as defined in the nnlf method. Instead of using the fit method, you could do this yourself with just a couple lines of code. This allows you to create an objective function with just one parameter, say the shape k, and within that function, set theta = mean/k, where mean is the desired mean, and call gamma.nnlf to evaluate the negative log-likelihood. Here's one way you could do it:
import numpy as np
from scipy.stats import gamma
from scipy.optimize import fmin
def nll(k, mean, x):
return gamma.nnlf(np.array([k[0], 0, mean/k[0]]), x)
observations = [17.6, 24.9, 3.9, 17.6, 11.8, 10.4, 4.1, 11.7, 5.7, 1.6,
8.6, 12.9, 5.7, 8.0, 7.4, 1.2, 11.3, 10.4, 1.0, 1.9,
6.0, 9.3, 13.3, 5.4, 9.1, 4.0, 12.8, 11.1, 23.1, 4.2,
7.9, 11.1, 10.0, 3.4, 27.8, 7.2, 14.9, 2.9, 5.5, 7.0,
3.9, 12.3, 10.6, 22.1, 5.0, 4.1, 21.3, 15.9, 34.5, 8.1,
19.6, 10.8, 13.4, 22.8, 27.6, 6.8, 5.9, 9.0, 7.1, 21.2,
1.0, 14.6, 16.9, 1.0, 6.5, 2.9, 7.1, 14.1, 15.2, 7.8,
9.0, 4.9, 2.1, 9.5, 5.6, 11.1, 7.7, 18.3, 3.8, 11.0,
4.2, 12.5, 8.4, 3.2, 4.0, 3.8, 2.0, 24.7, 24.6, 3.4,
4.3, 3.2, 7.6, 8.3, 14.5, 8.3, 8.4, 14.0, 1.0, 9.0]
# This is the desired mean of the distribution.
mean = 11
# Initial guess for the shape parameter.
k0 = 3.0
opt = fmin(nll, k0, args=(mean, np.array(observations)),
xtol=1e-11, disp=False)
k_opt = opt[0]
theta_opt = mean / k_opt
print(f"k_opt: {k_opt:9.7f}")
print(f"theta_opt: {theta_opt:9.7f}")
This script prints
k_opt: 1.9712604
theta_opt: 5.5801861
Alternatively, one can modify the first order conditions for the extremum of the log-likelihood shown in wikipedia so that there is just one parameter, k. Then the condition for the extreme value can be implemented as a scalar equation whose root can be found with, say, scipy.optimize.fsolve. The following is a variation of the above script that uses this technique.
import numpy as np
from scipy.special import digamma
from scipy.optimize import fsolve
def first_order_eq(k, mean, x):
mean_logx = np.mean(np.log(x))
return (np.log(k) - digamma(k) + mean_logx - np.mean(x)/mean
- np.log(mean) + 1)
observations = [17.6, 24.9, 3.9, 17.6, 11.8, 10.4, 4.1, 11.7, 5.7, 1.6,
8.6, 12.9, 5.7, 8.0, 7.4, 1.2, 11.3, 10.4, 1.0, 1.9,
6.0, 9.3, 13.3, 5.4, 9.1, 4.0, 12.8, 11.1, 23.1, 4.2,
7.9, 11.1, 10.0, 3.4, 27.8, 7.2, 14.9, 2.9, 5.5, 7.0,
3.9, 12.3, 10.6, 22.1, 5.0, 4.1, 21.3, 15.9, 34.5, 8.1,
19.6, 10.8, 13.4, 22.8, 27.6, 6.8, 5.9, 9.0, 7.1, 21.2,
1.0, 14.6, 16.9, 1.0, 6.5, 2.9, 7.1, 14.1, 15.2, 7.8,
9.0, 4.9, 2.1, 9.5, 5.6, 11.1, 7.7, 18.3, 3.8, 11.0,
4.2, 12.5, 8.4, 3.2, 4.0, 3.8, 2.0, 24.7, 24.6, 3.4,
4.3, 3.2, 7.6, 8.3, 14.5, 8.3, 8.4, 14.0, 1.0, 9.0]
# This is the desired mean of the distribution.
mean = 11
# Initial guess for the shape parameter.
k0 = 3
sol = fsolve(first_order_eq, k0, args=(mean, observations),
xtol=1e-11)
k_opt = sol[0]
theta_opt = mean / k_opt
print(f"k_opt: {k_opt:9.7f}")
print(f"theta_opt: {theta_opt:9.7f}")
Output:
k_opt: 1.9712604
theta_opt: 5.5801861

Sum combination of lists by element

I have a nested list, which can be of varying length (each sublist will always contain the same number of elements as the others):
list1=[[4.1,2.9,1.2,4.5,7.9,1.2],[0.7,1.1,2.0,0.4,1.8,2.2],[5.1,4.1,6.5,7.1,2.3,3.6]]
I can find every possible combination of sublists of length n using itertools:
n=2
itertools.combinations(list1,n)
[([4.1, 2.9, 1.2, 4.5, 7.9, 1.2], [0.7, 1.1, 2.0, 0.4, 1.8, 2.2]),
([4.1, 2.9, 1.2, 4.5, 7.9, 1.2], [5.1, 4.1, 6.5, 7.1, 2.3, 3.6]),
([0.7, 1.1, 2.0, 0.4, 1.8, 2.2], [5.1, 4.1, 6.5, 7.1, 2.3, 3.6])]
I would like to sum all lists in each tuple by index. In this example, I would end up with:
[([4.8, 4.0, 3.2, 4.9, 9.7, 3.4],
[9.2, 7.0, 7.7, 6.8, 10.2, 4.8],
[5.8, 5.2, 8.5, 7.5, 4.1, 5.8])]
I have tried:
[sum(x) for x in itertools.combinations(list1,n)]
[sum(x) for x in zip(*itertools.combinations(list1,n))]
Each run into errors.
You can use zip for this:
>>> [tuple(map(sum, zip(*x))) for x in itertools.combinations(list1, n)]
[(4.8, 4.0, 3.2, 4.9, 9.700000000000001, 3.4000000000000004),
(9.2, 7.0, 7.7, 11.6, 10.2, 4.8),
(5.8, 5.199999999999999, 8.5, 7.5, 4.1, 5.800000000000001)]
Try this :
>>> list1=[[4.1,2.9,1.2,4.5,7.9,1.2],[0.7,1.1,2.0,0.4,1.8,2.2],[5.1,4.1,6.5,7.1,2.3,3.6]]
>>> from itertools import combinations as c
>>> list(list(map(sum, zip(*k))) for k in c(list1, 2))
[[4.8, 4.0, 3.2, 4.9, 9.700000000000001, 3.4000000000000004], [9.2, 7.0, 7.7, 11.6, 10.2, 4.8], [5.8, 5.199999999999999, 8.5, 7.5, 4.1, 5.800000000000001]]

How can I convert a dict_keys list to integers

I am trying to find a way of converting a list within dict_keys() to an integer so I can use it as a trigger to send to another system. My code (below) imports a list of 100 words (a txt file with words each on a new line) which belong to 10 categories (e.g. the first 10 words belong to category 1, second 10 words belong to category 2 etc...).
Code:
from numpy.random import choice
from collections import defaultdict
number_of_elements = 10
Words = open('file_location').read().split()
categories = defaultdict(list)
for i in range(len(words)):
categories[i/number_of_elements].append(words[i])
category_labels = categories.keys()
category_labels
Output
dict_keys([0.0, 1.1, 2.0, 3.0, 4.9, 5.0, 0.5, 1.9, 8.0, 9.0, 1.3, 2.7, 3.9, 9.2, 9.4, 7.2, 4.2, 8.6, 5.1, 5.4, 3.3, 1.0, 6.6, 7.4, 7.7, 8.4, 5.8, 9.8, 0.7, 8.8, 2.1, 7.0, 6.4, 4.3, 0.1, 2.5, 3.8, 1.2, 6.9, 7.1, 5.6, 0.4, 5.3, 2.9, 7.3, 3.5, 9.5, 8.2, 2.8, 3.1, 0.9, 2.3, 8.1, 4.0, 6.3, 6.7, 4.5, 0.2, 1.7, 2.2, 8.9, 1.4, 7.6, 9.1, 7.8, 5.5, 4.8, 0.6, 3.2, 2.4, 6.5, 9.9, 9.6, 1.5, 6.0, 3.7, 4.7, 3.4, 5.9, 4.1, 1.6, 6.8, 9.3, 3.6, 8.5, 8.7, 0.3, 0.8, 7.5, 5.2, 2.6, 4.6, 5.7, 7.9, 6.1, 1.8, 8.3, 6.2, 9.7, 4.4])
What I need:
I would like the first number before the point (e.g. if it was 6.7, I just want the 6 as an int).
Thank you in advance for any help and/or advice!
Just convert your keys to integers using a list comprehension; note that there is no need to call .keys() here as iteration over the dictionary directly suffices:
[int(k) for k in categories]
You may want to bucket your values directly into integer categories rather than by floating point values:
categories = defaultdict(list)
for i, word in enumerate(words):
categories[int(i / number_of_elements)].append(word)
I used enumerate() to pair words up with their index, rather than use range() plus indexing back into words.

Adding new line to data for csv in python

I'm trying to scrape data from http://www.hoopsstats.com/basketball/fantasy/nba/opponentstats/16/12/eff/1-1 to create a CSV file using Python 3.5. I've figured out how to do so, but all the data is in the same row when I open the file in excel.
import sys
import requests
from bs4 import BeautifulSoup
import csv
r = requests.get('http://www.hoopsstats.com/basketball/fantasy/nba/opponentstats/16/12/eff/1-1')
soup = BeautifulSoup(r.text, "html.parser")
stats = soup.find_all('table', 'statscontent')
pgFile = open ('C:\\Users\\James\\Documents\\testpoop.csv', 'w')
for table in soup.find_all('table', 'statscontent','a'):
stats = [ stat.text for stat in table.find_all('center') ]
team = [team for team in table.find('a')]
p = (team,stats)
z = str(p)
a = z.replace("]",'')
b = a.replace("'", "")
c = b.replace(")", "") #Only way I knew how to clean up extra characters
d = c.replace("(", "")
e = d.replace("[", "")
print(e) #printing while testing
pgFile.writelines(e)
pgFile.close()
The data comes out nice in the python shell:
Boston, 1, 67, 47.9, 19.6, 5.2, 7.2, 1.8, 0.5, 4.3, 4.1, 4.3, 0.9, 6.8-16.1, .421, 1.6-4.9, .324, 4.4-5.4, .816, 19.7, -6.8
San Antonio, 2, 67, 47.8, 19.7, 5.0, 8.7, 1.9, 0.3, 3.5, 3.3, 4.2, 0.8, 7.4-18.0, .411, 1.5-4.6, .317, 3.4-4.2, .819, 20.7, -2.4
Atlanta, 3, 67, 48.7, 19.2, 5.6, 8.4, 2.3, 0.6, 4.1, 3.7, 4.6, 1.0, 7.1-17.6, .401, 2.0-5.8, .338, 3.2-3.8, .828, 20.8, -5.6
Miami, 4, 67, 49.8, 20.6, 5.2, 8.0, 1.9, 0.3, 3.2, 3.6, 4.3, 0.9, 7.6-18.5, .407, 1.9-5.3, .348, 3.7-4.5, .814, 21.0, 2.1
L.A.Clippers, 5, 66, 48.2, 21.0, 5.7, 8.7, 1.9, 0.2, 4.1, 4.5, 4.6, 1.1, 7.6-18.7, .405, 1.9-5.4, .346, 3.9-4.9, .799, 21.1, -7.0
Toronto, 6, 66, 48.0, 20.5, 5.3, 8.8, 1.7, 0.6, 3.8, 3.7, 4.4, 0.9, 7.4-18.0, .412, 2.1-5.9, .349, 3.6-4.4, .826, 21.6, -4.3
Charlotte, 7, 66, 48.1, 19.3, 6.0, 9.1, 1.6, 0.6, 3.4, 4.1, 5.1, 0.9, 7.1-17.8, .399, 2.0-6.4, .321, 3.0-3.7, .802, 21.7, -4.5
Milwaukee, 8, 68, 48.8, 19.3, 5.4, 9.1, 1.9, 0.3, 4.2, 3.5, 4.6, 0.8, 6.8-15.9, .425, 1.9-6.0, .311, 3.9-5.0, .788, 21.7, 2.1
Utah, 9, 67, 49.3, 21.9, 5.5, 8.1, 2.3, 0.4, 3.7, 3.4, 4.5, 1.0, 7.8-18.3, .424, 2.2-5.7, .382, 4.1-5.3, .787, 22.7, 5.8
Memphis, 10, 67, 48.7, 22.4, 5.1, 8.3, 1.6, 0.4, 3.9, 4.1, 4.3, 0.8, 7.7-17.7, .434, 2.5-7.0, .358, 4.6-5.7, .813, 22.9, -2.0
Detroit, 11, 67, 49.1, 22.3, 5.8, 8.4, 1.6, 0.3, 3.7, 4.2, 4.9, 0.9, 8.4-19.1, .441, 2.0-5.5, .362, 3.5-4.4, .801, 23.2, -0.1
Minnesota, 12, 67, 47.1, 21.9, 5.3, 8.7, 2.0, 0.3, 3.6, 3.9, 4.3, 1.0, 8.1-18.7, .434, 2.2-6.5, .336, 3.5-4.2, .826, 23.3, -2.8
Portland, 13, 68, 47.8, 22.5, 5.1, 8.1, 1.8, 0.5, 3.1, 3.7, 4.1, 1.0, 8.2-18.8, .438, 2.1-5.7, .370, 4.0-5.1, .777, 23.3, -1.0
New York, 14, 68, 47.5, 21.2, 6.0, 8.5, 1.9, 0.2, 3.0, 2.6, 4.9, 1.1, 7.7-18.3, .419, 1.8-5.2, .342, 4.1-5.0, .819, 23.3, 6.4
Houston, 15, 67, 50.9, 21.3, 6.2, 9.8, 2.3, 0.3, 5.0, 4.3, 5.3, 0.9, 7.7-18.4, .417, 2.3-6.7, .351, 3.6-4.4, .809, 23.3, 6.1
Indiana, 16, 67, 49.3, 23.3, 5.9, 8.3, 1.8, 0.4, 4.6, 3.9, 5.0, 0.9, 8.3-18.8, .443, 2.3-5.8, .387, 4.3-5.3, .813, 23.7, 5.4
Chicago, 17, 65, 48.9, 22.2, 6.4, 8.6, 2.1, 0.6, 2.9, 2.8, 5.2, 1.2, 8.2-20.3, .407, 1.8-5.6, .323, 3.9-5.2, .764, 23.8, 4.7
Golden State, 18, 66, 49.3, 24.5, 5.1, 8.4, 2.4, 0.2, 3.7, 4.1, 4.0, 1.2, 9.1-21.3, .427, 2.3-6.6, .350, 4.0-5.0, .802, 23.8, -14.7
Dallas, 19, 67, 49.5, 22.1, 6.0, 8.3, 2.0, 0.4, 3.3, 4.0, 5.1, 0.9, 8.3-18.7, .440, 2.1-6.1, .347, 3.4-4.4, .778, 24.0, 2.0
Washington, 20, 66, 49.5, 23.8, 5.8, 8.2, 2.0, 0.3, 4.4, 3.9, 5.0, 0.9, 8.9-20.1, .444, 2.5-6.4, .398, 3.5-4.1, .851, 24.1, -4.6
Cleveland, 21, 66, 49.3, 22.9, 5.7, 9.1, 1.9, 0.3, 3.5, 3.3, 4.9, 0.8, 8.3-19.4, .428, 2.0-5.5, .360, 4.3-5.1, .837, 24.3, 1.0
Denver, 22, 68, 48.6, 21.8, 5.9, 8.8, 1.9, 0.5, 3.3, 3.8, 4.9, 1.0, 7.8-17.9, .436, 2.4-6.5, .369, 3.9-4.9, .783, 24.5, 5.8
Philadelphia, 23, 67, 48.6, 21.9, 6.0, 8.8, 2.3, 0.5, 4.1, 3.4, 5.0, 0.9, 8.0-17.8, .447, 1.7-4.7, .366, 4.2-5.0, .837, 24.7, 2.8
Oklahoma City, 24, 67, 48.1, 22.6, 6.1, 8.5, 2.1, 0.3, 3.1, 3.8, 5.0, 1.1, 8.2-18.7, .440, 2.4-5.9, .405, 3.8-5.0, .750, 24.8, -10.4
Orlando, 25, 66, 49.6, 22.9, 6.7, 9.2, 1.9, 0.6, 4.3, 3.5, 5.7, 1.0, 8.2-18.5, .444, 2.3-6.1, .385, 4.2-5.2, .794, 25.6, 5.7
Brooklyn, 26, 67, 48.5, 23.0, 5.5, 9.0, 2.4, 0.3, 3.5, 3.2, 4.5, 1.0, 8.6-18.6, .463, 2.6-6.6, .390, 3.3-4.3, .768, 25.8, 3.4
Sacramento, 27, 66, 49.7, 23.7, 5.9, 9.5, 2.3, 0.4, 4.0, 3.6, 4.8, 1.0, 8.6-19.8, .436, 2.6-7.5, .346, 3.9-4.7, .834, 25.9, -0.3
New Orleans, 28, 66, 49.9, 24.3, 5.7, 8.9, 1.6, 0.4, 3.5, 3.6, 4.8, 0.9, 8.7-18.2, .475, 2.6-6.3, .415, 4.4-5.3, .821, 26.9, 0.8
L.A.Lakers, 29, 68, 49.5, 24.5, 6.0, 9.8, 1.9, 0.4, 3.4, 3.3, 4.9, 1.1, 9.3-20.6, .449, 2.3-6.7, .349, 3.6-4.5, .818, 26.9, 4.8
Phoenix, 30, 67, 49.0, 25.3, 5.8, 9.5, 2.3, 0.4, 4.1, 4.0, 4.7, 1.1, 9.2-20.3, .452, 2.6-6.6, .388, 4.4-5.6, .788, 27.0, 7.1
but when opened in excel each value is in it's own cell, but they're all in the first row. I want a new row for each team.
Use csv.writer to write CSV data to a CSV file:
import csv
import requests
from bs4 import BeautifulSoup
r = requests.get('http://www.hoopsstats.com/basketball/fantasy/nba/opponentstats/16/12/eff/1-1')
soup = BeautifulSoup(r.text, "html.parser")
with open("output.csv", "w") as f:
writer = csv.writer(f)
for table in soup.find_all('table', class_='statscontent'):
team = table.find('a').text
stats = [team] + [stat.text for stat in table.find_all('center')]
writer.writerow(stats)
Now, in the output.csv the following content would be written:
Boston,1,67,47.9,19.6,5.2,7.2,1.8,0.5,4.3,4.1,4.3,0.9,6.8-16.1,.421,1.6-4.9,.324,4.4-5.4,.816,19.7,-6.8
San Antonio,2,67,47.8,19.7,5.0,8.7,1.9,0.3,3.5,3.3,4.2,0.8,7.4-18.0,.411,1.5-4.6,.317,3.4-4.2,.819,20.7,-2.4
Atlanta,3,67,48.7,19.2,5.6,8.4,2.3,0.6,4.1,3.7,4.6,1.0,7.1-17.6,.401,2.0-5.8,.338,3.2-3.8,.828,20.8,-5.6
Miami,4,67,49.8,20.6,5.2,8.0,1.9,0.3,3.2,3.6,4.3,0.9,7.6-18.5,.407,1.9-5.3,.348,3.7-4.5,.814,21.0,2.1
L.A.Clippers,5,66,48.2,21.0,5.7,8.7,1.9,0.2,4.1,4.5,4.6,1.1,7.6-18.7,.405,1.9-5.4,.346,3.9-4.9,.799,21.1,-7.0
Toronto,6,66,48.0,20.5,5.3,8.8,1.7,0.6,3.8,3.7,4.4,0.9,7.4-18.0,.412,2.1-5.9,.349,3.6-4.4,.826,21.6,-4.3
Charlotte,7,66,48.1,19.3,6.0,9.1,1.6,0.6,3.4,4.1,5.1,0.9,7.1-17.8,.399,2.0-6.4,.321,3.0-3.7,.802,21.7,-4.5
Milwaukee,8,68,48.8,19.3,5.4,9.1,1.9,0.3,4.2,3.5,4.6,0.8,6.8-15.9,.425,1.9-6.0,.311,3.9-5.0,.788,21.7,2.1
...
Sacramento,27,66,49.7,23.7,5.9,9.5,2.3,0.4,4.0,3.6,4.8,1.0,8.6-19.8,.436,2.6-7.5,.346,3.9-4.7,.834,25.9,-0.3
New Orleans,28,66,49.9,24.3,5.7,8.9,1.6,0.4,3.5,3.6,4.8,0.9,8.7-18.2,.475,2.6-6.3,.415,4.4-5.3,.821,26.9,0.8
L.A.Lakers,29,68,49.5,24.5,6.0,9.8,1.9,0.4,3.4,3.3,4.9,1.1,9.3-20.6,.449,2.3-6.7,.349,3.6-4.5,.818,26.9,4.8
Phoenix,30,67,49.0,25.3,5.8,9.5,2.3,0.4,4.1,4.0,4.7,1.1,9.2-20.3,.452,2.6-6.6,.388,4.4-5.6,.788,27.0,7.1

Categories

Resources