Traceback when using Pandas to convert to Json Format

Traceback when using Pandas to convert to Json Format - python

I recently have been trying to use the NBA API to pull shot chart data. I'll link the documentation for the specific function I'm using here.
I keep getting a traceback as follows:
Traceback (most recent call last):
File "nbastatsrecieve2.py", line 27, in <module>
df.to_excel(filename, index=False)
File "C:\Users\*\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\generic.py", line 2023, in to_excel
formatter.write(
File "C:\Users\*\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\io\formats\excel.py", line 730, in write
writer = ExcelWriter(stringify_path(writer), engine=engine)
File "C:\Users\*\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\io\excel\_base.py", line 637, in __new__
raise ValueError(f"No engine for filetype: '{ext}'") from err
ValueError: No engine for filetype: ''
This is all of the code as I currently have it:
from nba_api.stats.endpoints import shotchartdetail
import pandas as pd
import json
print('Player ID?')
playerid = input()
print('File Name?')
filename = input()
response = shotchartdetail.ShotChartDetail(
team_id= 0,
player_id= playerid
)
content = json.loads(response.get_json())
# transform contents into dataframe
results = content['resultSets'][0]
headers = results['headers']
rows = results['rowSet']
df = pd.DataFrame(rows)
df.columns = headers
# write to excel file
df.to_excel(filename, index=False)
Hoping someone can help because I'm very new to the JSON format.

You are getting this because the filename has no extension. Pandas will use the extension (like xlsx or xls) of the filename (if you're not giving it an ExcelWriter) to understand the right lib to use for this format. Just try this with something like df.to_excel('filename.xlsx', index=False) and see how goes.

Related

Invalid argument error when using f string in path of DataFrame.to_csv()

I want to write pandas dataframe to a csv file every 10 secs. The csv file name includes the current timestamp. Here is part of the code:
import pandas as pd
import time
while True:
df = pd.read_sql_query('select * from dbo.tbl_tag_values', cnxn)
t = time.localtime()
current_time = time.strftime('%Y-%m-%dT%H:%M:%S',t)
csv_path =f'C:/Users/00_Projects/App/data-{current_time}.csv'
df.to_csv(csv_path)
time.sleep(10)
Without using f-string and a static file name, the script works fine but with the f-string I get the error:
Traceback (most recent call last):
File "c:\Users\00_Projects\App\script.py", line 24, in <module>
df.to_csv(csv_path)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3466, in to_csv
return DataFrameRenderer(formatter).to_csv(
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\format.py", line 1105, in to_csv
csv_formatter.save()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 237, in save
with get_handle(
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\common.py", line 702, in get_handle
handle = open(
OSError: [Errno 22] Invalid argument: 'C:/Users/00_Projects/App/data-2023-01-02T15:33:19.csv'
I read this post How to use f string in a path location and tried Path from pathlib but got the same error.
My OS is windows.
Thanks for any help!

Open .h5 file in Python

I am trying to read a h5 file in Python.
The file can be found in this link and it is called 'vstoxx_data_31032014.h5'. The code I am trying to run is from the book Python for Finance, by Yves Hilpisch and goes like this:
import pandas as pd
h5 = pd.HDFStore('path.../vstoxx_data_31032014.h5', 'r')
futures_data = h5['futures_data'] # VSTOXX futures data
options_data = h5['options_data'] # VSTOXX call option data
h5.close()
I am getting the following error:
h5 = pd.HDFStore('path.../vstoxx_data_31032014.h5', 'r')
Traceback (most recent call last):
File "<ipython-input-692-dc4e79ec8f8b>", line 1, in <module>
h5 = pd.HDFStore('path.../vstoxx_data_31032014.h5', 'r')
File "C:\Users\Laura\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 466, in __init__
self.open(mode=mode, **kwargs)
File "C:\Users\Laura\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 637, in open
raise IOError(str(e))
OSError: HDF5 error back trace
File "C:\aroot\work\hdf5-1.8.15-patch1\src\H5F.c", line 604, in H5Fopen
unable to open file
File "C:\aroot\work\hdf5-1.8.15-patch1\src\H5Fint.c", line 1085, in H5F_open
unable to read superblock
File "C:\aroot\work\hdf5-1.8.15-patch1\src\H5Fsuper.c", line 277, in H5F_super_read
file signature not found
End of HDF5 error back trace
Unable to open/create file 'path.../vstoxx_data_31032014.h5'
where I have substituted my working directory for 'path.../' for the purpose of this question.
Does anyone know where this error might be coming from?

In order to open a HDF5 file with the h5py module you can use h5py.File(filename). The documentation can be found here.
import h5py
filename = "vstoxx_data_31032014.h5"
h5 = h5py.File(filename,'r')
futures_data = h5['futures_data'] # VSTOXX futures data
options_data = h5['options_data'] # VSTOXX call option data
h5.close()

import os
wd=os.chdir('pah of your working directory') #change the file path to your working directory
wd=os.getcwd() #request what is the current working directory
print(wd)
if __name__ == '__main__':
# import required libraries
import h5py as h5
import numpy as np
import matplotlib.pyplot as plt
f = h5.File("hdf5 file with its path", "r")
datasetNames = [n for n in f.keys()]
for n in datasetNames:
print(n)

How to properly load an .XLS file in script

I have a python script that opens an .xls file to read its lines and store them in a list and then it does a bunch of other things. When I try to call it from Excel like so:
Sub SampleCall()
RunPython ("import sov_reformat;sov_reformat.sov_convert()")
End Sub
Here's the first few lines of my script:
# In[1]:
import xlwings as xw
import pandas as pd
import numpy as np
import csv
# In[2]:
def sov_convert():
wb = xw.Book.caller()
df2 = pd.read_excel("Full SOV.XLS")
temp_df = df2
I get an error:
Error
Traceback (most recent call last):
File "", line 1, in
File "...\sov_reformat.py", line 70, in
sov_convert()
File "...\sov_reformat.py", line 14, in sov_convert
df2 = pd.read_excel("Full SOV.XLS")
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 227, in init
self.book = xlrd.open_workbook(io)
File "C:\Python27\lib\site-packages\xlrd__init__.py", line 395, in open_workbook
with open(filename, "rb") as f:
IOError: [Errno 2] No such file or directory: 'Full SOV.XLS'
I think the line that's causing this error is the following:
with open('rates.csv', 'rb') as f:
reader = csv.reader(f)
rate_combinations = list(reader)
But I don't understand why. When I run the script it does exactly what I want so I know everything else is working.

Replacing Bad Data in Pandas Data Frame

In Python 2.7, I'm connecting to an external data source using the following:
import pypyodbc
import pandas as pd
import datetime
import csv
import boto3
import os
# Connect to the DataSource
conn = pypyodbc.connect("DSN = FAKE DATA SOURCE; UID=FAKEID; PWD=FAKEPASSWORD")
# Specify the query we're going to run on it
script = ("SELECT * FROM table")
# Create a dataframe from the above query
df = pd.read_sql_query(script, conn)
I get the following error:
C:\Python27\python.exe "C:/Thing.py"
Traceback (most recent call last):
File "C:/Thing.py", line 30, in <module>
df = pd.read_sql_query(script,conn)
File "C:\Python27\lib\site-packages\pandas-0.18.1-py2.7-win32.egg\pandas\io\sql.py", line 431, in read_sql_query
parse_dates=parse_dates, chunksize=chunksize)
File "C:\Python27\lib\site-packages\pandas-0.18.1-py2.7-win32.egg\pandas\io\sql.py", line 1608, in read_query
data = self._fetchall_as_list(cursor)
File "C:\Python27\lib\site-packages\pandas-0.18.1-py2.7-win32.egg\pandas\io\sql.py", line 1617, in _fetchall_as_list
result = cur.fetchall()
File "build\bdist.win32\egg\pypyodbc.py", line 1819, in fetchall
File "build\bdist.win32\egg\pypyodbc.py", line 1871, in fetchone
ValueError: could not convert string to float: ?
It's seems to me that in one of the float columns, there is a '?' symbol for some reason. I've reached out to the owner of the data source, but they cannot change the underlying table.
Is there a way to replace incorrect data like this using pandas? I've tried using replace after the read_sql_query statement, but I get the same error.

Hard to know for certain without having your data obviously, but you could try setting coerce_float to False, i.e. replace your last line with
df = pd.read_sql_query(script, conn, coerce_float=False)
See the documentation of read_sql_query.

How can I format this array of arrays into a pandas data frame?

Here is the data
Originally I was using openpyxl and the .split() method to separate the arrays of data. This still leaves some formatting, but most of all I would really like to able to do this with pandas.
Any help would be great, thanks !
EDIT: Also if anyone knows some good tutorials for pandas beginners that would be great !
EDIT2:
Ami Tavory's answer throws this error:
Traceback (most recent call last):
File "C:\Users\David\Desktop\Python\Coursera\Computational Finance\CAPM\Scatter\JSONparser.py", line 7, in <module>
data = json.load(open('ETH_USD.txt'))
File "C:\Python27\lib\json\__init__.py", line 290, in load
**kw)
File "C:\Python27\lib\json\__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 13409 - line 1 column 13426 (char 13408 - 13425)
EDIT3: this is my code:
# Import the JSON parser
import json
# and pandas
import pandas as pd
# Assuming the data is in stuff.txt
data = json.load(open('ETH_USD.txt'))
#bpd.DataFrame(data)
[Finished in 1.1s]
EDIT3: this worked like a treat:
# Import the JSON parser
import json
# and pandas
import pandas as pd
URL = 'http://cryptocoincharts.info/fast/period.php?pair=ETH-USDT&market=poloniex&time=alltime&resolution=1d'
data = pd.read_json(URL)
data = pd.DataFrame(data)
data.to_csv('ETH_USD_PANDAS.csv')

There are several ways. Based on the format of the text to which you linked, here is the one I think is easiest:
# Import the JSON parser
import json
# and pandas
import pandas as pd
# Assuming the data is in stuff.txt
data = json.load(open('stuff.txt'))
pd.DataFrame(data)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Traceback when using Pandas to convert to Json Format - python

Related

Invalid argument error when using f string in path of DataFrame.to_csv()

Open .h5 file in Python

How to properly load an .XLS file in script

Replacing Bad Data in Pandas Data Frame

How can I format this array of arrays into a pandas data frame?

Categories

Resources