How can I get this json file in a python dataframe? https://data.cdc.gov/resource/8xkx-amqh.json
I tried to read the data using socrata and was working. However it has a limit and I need the whole data.
That's what I have:
client = Socrata("data.cdc.gov", app_token=None)
# First 5000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
vcounty = client.get_all("8xkx-amqh", limit=5000)
# Convert to pandas DataFrame
vcounty_df = pd.DataFrame.from_records(vcounty)
But I want the whole data and for what I understand Socrata has a limit which is less than what I need.
API is limited for unauthorized users but you can download all data in csv format and convert them to dataframe. There are 1.5+ millions rows.
# pip install requests
# pip install pandas
import requests
import pandas as pd
import io
urlData = requests.get('https://data.cdc.gov/api/views/8xkx-amqh/rows.csv?accessType=DOWNLOAD').content
df = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
df
Returns
Related
I am trying to download a csv file to python. For some reason I can not do it. I suppose I need to add an additional argument to read_csv?
import pandas as pd
url = "https://raw.githubusercontent.com/UofGAnalyticsData/"\
"DPIP/main/assesment_datasets/assessment3/starwars.csv"
df = pd.read_csv(url)
The code you attempt is downloading the content from the url and pasting it in the data frame named 'df'.
You need to save the output csv by using the following line. You will find the output file in the same directory where the python script is saved.
import pandas as pd
url = "https://raw.githubusercontent.com/UofGAnalyticsData/"\
"DPIP/main/assesment_datasets/assessment3/starwars.csv"
df = pd.read_csv(url)
df.to_csv('output.csv')
I would like to pull multiple tickers from Binance and have managed to do so and write them into a CSV file. However, I am having an issue pulling specific information from the columns to have the OHLCV data only and then work on wrapping ta-lib around this data.
For eg. I would like to keep the OHLCV data from each row for XRPBTC, NEOBTC which are in columns, and write them into a new file or just wrap ta-lib around the same data. It works fine for just one ticker but I'm having some troubles extracting this for multiple tickers.
I am given to understand that these are in the format of lists, can I split them to keep only OHLCV data and from each row and from each column and write them into a new file - is there an easier way of splitting a list?
screenshot of the data
Link to relevant binance documentation Klines candlestick data
import pandas as pd
import numpy as np
import csv
import talib as ta
from binance.client import Client
candlesticks = ['XRPBTC','NEOBTC'] # unable to split for each row in multiple columns
data = pd.DataFrame()
for candlestick in candlesticks:
data[candlestick] = client.get_historical_klines(candlestick, Client.KLINE_INTERVAL_15MINUTE, "1 Jul, 2021")
data.to_csv("XRPNEO15M.csv")
print(data)
I have a tool that processes data in-memory with pandas DataFrames, and I'd like to be able to use Spanner as a data source for some of that processing. How can I use Python to run a query in Spanner and then download all the query results to a pandas DataFrame?
A quick and dirty way to get spanner results into a pandas data frame.
import pandas as pd
from google.cloud import spanner
# Initialize client
client = spanner.Client()
# Get a Cloud Spanner instance by ID.
instance = client.instance('instance-name')
# Get a Cloud Spanner database by ID.
database = instance.database('database-name')
with database.snapshot() as snapshot:
result = snapshot.execute_sql("SELECT * FROM somewhere")
# Stream in rows
rows = list()
for row in result:
rows.append(row)
# Get column names
cols = [x.name for x in result.fields]
# Convert to pandas dataframe
result_df = pd.DataFrame(rows, columns = cols)
This likely won't scale and you may run into issues with Spanner types vs pandas types, but it will solve the immediate problem of "I want to analyze data from Spanner in pandas."
Pandas lib is using sqlalchemy, so we can use this doc: https://cloud.google.com/spanner/docs/use-sqlalchemy
pip install sqlalchemy-spanner
then in python code (if sqlalchemy version >= 1.4):
import pandas as pd
url = 'spanner+spanner:///projects/project-id/instances/instance-id/databases/database-id'
sql = 'SELECT * FROM my_table;'
df = pd.read_sql(sql, url, index_col='id_column')
or in case it's sqlalchemy version 1.3:
...
url = 'spanner:///projects/project-id/instances/instance-id/databases/database-id'
...
To use Python to query your data in Cloud Spanner yoou need to install and use the Python Cloud Spanner client library.
As of now there is no a straightforward way to download data from Spanner to Pandas DataFrame.
I would suggest to use the "StreamedResultSet API" to export your data to Pandas.
Also please take a look at this post about streaming data from Cloud Spanner to Panda dataframe, as it may be proven helpful implementing your use case as well.
I have large data-frame in a Csv file sample1 from that i have to generate a new Csv file contain only 100 data-frame.i have generate code for it.but i am getting key Error the label[100] is not in the index?
I have just tried as below,Any help would be appreciated
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv")
data_frame1 = data_frame[:100]
data_frame.to_csv("C:/users/raju/sample.csv")`
`
The correct syntax is with iloc:
data_frame.iloc[:100]
A more efficient way to do it is to use nrows argument who purpose is exactly to extract portions of files. This way you avoid wasting resources and time parsing useless rows:
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv", nrows=101) # 100+1 for header
data_frame.to_csv("C:/users/raju/sample.csv")
I'm trying to do some basic analsys on ether historical prices for a school project. My problem is quite a simple one I think. I made a function that download the data from the URL, but the format is wrong. I got a dataframe thats size is (0,~14k). So I download the data, but I'm not sure how should I format it into a form that I can use.
I see 2 possibilities, I format the dataframe after download, which I will try to do. Or I download it in the correct format first, which would be the better and more elegant solution.
My problem that I don't know how to do the 2. and I may not succeed on the 1. thats why I make this post.
def get_stock_price_csv_from_poloniex():
import requests
from pandas import DataFrame
from io import StringIO
url = 'https://poloniex.com/public?command=returnChartData¤cyPair=USDT_ETH&start=1435699200&end=9999999999&period=14400'
csv = requests.get(url)
if csv.ok:
return DataFrame.from_csv(StringIO(csv.text), sep=',')
else:
return None
The source data is not CSV, it's json. Luckily pandas provides facilities for working with it as well.
import requests
from pandas.io.json import json_normalize
url = 'https://poloniex.com/public?command=returnChartData¤cyPair=USDT_ETH&start=1435699200&end=9999999999&period=14400'
resp = requests.get(url)
data_frame = json_normalize(resp.json())