I use API, got many data but need to write only one to Pandas dataframe and then to csv file
how can i do this? i need currency and rate columns only
import requests
import pandas as pd
url = 'https://api.apilayer.com/exchangerates_data/latest?base=EUR'
get_response = requests.get(url)
print(get_response.content)
reponse is
b'{\n "success": true,\n "timestamp": 1653291723,\n "base": "EUR",\n "date": "2022-05-23",\n "rates": {\n "AED": 3.891874,\n "AFN": 96.332724,\n "ALL": 120.12076,\n "AMD": 486.326147,\n "ANG": 1.910798,\n "AOA": 440.146399,\n "ARS": 125.559742,\n "AUD": 1.49136,\n "AWG": 1.907774,\n "AZN": 1.796984,\n "BAM": 1.958501,\n "BBD": 2.140687,\n "BDT": 92.747171,\n "BGN": 1.955884,\n "BHD": 0.399459,\n "BIF": 2179.202426,\n "BMD": 1.05958,\n "BND": 1.461113,\n "BOB": 7.300355,\n "BRL": 5.170217,\n "BSD": 1.060231,\n "BTC": 3.4686139e-05,\n "BTN": 82.245346,\n "BWP": 12.83561,\n "BYN": 3.578179,\n "BYR": 20767.765076,\n "BZD": 2.137314,\n "CAD": 1.355568,\n "CDF": 2124.457448,\n "CHF": 1.030337,\n "CLF": 0.032122,\n "CLP": 886.340415,\n "CNY": 7.067924,\n "COP": 4208.651167,\n "CRC": 711.74061,\n "CUC": 1.05958,\n "CUP": 28.078866,\n "CVE": 110.417129,\n "CZK": 24.584356,\n "DJF": 188.746891,\n "DKK": 7.440804,\n "DOP": 58.552672,\n "DZD": 154.438592,\n "EGP": 19.350685,\n "ERN": 15.893699,\n "ETB": 55.044138,\n "EUR": 1,\n "FJD": 2.284134,\n "FKP": 0.867087,\n "GBP": 0.843796,\n "GEL": 3.078048,\n "GGP": 0.867087,\n "GHS": 8.244159,\n "GIP": 0.867087,\n "GMD": 57.376242,\n "GNF": 9372.913662,\n "GTQ": 8.136894,\n "GYD": 221.832492,\n "HKD": 8.316976,\n "HNL": 26.044688,\n "HRK": 7.531387,\n "HTG": 118.758621,\n "HUF": 382.210602,\n "IDR": 15541.387462,\n "ILS": 3.544575,\n "IMP": 0.867087,\n "INR": 82.241939,\n "IQD": 1547.570278,\n "IRR": 44820.228002,\n "ISK": 138.696976,\n "JEP": 0.867087,\n "JMD": 163.866665,\n "JOD": 0.751235,\n "JPY": 135.279211,\n "KES": 123.494294,\n "KGS": 84.628222,\n "KHR": 4306.032696,\n "KMF": 494.770756,\n "KPW": 953.622101,\n "KRW": 1339.844028,\n "KWD": 0.324317,\n "KYD": 0.883605,\n "KZT": 451.435581,\n "LAK": 14057.368731,\n "LBP": 1603.386389,\n "LKR": 376.408573,\n "LRD": 161.585115,\n "LSL": 16.867967,\n "LTL": 3.128664,\n "LVL": 0.640929,\n "LYD": 5.0991,\n "MAD": 10.616176,\n "MDL": 20.305389,\n "MGA": 4285.919145,\n "MKD": 61.538618,\n "MMK": 1962.989296,\n "MNT": 3259.024764,\n "MOP": 8.570086,\n "MRO": 378.269824,\n "MUR": 45.988019,\n "MVR": 16.344051,\n "MWK": 866.149081,\n "MXN": 21.058086,\n "MYR": 4.645208,\n "MZN": 67.632728,\n "NAD": 16.868837,\n "NGN": 439.852629,\n "NIO": 37.997351,\n "NOK": 10.249848,\n "NPR": 131.573277,\n "NZD": 1.638153,\n "OMR": 0.40741,\n "PAB": 1.060346,\n "PEN": 3.9603,\n "PGK": 3.736248,\n "PHP": 55.384769,\n "PKR": 213.235927,\n "PLN": 4.617629,\n "PYG": 7251.300917,\n "QAR": 3.857965,\n "RON": 4.946967,\n "RSD": 117.499439,\n "RUB": 62.329807,\n "RWF": 1088.946708,\n "SAR": 3.974436,\n "SBD": 8.607563,\n "SCR": 14.438411,\n "SDG": 473.497309,\n "SEK": 10.486142,\n "SGD": 1.457722,\n "SHP": 1.459462,\n "SLL": 13581.160767,\n "SOS": 618.272012,\n "SRD": 22.260709,\n "STD": 21931.163629,\n "SVC": 9.277651,\n "SYP": 2662.141945,\n "SZL": 16.788471,\n "THB": 36.311502,\n "TJS": 13.261298,\n "TMT": 3.708529,\n "TND": 3.243904,\n "TOP": 2.459973,\n "TRY": 16.843815,\n "TTD": 7.198816,\n "TWD": 31.379439,\n "TZS": 2464.582977,\n "UAH": 31.322712,\n "UGX": 3864.924954,\n "USD": 1.05958,\n "UYU": 42.945451,\n "UZS": 11764.462107,\n "VEF": 226570195087.7927,\n "VND": 24545.167244,\n "VUV": 121.073594,\n "WST": 2.73302,\n "XAF": 656.781309,\n "XAG": 0.048294,\n "XAU": 0.000571,\n "XCD": 2.863568,\n "XDR": 0.790993,\n "XOF": 656.781309,\n "XPF": 120.315233,\n "YER": 265.159952,\n "ZAR": 16.77781,\n "ZMK": 9537.495082,\n "ZMW": 18.060768,\n "ZWL": 341.18428\n }\n}\n'
first load data as DataFrame
import json
df = json.loads(get_response.content)
second, choose the rates (currency is index) and save to csv
df[["base","rates"]].to_csv("path/to/csv")
import requests
import pandas as pd
import json
url = 'https://api.apilayer...'
get_response = requests.get(url)
# Parse the response to a dict
response_dict = get_response.json()
# Turn the rates nodes into a dataframe
data_items = response_dict['rates'].items()
data_list = list(data_items)
df = pd.DataFrame(data_list,columns=['currency','rate'])
# Export to csv
df.to_csv('export.csv')
IIUC, you can use:
import json
df = pd.DataFrame(json.loads(get_response.content.decode('utf-8')))[['base', 'rates']]
# for export to csv
# df.to_csv('filename.csv')
output:
base rates
AED EUR 3.893237
AFN EUR 96.366461
ALL EUR 120.162829
AMD EUR 486.496485
ANG EUR 1.911467
.. ... ...
YER EUR 265.252922
ZAR EUR 16.796411
ZMK EUR 9540.830787
ZMW EUR 18.067093
ZWL EUR 341.303769
[168 rows x 2 columns]
get_response = requests.get(url)
# print(get_response.content)
in_json = get_response.json()
# print(in_json)
fd = pd.DataFrame(in_json)
y = fd[['rates']]
print(y)
solved but looks not nice and maybe there a more simple solution
i converted to json - then to dataframe - then i will convert to csv
I have an example json data file which has the following structure:
{
"Header": {
"Code1": "abc",
"Code2": "def",
"Code3": "ghi",
"Code4": "jkl",
},
"TimeSeries": {
"2020-11-25T03:00:00+00:00": {
"UnitPrice": 1000,
"Amount": 10000,
},
"2020-11-26T03:00:00+00:00": {
"UnitPrice": 1000,
"Amount": 10000,
}
}
}
When I parse this into databricks with command:
df = spark.read.json("/FileStore/test.txt")
I get as output 2 objects: Header and TimeSeries. With the TimeSeries I want to be able to flatten the structure so it has the following schema:
Date
UnitPrice
Amount
As the date field is a key, I am currently only able to access it via iterating through the column names and then using this in the dot-notation dynamically:
def flatten_json(data):
columnlist = data.select("TimeSeries.*")
count = 0
for name in data.select("TimeSeries.*"):
df1 = data.select("Header.*").withColumn(("Timeseries"), lit(columnlist.columns[count])).withColumn("join", lit("a"))
df2 = data.select("TimeSeries." + columnlist.columns[count] + ".*").withColumn("join", lit("a"))
if count == 0:
df3 = df1.join(df2, on=['join'], how="inner")
else:
df3 = df3.union(df1.join(df2, on=['join'], how="inner"))
count = count + 1
return(df3)
This is far from ideal. Does anyone know a better method to create the described dataframe?
The idea:
Step 1: Extract Header and TimeSeries separately.
Step 2: For each field in the TimeSeries object, extract the Amount and UnitPrice, together with the name of the field, stuff them into a struct.
Step 3: Merge all these structs into an array column, and explode it.
Step 4: Extract Timeseries, Amount and UnitPrice from the exploded column.
Step 5: Cross join with the Header row.
import pyspark.sql.functions as F
header_df = df.select("Header.*")
timeseries_df = df.select("TimeSeries.*")
fieldNames = enumerate(timeseries_df.schema.fieldNames())
cols = [F.struct(F.lit(name).alias("Timeseries"), col(name).getItem("Amount").alias("Amount"), col(name).getItem("UnitPrice").alias("UnitPrice")).alias("ts_" + str(idx)) for idx, name in fieldNames]
combined = explode(array(cols)).alias("comb")
timeseries = timeseries_df.select(combined).select('comb.Timeseries', 'comb.Amount', 'comb.UnitPrice')
result = header_df.crossJoin(timeseries)
result.show(truncate = False)
Output:
+-----+-----+-----+-----+-------------------------+------+---------+
|Code1|Code2|Code3|Code4|Timeseries |Amount|UnitPrice|
+-----+-----+-----+-----+-------------------------+------+---------+
|abc |def |ghi |jkl |2020-11-25T03:00:00+00:00|10000 |1000 |
|abc |def |ghi |jkl |2020-11-26T03:00:00+00:00|10000 |1000 |
+-----+-----+-----+-----+-------------------------+------+---------+
I have a pandas list called Symbols with 30 ticker symbols for stock e.g., Apple ->> AAPL, and I would like to grab the current stock price for each ticker and populate a data frame with this info. Two columns: the first with ticker symbols and the second with current price. I continue getting the following error message when I run this part of my script:
"ValueError: If using all scalar values, you must pass an index"
Stock = []
Price = []
df_temp = []
for symbol in Symbols:
try:
params = {
'symbols': symbol,
'range': '1d',
'interval': '1d',
'indicators': 'close',
'includeTimestamps': 'false',
'includePrePost': 'false',
'corsDomain': 'finance.yahoo.com',
'.tsrc': 'finance'}
url = 'https://query1.finance.yahoo.com/v7/finance/spark'
r = requests.get(url, params=params)
data = r.json()
df_stock = pd.DataFrame({'Ticker' : symbol,
'Current Price' : data['spark']['result'][0]['response'][0]['indicators']['quote'][0]['close'][0]
})
df_temp.append(df_stock)
df_temp = pd.concat(df_temp, axis = 1)
except KeyError:
continue
Need to change only one part -
df_stock = pd.DataFrame({'Ticker' : [symbol],
'Current Price' : [data['spark']['result'][0]['response'][0]['indicators']['quote'][0]['close'][0]]
})
Output
Ticker Current Price
0 AAPL 118.119
Full Code
Stock = []
Price = []
df_temp = pd.DataFrame()
for symbol in ['AAPL', 'IBM', 'NKE', 'FB']:
try:
params = {
'symbols': symbol,
'range': '1d',
'interval': '1d',
'indicators': 'close',
'includeTimestamps': 'false',
'includePrePost': 'false',
'corsDomain': 'finance.yahoo.com',
'.tsrc': 'finance'}
url = 'https://query1.finance.yahoo.com/v7/finance/spark'
r = requests.get(url, params=params)
data = r.json()
df_stock = pd.DataFrame({'Ticker' : [symbol],
'Current Price' : [data['spark']['result'][0]['response'][0]['indicators']['quote'][0]['close'][0]]
})
df_temp = df_temp.append(df_stock)
except KeyError:
continue
Explanation
What you were passing as values to the df_stock were scalar values, wrapped in a list solves it.
When I query the AdWords API to get search volume data and trends through their TargetingIdeaSelector using the Python client library the returned data looks like this:
(TargetingIdeaPage){
totalNumEntries = 1
entries[] =
(TargetingIdea){
data[] =
(Type_AttributeMapEntry){
key = "KEYWORD_TEXT"
value =
(StringAttribute){
Attribute.Type = "StringAttribute"
value = "keyword phrase"
}
},
(Type_AttributeMapEntry){
key = "TARGETED_MONTHLY_SEARCHES"
value =
(MonthlySearchVolumeAttribute){
Attribute.Type = "MonthlySearchVolumeAttribute"
value[] =
(MonthlySearchVolume){
year = 2016
month = 2
count = 2900
},
...
(MonthlySearchVolume){
year = 2015
month = 3
count = 2900
},
}
},
},
}
This isn't JSON and appears to just be a messy Python list. What's the easiest way to flatten the monthly data into a Pandas dataframe with a structure like this?
Keyword | Year | Month | Count
keyword phrase 2016 2 10
The output is a sudsobject. I found that this code does the trick:
import suds.sudsobject as sudsobject
import pandas as pd
a = [sudsobject.asdict(x) for x in output]
df = pd.DataFrame(a)
Addendum: This was once correct but new versions of the API (I tested
201802) now return a zeep.objects. However, zeep.helpers.serialize_object should do the same trick.
link
Here's the complete code that I used to query the TargetingIdeaSelector, with requestType STATS, and the method I used to parse the data to a useable dataframe; note the section starting "Parse results to pandas dataframe" as this takes the output given in the question above and converts it to a dataframe. Probably not the fastest or best, but it works! Tested with Python 2.7.
"""This code pulls trends for a set of keywords, and parses into a dataframe.
The LoadFromStorage method is pulling credentials and properties from a
"googleads.yaml" file. By default, it looks for this file in your home
directory. For more information, see the "Caching authentication information"
section of our README.
"""
from googleads import adwords
import pandas as pd
adwords_client = adwords.AdWordsClient.LoadFromStorage()
PAGE_SIZE = 10
# Initialize appropriate service.
targeting_idea_service = adwords_client.GetService(
'TargetingIdeaService', version='v201601')
# Construct selector object and retrieve related keywords.
offset = 0
stats_selector = {
'searchParameters': [
{
'xsi_type': 'RelatedToQuerySearchParameter',
'queries': ['donald trump', 'bernie sanders']
},
{
# Language setting (optional).
# The ID can be found in the documentation:
# https://developers.google.com/adwords/api/docs/appendix/languagecodes
'xsi_type': 'LanguageSearchParameter',
'languages': [{'id': '1000'}],
},
{
# Location setting
'xsi_type': 'LocationSearchParameter',
'locations': [{'id': '1027363'}] # Burlington,Vermont
}
],
'ideaType': 'KEYWORD',
'requestType': 'STATS',
'requestedAttributeTypes': ['KEYWORD_TEXT', 'TARGETED_MONTHLY_SEARCHES'],
'paging': {
'startIndex': str(offset),
'numberResults': str(PAGE_SIZE)
}
}
stats_page = targeting_idea_service.get(stats_selector)
##########################################################################
# Parse results to pandas dataframe
stats_pd = pd.DataFrame()
if 'entries' in stats_page:
for stats_result in stats_page['entries']:
stats_attributes = {}
for stats_attribute in stats_result['data']:
#print (stats_attribute)
if stats_attribute['key'] == 'KEYWORD_TEXT':
kt = stats_attribute['value']['value']
else:
for i, val in enumerate(stats_attribute['value'][1]):
data = {'keyword': kt,
'year': val['year'],
'month': val['month'],
'count': val['count']}
data = pd.DataFrame(data, index = [i])
stats_pd = stats_pd.append(data, ignore_index=True)
print(stats_pd)
abc.txt file content are as -
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| 43c51829-20f8-422d-a667-ce2ed917a33c | creating | New-Vol | 2 | None | false | |
| 7b388ad1-eec9-44fc-b9b1-240c0681d106 | available | New-Vol | 2 | None | false | |
| d4649bda-eb4f-40f9-a856-254f51f274ae | available | New-Vol | 2 | None | false | |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
How to covert this content in to a valid dictionary using python ??
tried code -
def _table2dict():
# f = open('abc.txt', 'wt')
# f.write(body)
# f.close()
table = [line.strip().split('|') for line in open("abc.txt", 'r')]
del table[0]
del table[1]
del table[-1]
result = {'volumes' : []}
for a_row in table[1:]:
tmp = {}
for key, value in zip(table[0][1:], a_row[1:]):
key = key.strip(' ')
value = value.strip(' ')
tmp[key] = value
result["volumes"].append(tmp)
return result
x = _table2dict()
print x
I tried above command it gives some sort of output.
You can try this (probably not the most pretty python code imaginable):
def _table2dict():
entries = {'volumes' : []}
fields = ()
for line in open("abc.txt",'r'):
entry = {}
if not line.strip().startswith("+-"): # get rid of +--- lines
cells = [x.strip() for x in line.split("|")[1:-1] ] # weed out first and last | and get rid of whitespace
if len(fields) == 0: # first get field names if we don't have it already
fields = [cell for cell in cells]
# do not process this line and skip to next in file
continue
if len( fields) != 0: # we already found the field names
for (key,value) in zip(fields, cells):
entry[key] = value
entries["volumes"].append(entry)
return entries
x = _table2dict()
print x
Output (formatting for readability):
{'volumes':
[
{'Status': 'creating', 'Bootable': 'false', 'Attached to': '', 'Display Name': 'New-Vol', 'Volume Type': 'None', 'ID': '43c51829-20f8-422d-a667-ce2ed917a33c', 'Size': '2'},
{'Status': 'available', 'Bootable': 'false', 'Attached to': '', 'Display Name': 'New-Vol', 'Volume Type': 'None', 'ID': '7b388ad1-eec9-44fc-b9b1-240c0681d106', 'Size': '2'},
{'Status': 'available', 'Bootable': 'false', 'Attached to': '', 'Display Name': 'New-Vol', 'Volume Type': 'None', 'ID': 'd4649bda-eb4f-40f9-a856-254f51f274ae', 'Size': '2'}
]
}
Note that I created a dict for every entry in addition to having the final output being a dict. The order of the fields is thus not the one you have in the original file but every field can be retrieved by the name given in the header.
Does this do what you had in mind?
edit: in first version I had an extra empty field in the output because of the last '|' character that I didn't take care of, now fixed