String to pandas Dataframe based on columns

String to pandas Dataframe based on columns - python

I am sending an ajax GET request to a flask server (http://localhost:5000/req/?q=139,2,10,60,5,1462,7,5,6,9,17,78) in order to retrieve some values and assign them to a Dataframe. Doing it manually, it works fine:
df = pd.DataFrame(data=[[139,2,10,60,5,1462,7,5,6,9,17,78]],columns=['col1','col2','col3','col4','col5','col6','col7','col8','col9','col10','col11','col12'])
but i need the numbers to come from request.args via ajax and then be based in the Dataframe as an array.
#app.route('/req/', methods=['GET'])
def foo():
args = dict(request.args.to_dict())
t = request.args["q"]
return getResults(t), 200
And the getResults() would be something like:
def getResults(name):
df = pd.DataFrame(data=[[name]], columns=['col1','col2','col3','col4','col5','col6','col7','col8','col9','col10','col11','col12'])
""""
but of course this doesn't work. Gives an error: ValueError: 12 columns passed, passed data had 1 columns
How can i do this ? I've tried splitting the string, try to convert to an array..nothing worked.

The args is resolved as a string, so after t = request.args["q"], t is "139,2,10,60,5,1462,7,5,6,9,17,78", you need a list of int
#app.route('/req') # GET only is default method
def foo():
t = request.args["q"]
t = [int(val) for val in t.split(",")]
return getResults(t) # 200 is the default status code
And
def getResults(name):
df = pd.DataFrame(data=[name], # no extra []
Also prefer /req (that allows both with and without trailing slash) rather than /req/ that accept only one , refer to this for detail

Related

Return DataFrame from within python function rather than list using pandas' pd.read_sql_query

I am struggling to understand why this returns a list containing a DataFrame rather than only the DataFrame. Is there something wrong with the code or a way to return only the DataFrame? It works as expected if not placed within a function.
import sqlite3
import pandas as pd
def get_tbl_info(db ='MyDatabase', table ='Measurements'):
database = "/Users/Mary/Documents/Database/{DB}.db"..format(DB=db)
conn = sqlite3.connect(database)
tbl_info_command = "PRAGMA TABLE_INFO({table});".format(table=table)
result_all = pd.read_sql_query(tbl_info_command,conn)
print(type(result_all))
return [result_all]
out = get_tbl_info()
print(type(out))
gives:
<class 'pandas.core.frame.DataFrame'>
<class 'list'>

Because when you enclose a variable with brackets [ ], Python understands it as "ok put this variable inside a list". Just replace the return of your function with:
return result_all
This should work properly, or actually your function could just return your dataframe directly
return pd.read_sql_query(tbl_info_command,conn)

Python Loop Addition

No matter what I do I don't seem to be able to add all the base volumes and quote volumes together easily! I want to end up with a total base volume and a total quote volume of all the data in the data frame. Can someone help me on how you can do this easily?
I have tried summing and saving the data in a dictionary first and then adding it but I just don't seem to be able to make this work!
import urllib
import pandas as pd
import json
def call_data(): # Call data from Poloniex
global df
datalink = 'https://poloniex.com/public?command=returnTicker'
df = urllib.request.urlopen(datalink)
df = df.read().decode('utf-8')
df = json.loads(df)
global current_eth_price
for k, v in df.items():
if 'ETH' in k:
if 'USDT_ETH' in k:
current_eth_price = round(float(v['last']),2)
print("Current ETH Price $:",current_eth_price)
def calc_volumes(): # Calculate the base & quote volumes
global volume_totals
for k, v in df.items():
if 'ETH' in k:
basevolume = float(v['baseVolume'])*current_eth_price
quotevolume = float(v['quoteVolume'])*float(v['last'])*current_eth_price
if quotevolume > 0:
percentages = (quotevolume - basevolume) / basevolume * 100
volume_totals = {'key':[k],
'basevolume':[basevolume],
'quotevolume':[quotevolume],
'percentages':[percentages]}
print("volume totals:",volume_totals)
print("#"*8)
call_data()
calc_volumes()

A few notes:
For the next 2 years don't use the keyword globals for anything.
put function documentation under the function in quotes
using the requests library will be much easier than urllib. However ...
pandas can fetch the JSON and parse it all in one step
ok it doesn't have to be as split up as this, I'm just showing you how to properly pass variables around instead of globals.
I could not find "ETH" by itself. In the data they sent they have these 3 ['BTC_ETH', 'USDT_ETH', 'USDC_ETH']. So I used "USDT_ETH" I hope the substitution is ok.
calc_volumes is seeming to do the calculation and being some sort of filter (it's picky as to what it prints). This function needs to be broken up in to it's two separate jobs. printing and calculating. (maybe there was a filter step but I leave that for homework)
.
import pandas as pd
eth_price_url = 'https://poloniex.com/public?command=returnTicker'
def get_data(url=''):
""" Call data from Poloniex and put it in a dataframe"""
data = pd.read_json(url)
return data
def get_current_eth_price(data = None):
""" grab the price out of the dataframe """
current_eth_price = data['USDT_ETH']['last'].round(2)
return current_eth_price
def calc_volumes(data=None, current_eth_price=None):
""" Calculate the base & quote volumes """
data = df[df.columns[df.columns.str.contains('ETH')]].loc[['baseVolume', 'quoteVolume', 'last']]
data = data.transpose()
data[['baseVolume','quoteVolume']]*= current_eth_price
data['quoteVolume']*=data['last']
data['percentages']=(data['quoteVolume'] - data['baseVolume']) / data['quoteVolume'] * 100
return data
df = get_data(url = eth_price_url)
the_price = get_current_eth_price(data = df)
print(f'the current eth price is: {the_price}')
volumes = calc_volumes(data=df, current_eth_price=the_price)
print(volumes)

This code seems kind of odd and inconsistent... for example, you're importing pandas and calling your variable df but you're not actually using dataframes. If you used df = pd.read_json('https://poloniex.com/public?command=returnTicker', 'index')* to get a dataframe, most of your data manipulation here would become much easier, and wouldn't require any loops either.
For example, the first function's code would become as simple as current_eth_price = df.loc['USDT_ETH','last'].
The second function's code would basically be
eth_rows = df[df.index.str.contains('ETH')]
total_base_volume = (eth_rows.baseVolume * current_eth_price).sum()
total_quote_volume = (eth_rows.quoteVolume * eth_rows['last'] * current_eth_price).sum()
(*The 'index' argument tells pandas to read the JSON dictionary indexed by rows, then columns, rather than columns, then rows.)

issue in writing function to filter rows data frame

I am writing a function that will serve as filter for rows that I wanted to use.
The sample data frame is as follow:
df = pd.DataFrame()
df ['Xstart'] = [1,2.5,3,4,5]
df ['Xend'] = [6,8,9,10,12]
df ['Ystart'] = [0,1,2,3,4]
df ['Yend'] = [6,8,9,10,12]
df ['GW'] = [1,1,2,3,4]
def filter(data,Game_week):
pass_data = data [(data['GW'] == Game_week)]
when I recall the function filter as follow, I got an error.
df1 = filter(df,1)
The error message is
AttributeError: 'NoneType' object has no attribute 'head'
but when I use manual filter, it works.
pass_data = df [(df['GW'] == [1])]
This is my first issue.
My second issue is that I want to filter the rows with multiple GW (1,2,3) etc.
For that I can manually do it as follow:
pass_data = df [(df['GW'] == [1])|(df['GW'] == [2])|(df['GW'] == [3])]
if I want to use in function input as list [1,2,3]
how can I write it in function such that I can input a range of 1 to 3?
Could anyone please advise?
Thanks,
Zep

Use isin for pass list of values instead scalar, also filter is existing function in python, so better is change function name:
def filter_vals(data,Game_week):
return data[data['GW'].isin(Game_week)]
df1 = filter_vals(df,range(1,4))

Because you don't return in the function, so it will be None, not the desired dataframe, so do (note that also no need parenthesis inside the data[...]):
def filter(data,Game_week):
return data[data['GW'] == Game_week]
Also, isin may well be better:
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]

Use return to return data from the function for the first part. For the second, use -
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]
Now apply the filter function -
df1 = filter(df,[1,2])

DataFrame not being assigned given value

I have the following class and the print statement returns an empty dataframe even though I'm sure my get_percent_change method is returning the values. I even tried just assigning test to three. Still, empty dataframe.
Is it something to do with the fact it's inside a class? Inside the init method? I tried using self.metrics too.
class options_metrics:
def __init__(self, calls, puts):
self.calls, self.puts = calls, puts
self.calls = self.calls.drop(["Type"])
self.puts = self.puts.drop(["Type"])
metrics = pd.DataFrame()
metrics['Perc_Chg_Vol_Call'], metrics['Perc_Chg_Open_Int_Call'] = self.get_percent_change(self.calls)
metrics['Test'] = 3
print(metrics)
input()
def get_percent_change(self, option_df):
perc_changes = option_df.pct_change(axis=1)
print(perc_changes)
return (perc_changes.ix['Vol',1], perc_changes.ix['Open_Int',1])
Here is the output:
Empty DataFrame
Columns: [Perc_Chg_Vol_Call, Perc_Chg_Open_Int_Call, Test]
Index: []

Switching the DataFrame to a Series worked.

DataFrame constructor not properly called! error

I am new to Python and I am facing problem in creating the Dataframe in the format of key and value i.e.
data = [{'key':'\[GlobalProgramSizeInThousands\]','value':'1000'},]
Here is my code:
columnsss = ['key','value'];
query = "select * from bparst_tags where tag_type = 1 ";
result = database.cursor(db.cursors.DictCursor);
result.execute(query);
result_set = result.fetchall();
data = "[";
for row in result_set:
`row["tag_expression"]`)
data += "{'value': %s , 'key': %s }," % ( `row["tag_expression"]`, `row["tag_name"]` )
data += "]" ;
df = DataFrame(data , columns=columnsss);
But when I pass the data in DataFrame it shows me
pandas.core.common.PandasError: DataFrame constructor not properly called!
while if I print the data and assign the same value to data variable then it works.

You are providing a string representation of a dict to the DataFrame constructor, and not a dict itself. So this is the reason you get that error.
So if you want to use your code, you could do:
df = DataFrame(eval(data))
But better would be to not create the string in the first place, but directly putting it in a dict. Something roughly like:
data = []
for row in result_set:
data.append({'value': row["tag_expression"], 'key': row["tag_name"]})
But probably even this is not needed, as depending on what is exactly in your result_set you could probably:
provide this directly to a DataFrame: DataFrame(result_set)
or use the pandas read_sql_query function to do this for you (see docs on this)

Just ran into the same error, but the above answer could not help me.
My code worked fine on my computer which was like this:
test_dict = {'x': '123', 'y': '456', 'z': '456'}
df=pd.DataFrame(test_dict.items(),columns=['col1','col2'])
However, it did not work on another platform. It gave me the same error as mentioned in the original question. I tried below code by simply adding the list() around the dictionary items, and it worked smoothly after:
df=pd.DataFrame(list(test_dict.items()),columns=['col1','col2'])
Hopefully, this answer can help whoever ran into a similar situation like me.

import json
# Opening JSON file
f = open('data.json')
# returns JSON object as
# a dictionary
data1 = json.load(f)
#converting it into dataframe
df = pd.read_json(data1, orient ='index')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

String to pandas Dataframe based on columns - python

Related

Return DataFrame from within python function rather than list using pandas' pd.read_sql_query

Python Loop Addition

issue in writing function to filter rows data frame

DataFrame not being assigned given value

DataFrame constructor not properly called! error

Categories

Resources