How to use pandas to get output in tabular format

How to use pandas to get output in tabular format - python

the below mentioned code was all ok untill i tried to beautify it using pandas, can anyone pls suggest how can i wrap the output in tabular format with headers, borders?
old code :
import eikon as ek
ek.set_app_key('8854542454521546fgf4f4gfg5f4')
df, err = ek.get_data('ESCO.NS',['TR.DivUnadjustedGross','TR.DivExDate','TR.DivType'],{'SDate': '2020-07-01','EDate': '2021-07-26','DivType': '61:70'})
print(df)
New code :
import eikon as ek
import pandas as pd
ek.set_app_key('8854542454521546fgf4f4gfg5f4')
df = pd.dataframe(ek.get_data('ESCO.NS',['TR.DivUnadjustedGross','TR.DivExDate','TR.DivType'],{'SDate': '2020-07-01','EDate': '2021-07-26','DivType': '61:70'}))
print(df, headers='Keys',tablefmt='psql')

this call ek.get_data is returning two things, you can add a line like this:
import eikon as ek
ek.set_app_key('8854542454521546fgf4f4gfg5f4')
data, err = ek.get_data('ESCO.NS',['TR.DivUnadjustedGross','TR.DivExDate','TR.DivType'],{'SDate': '2020-07-01','EDate': '2021-07-26','DivType': '61:70'})
df = pd.DataFrame(data)
print(df)

Related

Pandas Dataframe display total

Here is an example dataset found from google search close to my datasets in my environment
I'm trying to get output like this
import pandas as pd
import numpy as np
data = {'Product':['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box','Markers','Markers','Pen'],
'State':['Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'],
'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}
df=pd.DataFrame(data, columns=['Product','State','Sales'])
df1=df.sort_values('State')
#df1['Total']=df1.groupby('State').count()
df1['line']=df1.groupby('State').cumcount()+1
print(df1.to_string(index=False))
Commented out line throws this error
ValueError: Columns must be same length as key
Tried with size() it gives NaN for all rows
Hope someone points me to right direction
Thanks in advance

I think this should work for 'Total':
df1['Total']=df1.groupby('State')['Product'].transform(lambda x: x.count())

Try this:
df = pd.DataFrame(data).sort_values("State")
grp = df.groupby("State")
df["Total"] = grp["State"].transform("size")
df["line"] = grp.cumcount() + 1

Scraping free proxy list with pandas

So I'm using pandas and requests to scrape IP's from https://free-proxy-list.net/ but how do I cover this code
import pandas as pd
resp = requests.get('https://free-proxy-list.net/')
df = pd.read_html(resp.text)[0]
df = (df[(df['Anonymity'] == 'elite proxy')])
print(df.to_string(index=False))
so that the output is list of IP's without anything else. I managed to remove index and only added elite proxy but I can't make a variable that is a list with only IP's and without index.

You can use loc to slice directly the column for the matching rows, and to_list to convert to list:
df.loc[df['Anonymity'].eq('elite proxy'), 'IP Address'].to_list()
output: ['134.119.xxx.xxx', '173.249.xxx.xxx'...]

To get the contents of the 'IP Address' column, subset to the 'IP address' column and use .to_list().
Here's how:
print(df['IP Address'].to_list())

It looks like you are trying to accomplish something like below:
print(df['IP Address'].to_string(index=False))
Also It would be a good idea, after filtering your dataframe to reset its index like below:
df = df.reset_index(drop=True)
So the code snippet would be something like this:
import pandas as pd
resp = requests.get('https://free-proxy-list.net/')
df = pd.read_html(resp.text)[0]
df = (df[(df['Anonymity'] == 'elite proxy')])
df = df.reset_index(drop=True)
print(df['IP Address'].to_string(index=False))

Python: DataFrame restructure

I have one DataFrame generated from below code
import pandas as pd
import pandas_datareader.data as web
from datetime import datetime
start=datetime(2018, 3, 1)
end=datetime(2018,3,12)
symbol = ['AAPL' , 'IBM' , 'MSFT' , 'GOOG']
Morningstar=web.DataReader(symbol, data_source='morningstar',start=start, end=end)
dfResetMorningstar=Morningstar.reset_index()
pricemine=dfResetMorningstar[['Symbol','Date','Close']]
pricemine.set_index(['Symbol','Date'], inplace=True)
Result:
enter image description here
I would like to transform the dataframe into format similar as below (the data would be the ['Close'] data
enter image description here
I'm not sure how this can be achieved using "groupby" comment. Any feedback would be much appreciated. Also open to other ways not using "groupby".
Thank you!

Use unstack by first level:
pricemine = pricemine['Close'].unstack(0)

import nested data into pandas from a json file

I have a generated file as follows:
[{"intervals": [{"overwrites": 35588.4, "latency": 479.52}, {"overwrites": 150375.0, "latency": 441.1485001192274}], "uid": "23"}]
I simplified the file a bit for space reasons (there are more columns besides for the "overwrites" and "latency" ). I would like to import the data into a dataframe so I can later on draw the latency. I tried the following:
with open(os.path.join(path, "my_file.json")) as json_file:
curr_list=json.load(json_file)
df=pd.Series(curr_list[0]['intervals'])
print df
which returned:
0 {u'overwrites': 35588.4, u'latency...
1 {u'overwrites': 150375.0, u'latency...
However I couldn't get to store df in a data structure that allows me to access the latency field as follows:
graph = df[['latency']]
graph.plot(title="latency")
Any ideas?
Thanks for the help!

I think you can use json_normalize:
import pandas as pd
from pandas.io.json import json_normalize
data = [{"intervals": [{"overwrites": 35588.4, "latency": 479.52},
{"overwrites": 150375.0, "latency": 441.1485001192274}],
"uid": "23"}]
result = json_normalize(data, 'intervals', ['uid'])
print result
latency overwrites uid
0 479.5200 35588.4 23
1 441.1485 150375.0 23

Filling data using .fillNA(), data pulled from Quandl

I've pulled some stock data from Quandl for both Crude Oil prices (WTI) and Caterpillar (CAT) price. When I concatenate the two dataframes together I'm left with some NaNs. My ultimate goal is to run a .Pearsonr() to assess the correlation (along with p-values), however I can't get Pearsonr() to work because of all the Nan's. So I'm trying to clean them up. When I use the .fillNA() function it doesn't seem to be working. I've even tried .interpolate() as well as .dropna(). None of them appear to work. Here is my working code.
import Quandl
import pandas as pd
import numpy as np
#WTI Data#
WTI_daily = Quandl.get("DOE/RWTC", collapse="daily",trim_start="1986-10-10", trim_end="1986-10-15")
WTI_daily.columns = ['WTI']
#CAT Data
CAT_daily = Quandl.get("YAHOO/CAT.6", collapse = "daily",trim_start="1986-10-10", trim_end="1986-10-15")
CAT_daily.columns = ['CAT']
#Combine Data Frames
daily_price_df = pd.concat([CAT_daily, WTI_daily], axis=1)
print daily_price_df
#Verify they are dataFrames:
def really_a_df(var):
if isinstance(var, pd.DataFrame):
print "DATAFRAME SUCCESS"
else:
print "Wahh Wahh"
return 'done'
print really_a_df(daily_price_df)
#Fill NAs
#CAN'T GET THIS TO WORK!!
daily_price_df.fillna(method='pad', limit=8)
print daily_price_df
# Try to interpolate
#CAN'T GET THIS TO WORK!!
daily_price_df.interpolate()
print daily_price_df
#Drop NAs
#CAN'T GET THIS TO WORK!!
daily_price_df.dropna(axis=1)
print daily_price_df
For what it's worth I've managed to get the function working when I create a dataframe from scratch using this code:
import pandas as pd
import numpy as np
d = {'a' : 0., 'b' : 1., 'c' : 2.,'d':None,'e':6}
d_series = pd.Series(d, index=['a', 'b', 'c', 'd','e'])
d_df = pd.DataFrame(d_series)
d_df = d_df.fillna(method='pad')
print d_df
Initially I was thinking that perhaps my data wasn't in dataframe form, but I used a simple test to confirm they are in fact dataframe. The only conclusion I that remains (in my opinion) is that it is something about the structure of the Quandl dataframe, or possibly the TimeSeries nature. Please know I'm somewhat new to python so structure answers for a begginner/novice. Any help is much appreciated!

pot shot - have you just forgotten to assign or use the inplace flag.
daily_price_df = daily_price_df.fillna(method='pad', limit=8)
OR
daily_price_df.fillna(method='pad', limit=8, inplace=True)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use pandas to get output in tabular format - python

Related

Pandas Dataframe display total

Scraping free proxy list with pandas

Python: DataFrame restructure

import nested data into pandas from a json file

Filling data using .fillNA(), data pulled from Quandl

Categories

Resources