Python: DataFrame restructure - python

I have one DataFrame generated from below code
import pandas as pd
import pandas_datareader.data as web
from datetime import datetime
start=datetime(2018, 3, 1)
end=datetime(2018,3,12)
symbol = ['AAPL' , 'IBM' , 'MSFT' , 'GOOG']
Morningstar=web.DataReader(symbol, data_source='morningstar',start=start, end=end)
dfResetMorningstar=Morningstar.reset_index()
pricemine=dfResetMorningstar[['Symbol','Date','Close']]
pricemine.set_index(['Symbol','Date'], inplace=True)
Result:
enter image description here
I would like to transform the dataframe into format similar as below (the data would be the ['Close'] data
enter image description here
I'm not sure how this can be achieved using "groupby" comment. Any feedback would be much appreciated. Also open to other ways not using "groupby".
Thank you!

Use unstack by first level:
pricemine = pricemine['Close'].unstack(0)

Related

Pandas Dataframe display total

Here is an example dataset found from google search close to my datasets in my environment
I'm trying to get output like this
import pandas as pd
import numpy as np
data = {'Product':['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box','Markers','Markers','Pen'],
'State':['Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'],
'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}
df=pd.DataFrame(data, columns=['Product','State','Sales'])
df1=df.sort_values('State')
#df1['Total']=df1.groupby('State').count()
df1['line']=df1.groupby('State').cumcount()+1
print(df1.to_string(index=False))
Commented out line throws this error
ValueError: Columns must be same length as key
Tried with size() it gives NaN for all rows
Hope someone points me to right direction
Thanks in advance
I think this should work for 'Total':
df1['Total']=df1.groupby('State')['Product'].transform(lambda x: x.count())
Try this:
df = pd.DataFrame(data).sort_values("State")
grp = df.groupby("State")
df["Total"] = grp["State"].transform("size")
df["line"] = grp.cumcount() + 1

How to use pandas to get output in tabular format

the below mentioned code was all ok untill i tried to beautify it using pandas, can anyone pls suggest how can i wrap the output in tabular format with headers, borders?
old code :
import eikon as ek
ek.set_app_key('8854542454521546fgf4f4gfg5f4')
df, err = ek.get_data('ESCO.NS',['TR.DivUnadjustedGross','TR.DivExDate','TR.DivType'],{'SDate': '2020-07-01','EDate': '2021-07-26','DivType': '61:70'})
print(df)
New code :
import eikon as ek
import pandas as pd
ek.set_app_key('8854542454521546fgf4f4gfg5f4')
df = pd.dataframe(ek.get_data('ESCO.NS',['TR.DivUnadjustedGross','TR.DivExDate','TR.DivType'],{'SDate': '2020-07-01','EDate': '2021-07-26','DivType': '61:70'}))
print(df, headers='Keys',tablefmt='psql')
this call ek.get_data is returning two things, you can add a line like this:
import eikon as ek
ek.set_app_key('8854542454521546fgf4f4gfg5f4')
data, err = ek.get_data('ESCO.NS',['TR.DivUnadjustedGross','TR.DivExDate','TR.DivType'],{'SDate': '2020-07-01','EDate': '2021-07-26','DivType': '61:70'})
df = pd.DataFrame(data)
print(df)

Pandas Python probelm

import pandas as pd
nba = pd.read_csv("nba.csv")
names = pd.Series(nba['Name'])
data = nba['Salary']
nba_series = (data, index=[names])
print(nba_series)
Hello I am trying to convert the columns 'Name' and 'Salary' into a series from a dataframe. I need to set the names as the index and the salaries as the values but i cannot figure it out. this is my best attempt so far anyone guidance is appreciated
I think you are over-thinking this. Simply construct it with pd.Series(). Note the data needs to be with .values, otherwis eyou'll get Nans
import pandas as pd
nba = pd.read_csv("nba.csv")
nba_series = pd.Series(data=nba['Salary'].values, index=nba['Name'])
Maybe try set_index?
nba.set_index('name', inlace = True )
nba_series = nba['Salary']
This might help you
import pandas as pd
nba = pd.read_csv("nba.csv")
names = nba['Name']
#It's automatically a series
data = nba['Salary']
#Set names as index of series
data.index = nba_series
data.index = names might be correct but depends on the data

Convert columns in dataframe with comas into numeric data to plotting

I'm new in the world of plotting in Python I started learning today doing a mini project by my own, I tried to scrape data and represent here's my code:
import requests
import pandas as pd
from pandas import DataFrame
import numpy as np
import bs4
from bs4 import BeautifulSoup
import matplotlib.pyplot as plot
# Getting the HTML page
URL = "https://www.worldometers.info/coronavirus/#countries"
pag_html = requests.get(URL).text
# Extracting data with BeautifulSoup.
soup = BeautifulSoup(pag_html, 'html.parser')
tabla = soup.find("table", id="main_table_countries_today")
datos_tabla = tabla.tbody.find_all("tr")
Lista = []
for x in range(len(datos_tabla)):
values = [j.string for j in datos_tabla[x].find_all('td')]
Lista.append(values)
df = pd.DataFrame(Lista).iloc[7: , 1:9]
nombre_columna = ["Pais", "Casos totales", "Nuevos Casos", "Muertes totales", "Nuevas Muertes", "Total Recuperados", "Nuevos Recuperados", "Activos"]
df.columns = nombre_columna
df.plot(x="Pais", y="Casos totales", kind ="barh")
plot.show()
The error it's giving me is: "TypeError: no numeric data to plot" I understand that this error is because the column "Casos totales" is a string not a float.
I tried to convert the columns of my Dataframe into floats, but there's no way I got error from everywhere.
Does anyone have any idea how can I represent my DataFrame?
Thanks.
After running the script, as you say the column "Casos Totales" is being interpreted as string due to the commas in the values. You can change this using .str.replace(',','') and then .astype(float), right after renaming the column names in your dataframe:
df['Casos totales'] = df['Casos totales'].str.replace(',','').astype(float)
df.plot(x="Pais", y="Casos totales", kind ="barh")
plot.show()
And this plots the graph (although the visualization is quite poor, but that's another story)

how to extract date/time parameters from a list of strings?

i have a pandas dataframe having a column as
from pandas import DataFrame
df = pf.DataFrame({ 'column_name' : [u'Monday,30 December,2013', u'Delivered', u'19:23', u'1']})
now i want to extract every thing from it and store in three columns as
date status time
[30/December/2013] ['Delivered'] [19:23]
i have so far used this :
import dateutil.parser as dparser
dparser.parse([u'Monday,30 December,2013', u'Delivered', u'19:23', u'1'])
but this throws an error . can anyone please guide me to a solution ?
You can apply() a function to a column, see the whole example:
from pandas import DataFrame
df = DataFrame({'date': ['Monday,30 December,2013'], 'delivery': ['Delivered'], 'time': ['19:23'], 'status':['1']})
# delete the status column
del df['status']
def splitter(val):
parts = val.split(',')
return parts[1]
df['date'] = df['date'].apply(splitter)
This yields a dataframe with date, delivery and the time.

Categories

Resources