import pandas as pd
nba = pd.read_csv("nba.csv")
names = pd.Series(nba['Name'])
data = nba['Salary']
nba_series = (data, index=[names])
print(nba_series)
Hello I am trying to convert the columns 'Name' and 'Salary' into a series from a dataframe. I need to set the names as the index and the salaries as the values but i cannot figure it out. this is my best attempt so far anyone guidance is appreciated
I think you are over-thinking this. Simply construct it with pd.Series(). Note the data needs to be with .values, otherwis eyou'll get Nans
import pandas as pd
nba = pd.read_csv("nba.csv")
nba_series = pd.Series(data=nba['Salary'].values, index=nba['Name'])
Maybe try set_index?
nba.set_index('name', inlace = True )
nba_series = nba['Salary']
This might help you
import pandas as pd
nba = pd.read_csv("nba.csv")
names = nba['Name']
#It's automatically a series
data = nba['Salary']
#Set names as index of series
data.index = nba_series
data.index = names might be correct but depends on the data
Related
Here is an example dataset found from google search close to my datasets in my environment
I'm trying to get output like this
import pandas as pd
import numpy as np
data = {'Product':['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box','Markers','Markers','Pen'],
'State':['Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'],
'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}
df=pd.DataFrame(data, columns=['Product','State','Sales'])
df1=df.sort_values('State')
#df1['Total']=df1.groupby('State').count()
df1['line']=df1.groupby('State').cumcount()+1
print(df1.to_string(index=False))
Commented out line throws this error
ValueError: Columns must be same length as key
Tried with size() it gives NaN for all rows
Hope someone points me to right direction
Thanks in advance
I think this should work for 'Total':
df1['Total']=df1.groupby('State')['Product'].transform(lambda x: x.count())
Try this:
df = pd.DataFrame(data).sort_values("State")
grp = df.groupby("State")
df["Total"] = grp["State"].transform("size")
df["line"] = grp.cumcount() + 1
I am using investpy to get historical stock data for 2 stocks ( TRP_pb , TRP_pc )
import investpy
import pandas as pd
import numpy as np
TRP_pb = investpy.get_stock_historical_data(stock='TRP_pb',
country='canada',
from_date='01/01/2022',
to_date='01/04/2022')
print(TRP_pb.head())
TRP_pc = investpy.get_stock_historical_data(stock='TRP_pc',
country='canada',
from_date='01/01/2022',
to_date='01/04/2022')
print(TRP_pc.head())
I can append the two tables by using the append method
appendedtable = TRP_pb.append(TRP_pc, ignore_index=False)
What I am trying to do is to use a loop function in order to combine these two tables
Here is what I have tried so far
preferredlist = ['TRP_pb','TRP_pc']
for i in preferredlist:
new = investpy.get_stock_historical_data(stock=i,
country='canada',
from_date='01/01/2022',
to_date='01/04/2022')
new.append(new, ignore_index=True)
However this doesnt work.
I would appreciate any help
Since get_stock_historical_data returns a DataFrame, you can create an empty dataframe before the for loop and concat in the loop.
preferredlist = ['TRP_pb','TRP_pc']
final_list = pd.DataFrame()
for i in preferredlist:
new = investpy.get_stock_historical_data(stock=i,
country='canada',
from_date='01/01/2022',
to_date='01/04/2022')
final_list = pd.concat([final_list, new])
So I'm using pandas and requests to scrape IP's from https://free-proxy-list.net/ but how do I cover this code
import pandas as pd
resp = requests.get('https://free-proxy-list.net/')
df = pd.read_html(resp.text)[0]
df = (df[(df['Anonymity'] == 'elite proxy')])
print(df.to_string(index=False))
so that the output is list of IP's without anything else. I managed to remove index and only added elite proxy but I can't make a variable that is a list with only IP's and without index.
You can use loc to slice directly the column for the matching rows, and to_list to convert to list:
df.loc[df['Anonymity'].eq('elite proxy'), 'IP Address'].to_list()
output: ['134.119.xxx.xxx', '173.249.xxx.xxx'...]
To get the contents of the 'IP Address' column, subset to the 'IP address' column and use .to_list().
Here's how:
print(df['IP Address'].to_list())
It looks like you are trying to accomplish something like below:
print(df['IP Address'].to_string(index=False))
Also It would be a good idea, after filtering your dataframe to reset its index like below:
df = df.reset_index(drop=True)
So the code snippet would be something like this:
import pandas as pd
resp = requests.get('https://free-proxy-list.net/')
df = pd.read_html(resp.text)[0]
df = (df[(df['Anonymity'] == 'elite proxy')])
df = df.reset_index(drop=True)
print(df['IP Address'].to_string(index=False))
In a dataframe, I have a column "UnixTime" and want to convert it to a new column containing the UTC time.
import pandas as pd
from datetime import datetime
df = pd.DataFrame([1565691196, 1565691297, 1565691398], columns = ["UnixTime"])
unix_list = df["UnixTime"].tolist()
utc_list = []
for i in unix_list:
i = datetime.utcfromtimestamp(i).strftime('%Y-%m-%d %H:%M:%S')
utc_list.append(i)
df["UTC"] = utc_list
This works, but I guess there is a smarter approach?
Could you try this:
df["UTC"] = pd.to_datetime(df['UnixTime'], unit='s')
If you mean by smarter approach is pandas-way and less code, then this is your answer :
df["UTC"] = pd.to_datetime(df["UnixTime"], unit = "s")
Hope this helps.
I'm parsing some HTML data using Pandas like this:
rankings = pd.read_html('https://en.wikipedia.org/wiki/Rankings_of_universities_in_the_United_Kingdom')
university_guide = rankings[0]
This gives me a nice data frame:
What I want is to reshape this data frame so that there are only two columns (rank and university name). My current solution is to do something like this:
ug_copy = rankings[0][1:]
npa1 = ug_copy.as_matrix( columns=[0,1] )
npa2 = ug_copy.as_matrix( columns=[2,3] )
npa3 = ug_copy.as_matrix( columns=[4,5] )
npam = np.append(npa1,npa2)
npam = np.append(npam,npa3)
reshaped = npam.reshape((npam.size/2,2))
pd.DataFrame(data=reshaped)
This gives me what I want, but it doesn't seem like it could possibly be the best solution. I can't seem to find a good way to complete this all using a data frame. I've tried using stack/unstack and pivoting the data frame (as some of the other solutions here have suggested), but I haven't had any luck. I've tried doing something like this:
ug_copy.columns=['Rank','University','Rank','University','Rank','University']
ug_copy = ug_copy[1:]
ug_copy.groupby(['Rank', 'University'])
There has to be something small I'm missing!
This is probably a bit shorter (also note that you can use the header option in read_html to save a bit of work):
import pandas as pd
rankings = pd.read_html('https://en.wikipedia.org/wiki/Rankings_of_universities_in_the_United_Kingdom', header=0)
university_guide = rankings[0]
df = pd.DataFrame(university_guide.values.reshape((30, 2)), columns=['Rank', 'University'])
df = df.sort('Rank').reset_index(drop=True)
print df