Keep getting the ValueError 'Index contains duplicate entries, cannot reshape' - python

I want to pull ticker data from the all the sp500 stocks from yahoo.
I saved the sp500 ticker symbols into a list from a local csv file that I made.
When when I run the following code, I get the:
ValueError 'Index contains duplicate entries, cannot reshape'
However, I noticed that this problem doesn't seem to occur with shorter stock lists but can't figure why; some help would be fully appreciated.
import pandas as pd
import numpy as np
from pandas_datareader import data
from statsmodels.tsa.stattools import coint
import matplotlib.pyplot as plt
from pyfinance.ols import PandasRollingOLS
sp500=pd.read_csv('sp500 stocks list.csv')
sp500_list=[]
for i in sp500:
sp500_list.append(i)
dataframe=data.DataReader(sp500_list, 'yahoo',start='2020/01/01')
print(dataframe)
I have tried dataframe = dataframe.drop_duplicates(sp500_list) however i still gives me the same ValueError

Related

'DataFrame' object without attribute error

I am trying to run a basic from-scratch code for linear regression. It is giving me this error despite the csv file containing a column header with the following name "studytime" for which it is giving me this error.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('/Users/brasilgu/Downloads/student/student-por.csv')
plt.scatter(data.studytime, data.score)
plt.show
check your data with
print(data.columns)
to ensure you have no typos etc. if not you'll need to put an example of the data in to reproduce.

visualising data with python of time series and float colmn

i have the following quastion-
What can you tell about the relationship between time and speed? Is there a best time of day to connect? Has it changed throughout the years?
this is my dataframedataframe
my columns
data
does any one have any suggestion on how i would aprouch this question ?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('/Users/dimagoroh/Desktop/data_vis/big_file.csv', low_memory=False)
sns.lmplot(x="hours",y="speed",data=df)
im trying to do a plot but get this error i think i need to manipulate the hour column to a diffrent data type right now it is set as object
Please post the error you get. From the data I think you need to pass x="hour" and not x="hours". Also try
df.hour = pd.to_datetime(df.hour)

Can't choose a column of a data frame on Python

I'm trying to plot a figure on Python but I get a KeyError. I can't read the column "Cost per Genome" for some reason.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("Sequencing_Cost_Data_Table_Aug2021 - Data Table.csv") #The data can be found here: https://docs.google.com/spreadsheets/d/1auLPEnAp0aI__zIyK9fKBAkLpwQpOFBx9qOWgJoh0xY/edit#gid=729639239
fig = plt.figure()
plt.plot(data["Date"],data["Cost per Genome"])
It looks like either you have interpreted the data wrong into the Dataframe, of made an error with the plot. Read this. It might help you further: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html
P.S. I couldn't acces your spreadsheet. It was request only

PyCharm not showing dataframe when I press ctrl+alt+f10

I'm trying to backtest a simple strategy of mine and the first step is to retrieve historical data from yfinance. However, whenver I run this, I can't see the contents of hist. Instead, it just has this outputthis output
# import all the libraries
import nsetools as ns
import pandas as pd
import numpy
from datetime import datetime
import yfinance as yf
import matplotlib.pyplot as plot
plot.style.use('classic')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="5m")
hist
I really just want to see the historical prices against the time but can't get the dataframe to appear. I would appreciate any input on this.

python/pandas "Kernel died, restarting" while loading a csv file

While trying to load a big csv file (150 MB) I get the error "Kernel died, restarting". Then only code that I use is the following:
import pandas as pd
from pprint import pprint
from pathlib import Path
from datetime import date
import numpy as np
import matplotlib.pyplot as plt
basedaily = pd.read_csv('combined_csv.csv')
Before it used to work, but I do not know why it is not working anymore. I tried to fixed it using engine="python" as follows:
basedaily = pd.read_csv('combined_csv.csv', engine='python')
But it gives me an error execution aborted.
Any help would be welcome!
Thanks in advance!
It may be because of the lack of memory you got this error. You can split your data in many data frames, do your work than you can re merge them, below some useful code that you may use:
import pandas as pd
# the number of row in each data frame
# you can put any value here according to your situation
chunksize = 1000
# the list that contains all the dataframes
list_of_dataframes = []
for df in pd.read_csv('combined_csv.csv', chunksize=chunksize):
# process your data frame here
# then add the current data frame into the list
list_of_dataframes.append(df)
# if you want all the dataframes together, here it is
result = pd.concat(list_of_dataframes)

Categories

Resources