Python - input array has wrong dimensions - python

I'm an absolute beginner when it comes to coding and recently I discovered talib.
I've been trying to calculate an RSI, but I encountered an error. I've been looking up the internet for a solution like I usually do, but without success. I'm guessing my data has a wrong datatype for the talib.RSI function, but that's about how far my knowledge goes.
Would be great if someone could come up with a solution and expand a little bit on it so I might be able to learn a bit along the way :-)
Many thanks in advance,
Mattie
import pandas as pd
import talib
import numpy as np
data = pd.read_excel (r'name.xlsx')
df = pd.DataFrame(data, columns = ['close'])
RSI_PERIOD = 14
close_prices = pd.DataFrame(df, columns = ['close'])
np_close_prices = np.array(close_prices)
print(np_close_prices)
rsi = talib.RSI(np_close_prices, RSI_PERIOD)
print(rsi)
--------------------------------------------------------------------------- Exception Traceback (most recent call
last) in
12 print(np_close_prices)
13
---> 14 rsi = talib.RSI(np_close_prices, RSI_PERIOD)
15 print(rsi)
~\anaconda3\lib\site-packages\talib_init_.py in wrapper(*args,
**kwargs)
25
26 if index is None:
---> 27 return func(*args, **kwargs)
28
29 # Use Series' float64 values if pandas, else use values as passed
talib_func.pxi in talib._ta_lib.RSI()
talib_func.pxi in talib._ta_lib.check_array()
Exception: input array has wrong dimensions

#kcw78 thanks for your reply.
I looked up the internet some more before I saw your reply and managed to find an answer. I have no clue what lambda is or what it does yet, but hopefully one day I'll find out and understand how this fixes the problem :)
import pandas as pd
import talib
import numpy as np
RSI_PERIOD = 14
data = pd.read_excel (r'name.xlsx')
df = pd.DataFrame(data, columns = ['close'])
rsi = df.apply(lambda x: talib.RSI(x, RSI_PERIOD))
rsi.columns = ['RSI']
print(rsi)

Related

Pandas Dataframe display total

Here is an example dataset found from google search close to my datasets in my environment
I'm trying to get output like this
import pandas as pd
import numpy as np
data = {'Product':['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box','Markers','Markers','Pen'],
'State':['Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'],
'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}
df=pd.DataFrame(data, columns=['Product','State','Sales'])
df1=df.sort_values('State')
#df1['Total']=df1.groupby('State').count()
df1['line']=df1.groupby('State').cumcount()+1
print(df1.to_string(index=False))
Commented out line throws this error
ValueError: Columns must be same length as key
Tried with size() it gives NaN for all rows
Hope someone points me to right direction
Thanks in advance
I think this should work for 'Total':
df1['Total']=df1.groupby('State')['Product'].transform(lambda x: x.count())
Try this:
df = pd.DataFrame(data).sort_values("State")
grp = df.groupby("State")
df["Total"] = grp["State"].transform("size")
df["line"] = grp.cumcount() + 1

For loop using enumerate runs more than expected for a pandas Data Frame

So, I was working on titanic dataset to extract Title(Mr,Ms,Mrs) from Name column from Data frame(df). Its has 1309 rows.
for ind,name in enumerate(df['Name']):
if type(name)==str:
inf = name.find(', ') + 2
df.loc[ind+1,'Title'] = name[inf:name.find('.')]
else :
print(name,ind)
This peice of code gives the following output
nan 1309
As supposed it had to stop for ind=1308, but it goes one step further even if not indicated to do so.
What could be the flaw here? Is it due to the fact that I am using 1 based indexing of the data frame?
If so, what could be done here to prevent such behaviour?
I am new to this platform, so please ask for clarifications in case of any discrepancies.
Here is a short Example:-
import numpy as np
import pandas as pd
dict1 = {'Name':['Hey, Mr.','Hello, Ms.','Hi, Mrs,','Welcome, Master.','Yes, Mr.'],'ind':[1,2,3,4,5]}
df = pd.DataFrame(data = dict1)
df.set_index('ind')
for ind,name in enumerate(df['Name']):
if type(name)==str:
inf = name.find(', ') + 2
df.loc[ind+1,'Title'] = name[inf:name.find('.')]
else :
print(name,ind)
print(df['Title'])

Convert Julia Dataframe to Python Pandas data frame

I am trying to convert a PyCall.jlwrap ('Julia') object to a Pandas dataframe. I'm using PyJulia to run an optimization algorithm in Julia, which spits out a dataframe object as a result. I would like to convert that object to a Pandas dataframe.
This is a similar question as posed 5 years ago here. However, there is not any code to suggest how to accomplish the transfer.
Any help would be useful!
Here is the code I currently have set-up. It's not that useful to know what is happening in the background of my 'optimization_program' but just to know that what is returned by the 'run_hybrid' and 'run_storage' commands returns a data frame:
### load in necessary modules for pyjulia
from julia import Main as jl
##load my user defined module
jl.include("optimization_program_v3.jl")
##run function from module
results = jl.run_hybrid(generic_inputs)
##test type of item returned
jl.typeof(results)
returns: <PyCall.jlwrap DataFrame>
##try to convert to pandas
test = pd.DataFrame(results)
Value Error Traceback (most recent call last)
in ()
----> 1 test = pd.DataFrame(results)
in init(self, data, index, columns, dtype, copy)
420 dtype=values.dtype, copy=False)
421 else:
422 raise ValueError('DataFrame constructor not properly called!')
423
424 NDFrame.init(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
I get an error (reading a Julia DataFrame in Python), if I use the DataFrames.jl package. However, it seems to work nicely with the Pandas.jl package:
>>> from julia import Main as jl
>>> import pandas as pd
>>> jl.eval('using Pandas')
>>> res = jl.eval('DataFrame(Dict(:age=>[27, 29, 27], :name=>["James", "Jill", "Jake"]))')
>>> jl.typeof(res)
#<PyCall.jlwrap PyObject>
>>> df = pd.DataFrame(res)
>>> df
age name
0 27 James
1 29 Jill
2 27 Jake
This was tested on Win10, with Python 3.8.2, and Julia 1.3.1

'expected string or buffer' when using re.match with pandas

I am trying to clean some data from a csv file. I need to make sure that whatever is in the 'Duration' category matches a certain format. This is how I went about that:
import re
import pandas as pd
data_path = './ufos.csv'
ufos = pd.read_csv(data_path)
valid_duration = re.compile('^[0-9]+ (seconds|minutes|hours|days)$')
ufos_clean = ufos[valid_duration.match(ufos.Duration)]
ufos_clean.head()
This gives me the following error:
TypeErrorTraceback (most recent call last)
<ipython-input-4-5ebeaec39a83> in <module>()
6
7 valid_duration = re.compile('^[0-9]+ (seconds|minutes|hours|days)$')
----> 8 ufos_clean = ufos[valid_duration.match(ufos.Duration)]
9
10 ufos_clean.head()
TypeError: expected string or buffer
I used a similar method to clean data before without the regular expressions. What am I doing wrong?
Edit:
MaxU got me the closest, but what ended up working was:
valid_duration_RE = '^[0-9]+ (seconds|minutes|hours|days)$'
ufos_clean = ufos
ufos_clean = ufos_clean[ufos.Duration.str.contains(valid_duration_RE)]
There's probably a lot of redundancy in there, I'm pretty new to python, but it worked.
You can use vectorized .str.match() method:
valid_duration_RE = '^[0-9]+ (seconds|minutes|hours|days)$'
ufos_clean = ufos[ufos.Duration.str.contains(valid_duration_RE)]
I guess you want it the other way round (not tested):
import re
import pandas as pd
data_path = './ufos.csv'
ufos = pd.read_csv(data_path)
def cleanit(val):
# your regex solution here
pass
ufos['ufos_clean'] = ufos['Duration'].apply(cleanit)
After all, ufos is a DataFrame.

Load R data frame into Python and convert to Pandas data frame

I am trying to run the following code in an R data frame using Python.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import os
import pandas as pd
import timeit
from rpy2.robjects import r
from rpy2.robjects import pandas2ri
pandas2ri.activate()
start = timeit.default_timer()
def f(x):
return fuzz.partial_ratio(str(x["sig1"]),str(x["sig2"]))
def fu_match(file):
f1=r.load(file)
f1=pandas2ri.ri2py(f1)
f1["partial_ratio"]=f1.apply(f, axis=1)
f1=f1.loc[f1["partial_ratio"]>90]
f1.to_csv("test.csv")
stop = timeit.default_timer()
print stop - start
fu_match('test_full.RData')
Here is the error.
AttributeError: 'numpy.ndarray' object has no attribute 'apply'
I guess the problem has to do with the conversion from R to Pandas data frame. I know this is a repeated question, but I have tried all the solutions given to previous questions with no success.
Please, any help will be much appreciated.
EDIT: Here is the head of .RData.
city sig1 sig2
1 19 claudiopillonrobertoscolari almeidabartolomeufrancisco
2 19 claudiopillonrobertoscolari cruzricardosantasergiosilva
3 19 claudiopillonrobertoscolari costajorgesilva
4 19 claudiopillonrobertoscolari costafrancisconaifesilva
5 19 claudiopillonrobertoscolari camarajoseluizreis
6 19 claudiopillonrobertoscolari almeidafilhojoaopimentel
This line
f1=pandas2ri.ri2py(f1)
is setting f1 to be a numpy.ndarray when I think you expect it to be a pandas.DataFrame.
You can cast the array into a DataFrame with something like
f1 = pd.DataFrame(data=f1)
but you won't have your column names defined (which you use in f(x)). What is the structure of test_full.RData? Do you want to manually define your column names? If so
f1 = pd.DataFrame(data=f1, columns=("my", "column", "names"))
should do the trick.
BUT I would suggest you look at using a more standard data format, maybe .csv. pandas has good support for this, and I expect R does too. Check out the docs.

Categories

Resources