Python Pandas: Increase Maximum Number of Rows - python

I am processing a large text file (500k lines), formatted as below:
S1_A16
0.141,0.009340221649748676
0.141,4.192618196894668E-5
0.11,0.014122135626540204
S1_A17
0.188,2.3292323316081486E-6
0.469,0.007928706856794138
0.172,3.726771730573038E-5
I'm using the code below to return the correlation coefficients of each series, e.g. S!_A16:
import numpy as np
import pandas as pd
import csv
pd.options.display.max_rows = None
fileName = 'wordUnigramPauseTEST.data'
df = pd.read_csv(fileName, names=['pause', 'probability'])
mask = df['pause'].str.match('^S\d+_A\d+')
df['S/A'] = (df['pause']
.where(mask, np.nan)
.fillna(method='ffill'))
df = df.loc[~mask]
result = df.groupby(['S/A']).apply(lambda grp: grp['pause'].corr(grp['probability']))
print(result)
However, on some large files, this returns the error:
Traceback (most recent call last):
File "/Users/adamg/PycharmProjects/Subj_AnswerCorrCoef/GetCorrCoef.py", line 15, in <module>
print(result)
File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 35, in __str__
return self.__bytes__()
File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 47, in __bytes__
return self.__unicode__().encode(encoding, 'replace')
File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 857, in __unicode__
result = self._tidy_repr(min(30, max_rows - 4))
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
I understand that this is related to the print statement, but how do I fix it?
EDIT:
This is related to the maximum number of rows. Does anyone know how to accommodate a greater number of rows?

The error message:
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
is saying None minus an int is a TypeError. If you look at the next-to-last line in the traceback you see that the only subtraction going on there is
max_rows - 4
So max_rows must be None. If you dive into /Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py, near line 857 and ask yourself how max_rows could end up being equal to None, you'll see that somehow
get_option("display.max_rows")
must be returning None.
This part of the code is calling _tidy_repr which is used to summarize the Series. None is the correct value to set when you want pandas to display all lines of the Series.
So this part of the code should not have been reached when max_rows is None.
I've made a pull request to correct this.

Related

How to recognize float variable? getting ValueError: could not convert string to float:

I'm trying to create a GUI for a signal analysis simulation that i'm writing in Python. For that, I use AppJar. However, when I call the function that generates the signal, I get a ValueError like in the title.
I've read every single ValueError post on stackOverflow (i could have missed one maybe, but i did my best) and all of them are about extra spacings, letters that can not be parsed as a floating point number, etc. None of that seems to apply here.
Basically, i'm using this code to call a function to generate my signal:
signal_axes = app.addPlot("signal", *logic.signal(5, 2), 0, 0, 1)
And the relevant part of the function itself (in the file logic.py, which is imported)
def signal(electrodes, length):
velocity = math.sqrt((3.2e-19 * kinetic_energy) / (mass * 1.66e-27))
frequency = velocity / length
This is not the whole function, the variables are all declared and unused variables are used later in the function.
The error specifically points to the line with "frequency = velocity / length", telling me:
TypeError: unsupported operand type(s) for /: 'float' and 'str'
When i try to fix it by using "float(length)" i get the error:
ValueError: could not convert string to float:
In one of the answers on StackExchange someone suggested using .strip() to get rid of invisible spaces. So i tried using:
length.strip()
But that gives me the following error:
AttributeError: 'float' object has no attribute 'strip'
I am slowly descending into madness here. The following code, by the way, stand-alone, works:
import numpy as np
kinetic_energy = 9000
mass = 40
length = 2e-2
velocity = np.sqrt((3.2e-19 * kinetic_energy) / (mass * 1.66e-27))
frequency = float(velocity) / float(length)
print(frequency)
Can anyone see what could be wrong? I've included all the relevant code below, it's not my complete file but this alone should give an output, at least.
run.py
import logic
from appjar import gui
def generate(btn):
app.updatePlot("signal", *logic.signal(app.getEntry("electrodes"), app.getEntry("length")))
showSignalLabels()
def showSignalLabels():
signal_axes.set_xlabel("time (us)")
signal_axes.set_ylabel("amplitude (uV)")
app.refreshPlot("signal")
app = gui()
signal_axes = app.addPlot("signal", *logic.signal(5, 0.02), 0, 0, 1)
app.addLabelEntry("electrodes", 1, 0, 1)
app.addLabelEntry("length", 2, 0, 1)
showSignalLabels()
app.addButton("Generate", generate)
app.go()
logic.py
import numpy as np
import math
import colorednoise as cn
steps = 5000
amplitude = 1
offset_code = 0
kinetic_energy = 9000
mass = 40
centered = 1
def signal(electrodes, length):
velocity = math.sqrt((3.2e-19 * kinetic_energy) / (mass * 1.66e-27))
frequency = velocity / length
time = 2 * (electrodes / frequency)
--- irrelevant code ---
return OutputTime, OutputSignal
edit: here is the full traceback.
Exception in Tkinter callback
Traceback (most recent call last):
File "E:\Internship IOM\WPy64-3720\python-3.7.2.amd64\lib\tkinter\__init__.py", line 1705, in __call__
return self.func(*args)
File "E:\Internship IOM\PythonScripts\appJar\appjar.py", line 3783, in <lambda>
return lambda *args: funcName(param)
File "E:/Internship IOM/PythonScripts/appJar/testrun.py", line 12, in generate
app.updatePlot("signal", *logic.signal(app.getEntry("electrodes"), app.getEntry("length")))
File "E:\Internship IOM\PythonScripts\appJar\logic.py", line 33, in signal
frequency = velocity / length
TypeError: unsupported operand type(s) for /: 'float' and 'str'
You should convert at the calling site, i.e. do:
def generate(btn):
app.updatePlot("signal", *logic.signal(app.getEntry("electrodes"),
float(app.getEntry("length"))))
...
Because otherwise your function logic.signal receives different type (str and float). That's why you receive the other error about float has no strip because somewhere else in your code you do:
signal_axes = app.addPlot("signal", *logic.signal(5, 0.02), 0, 0, 1)
Here you pass it a float.
Since your original error was could not convert string to float with an apparently empty string, you need to take an extra measure to prevent empty values from the app. You can use a try ... except:
def generate(btn):
try:
length = float(app.getEntry("length"))
except ValueError:
# Use some default value here, or re-raise.
length = 0.
app.updatePlot("signal", *logic.signal(app.getEntry("electrodes"), length))

python : TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'str'

I want to build a model data using this code :
def modeldata(filename, ratingMatrix):
result={}
itemmatrix = matrixconvert(ratingMatrix)
current = 0
total = len(itemmatrix)
for item in itemmatrix:
current+=1
if current%100--0: print ("%d / %d" % (current,total))
result[item] = neighbor
#print result
with open(filename+".csv", "wb") as f:
pickle.dump(result, f)
filename variable is a data result from clustering process that contains userid, itemid, and rating,
`then ratingMatrix is a dictionary which contains key (user), subkey(item) and rating
10 dict 1 {'255': 3.0}
. Neighbor contains a similarity data.
0 tuple 2 (1.0, '9790')
I want to build a model data using those things above, I run the function with this code
modeldata(filename, ratingMatrix)
but, I get this error :
1 / 306
.
.
304 / 306
305 / 306
306 / 306
Traceback (most recent call last):
File "<ipython-input-29-5af8931a8f1e>", line 1, in <module>
modeldata(filename, ratingMatrix)
File "<ipython-input-28-220883448026>", line 14, in modeldata
with open(filename+".txt", "wb") as f:
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'str'
Do you have any idea what's wrong with this code ? where is the error from, and how can I make it work?
Thank you for your help....
The error you getting is within the modeldata function and more specifically the opening statement for writing to a file. The error is stating that you can't add using the "+" a string ".txt" to "numpy.ndarray" which looks like it is coming from the variable filename. Make sure your filename variable is the actual file name you want to write to and not numpy array.

covariance of each key in a dictionary

I have a list, which is a set of tickers. For each ticker, I get the the daily return going back six months. I then want to compute the covariance between each ticker. I am having trouble with np.cov, here is my code to test COV:
newStockDict = {}
for i in newList_of_index:
a = Share(i)
dataB = a.get_historical(look_back_date, end_date)
stockData = pd.DataFrame(dataB)
stockData['Daily Return'] = ""
yList = []
for y in range(0,len(stockData)-1):
stockData['Daily Return'][y] = np.log(float(stockData['Adj_Close'][y])/float(stockData['Adj_Close'][y+1]))
yList = stockData['Daily Return'].values.tolist()
newStockDict[stockData['Symbol'][0]] = yList
g = (np.cov(pd.Series((newStockDict[newList_of_index[0]]))), pd.Series(((newStockDict[newList_of_index[1]]))))
return g
My error is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Udaya\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "C:/Users/Udaya/Documents/Python Scripts/SR_YahooFinanceRead.py", line 150, in <module>
print CumReturnStdDev(stock_list)
File "C:/Users/Udaya/Documents/Python Scripts/SR_YahooFinanceRead.py", line 132, in CumReturnStdDev
g = (np.cov(pd.Series((newStockDict[newList_of_index[0]]))), pd.Series(((newStockDict[newList_of_index[1]]))))
File "C:\Users\Udaya\Anaconda\lib\site-packages\numpy\lib\function_base.py", line 1885, in cov
X -= X.mean(axis=1-axis, keepdims=True)
File "C:\Users\Udaya\Anaconda\lib\site-packages\numpy\core\_methods.py", line 66, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: unsupported operand type(s) for +: 'numpy.float64' and 'str'
>>> TypeError: unsupported operand type(s) for +: 'numpy.float64' and 'str'
I've tried using pd.cov on a dataframe, then np.cov. Nothing works. Here I am actually appending the daily returns to a list, then to a dictionary, before I manually calculate an n by n covariance matrix. But I am unable to get np.cov to work.
Please help. The idea is I can easily construct a dataframe of N tickers, with each row being a daily return. but am unable to compute cov with said dataframe, thus this df-->list-->dict process.

Error with "unsupported operand type" when converting feet to inches

I am working on a homework problem, and got the following error:
Traceback (most recent call last):
File "/Users//Dropbox/Homework 3 - 2.py", line 15, in <module>
conversion = convert_feet_to_inches(get_feet)
File "/Users//Dropbox/Homework 3 - 2.py", line 4, in convert_feet_to_inches
calculate_conversion = feet*12 TypeError:
unsupported operand type(s) for *: 'function' and 'int'
Here is my code. I'm trying to convert feet to inches:
def convert_feet_to_inches(feet):
calculate_conversion = feet*12
return calculate_conversion
def get_feet():
ask_for_feet = float(input("Please enter number of feet for conversion "))
return ask_for_feet
def printing_answer():
print (convert_feet_to_inches)
asking_for_feet = get_feet()
conversion = convert_feet_to_inches(get_feet)
print_answer(printing_answer)
What am I doing wrong?
I think you meant to pass asking_for_feet to convert_feet_to_inches instead of the function get_feet in this line:
conversion = convert_feet_to_inches(get_feet)
So that should be:
conversion = convert_feet_to_inches(asking_for_feet)
The error was because get_feet() is a function and you passed it to convert_feet_to_inches() which is taking the argument you pass it and multiplying it by 12. You can't multiply a function by an int so that is what the error was saying. I think what you meant to do was to pass asking_for_feet. So change
conversion = convert_feet_to_inches(get_feet)
to
conversion = convert_feet_to_inches(asking_for_feet)
After that, you have
print_answer(printing_answer)
The function print_answer was not defined yet so change:
def printing_answer():
print (convert_feet_to_inches)
to
def print_answer(answer):
print (answer)
Then you final line of code would be:
print_answer(conversion)

Adding column to pyodbc list

I'm trying to add a column to a list returned from the fetchall() method in pyodbc, but it is giving me an error. Here is my code:
import pyodbc
import time
import calendar
from datetime import date
#variable declaration
today = date.today()
beginRange = date(today.year,today.month,1)
endRange = date(today.year,today.month,2) #limit data to 2 days for testing
#connect to database
connJobDtl = pyodbc.connect("DSN=Global_MWM;UID=Master")
cJobDtl = connJobDtl.cursor()
#database query
cJobDtl.execute("select job,suffix,seq,date_sequence,[...]")
dataJobDtl = cJobDtl.fetchall()
cJobDtl.close()
#add another column to the list, date_sequence formatted to YYYY-MM
dataJobDtl = [x + [x[3].strftime("%Y-%m")] for x in dataJobDtl]
I'm getting this error when I run the script:
File "...\jobDetailScript.py", line 23, in <module>
dataJobDtl = [x + [x[3].strftime("%Y-%m")] for x in dataJobDtl]
TypeError: unsupported operand type(s) for +: 'pyodbc.Row' and 'list'
As a test, I created a representative example in a Python shell and it worked fine, but I manually created a list of lists rather than generating a list from fetchall(). How can I resolve this error?
it seems fairly straightforward - as the error message states you're trying to + two different types of objects. If you just cast the rows as lists it should work, so from my own ad-hoc testing:
>>>cur.execute('<removed>') #one of my own tables
>>>tmp = cur.fetchall()
>>>type(tmp[0]) #this is KEY! You need to change the type
<type 'pyodbc.Row'>
>>>tt = [1,2,3]
>>>tmp[0] + tt #gives the same error you have
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
tmp[0] + tt
TypeError: unsupported operand type(s) for +: 'pyodbc.Row' and 'list'
>>>list(tmp[0]) + tt #returns a list as you wanted
[14520496, ..., 1, 2, 3]

Categories

Resources