pagerank python implementation - python

I want to convert PageRank MATLAB/Octave implementation to python, but when it comes to:
a=array([[inf]])
last_v = dot(ones(N,1),a)
there is a TypeError.
Traceback (most recent call last):
File "/home/googcheng/page_rank.py", line 18, in <module>
pagerank(0,0)
File "/home/googcheng/page_rank.py", line 14, in pagerank
last_v = dot(ones(N,1),a)
File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 1819, in ones
a = empty(shape, dtype, order)
TypeError: data type not understood
some code https://gist.github.com/3722398

The first argument to ones, the shape, should be a tuple. Change ones(N,1) to ones((N,1)).

Related

spatial regression in Python - read matrix from list

I have a following problem. I am following this example about spatial regression in Python:
import numpy
import libpysal
import spreg
import pickle
# Read spatial data
ww = libpysal.io.open(libpysal.examples.get_path("baltim_q.gal"))
w = ww.read()
ww.close()
w_name = "baltim_q.gal"
w.transform = "r"
Example above works. But I would like to read my own spatial matrix which I have now as a list of lists. See my approach:
ww = libpysal.io.open(matrix)
But I got this error message:
Traceback (most recent call last):
File "/usr/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/libpysal/io/fileio.py", line 90, in __new__
cls.__registry[cls.getType(dataPath, mode, dataFormat)][mode][0]
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/libpysal/io/fileio.py", line 105, in getType
ext = os.path.splitext(dataPath)[1]
File "/usr/lib/python3.8/posixpath.py", line 118, in splitext
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not list
this is how matrix looks like:
[[0, 2, 1], [2, 0, 4], [1, 4, 0]]
EDIT:
If I try to insert my matrix into the GM_Lag like this:
model = spreg.GM_Lag(
y,
X,
w=matrix,
)
I got following error:
warn("w must be API-compatible pysal weights object")
Traceback (most recent call last):
File "/usr/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 2, in <module>
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/spreg/twosls_sp.py", line 469, in __init__
USER.check_weights(w, y, w_required=True)
File "/home/vojta/Desktop/INTERNET_HANDEL/ZASILKOVNA/optimal-delivery-branches/venv/lib/python3.8/site-packages/spreg/user_output.py", line 444, in check_weights
if w.n != y.shape[0] and time == False:
AttributeError: 'list' object has no attribute 'n'
EDIT 2:
This is how I read the list of lists:
import pickle
with open("weighted_matrix.pkl", "rb") as f:
matrix = pickle.load(f)
How can I insert list of lists into spreg.GM_Lag ? Thanks
Why do you want to pass it to the libpysal.io.open method? If I understand correctly this code, you first open a file, then read it (and the read method seems to be returning a List). So in your case, where you already have the matrix, you don't need to neither open nor read any file.
What will be needed though is what w is supposed to look like here: w = ww.read(). If it is a simple matrix, then you can initialize w = matrix. If the read method also format the data a certain way, you'll need to do it another way. If you could describe the expected behavior of the read method (e.g. what does the input file contain, and what is returned), it would be useful.
As mentioned, as the data is formatted into a libpysal.weights object, you must build one yourself. This can supposedly be done with this method libpysal.weights.W. (Read the doc too fast).

How can I access the raw data from individual data frame cells using pandas?

I'm working on a custom elo/team rating calculator using a CSV file as input. I was able to get similar logic for this working in Excel with openpyxl but I am now trying to implement it in pandas for better integration with jupyter and matplotlib. I'm having issues running calculations on individual cells in the data frames, however.
def find_team_row(team_name):
switcher = {
'100T': 0,
'C9': 1,
'CG': 2,
'CLG': 3,
'FOX': 4,
'FLY': 5,
'GGS': 6,
'OPT': 7,
'TL': 8,
'TSM': 9,
}
return switcher.get(team_name, None)
def update_df():
for column in range(1, df.columns.get_loc(df.columns[-1]), 3):
for row in range(0,9):
init_rating = df.iloc[row,column]
opponent_name = df.iloc[row,column+1]
match_result = df.iloc[row,column+2]
oppo_rating = df.iloc[find_team_row(opponent_name),column]
These exceptions are thrown with respect to this code block:
ajisaksonmac:elo_calc ajisakson$ /Library/Frameworks/Python.framework/Versions/3.7/bin/python3 "/Users/ajisakson/Google Drive/swe_projects/elo_calc/test2.py"
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 235, in _has_valid_tuple
self._validate_key(k, i)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 2035, in _validate_key
"a [{types}]".format(types=self._valid_types)
ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/ajisakson/Google Drive/swe_projects/elo_calc/test2.py", line 74, in <module>
update_df()
File "/Users/ajisakson/Google Drive/swe_projects/elo_calc/test2.py", line 27, in update_df
oppo_rating = df.iloc[find_team_row(opponent_name),column]
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1418, in __getitem__
return self._getitem_tuple(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 2092, in _getitem_tuple
self._has_valid_tuple(tup)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 239, in _has_valid_tuple
"[{types}] types".format(types=self._valid_types)
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
So I'm trying to access individual cells in the data frame using iloc but I receive this ValueError where oppo_rating is assigned. I tried a number of different things to convert both of the iloc parameters to integers including int(), .iat(), .at(), .loc(), etc. and I continue to receive errors suggesting that one of my parameters is not an integer.
Here is the first part of the data frame I'm trying to manipulate/make calculations on:
example of the pandas data frame

How do I print very large value of N%M using Python?

For example if I give input 5 and 18. I want to convert 5 to five ones i.e. (11111)%18 = 5. I can do this using print(int(('1'*N))%M)
but I want same with very large numbers i.e. N=338692981500, M=1838828
now my N should be converted in 111111111111111111........1111 (Ntimes)%1838828 = 482531. When I did this I'm getting memory error.
N,M=map(int,input().split())
print(int(('1'*N))%M)
338692981500 1838828
Traceback (most recent call last):
File "testing.py", line 687, in <module>
result=int ('1'*N)%M;
MemoryError

How to Read nrows From Pandas HDF Storage?

What am I trying to do?
pd.read_csv(... nrows=###) can read the top nrows of a file. I'd like to do the same while using pd.read_hdf(...).
What is the problem?
I am confused by the documentation. start and stop look like what I need but when I try it, a ValueError is returned. The second thing I tried was using nrows=10 thinking that it might be an allowable **kwargs. When I do, no errors are thrown but also the full dataset is returned instead of just 10 rows.
Question: How does one correctly read a smaller subset of rows from an HDF file? (edit: without having to read the whole thing into memory first!)
Below is my interactive session:
>>> import pandas as pd
>>> df = pd.read_hdf('storage.h5')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
df = pd.read_hdf('storage.h5')
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 367, in read_hdf
raise ValueError('key must be provided when HDF5 file '
ValueError: key must be provided when HDF5 file contains multiple datasets.
>>> import h5py
>>> f = h5py.File('storage.h5', mode='r')
>>> list(f.keys())[0]
'table'
>>> f.close()
>>> df = pd.read_hdf('storage.h5', key='table', start=0, stop=10)
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
df = pd.read_hdf('storage.h5', key='table', start=0, stop=10)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 370, in read_hdf
return store.select(key, auto_close=auto_close, **kwargs)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 740, in select
return it.get_result()
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 1447, in get_result
results = self.func(self.start, self.stop, where)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 733, in func
columns=columns, **kwargs)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 2890, in read
return self.obj_type(BlockManager(blocks, axes))
File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 2795, in __init__
self._verify_integrity()
File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 3006, in _verify_integrity
construction_error(tot_items, block.shape[1:], self.axes)
File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 4280, in construction_error
passed, implied))
ValueError: Shape of passed values is (614, 593430), indices imply (614, 10)
>>> df = pd.read_hdf('storage.h5', key='table', nrows=10)
>>> df.shape
(593430, 614)
Edit:
I just attempted to use where:
mylist = list(range(30))
df = pd.read_hdf('storage.h5', key='table', where='index=mylist')
Received a TypeError indicating a Fixed format store (the default format value of df.to_hdf(...)):
TypeError: cannot pass a where specification when reading from a
Fixed format store. this store must be selected in its entirety
Does this mean I can't select a subset of rows if the format is Fixed format?
I ran into the same problem. I am pretty certain by now that https://github.com/pandas-dev/pandas/issues/11188 tracks this very problem. It is a ticket from 2015 and it contains a repro. Jeff Reback suggested that this is actually a bug, and he even pointed us towards a solution back in 2015. It's just that nobody built that solution yet. I might have a try.
Seems like this now works, at least with pandas 1.0.1. Just provide start and stop arguments:
df = pd.read_hdf('test.h5', '/floats/trajectories', start=0, stop=5)

broadcasting linalg.pinv on a 3D theano tensor

in the example below, there is a 3d numpy matrix of size (4, 3, 3)+ a solution about how to calculate pinv of each of 4 of those 3*3 matrices in numpy. I also tried to use the same function worked in numpy, in theano hoping that it is implemented the same, but it failed. Any idea how to do it in theano?
dt = np.dtype(np.float32)
a=[[[12,3,1],
[2,4,1],
[2,4,2],],
[[12,3,3],
[2,4,4],
[2,4,5],],
[[12,3,6],
[2,4,5],
[2,4,4],],
[[12,3,3],
[2,4,5],
[2,4,6]]]
a=np.asarray(a,dtype=dt)
print(a.shape)
apinv=np.zeros((4,3,3))
print(np.linalg.pinv(a[0,:,:]).shape)
#numpy solution
apinv = map(lambda n: np.linalg.pinv(n), a)
apinv = np.asarray(apinv,dtype=dt)
#theano solution (not working)
at=T.tensor3('a')
apinvt = map(lambda n: T.nlinalg.pinv(n), at)
The error is:
Original exception was:
Traceback (most recent call last):
File "pydevd.py", line 2403, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "pydevd.py", line 1794, in run
launch(file, globals, locals) # execute the script
File "exp_thn_pinv_map.py", line 35, in <module>
apinvt = map(lambda n: T.nlinalg.pinv(n), at)
File "theano/tensor/var.py", line 549, in __iter__
raise TypeError(('TensorType does not support iteration. '
TypeError: TensorType does not support iteration. Maybe you are using builtin.sum instead of theano.tensor.sum? (Maybe .max?)
The error message is
Traceback (most recent call last):
File "D:/Dropbox/source/intro_theano/pinv.py", line 32, in <module>
apinvt = map(lambda n: T.nlinalg.pinv(n), at)
File "d:\dropbox\source\theano\theano\tensor\var.py", line 549, in __iter__
raise TypeError(('TensorType does not support iteration. '
TypeError: TensorType does not support iteration. Maybe you are using builtin.sum instead of theano.tensor.sum? (Maybe .max?)
This is occurring because, as the error message indicates, the symbolic variable at is not iterable.
The fundamental problem here is that you're incorrectly mixing immediately executed Python code with delayed execution Theano symbolic code.
You need to use a symbolic loop, not a Python loop. The correct solution is to use Theano's scan operator:
at=T.tensor3('a')
apinvt, _ = theano.scan(lambda n: T.nlinalg.pinv(n), at, strict=True)
f = theano.function([at], apinvt)
print np.allclose(f(a), apinv)

Categories

Resources