Equations and math in python - python

I just don't know how to explain what I need. I'm not looking for any codes, but just tutorial and direction to get to where I need to be.
Example: I have numbers in a csv file, a and b are in different columns:
header1,header2
a,b
a1,b1
a2,b2
a3,b3
a4,b4
a5,b5
a6,b6
so how would i create something like
[a(b)+a1(b1)+a2(b2)...a6(b6)] /(divided by) [sum of (all b values)]
ok so I know how to code the denominator by using pandas, but how would I code the numerator?
What is this process called, and where can I find a tutorial for it?

I don't know if this is the best method but should work. You can create a new column in pandas which is product of a*b
df['product'] = df['a']*df['b']
You can then simply user sum() to get sum of column b and column product and then divide the product by b:
ans = df['product'].sum() / df['b'].sum()

Not sure if this is the best method to use, but you could use list comprehensions along with the zip() function. With these two, you can get the nominator like this:
[a*b for a, b in zip(df['header1'], df['header2'])]
Chapter 3 of Dive into Python 3 has more on list comprehensions. Here is the documentation on zip() and here a few examples of its usage.

Related

pandas apply lambda multiple arguments no query different dataframes

I noticed previous versions of my question suggested the use of queries, but I have unique data frames that do not have the same column names. I want to code this formula without for loops and only with apply function:
Here is the variables initialized. mu=μ and the other variables are as follows:
mu=pd.DataFrame(0, index=['A','B','C'], columns=['x','y'])
pij=pd.DataFrame(np.random.randn(500,3),columns=['A','B','C'])
X=pd.DataFrame(np.random.randn(500,2),columns=['x','y'])
Next, I am able to use nested for loops to solve this
for j in range(len(mu)):
for i in range(len(X)):
mu.ix[j,:]+=pij.ix[i,j]*X.ix[i,['x','y']]
mu.ix[j,:]=(mu.ix[j,:])/(pij.ix[:,j].sum())
mu
x y
A 0.147804 0.169263
B -0.299590 -0.828494
C -0.199637 0.363423
My question is if it is possible to not use the nested for loops or even remove one for loop to solve this. I have made feeble attempts to no avail.
Even my initial attempts result in multiple NaN's.
The code you pasted suggests you meant the index on mu on the left hand side of the formula to be j, so I'll assume that's the case.
Also since you generated random matrices for your example, my results will turn out different than yours, but I checked that your pasted code gives the same results as my code on the matrices I generated.
The numerator of the RHS of the formula can be computed with the appropriate transpose and matrix multiplication:
>>> num = pij.transpose().dot(X)
>>> num
x y
A -30.352924 -22.405490
B 14.889298 -16.768464
C -24.671337 9.092102
The denominator is simply summing over columns:
>>> denom = pij.sum()
>>> denom
A 23.460325
B 20.106702
C -46.519167
dtype: float64
Then the "division" is element-wise division by column:
>>> num.divide(denom, axis='index')
x y
A -1.293798 -0.955037
B 0.740514 -0.833974
C 0.530348 -0.195449
I'd normalize pij first then take inner product with X. The formula looks like:
mu = (pij / pij.sum()).T.dot(X)

Substitute elements of a matrix at specific coordinates in python

I am trying to solve a "very simple" problem. Not so simple in Python. Given a large matrix A and another smaller matrix B I want to substitute certain elements of A with B.
In Matlab is would look like this:
Given A, row_coord = [1,5,6] col_coord = [2,4], and a matrix B of size(3X2), A[row_coord, col_coord] = B
In Python I tried to use product(row_coord, col_coord) from the itertools to generate the set of all indexes that need to be accessible in A but it does not work. All examples on submatrix substitution refer to block-wise row_coord = col_coord examples. Nothing concrete except for the http://comments.gmane.org/gmane.comp.python.numeric.general/11912 seems to relate to the problem that I am facing and the code in the link does not work.
Note: I know that I can implement what I need via the double for-loop, but on my data such a loop adds 9 secs to the run of one iteration and I am looking for a faster way to implement this.
Any help will be greatly appreciated.
Assuming you're using numpy arrays then (in the case where your B is a scalar) the following code should work to assign the chosen elements to the value of B.
itertools.product will create all of the coordinate pairs which we then convert into a numpy array and use in indexing your original array:
import numpy as np
from itertools import product
A = np.zeros([20,20])
col_coord = [0,1,3]
row_coord = [1,2]
coords = np.array(list(product(row_coord, col_coord)))
B = 1
A[coords[:,0], coords[:,1]] = B
I used this excellent answer by unutbu to work out how to do the indexing.

Python Pandas, using the previous value in Dataframe

I have a function that use b(t-1) variable like:
def test_b(a,b_1):
return a + b_1
Assume the following dataframe:
df = pd.DataFrame({'a':[1,2,3],'b':np.nan})
I am assigning the b_1 initial value:
df['b'].ix[0]=0
and then (using my Matlab experience), i use the loop:
for i in range(1,len(df)):
df['b'].ix[i] = test_b(df['a'].ix[i],df['b'].ix[i-1])
output:
a|b
0|1|0
1|2|2
2|3|5
Is it a more elegant way to do the same?
You never want to do assignments like this, as this is chained indexing
This is a recurrent relation, so not easy way ATM to do this in a super performant manner, though see here.
here is an open issue about this with a pointer to this which uses ifilter to solve the relation.

Return the sum of all values in a dictionary Python

need help making the function sumcounts(D) where D is a dictionary with numbers as values, returns the sum of all the values. Sample output should be like this:
>>> sumcounts({"a":2.5, "b":7.5, "c":100})
110.0
>>> sumcounts({ })
0
>>> sumcounts(strcount("a a a b"))
4
It's already there:
sum(d.values())
Or maybe that:
def sumcount(d):
return sum(d.values())
I'm not sure what you've been taught about dictionaries, so I'll assume the bare minimum.
Create a total variable and set it to 0.
Iterate over all keys in your dictionary using the normal for x in y syntax.
For every key, fetch its respective value from your dictionary using your_dict[key_name].
Add that value to total.
Get an A on your assignment.
Michael already posted the regular Pythonic solution.
The answer as given by Michael above is spot on!
I want to suggest that if you are going to work with large data sets to look at the most excellent Pandas Python Framework.(Maybe overkill for your problem but worth a look)
It accepts dictionaries and transforms it into a data set, for instance
yourdict = {"a":2.5, "b":7.5, "c":100}
dataframe = pandas.Series(yourdict)
You now have a very powerfull data frame that you can realy do a lot of neat stuff on including getting the sum
sum = dateframe.sum()
You can also easily plot it, save it to excel, CSV, get the mean, standard deviation etc...
dateframe.plot() # plots is in matplotlib
dateframe.mean() # gets the mean
dateframe.std() # gets the standard deviation
dataframe.to_csv('name.csv') # writes to csv file
I can realy recomend Pandas. It changed the way I do data business with Python...It compares well with the R data frame by the way.

Recursive generation + filtering. Better non-recursive?

I have the following need (in python):
generate all possible tuples of length 12 (could be more) containing either 0, 1 or 2 (basically, a ternary number with 12 digits)
filter these tuples according to specific criteria, culling those not good, and keeping the ones I need.
As I had to deal with small lengths until now, the functional approach was neat and simple: a recursive function generates all possible tuples, then I cull them with a filter function. Now that I have a larger set, the generation step is taking too much time, much longer than needed as most of the paths in the solution tree will be culled later on, so I could skip their creation.
I have two solutions to solve this:
derecurse the generation into a loop, and apply the filter criteria on each new 12-digits entity
integrate the filtering in the recursive algorithm, so to prevent it stepping into paths that are already doomed.
My preference goes to 1 (seems easier) but I would like to hear your opinion, in particular with an eye towards how a functional programming style deals with such cases.
How about
import itertools
results = []
for x in itertools.product(range(3), repeat=12):
if myfilter(x):
results.append(x)
where myfilter does the selection. Here, for example, only allowing result with 10 or more 1's,
def myfilter(x): # example filter, only take lists with 10 or more 1s
return x.count(1)>=10
That is, my suggestion is your option 1. For some cases it may be slower because (depending on your criteria) you many generate many lists that you don't need, but it's much more general and very easy to code.
Edit: This approach also has a one-liner form, as suggested in the comments by hughdbrown:
results = [x for x in itertools.product(range(3), repeat=12) if myfilter(x)]
itertools has functionality for dealing with this. However, here is a (hardcoded) way of handling with a generator:
T = (0,1,2)
GEN = ((a,b,c,d,e,f,g,h,i,j,k,l) for a in T for b in T for c in T for d in T for e in T for f in T for g in T for h in T for i in T for j in T for k in T for l in T)
for VAL in GEN:
# Filter VAL
print VAL
I'd implement an iterative binary adder or hamming code and run that way.

Categories

Resources