Should the pandas at accessor work with multiindexes?

Should the pandas at accessor work with multiindexes? - python

def gen():
yield '1','2',1,2
yield '1','2',1,2
yield '1','3',1,2
yield '2','4',1,2
df = pd.DataFrame(gen(), columns=["a", "b", "c", "d",]).set_index(["a", "b"])
print df # ('a','b') --> ('c','d')
We have:
c d
a b
1 2 1 2
2 1 2
3 1 2
2 4 1 2
When accessing:
print df.loc[('1','3')] # Success
print df.at[('1','3')] # KeyError: '3'
Also note, this also fails is df is a series:
print df['c'].at[('1','3')] # TypeError: _get_value() got multiple values for keyword argument 'takeable'

Related

Iterating on iterator with the rest of the values

How for I get the "rest of the list" after the the current element for an iterator in a loop?
I have a list:
[ "a", "b", "c", "d" ]
They are not actually letters, they are words, but the letters are there for illustration, and there is no reason to expect the list to be small.
For each member of the list, I need to:
def f(depth, list):
for i in list:
print(f"{depth} {i}")
f(depth+1, rest_of_the_list_after_i)
f(0,[ "a", "b", "c", "d" ])
The desired output (with spaces for clarity) would be:
0 a
1 b
2 c
3 d
2 d
1 c
2 d
1 d
0 b
1 c
2 d
1 d
0 c
1 d
0 d
I explored enumerate with little luck.
The reality of the situation is that there is a yield terminating condition. But that's another matter.
I am using (and learning with) python 3.10
This is not homework. I'm 48 :)
You could also look at it like:
0 a 1 b 2 c 3 d
2 d
1 c 2 d
1 d
0 b 1 c 2 d
1 d
0 c 1 d
0 d
That illustrates the stream nature of the thing.

Seems like there are plenty of answers here, but another way to solve your given problem:
def f(depth, l):
for idx, item in enumerate(l):
step = f"{depth * ' '} {depth} {item[0]}"
print(step)
f(depth + 1, l[idx + 1:])
f(0,[ "a", "b", "c", "d" ])

def f(depth, alist):
# you dont need this if you only care about first
# for i in list:
print(f"{depth} {alist[0]}")
next_depth = depth + 1
rest_list = alist[1:]
f(next_depth,rest_list)
this doesnt seem like a very useful method though
def f(depth, alist):
# if you actually want to iterate it
for i,item in enumerate(alist):
print(f"{depth} {alist[0]}")
next_depth = depth + 1
rest_list = alist[i:]
f(next_depth,rest_list)

I guess this code is what you're looking for
def f(depth, lst):
for e,i in enumerate(lst):
print(f"{depth} {i}")
f(depth+1, lst[e+1:])
f(0,[ "a", "b", "c", "d" ])

Cross tabulation between two sets of columns in a DataFrame

I have tried to get crosstab of data specified by slice.
But something wrong in syntax.
data.csv like the following
ia,ib,ic,id,ie,if,ig
a,0,0,0,e,0,g
0,b,0,0,e,f,0
0,0,c,d,0,f,g
And then do python3 test.py like the following
import pandas as pd
import enum
df = pd.read_csv('data.csv')
class Slices(enum.Enum):
first = slice(0, 2)
second = slice(4, 6)
def getCrosstab(*args):
cols1 = []
cols1.append(df.iloc[:, args[0].value])
cols2 = []
cols2.append(df.iloc[:, args[1].value])
print( pd.crosstab(cols1, cols2) )
if __name__ == '__main__':
getCrosstab(Slices.first, Slices.second)
Expected result:
col2 ie if ig
col1
ia 1 0 1
ib 1 1 0
ic 0 1 1
But I had an error:
ValueError: Shape of passed values is (2, 2), indices imply (2, 3)
I can not fully understand the meaning of this error.
Please give me your guidance.

melt twice, once for each set of columns, and then call crosstab:
u = (df.melt(['ia', 'ib', 'ic'], var_name='C', value_name='D')
.melt(['C', 'D'], var_name='A', value_name='B')
.query("B != '0' and D != '0'"))
pd.crosstab(u.A, u.C)
C id ie if ig
A
ia 0 1 0 1
ib 0 1 1 0
ic 1 0 1 1
def crosstab_for(df, sliceA, sliceB):
u = (df.reindex(df.columns[sliceA] | df.columns[sliceB], axis=1)
.melt(df.columns[sliceA], var_name='C', value_name='D')
.melt(['C', 'D'], var_name='A', value_name='B')
.query("B != '0' and D != '0'"))
return pd.crosstab(u.A, u.C)
crosstab_for(df, slice(0, 3), slice(4, 7))
C ie if ig
A
ia 1 0 1
ib 1 1 0
ic 0 1 1

assign value to new column [Python pandas]

I have a scenario where I am running two functions in a script:
test.py :
def func1():
df1=pd.read_csv('test1.csv')
val1=df['col1'].mean().round(2)
return va11
def func2():
df2=pd.read_csv('test2.csv')
val2=df['col1'].mean().round(2)
return val2
def func3():
dataf = pd.read_csv('test3.csv')
col1=dataf['area']
col2 = dataf['overall']
dataf['overall']=val1 # value from val1 ->leads to error
dataf['overall']=val2 #value from val2 ->leads to error
Here I am reading test1.csv & test2.csv file and I am storing the mean value in variable "val1" & "val2" respectively and returning the same.
These variable values I want to store in a new test3.csv file which is having two cols and values should be stored one after one(appending). By the above it is not working out & couldn't find anything on internet as such. Any help would be great.

You need pass variables as parameters in function func3, and if only difference in func1 and func2 is file name, create only one function with parameetr .
Thanks for idea cᴏʟᴅsᴘᴇᴇᴅ ;)
def func1(file):
df=pd.read_csv(file)
val=df['col1'].mean().round(2)
return val
a = func1('test1.csv')
b = func1('test2.csv')
def func3(val1=a, val2=b):
dataf = pd.read_csv('test3.csv')
col1=dataf['area']
col2 = dataf['overall']
dataf.iloc[::2, dataf.columns.get_loc('overall')] = val1
dataf.iloc[1::2, dataf.columns.get_loc('overall')] = val2
return dataf
Sample:
dataf = pd.DataFrame({'overall':[1,7,8,9,4],
'col':list('abcde')})
print (dataf)
col overall
0 a 1
1 b 7
2 c 8
3 d 9
4 e 4
val1 = 20
val2 = 50
dataf.iloc[::2, dataf.columns.get_loc('overall')] = val1
dataf.iloc[1::2, dataf.columns.get_loc('overall')] = val2
print (dataf)
col overall
0 a 20
1 b 50
2 c 20
3 d 50
4 e 20
General solution for append N values from list - create array by numpy.tile and then assign to new column:
val =[1,8,4]
a = np.tile(val, int(len(dataf) / len(val))+2)[:len(dataf)]
dataf['overall'] = a
print (dataf)
col overall
0 a 1
1 b 8
2 c 4
3 d 1
4 e 8

Create new pandas column based on start of text string from other column

I have a pandas dataframe with a text column.
I'd like to create a new column in which values are conditional on the start of the text string from the text column.
So if the 30 first characters of the text column:
== 'xxx...xxx' then return value 1
== 'yyy...yyy' then return value 2
== 'zzz...zzz' then return value 3
if none of the above return 0

There is possible use multiple numpy.where but if more conditions use apply:
For select strings from strats use indexing with str.
df = pd.DataFrame({'A':['xxxss','yyyee','zzzswee','sss'],
'B':[4,5,6,8]})
print (df)
A B
0 xxxss 4
1 yyyee 5
2 zzzswee 6
3 sss 8
#check first 3 values
a = df.A.str[:3]
df['new'] = np.where(a == 'xxx', 1,
np.where(a == 'yyy', 2,
np.where(a == 'zzz', 3, 0)))
print (df)
A B new
0 xxxss 4 1
1 yyyee 5 2
2 zzzswee 6 3
3 sss 8 0
def f(x):
#print (x)
if x == 'xxx':
return 1
elif x == 'yyy':
return 2
elif x == 'zzz':
return 3
else:
return 0
df['new'] = df.A.str[:3].apply(f)
print (df)
A B new
0 xxxss 4 1
1 yyyee 5 2
2 zzzswee 6 3
3 sss 8 0
EDIT:
If length is different, only need:
df['new'] = np.where(df.A.str[:3] == 'xxx', 1,
np.where(df.A.str[:2] == 'yy', 2,
np.where(df.A.str[:1] == 'z', 3, 0)))
print (df)
A B new
0 xxxss 4 1
1 yyyee 5 2
2 zzzswee 6 3
3 sss 8 0
EDIT1:
Thanks for idea to Quickbeam2k1 use str.startswith for check starts of each string:
df['new'] = np.where(df.A.str.startswith('xxx'), 1,
np.where(df.A.str.startswith('yy'), 2,
np.where(df.A.str.startswith('z'), 3, 0)))
print (df)
A B new
0 xxxss 4 1
1 yyyee 5 2
2 zzzswee 6 3
3 sss 8 0

A different and slower solution:
However, the advantage is that the mapping from patterns is a function parameter (with implicit default 0 value)
def map_starts_with(pat_map):
def map_string(t):
pats = [pat for pat in pat_map.keys() if t.startswith(pat)]
return pat_map.get(pats[0]) if len(pats) > 0 else 0
# get only value of "first" pattern if at least one pattern is found
return map_string
df = pd.DataFrame({'col':[ 'xx', 'aaaaaa', 'c']})
col
0 xx
1 aaaaaa
2 c
mapping = { 'aaa':4 ,'c':3}
df.col.apply(lambda x: map_starts_with(mapping)(x))
0 0
1 4
2 3
Note the we also used currying here. I'm wondering if this approach can be implemented using additional pandas or numpy functionality.
Note that the "first" pattern match may depend on the traversal order of the dict keys. This is irrelephant if there is no overlap in the keys. (Jezrael's solution, or its direct generalization thereof, will also choose one element for the match, but in a more predictable manner)

Transpose nested generators

Is there a clear way to iterate over items for each generator in a list? I believe the simplest way to show the essence of the question is o proved an expample. Here it is
0. Assume we have an function returning generator:
def gen_fun(hint):
for i in range(1,10):
yield "%s %i" % (hint, i)
1. Clear solution with straight iteration order:
hints = ["a", "b", "c"]
for hint in hints:
for txt in gen_fun(hint):
print(txt)
This prints
a 1
a 2
a 3
...
b 1
b 2
b 3
...
2. Cumbersome solution with inverted iterating order
hints = ["a", "b", "c"]
generators = list(map(gen_fun, hints))
any = True
while any:
any = False
for g in generators:
try:
print(next(g))
any = True
except StopIteration:
pass
This prints
a 1
b 1
c 1
a 2
b 2
...
This works as expected and does what I want.
Bonus points:
The same task, but gen_fun ranges can differ, i.e
def gen_fun(hint):
if hint == 'a':
m = 5
else:
m = 10
for i in range(1,m):
yield "%s %i" % (hint, i)
The correct output for this case is:
a 1
b 1
c 1
a 2
b 2
c 2
a 3
b 3
c 3
a 4
b 4
c 4
b 5
c 5
b 6
c 6
b 7
c 7
b 8
c 8
b 9
c 9
The querstion:
Is there a way to implement case 2 cleaner?

If i understand the question correctly, you can use zip() to achieve the same thing as that whole while any loop:
hints = ["a", "b", "c"]
generators = list(map(gen_fun, hints))
for x in zip(*generators):
for txt in x:
print(txt)
output:
a 1
b 1
c 1
a 2
b 2
...
UPDATE:
If the generators are of different length, zip 'trims' them all to the shortest. you can use itertools.izip_longest (as suggested by this q/a) to achieve the opposite behaviour and continue yielding until the longest generator is exhausted. You'll need to filter out the padded values though:
hints = ["a", "b", "c"]
generators = list(map(gen_fun, hints))
for x in zip_longest(*generators):
for txt in x:
if txt:
print(txt)

You might want to look into itertools.product:
from itertools import product
# Case 1
for tup in product('abc', range(1,4)):
print('{0} {1}'.format(*tup))
print '---'
# Case 2
from itertools import product
for tup in product(range(1,4), 'abc'):
print('{1} {0}'.format(*tup))
Output:
a 1
a 2
a 3
b 1
b 2
b 3
c 1
c 2
c 3
---
a 1
b 1
c 1
a 2
b 2
c 2
a 3
b 3
c 3
Note that the different between case 1 and 2 are just the order of parameters passed into the product function and the print statement.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Should the pandas at accessor work with multiindexes? - python

Related

Iterating on iterator with the rest of the values

Cross tabulation between two sets of columns in a DataFrame

assign value to new column [Python pandas]

Create new pandas column based on start of text string from other column

Transpose nested generators

Categories

Resources