Data management with Python base on vlookup function - python

I am trying to transform a data matrix in Python.
I want to change from :
Well A B C D
Production 1 2 3 4
to
Well Production
A 1
B 2
C 3
D 4
It is a simple task in Excel but I would like to know how to do it in Python.
How do I do it? I am sure there is a very simple way to do it but I just have not come across it?

I recommend setting index to Well before transposing. Transposing first you'll be left with a random column 0 and Production will become an observation in the dataframe.
df.T
0 # this is your column
Well Production # this becomes an observation
A 1
B 2
C 3
D 4
Do this:
df.set_index('Well').T
Well Production
A 1
B 2
C 3
D 4

If your data is contained in a dataframe you can simply transpose it.
data = data.transpose()
or equivalently
data = data.T

Convert your records into csv list format
l1 = [ 'Well', 'A', 'B', 'C', 'D', ]
l2 = [ 'Production', '1', '2', '3', '4' ]
for i,j in zip(l1, l2):
print ('%s %4s' %(i,j))
Output:
Well Production
A 1
B 2
C 3
D 4

Related

Function Value with Combination(or Permutation) of Variables and Assign to Dataframe

I have n variables. Suppose n equals 3 in this case. I want to apply one function to all of the combinations(or permutations, depending on how you want to solve this) of variables and store the result in the same row and column in dataframe.
a = 1
b = 2
c = 3
indexes = ['a', 'b', 'c']
df = pd.DataFrame({x:np.nan for x in indexes}, index=indexes)
If I apply sum(the function can be anything), then the result that I want to get is like this:
a b c
a 2 3 4
b 3 4 5
c 4 5 6
I can only think of iterating all the variables, apply the function one by one, and use the index of the iterators to set the value in the dataframe. Is there any better solution?
You can use apply and return a pd.Series for that effect. In such cases, pandas uses the series indices as columns in the resulting dataframe.
s = pd.Series({"a": 1, "b": 2, "c": 3})
s.apply(lambda x: x+s)
Just note that the operation you do is between an element and a series.
I believe you need broadcast sum of array created from variables if performance is important:
a = 1
b = 2
c = 3
indexes = ['a', 'b', 'c']
arr = np.array([a,b,c])
df = pd.DataFrame(arr + arr[:, None], index=indexes, columns=indexes)
print (df)
a b c
a 2 3 4
b 3 4 5
c 4 5 6

How to add multiple columns to dataframe by function

If I have a df such as this:
a b
0 1 3
1 2 4
I can use df['c'] = '' and df['d'] = -1 to add 2 columns and become this:
a b c d
0 1 3 -1
1 2 4 -1
How can I make the code within a function, so I can apply that function to df and add all the columns at once, instead of adding them one by one seperately as above? Thanks
Create a dictionary:
dictionary= { 'c':'', 'd':-1 }
def new_columns(df, dictionary):
return df.assign(**dictionary)
then call it with your df:
df = new_columns(df, dictionary)
or just ( if you don't need a function call, not sure what your use case is) :
df.assign(**dictionary)
def update_df(a_df, new_cols_names, new_cols_vals):
for n, v in zip(new_cols_names, new_cols_vals):
a_df[n] = v
update_df(df, ['c', 'd', 'e'], ['', 5, 6])

loop through a single column in one dataframe compare to a column in another dataframe create new column in first dataframe using pandas

right now I have two dataframes they look like:
c = pd.DataFrame({'my_goal':[3, 4, 5, 6, 7],
'low_number': [0,100,1000,2000,3000],
'high_number': [100,1000,2000,3000,4000]})
and
a= pd.DataFrame({'a':['a', 'b', 'c', 'd', 'e'],
'Number':[50, 500, 1030, 2005 , 3575]})
what I want to do is if 'Number' falls between the low number and the high number I want it to bring back the value in 'my_goal'. For example if we look at 'a' it's 'Number is is 100 so I want it to bring back 3. I also want to create a dataframe that contains all the columns from dataframe a and the 'my_goal' column from dataframe c. I want the output to look like:
I tried making my high and low numbers into a separate list and running a for loop from that, but all that gives me are 'my_goal' numbers:
low_number= 'low_number': [0,100,1000,2000,3000]
for i in a:
if float(i) >= low_number:
a = c['my_goal']
print(a)
You can use pd.cut, when I see ranges, I first think of pd.cut:
dfa = pd.DataFrame(a)
dfc = pd.DataFrame(c)
dfa['my_goal'] = pd.cut(dfa['Number'],
bins=[0]+dfc['high_number'].tolist(),
labels=dfc['my_goal'])
Output:
a Number my_goal
0 a 50 3
1 b 500 4
2 c 1030 5
3 d 2005 6
4 e 3575 7
I changed row 4 slightly to include a test case where the condition is not met. You can concat a with rows of c where the condition is true.
a= pd.DataFrame({'a':['a', 'b', 'c', 'd', 'e'],'Number':[50, 500, 1030, 1995 , 3575]})
cond= a.Number.between( c.low_number, c.high_number)
pd.concat([a, c.loc[cond, ['my_goal']] ], axis = 1, join = 'inner')
Number a my_goal
0 50 a 3
1 500 b 4
2 1030 c 5
4 3575 e 7

Unpack a list outside of a function for pandas DataFrame multi-index

I want to add a multi-index column to an existing pandas dataframe df. An example:
d = {('a','b'):[1,2,3], ('c', 'd'): [4,5,6]}
df = pd.DataFrame(d)
The resulting dataframe is:
a c
b d
0 1 4
1 2 5
2 3 6
Now I want to add a new column to the dataframe. The correct way to do that would be to use df['e', 'f'] = [7,8,9]. However, I would like to use the list new_key as the key. Normally I could use the asterisk *, but apparently it cannot be used outside of functions. So I get the following errors.
new_key = ['e','f']
df[new_key] = [7,8,9]
> KeyError: "['e' 'f'] not in index"
df[*new_key] = [7,8,9]
> SyntaxError: invalid syntax
Does anyone know how to solve this?
Cast to a tuple first:
df[tuple(new_key)] = [7,8,9]
a c e
b d f
0 1 4 7
1 2 5 8
2 3 6 9

Python Pandas Dataframe: Using Values in Column to Create New Columns

I've searched several books and sites and I can't find anything that quite matches what I'm trying to do. I would like to create itemized lists from a dataframe and reconfigure the data like so:
A B A B C D
0 1 aa 0 1 aa
1 2 bb 1 2 bb
2 3 bb 2 3 bb aa
3 3 aa --\ 3 4 aa bb dd
4 4 aa --/ 4 5 cc
5 4 bb
6 4 dd
7 5 cc
I've experimented with grouping, stacking, unstacking, etc. but nothing that I've attempted has produced the desired result. If it's not obvious, I'm very new to python and a solution would be great but an understanding of the process I need to follow would be perfect.
Thanks in advance
Using pandas you can query all results e.g. where A=4.
A crude but working method would be to iterate through the various index values and gather all 'like' results into a numpy array and convert this into a new dataframe.
Pseudo code to demonstrate my example:
(will need rewriting to actually work)
l= [0]*df['A'].max()
for item in xrange(df['A'].max() ):
l[item] = df.loc[df['A'].isin(item)]
df = pd.DataFrame(l)
# or something of the sort
I hope that helps.
Update from comments:
animal_list=[]
for animal in ['cat','dog'...]:
newdf=df[[x.is('%s'%animal) for x in df['A']]]
body=[animal]
for item in newdf['B']
body.append(item)
animal_list.append(body)
df=pandas.DataFrame(animal_list)
A quick and dirty method that will work with strings. Customize the column naming as per needs.
data = {'A': [1, 2, 3, 3, 4, 4, 4, 5],
'B': ['aa', 'bb', 'bb', 'aa', 'aa', 'bb', 'dd', 'cc']}
df = pd.DataFrame(data)
maxlen = df.A.value_counts().values[0] # this helps with creating
# lists of same size
newdata = {}
for n, gdf in df.groupby('A'):
newdata[n]= list(gdf.B.values) + [''] * (maxlen - len(gdf.B))
# recreate DF with Col 'A' as index; experiment with other orientations
newdf = pd.DataFrame.from_dict(newdict, orient='index')
# customize this section
newdf.columns = list('BCD')
newdf['A'] = newdf.index
newdf.index = range(len(newdf))
newdf = newdf.reindex_axis(list('ABCD'), axis=1) # to set the desired order
print newdf
The result is:
A B C D
0 1 aa
1 2 bb
2 3 bb aa
3 4 aa bb dd
4 5 cc

Categories

Resources