I have 4 pieces of output as shown below. They come from 4 separate functions. I would like to store them in one single dataframe.
print("Number of Genes-", number_of_genes)
print(donor.ix[[0],0])
print(test.sum(axis=1).argmax(), test.sum(axis=1).max())
I tried something like this but it doesn't work well.
print(number_of_genes, donor.ix[[0],0],est.sum(axis=1).argmax(), test.sum(axis=1).max())
Appending it to a dataframe doesn't seem to work. Thanks for your help.
NB, Each of these are for the same input.
Start by creating an empty DataFrame with your desired columns.
Assign variables to your data.
One way to append the row of new data is via a dictionary.
df = pd.DataFrame(columns=['n', 'd', 'max1', 'max2'])
n = number_of_genes
d = donor.ix[[0],0]
max1 = test.sum(axis=1).argmax()
max2 = test.sum(axis=1).max()
df.append({'n': n, 'd': d, 'max1': max1, 'max2': max2}, ignore_index=True)
Related
I have 4 different dataframes containing time series data that all have the same structure.
My goal is to take each individual dataframe and pass it through a function I have defined that will group them by datestamp, sum the columns and return a new dataframe with the columns I want. So in total I want 4 new dataframes that have only the data I want.
I just looked through this post:
Loop through different dataframes and perform actions using a function
but applying this did not change my results.
Here is my code:
I am putting the dataframes in a list so I can iterate through them
dfs = [vds, vds2, vds3, vds4]
This is my function I want to pass each dataframe through:
def VDS_pre(df):
df = df.groupby(['datestamp','timestamp']).sum().reset_index()
df = df.rename(columns={'datestamp': 'Date','timestamp':'Time','det_vol': 'VolumeVDS'})
df = df[['Date','Time','VolumeVDS']]
return df
This is the loop I made to iterate through my dataframe list and pass each one through my function:
for df in dfs:
df = VDS_pre(df)
However once I go through my loop and go to print out the dataframes, they have not been modified and look like they initially did. Thanks for the help!
However once I go through my loop and go to print out the dataframes, they have not been modified and look like they initially did.
Yes, this is actually the case. The reason why they have not been modified is:
Assignment to an item in a for item in lst: loop does not have any effect on both the lst and the identifier/variables from which the lst items got their values as it is demonstrated with following code:
v1=1; v2=2; v3=3
lst = [v1,v2,v3]
for item in lst:
item = 0
print(lst, v1, v2, v3) # gives: [1, 2, 3] 1 2 3
To achieve the result you expect to obtain you can use a list comprehension and the list unpacking feature of Python:
vds,vds2,vds3,vds4=[VDS_pre(df) for df in [vds,vds2,vds3,vds4]]
or following code which is using a list of strings with the identifier/variable names of the dataframes:
sdfs = ['vds', 'vds2', 'vds3', 'vds4']
for sdf in sdfs:
exec(str(f'{sdf} = VDS_pre(eval(sdf))'))
Now printing vds, vds2, vds3 and vds4 will output the modified dataframes.
Pandas frame operations return new copy of data. Your snippet store the result in df variable which is not stored or updated to your initial list. This is why you don't have any stored result after execution.
If you don't need to keep original frames, you may simply overwrite them:
for i, df in enumerate(dfs):
dfs[i] = VDS_pre(df)
If not just use a second list and append result to it.
l = []
for df in dfs:
df2 = VDS_pre(df)
l.append(df2)
Or even better use list comprehension to rewrite this snippet into a single line of code.
Now you are able to store the result of your processing.
Additionally if your frames have the same structure and can be merged as a single frame, you may consider to first concat them and then apply your function on it. That would be totally pandas.
Essentially, I would like to add values to certain columns in an empty DataFrame with defined columns, but when I run the code, I get.
Empty DataFrame
Columns: [AP, AV]
Index: []
Code:
df = pd.DataFrame(columns=['AP', 'AV'])
df['AP'] = propName
df['AV'] = propVal
I think this could be a simple fix, but I've tried some different solutions to no avail. I've tried adding the values to an existing dataframe I have, and it works when I do that, but would like to have these values in a new, separate structure.
Thank you,
It's the lack of an index.
If you create an empty dataframe with an index.
df = pd.DataFrame(index = [5])
Output
Empty DataFrame
Columns: []
Index: [5]
Then when you set the value, it will be set.
df[5] = 12345
Output
5
5 12345
You can also create an empty dataframe. And when setting a column with a value, pass the value in the list. The index will be automatically set.
df = pd.DataFrame()
df['qwe'] = [777]
Output
qwe
0 777
Assign propName and propValue to dictionary:
dict = {}
dict[propName] = propValue
Then, push to empty DataFrame, df:
df = pd.DataFrame()
df['AP'] = dict.keys()
df['AV'] = dict.values()
Probably not the most elegant solution, but works great for me.
I want to append 3 variables to an empty dataframe after each loop.
dfvol = dfvol.append([stock,mean,median],columns=['Stock','Mean','Median'])
Columns in Dataframe should be ['Stock','Median','Mean']
Result should be:
How can I solve the problem, because something with the append code is wrong.
You're trying to use a syntax for creating a new dataframe to append to it, which is not going to work.
Here is one way you can try to do what you want
df.loc[len(df)] = [stock,mean,median]
The better approach will be creating list of entries and when your loop is done to create the dataframe using that list (instead of appending to df with every iteration)
Like this:
some_list = []
for a in b:
some_list.append([stock,mean,median])
df = pd.DataFrame(some_list, columns = ['Stock','Mean','Median'])
The append method doesn't work like that. You would only use the columns parameter if you were creating a DataFrame object. You either want to create a second temporary DataFrame and append it to the main DataFrame like this:
df_tmp = pd.DataFrame([[stock,mean,median]], columns=['Stock','Mean','Median'])
dfvol = dfvol.append(df_tmp)
...or you can use a dictionary like this:
dfvol = dfvol.append({'Stock':stock,'Mean':mean,'Median':median}, ignore_index=True)
Like this:
In [256]: dfvol = pd.DataFrame()
In [257]: stock = ['AAPL', 'FB']
In [258]: mean = [600.356, 700.245]
In [259]: median = [281.788, 344.55]
In [265]: dfvol = dfvol.append(pd.DataFrame(zip(stock, mean, median), columns=['Stock','Mean','Median']))
In [265]: dfvol
Out[265]:
Stock Mean Median
0 AAPL 600.356 281.788
1 FB 700.245 344.550
check the append notation here. There are multiple way to do it.
dfvol = dfvol.append(pd.DataFrame([[Stock,Mean,Median]],columns=['Stock','Mean','Median']))
I would like to use python 3.4 to compare columns.
I have two columns a and b
If A=B print A in column C.
If B > A, print all numbers between A and B including A and B in column C.
The subsequent compared rows would print in column C after the results of the previous test.
Any help is appreciated. My question wording must be off as I'm sure this has been done before, but I just can't find it here or elsewhere.
as brittenb noticed, try apply function in pandas.
import pandas as pd
df = pd.read_excel("somefile.xlsx")
df['c'] = df.apply(lambda r: list(range(r['a'], r['b']+1)), axis=1)
Update
If you want to add rows, writing in pandas may get complicated. If you don't care much about speed and memory, classic python style seems easier to understand.
ary = []
for i,r in df.iterrows():
for j in range(r['a'], r['b']+1):
ary.append( (r['a'], r['b'], j) )
df = pd.DataFrame(ary, columns = ['a','b','c'])
I am a newbie to python. I am trying iterate over rows of individual columns of a dataframe in python. I am trying to create an adjacency list using the first two columns of the dataframe taken from csv data (which has 3 columns).
The following is the code to iterate over the dataframe and create a dictionary for adjacency list:
df1 = pd.read_csv('person_knows_person_0_0_sample.csv', sep=',', index_col=False, skiprows=1)
src_list = list(df1.iloc[:, 0:1])
tgt_list = list(df1.iloc[:, 1:2])
adj_list = {}
for src in src_list:
for tgt in tgt_list:
adj_list[src] = tgt
print(src_list)
print(tgt_list)
print(adj_list)
and the following is the output I am getting:
['933']
['4139']
{'933': '4139'}
I see that I am not getting the entire list when I use the list() constructor.
Hence I am not able to loop over the entire data.
Could anyone tell me where I am going wrong?
To summarize, Here is the input data:
A,B,C
933,4139,20100313073721718
933,6597069777240,20100920094243187
933,10995116284808,20110102064341955
933,32985348833579,20120907011130195
933,32985348838375,20120717080449463
1129,1242,20100202163844119
1129,2199023262543,20100331220757321
1129,6597069771886,20100724111548162
1129,6597069776731,20100804033836982
the output that I am expecting:
933: [4139,6597069777240, 10995116284808, 32985348833579, 32985348838375]
1129: [1242, 2199023262543, 6597069771886, 6597069776731]
Use groupby and create Series of lists and then to_dict:
#selecting by columns names
d = df1.groupby('A')['B'].apply(list).to_dict()
#seelcting columns by positions
d = df1.iloc[:, 1].groupby(df1.iloc[:, 0]).apply(list).to_dict()
print (d)
{933: [4139, 6597069777240, 10995116284808, 32985348833579, 32985348838375],
1129: [1242, 2199023262543, 6597069771886, 6597069776731]}