Adding whole lines of a dataframe via a for loop

Adding whole lines of a dataframe via a for loop - python

I had code as follows to collect interesting rows into a new dataframe:
df = df1.iloc[[66,113,231,51,152,122,185,179,114,169,97][:]]
but I want to use a for loop to collect the data. I have read that I need to combine the data as a list and then create the dataframe, but all the examples I have seen are for numbers and I can't create the same for each line of a dataframe. At the moment I have the following:
data = ['A','B','C','D','E']
for n in range(10):
data.append(dict(zip(df1.iloc[n, 4])))
df = pd.Dataframe(data)
(P.S. I have 4 in the code because I want the data to be selected via column E and the dataframe is already sorted so I am just looking for the first 10 rows)
Thanks in advance for your help.

Related

How to copy the largest values from me older dataframe to new using pandas?

my sort_drop2 dataframe is shown in the picture below
https://imgur.com/a/mdZZa7n
new_dataframe = sort_drop2.filter(['City','Est','Nti']
sort_drop2.filer I am trying to copy specific details from the old dataset into a new dataframe.
I want to only take the top 5 values from the sort_drop2 dataframe
I have sorted the sort_drop2 by nti from largest to smaller sort_values(by='Nti', ascending=False)
How do I copy only the top 5 values from the old dataframe to new?

You can get the top n rows of dataframe df with df.head(n). So in your case, take your sorted and filtered dataframe do do:
new_dataframe.head(5)
The default for n is 5, so you could also leave the parameter blank.
That will return the dataframe. If you want to save something new as it, you would do:
df_top_5 = new_dataframe.head(5)

Use .head(5) on the old DataFrame sort_drop2 and assign the result to your new DataFrame like this:
new_dataframe = sort_drop2.filter(['City','Est','Nti']).sort_values(by='Nti', ascending=False).head(5)
Here's my answer expanded over multiply lines which is more similar to your code you described so perhaps the following answer will be easier to compare to your existing code:
new_dataframe = sort_drop2.filter(['City','Est','Nti'])
new_dataframe = new_dataframe.sort_values(by='Nti', ascending=False)
new_dataframe = new_dataframe.head(5)

Pandas adding multiple null data frames

I want to create about 10 data frames with same number of rows and columns that I want to specify.
Currently I am creating a df with the specific rows and then using pd.concat to add column to the data frame. I am having to write 10 lines of code separately for each data frame. Is there a way to do it at one go together for all data frames. Say, all the data frames have 15 rows and 50 columns.
Also I don't want to use a loop. All values in the data frame are NaN and I want to perform different function on each data frame so editing one data frame shouldn't change the values of the other data frames.

You can simply create a numpy array of np.nan, and then create a dataframe:
df = pd.DataFrame(np.zeros([15, 50])*np.nan)
For creating 10 dataframes, you can just run this in a loop and add it to an array.
dfs = []
for i in range(10):
dfs.append(pd.DataFrame(np.zeros([15, 50])*np.nan))
Then you can index into dfs and change any value accordingly. It won't impact any other dataframe.

You could do something like this:
index_list = range(10)
column_list = ['a','b','c','d']
for i in range(5):
locals()["df_" + str(i)] = pd.DataFrame(index=index_list, columns=column_list)
This will create 5 different dataframes (df_1 to df_5) each with 10 rows and 4 columns named a,b,c,d having all values as Nan

import pandas as pd
row_num = 15
col_num = 50
temp=[]
for col_name in range(0, col_num):
temp.append(col_name)
Creation of Dataframe
df = pd.DataFrame(index=range(0,row_num), columns=temp)
this code creates a single data frame in pandas with specified row and column numbers. But without a loop or some form of iteration, multiple lines of same code must be written.
Note: this is a pure pandas implementation. github gist can be found here.

How to subtract all the date columns from each other (in permutation) and store them in a new pandas DataFrame?

I was working on Jupyter and arrived at a situation where I had to take differences of each column from every other column taken in permutation and then store them in a separate DataFrame. I tried using nested loops but got stuck while assigning the values to the DataFrame.
n=0
for i in range(len(list(df.columns))-1):
for j in range(i+1, len(list(df.columns))-1):
df1[n] = pd.DataFrame(abs((df.iloc[:,i] - df.iloc[:,j]).dt.days))
n=n+1
df1
Also, I would like to have column headers in this format: D1-D2, D1-D3, etc. The difference in dates has to be a positive integer. I would really appreciate if anyone could help me with this code. Thanks!
A snippet of the DataFrame

import itertools
import pandas as pd
# create a sample dataframe
df = pd.DataFrame(data={"co1":[1,2,3,4], "co22":[4,3,2,1], "co3":[2,3,2,4]})
# iterate over all permutations of size 2 and write to dictionary
newcols = {}
for col1, col2 in itertools.permutations(df.columns, 2):
newcols["-".join([col1, col2])] = df[col1]-df[col2]
# create dataframe from dict
newdf = pd.DataFrame(newcols)

Creating dataframes from others

I would need to filter multiple data frames and create new data frames based on them.
The multiple data frames are called as df[str(i)], i.e. df["0"], df["1"], and so on.
I would need, after filtering the rows, to create new dataframes. I am trying as follows:
n=5
for i in range(0, n):
filtered = df[str(i)]
but it returns at the end only the latest dataframe created, i.e. n=5.
I have tried also with filtered[str(i)] but it gives me the error "n".
What I would like to have is:
filtered["0"] for df["0"]
filtered["1"] for df["1"]
...
I would appreciate your help to figure it out. Thanks

You could append your filtered dataframes to a list, then concatenate into a new dataframe.
import pandas as pd
n=5
dfs = []
for i in range(n):
filtered = df[str(i)]
dfs.append(filtered)
df_filtered = pd.concat(dfs)

How to add values from one dataframe into another ignoring the row indices

I have a pandas dataframe called trg_data to collect data that I am producing in batches. Each batch is produced by a sub-routine as a smaller dataframe df with the same number of columns but less rows and I want to insert the values from df into trg_data at a new row position each time.
However, when I use the following statement df is always inserted at the top. (i.e. rows 0 to len(df)).
trg_data.iloc[trg_pt:(trg_pt + len(df))] = df
I'm guessing but I think the reason may be that even though the slice indicates the desired rows, it is using the index in df to decide where to put the data.
As a test I found that I can insert an ndarray at the right position no problem:
trg_data.iloc[trg_pt:(trg_pt + len(df))] = np.ones(df.shape)
How do I get it to ignore the index in df and insert the data where I want it? Or is there an entirely different way of achieving this? At the end of the day I just want to create the dataframe trg_data and then save to file at the end. I went down this route because there didn't seem to be a way of easily appending to an existing dataframe.
I've been working at this for over an hour and I can't figure out what to google to find the right answer!

I think I may have the answer (I thought I had already tried this but apparently not):
trg_data.iloc[trg_pt:(trg_pt + len(df))] = df.values
Still, I'm open to other suggestions. There's probably a better way to add data to a dataframe.

The way I would do this is save all the intermediate dataframes in an array, and then concatenate them together
import pandas as pd
dfs = []
# get all the intermediate dataframes somehow
# combine into one dataframe
trg_data = pd.concatenate(dfs)

Both
trg_data = pd.concat([df1, df2, ... dfn], ignore_index=True)
and
trg_data = pd.DataFrame()
for ...: #loop that generates df
trg_data = trg_data.append(df, ignore_index=True) #you can reuse the name df
shoud work for you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding whole lines of a dataframe via a for loop - python

Related

How to copy the largest values from me older dataframe to new using pandas?

Pandas adding multiple null data frames

How to subtract all the date columns from each other (in permutation) and store them in a new pandas DataFrame?

Creating dataframes from others

How to add values from one dataframe into another ignoring the row indices

Categories

Resources