getting data from DFs to put into a main DF

getting data from DFs to put into a main DF - python

I am trying to get data from a list of dfs into one df.
I am using the following code
main_df = pd.DataFrame(columns=data.columns[-len_data+1:], index = is_refs)
for index, dict in enumerate(dict_lists):
df = pd.DataFrame(dict_lists[index])
df = df.reindex(is_refs)
main_df = main_df.append(df[df.columns])
The problem is that it returns the following DF. Clearly I don't want repeated items in the index and just want the financial data to fit into my rows. Any idea how to do this?

df_col = df[df.columns]
print(df_col)
main_df = main_df.join(df_col)

Related

Concatenate two pandas dataframe and follow a sequence of uid

I have a pandas dataframe with the following data: (in csv)
#list1
poke_id,symbol
0,BTC
1,ETB
2,USDC
#list2
5,SOL
6,XRP
I am able to concatenate them into one dataframe using the following code:
df = pd.concat([df1, df2], ignore_index = True)
df = df.reset_index(drop = True)
df['poke_id'] = df.index
df = df[['poke_id','symbol']]
which gives me the output: (in csv)
poke_id,symbol
0,BTC
1,ETB
2,USDC
3,SOL
4,XRP
Is there any other way to do the same. I think calling the whole data frame of ~4000 entries just to add ~100 more will be a little pointless and cumbersome. How can I make it in such a way that it picks list 1 (or dataframe 1) and picks the highest poke_id; and just does i + 1 to the later entries in list 2.

Your solution is good, is possible simplify:
df = pd.concat([df1, df2], ignore_index = True).rename_axis('poke_id').reset_index()

use indexes to get what data you want from the dataframe, although this is not effective if you want large amounts of data from the dataframe, this method allows you to take specific amounts of data from the dataframe

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid I got this error

giving a unique code by seeing the first column string and the second columns string and whenever the first column string change it starts from 1
Example
i use this code
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df[1] for df in dfs]
df = pd.concat(dfs)
but I got the flowing error

Your code can be updated in the following way:
import pandas as pd
# set data
data = {"colummen1": ["Kenbele1", "Kenbele1", "Kenbele1", "Kenbele1", "Kenbele2", "Kenbele2", "Kenbele2", "Kenbele2"],
"colummun2": ["Commutity1", "Commutity2", "Commutity3", "Commutity4", "Commutity1", "Commutity2", "Commutity3", "Commutity4"]}
# create dataframe
df = pd.DataFrame(data)
dfs = df.groupby('colummen1')
dfs_updated = []
for _, df in dfs:
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()+1
dfs_updated.append(df)
df_new = pd.concat(dfs_updated)
df_new
Returns

It is not very clear what you expect, but when you write df[1] for df in dfs, then your df is a key (for example Kebele 1) and df[1] is a character (for example e - second character of the string).
That is why you get this error, because your array dfs is constructed out of 2 characters ["e", "e"]. Therefore you can not concatenate it.
I think with df[1] you meant the data frame, that is associated with the key, if so, then the code should look like this:
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df for _, df in dfs.items()]
df = pd.concat(dfs)

How to avoid need for creating variable dynamically?

I am looking into creating a big dataframe (pandas) from several individual frames. The data is organized in MF4-Files and the number of source files varies for each cycle. The goal is to have this process automated.
Creation of Dataframes:
df = (MDF('File1.mf4')).to_dataframe(channels)
df1 = (MDF('File2.mf4')).to_dataframe(channels)
df2 = (MDF('File3.mf4')).to_dataframe(channels)
These Dataframes are then merged:
df = pd.concat([df, df1, df2], axis=0)
How can I do this without dynamically creating variables for df, df1 etc.? Or is there no other way?
I have all filepathes in an Array of the form:
Filepath = ['File1.mf4', 'File2.mf4','File3.mf4',]
Now I am thinking of looping through it and create dynamically the data frames df,df1.df1000.... Any advice here?
Edit here is the full code:
df = (MDF('File1.mf4')).to_dataframe(channels)
df1 = (MDF('File2.mf4')).to_dataframe(channels)
df2 = (MDF('File3.mf4')).to_dataframe(channels)
#The Data has some offset:
x = df.index.max()
df1.index += x
x = df1.index.max()
df2.index += x
#With correct index now the data can be merged
df = pd.concat([df, df1, df2], axis=0)

The way I'm interpreting your question is that you have a predefined list you want. So just:
l = []
for f in [ list ... of ... files ]:
df = load_file(f) # however you load it
l.append(df)
big_df = pd.concat(l)
del l, df, f # if you want to clean it up
You therefore don't need to manually specify variable names for your data sub-sections. If you also want to do checks or column renaming between the various files, you can also just put that into the for-loop (or alternatively, if you want to simplify to a list comprehension, into the load_file function body).

Try this:
df_list = [(MDF(file)).to_dataframe(channels) for file in Filepath]
df = pd.concat(df_list)

Concatenate two dataframes with different row indices

I want to concatenate two data frames of the same length, by adding a column to the first one (df).
But because certain df rows are being filtered, it seems the index isn't matching.
import pandas as pd
pd.read_csv(io.StringIO(uploaded['customer.csv'].decode('utf-8')), sep=";")
df["Margin"] = df["Sales"]-df["Cost"]
df = df.loc[df["Margin"]>-100000]
df = df.loc[df["Sales"]> 1000]
df.reindex()
df
This returns:
So this operation:
customerCluster = pd.concat([df, clusters], axis = 1, ignore_index= True)
print(customerCluster)
Is returning:
So, I've tried reindex and the argument ignore_index = True as you can see in above code snippet.

Thanks for all the answers. If anyone encounters the same problem, the solution I found was this:
customerID = df["CustomerID"]
customerID = customerID.reset_index(drop=True)
df = df.reset_index(drop=True)
So, basically, the indexes of both data frames are now matching, thus:
customerCluster = pd.concat((customerID, clusters), axis = 1)
This will concatenate correctly the two data frames.

Set up MultiIndex DataFrame from multiple CSV files in DateTime series

I have a list of time series price data in CSV format that is read as follows:
asxList = ['ANZ', 'NAB', 'WBC']
for asxCode in asxList:
ohlcData = pd.DataFrame.from_csv(asxCode+'.CSV', header=0)
Example output:
How do I assemble all the ohlcData in particular order, firstly by DateTime index, and secondly by the asxList ['ANZ', 'NAB', 'WBC'] index, then followed by the data columns?

Create a list of dataframes, add a code column to each dataframe:
dfs = []
for asxCode in asxList:
df = pd.DataFrame.from_csv(asxCode+'.CSV', header=0)
df['code'] = asxCode
dfs.append(df)
Concatenate the dataframes, add the code column to the index:
pd.concat(dfs).reset_index().set_index(['index', 'code'])

Almost same with Dyz, just using keys from concat
asxList = ['ANZ', 'NAB', 'WBC']
l=[]
for asxCode in asxList:
l.append(pd.DataFrame.from_csv(asxCode+'.CSV', header=0))
pd.concat(l,keys=asxList)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

getting data from DFs to put into a main DF - python

df_col = df[df.columns] print(df_col) main_df = main_df.join(df_col)

Related

Concatenate two pandas dataframe and follow a sequence of uid

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid I got this error

How to avoid need for creating variable dynamically?

Concatenate two dataframes with different row indices

Set up MultiIndex DataFrame from multiple CSV files in DateTime series

Categories

Resources