I am trying to get data from a list of dfs into one df.
I am using the following code
main_df = pd.DataFrame(columns=data.columns[-len_data+1:], index = is_refs)
for index, dict in enumerate(dict_lists):
df = pd.DataFrame(dict_lists[index])
df = df.reindex(is_refs)
main_df = main_df.append(df[df.columns])
The problem is that it returns the following DF. Clearly I don't want repeated items in the index and just want the financial data to fit into my rows. Any idea how to do this?
df_col = df[df.columns]
print(df_col)
main_df = main_df.join(df_col)
Related
I have a pandas dataframe with the following data: (in csv)
#list1
poke_id,symbol
0,BTC
1,ETB
2,USDC
#list2
5,SOL
6,XRP
I am able to concatenate them into one dataframe using the following code:
df = pd.concat([df1, df2], ignore_index = True)
df = df.reset_index(drop = True)
df['poke_id'] = df.index
df = df[['poke_id','symbol']]
which gives me the output: (in csv)
poke_id,symbol
0,BTC
1,ETB
2,USDC
3,SOL
4,XRP
Is there any other way to do the same. I think calling the whole data frame of ~4000 entries just to add ~100 more will be a little pointless and cumbersome. How can I make it in such a way that it picks list 1 (or dataframe 1) and picks the highest poke_id; and just does i + 1 to the later entries in list 2.
Your solution is good, is possible simplify:
df = pd.concat([df1, df2], ignore_index = True).rename_axis('poke_id').reset_index()
use indexes to get what data you want from the dataframe, although this is not effective if you want large amounts of data from the dataframe, this method allows you to take specific amounts of data from the dataframe
giving a unique code by seeing the first column string and the second columns string and whenever the first column string change it starts from 1
Example
i use this code
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df[1] for df in dfs]
df = pd.concat(dfs)
but I got the flowing error
Your code can be updated in the following way:
import pandas as pd
# set data
data = {"colummen1": ["Kenbele1", "Kenbele1", "Kenbele1", "Kenbele1", "Kenbele2", "Kenbele2", "Kenbele2", "Kenbele2"],
"colummun2": ["Commutity1", "Commutity2", "Commutity3", "Commutity4", "Commutity1", "Commutity2", "Commutity3", "Commutity4"]}
# create dataframe
df = pd.DataFrame(data)
dfs = df.groupby('colummen1')
dfs_updated = []
for _, df in dfs:
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()+1
dfs_updated.append(df)
df_new = pd.concat(dfs_updated)
df_new
Returns
It is not very clear what you expect, but when you write df[1] for df in dfs, then your df is a key (for example Kebele 1) and df[1] is a character (for example e - second character of the string).
That is why you get this error, because your array dfs is constructed out of 2 characters ["e", "e"]. Therefore you can not concatenate it.
I think with df[1] you meant the data frame, that is associated with the key, if so, then the code should look like this:
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df for _, df in dfs.items()]
df = pd.concat(dfs)
I am looking into creating a big dataframe (pandas) from several individual frames. The data is organized in MF4-Files and the number of source files varies for each cycle. The goal is to have this process automated.
Creation of Dataframes:
df = (MDF('File1.mf4')).to_dataframe(channels)
df1 = (MDF('File2.mf4')).to_dataframe(channels)
df2 = (MDF('File3.mf4')).to_dataframe(channels)
These Dataframes are then merged:
df = pd.concat([df, df1, df2], axis=0)
How can I do this without dynamically creating variables for df, df1 etc.? Or is there no other way?
I have all filepathes in an Array of the form:
Filepath = ['File1.mf4', 'File2.mf4','File3.mf4',]
Now I am thinking of looping through it and create dynamically the data frames df,df1.df1000.... Any advice here?
Edit here is the full code:
df = (MDF('File1.mf4')).to_dataframe(channels)
df1 = (MDF('File2.mf4')).to_dataframe(channels)
df2 = (MDF('File3.mf4')).to_dataframe(channels)
#The Data has some offset:
x = df.index.max()
df1.index += x
x = df1.index.max()
df2.index += x
#With correct index now the data can be merged
df = pd.concat([df, df1, df2], axis=0)
The way I'm interpreting your question is that you have a predefined list you want. So just:
l = []
for f in [ list ... of ... files ]:
df = load_file(f) # however you load it
l.append(df)
big_df = pd.concat(l)
del l, df, f # if you want to clean it up
You therefore don't need to manually specify variable names for your data sub-sections. If you also want to do checks or column renaming between the various files, you can also just put that into the for-loop (or alternatively, if you want to simplify to a list comprehension, into the load_file function body).
Try this:
df_list = [(MDF(file)).to_dataframe(channels) for file in Filepath]
df = pd.concat(df_list)
I want to concatenate two data frames of the same length, by adding a column to the first one (df).
But because certain df rows are being filtered, it seems the index isn't matching.
import pandas as pd
pd.read_csv(io.StringIO(uploaded['customer.csv'].decode('utf-8')), sep=";")
df["Margin"] = df["Sales"]-df["Cost"]
df = df.loc[df["Margin"]>-100000]
df = df.loc[df["Sales"]> 1000]
df.reindex()
df
This returns:
So this operation:
customerCluster = pd.concat([df, clusters], axis = 1, ignore_index= True)
print(customerCluster)
Is returning:
So, I've tried reindex and the argument ignore_index = True as you can see in above code snippet.
Thanks for all the answers. If anyone encounters the same problem, the solution I found was this:
customerID = df["CustomerID"]
customerID = customerID.reset_index(drop=True)
df = df.reset_index(drop=True)
So, basically, the indexes of both data frames are now matching, thus:
customerCluster = pd.concat((customerID, clusters), axis = 1)
This will concatenate correctly the two data frames.
I have a list of time series price data in CSV format that is read as follows:
asxList = ['ANZ', 'NAB', 'WBC']
for asxCode in asxList:
ohlcData = pd.DataFrame.from_csv(asxCode+'.CSV', header=0)
Example output:
How do I assemble all the ohlcData in particular order, firstly by DateTime index, and secondly by the asxList ['ANZ', 'NAB', 'WBC'] index, then followed by the data columns?
Create a list of dataframes, add a code column to each dataframe:
dfs = []
for asxCode in asxList:
df = pd.DataFrame.from_csv(asxCode+'.CSV', header=0)
df['code'] = asxCode
dfs.append(df)
Concatenate the dataframes, add the code column to the index:
pd.concat(dfs).reset_index().set_index(['index', 'code'])
Almost same with Dyz, just using keys from concat
asxList = ['ANZ', 'NAB', 'WBC']
l=[]
for asxCode in asxList:
l.append(pd.DataFrame.from_csv(asxCode+'.CSV', header=0))
pd.concat(l,keys=asxList)