Problem with appending values into a empty dataframe

Problem with appending values into a empty dataframe - python

frame = pd.DataFrame()
for i in range (1995,2020):
file_name = f"{i}"
df = pd.read_csv(BytesIO(uploaded["%s.csv"%file_name]))
df = pd.DataFrame(data, columns= ['DATE','ARANGALI'])
frame.append(df)
print(frame)
I tried to define a function to append all the the data I have into one dataframe. The output appears to be an empty dataframe.
Empty DataFrame;
Columns: [];
Index: [].
The code works if consider the variables frame and df as lists and appends up to a large list. But I want a dataframe with all the data under the same column heads.
What am I doing wrong?

Append method returns a new DataFrame. Docs.
frame = pd.DataFrame()
for i in range (1995,2020):
file_name = f"{i}"
df = pd.read_csv(BytesIO(uploaded["%s.csv"%file_name]))
df = pd.DataFrame(data, columns= ['DATE','ARANGALI'])
frame = frame.append(df)
print(frame)

Related

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid I got this error

giving a unique code by seeing the first column string and the second columns string and whenever the first column string change it starts from 1
Example
i use this code
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df[1] for df in dfs]
df = pd.concat(dfs)
but I got the flowing error

Your code can be updated in the following way:
import pandas as pd
# set data
data = {"colummen1": ["Kenbele1", "Kenbele1", "Kenbele1", "Kenbele1", "Kenbele2", "Kenbele2", "Kenbele2", "Kenbele2"],
"colummun2": ["Commutity1", "Commutity2", "Commutity3", "Commutity4", "Commutity1", "Commutity2", "Commutity3", "Commutity4"]}
# create dataframe
df = pd.DataFrame(data)
dfs = df.groupby('colummen1')
dfs_updated = []
for _, df in dfs:
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()+1
dfs_updated.append(df)
df_new = pd.concat(dfs_updated)
df_new
Returns

It is not very clear what you expect, but when you write df[1] for df in dfs, then your df is a key (for example Kebele 1) and df[1] is a character (for example e - second character of the string).
That is why you get this error, because your array dfs is constructed out of 2 characters ["e", "e"]. Therefore you can not concatenate it.
I think with df[1] you meant the data frame, that is associated with the key, if so, then the code should look like this:
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df for _, df in dfs.items()]
df = pd.concat(dfs)

How to avoid need for creating variable dynamically?

I am looking into creating a big dataframe (pandas) from several individual frames. The data is organized in MF4-Files and the number of source files varies for each cycle. The goal is to have this process automated.
Creation of Dataframes:
df = (MDF('File1.mf4')).to_dataframe(channels)
df1 = (MDF('File2.mf4')).to_dataframe(channels)
df2 = (MDF('File3.mf4')).to_dataframe(channels)
These Dataframes are then merged:
df = pd.concat([df, df1, df2], axis=0)
How can I do this without dynamically creating variables for df, df1 etc.? Or is there no other way?
I have all filepathes in an Array of the form:
Filepath = ['File1.mf4', 'File2.mf4','File3.mf4',]
Now I am thinking of looping through it and create dynamically the data frames df,df1.df1000.... Any advice here?
Edit here is the full code:
df = (MDF('File1.mf4')).to_dataframe(channels)
df1 = (MDF('File2.mf4')).to_dataframe(channels)
df2 = (MDF('File3.mf4')).to_dataframe(channels)
#The Data has some offset:
x = df.index.max()
df1.index += x
x = df1.index.max()
df2.index += x
#With correct index now the data can be merged
df = pd.concat([df, df1, df2], axis=0)

The way I'm interpreting your question is that you have a predefined list you want. So just:
l = []
for f in [ list ... of ... files ]:
df = load_file(f) # however you load it
l.append(df)
big_df = pd.concat(l)
del l, df, f # if you want to clean it up
You therefore don't need to manually specify variable names for your data sub-sections. If you also want to do checks or column renaming between the various files, you can also just put that into the for-loop (or alternatively, if you want to simplify to a list comprehension, into the load_file function body).

Try this:
df_list = [(MDF(file)).to_dataframe(channels) for file in Filepath]
df = pd.concat(df_list)

How to remove column and index rows from dataframe results

How do I remove the Empty DataFrame, columns and index line?
Here is my code:
df = pd.read_csv(filename)
print(df)
if df.empty:
print("There is no data availale.")
Here is the results:
Empty DataFrame
Columns: [Date, Time, Name]
Index: []
There is no data availale.
I want the results to be just:
There is no data availale.

Use len(df):
df = pd.read_csv(filename)
if len(df)<1:
print("There is no data availale.")
else:
print(df)

getting data from DFs to put into a main DF

I am trying to get data from a list of dfs into one df.
I am using the following code
main_df = pd.DataFrame(columns=data.columns[-len_data+1:], index = is_refs)
for index, dict in enumerate(dict_lists):
df = pd.DataFrame(dict_lists[index])
df = df.reindex(is_refs)
main_df = main_df.append(df[df.columns])
The problem is that it returns the following DF. Clearly I don't want repeated items in the index and just want the financial data to fit into my rows. Any idea how to do this?

df_col = df[df.columns]
print(df_col)
main_df = main_df.join(df_col)

Set up MultiIndex DataFrame from multiple CSV files in DateTime series

I have a list of time series price data in CSV format that is read as follows:
asxList = ['ANZ', 'NAB', 'WBC']
for asxCode in asxList:
ohlcData = pd.DataFrame.from_csv(asxCode+'.CSV', header=0)
Example output:
How do I assemble all the ohlcData in particular order, firstly by DateTime index, and secondly by the asxList ['ANZ', 'NAB', 'WBC'] index, then followed by the data columns?

Create a list of dataframes, add a code column to each dataframe:
dfs = []
for asxCode in asxList:
df = pd.DataFrame.from_csv(asxCode+'.CSV', header=0)
df['code'] = asxCode
dfs.append(df)
Concatenate the dataframes, add the code column to the index:
pd.concat(dfs).reset_index().set_index(['index', 'code'])

Almost same with Dyz, just using keys from concat
asxList = ['ANZ', 'NAB', 'WBC']
l=[]
for asxCode in asxList:
l.append(pd.DataFrame.from_csv(asxCode+'.CSV', header=0))
pd.concat(l,keys=asxList)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problem with appending values into a empty dataframe - python

Append method returns a new DataFrame. Docs. frame = pd.DataFrame() for i in range (1995,2020): file_name = f"{i}" df = pd.read_csv(BytesIO(uploaded["%s.csv"%file_name])) df = pd.DataFrame(data, columns= ['DATE','ARANGALI']) frame = frame.append(df) print(frame)

Related

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid I got this error

How to avoid need for creating variable dynamically?

How to remove column and index rows from dataframe results

getting data from DFs to put into a main DF

Set up MultiIndex DataFrame from multiple CSV files in DateTime series

Categories

Resources