I have a list of time series price data in CSV format that is read as follows:
asxList = ['ANZ', 'NAB', 'WBC']
for asxCode in asxList:
ohlcData = pd.DataFrame.from_csv(asxCode+'.CSV', header=0)
Example output:
How do I assemble all the ohlcData in particular order, firstly by DateTime index, and secondly by the asxList ['ANZ', 'NAB', 'WBC'] index, then followed by the data columns?
Create a list of dataframes, add a code column to each dataframe:
dfs = []
for asxCode in asxList:
df = pd.DataFrame.from_csv(asxCode+'.CSV', header=0)
df['code'] = asxCode
dfs.append(df)
Concatenate the dataframes, add the code column to the index:
pd.concat(dfs).reset_index().set_index(['index', 'code'])
Almost same with Dyz, just using keys from concat
asxList = ['ANZ', 'NAB', 'WBC']
l=[]
for asxCode in asxList:
l.append(pd.DataFrame.from_csv(asxCode+'.CSV', header=0))
pd.concat(l,keys=asxList)
Related
giving a unique code by seeing the first column string and the second columns string and whenever the first column string change it starts from 1
Example
i use this code
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df[1] for df in dfs]
df = pd.concat(dfs)
but I got the flowing error
Your code can be updated in the following way:
import pandas as pd
# set data
data = {"colummen1": ["Kenbele1", "Kenbele1", "Kenbele1", "Kenbele1", "Kenbele2", "Kenbele2", "Kenbele2", "Kenbele2"],
"colummun2": ["Commutity1", "Commutity2", "Commutity3", "Commutity4", "Commutity1", "Commutity2", "Commutity3", "Commutity4"]}
# create dataframe
df = pd.DataFrame(data)
dfs = df.groupby('colummen1')
dfs_updated = []
for _, df in dfs:
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()+1
dfs_updated.append(df)
df_new = pd.concat(dfs_updated)
df_new
Returns
It is not very clear what you expect, but when you write df[1] for df in dfs, then your df is a key (for example Kebele 1) and df[1] is a character (for example e - second character of the string).
That is why you get this error, because your array dfs is constructed out of 2 characters ["e", "e"]. Therefore you can not concatenate it.
I think with df[1] you meant the data frame, that is associated with the key, if so, then the code should look like this:
dfs = dict(tuple(df.groupby('colummen1')))
for _, df in dfs.items():
df['id'] = df.groupby(['colummen1','colummun2']).ngroup()
dfs = [df for _, df in dfs.items()]
df = pd.concat(dfs)
frame = pd.DataFrame()
for i in range (1995,2020):
file_name = f"{i}"
df = pd.read_csv(BytesIO(uploaded["%s.csv"%file_name]))
df = pd.DataFrame(data, columns= ['DATE','ARANGALI'])
frame.append(df)
print(frame)
I tried to define a function to append all the the data I have into one dataframe. The output appears to be an empty dataframe.
Empty DataFrame;
Columns: [];
Index: [].
The code works if consider the variables frame and df as lists and appends up to a large list. But I want a dataframe with all the data under the same column heads.
What am I doing wrong?
Append method returns a new DataFrame. Docs.
frame = pd.DataFrame()
for i in range (1995,2020):
file_name = f"{i}"
df = pd.read_csv(BytesIO(uploaded["%s.csv"%file_name]))
df = pd.DataFrame(data, columns= ['DATE','ARANGALI'])
frame = frame.append(df)
print(frame)
How do I remove the Empty DataFrame, columns and index line?
Here is my code:
df = pd.read_csv(filename)
print(df)
if df.empty:
print("There is no data availale.")
Here is the results:
Empty DataFrame
Columns: [Date, Time, Name]
Index: []
There is no data availale.
I want the results to be just:
There is no data availale.
Use len(df):
df = pd.read_csv(filename)
if len(df)<1:
print("There is no data availale.")
else:
print(df)
I am trying to get data from a list of dfs into one df.
I am using the following code
main_df = pd.DataFrame(columns=data.columns[-len_data+1:], index = is_refs)
for index, dict in enumerate(dict_lists):
df = pd.DataFrame(dict_lists[index])
df = df.reindex(is_refs)
main_df = main_df.append(df[df.columns])
The problem is that it returns the following DF. Clearly I don't want repeated items in the index and just want the financial data to fit into my rows. Any idea how to do this?
df_col = df[df.columns]
print(df_col)
main_df = main_df.join(df_col)
AttributeError: 'NoneType' object has no attribute 'transpose'
I have been trying to extract cells as dictionary(from pandas dataframe) and trying to join with existing data
for example , I have csv file which contains two columns id,device_type.each cell in device_type column contains dictionary data. i have trying to split and add with original data.
And trying to do something like below.
import json
import pandas
df = pandas.read_csv('D:\\1. Work\\csv.csv',header=0)
sf = df.head(12)
sf['visitor_home_cbgs'].fillna("{}", inplace = True).transpose()
-- csv file sample
ID,device_type
3c30ee03047b478,{"060379800281":11,"061110053031":5,"060372062002":5}
f5d639a64a88496099,{}
-- looks for output like below
id,device_type,ttype,tvalue
3c30ee03047b478,{"060379800281":11,"061110053031":5,"060372062002":5},"060379800281",11
3c30ee03047b478,{"060379800281":11,"061110053031":5,"060372062002":5},"061110053031",5
3c30ee03047b478,{"060379800281":11,"061110053031":5,"060372062002":5},"060372062002",5
f5d639a64a88496099,{},NIL,NIL
avoid inplace=True
sf['visitor_home_cbgs'].fillna("{}").transpose()
when you give inplace=True, it converts the same dataframe and returns null.
If you want to use inplace=True, then do like below
sf['visitor_home_cbgs'].fillna("{}", inplace=True)
sf.transpose()
To create rows from column values
One solution it to iterate through dataframe rows and create new dataframe with desired columns and values.
import json
def extract_JSON(row):
df2 = pd.DataFrame(columns=['ID', 'device_type', 'ttype', 'tvalue'])
device_type = row['device_type']
dict = json.loads(device_type)
for key in dict:
df2.loc[len(df2)] = [row['ID'], row['device_type'], key, dict[key]]
if df2.empty:
df2.loc[0] = [row['ID'], row['device_type'], '', '']
return df2
df3 = pd.DataFrame(columns=['ID', 'device_type', 'ttype', 'tvalue'])
for _, row in df.iterrows():
df3 = df3.append(extract_JSON(row))
df3