I have multiple sheets that are identical in column headers but not in terms of the number of rows. I want to combine the sheets to make one master sheet.
At the moment this is the code that I get, for which the output is blank and I end up with data = to that in the last sheet.
I decided to utilise a for loop iterated through data_sheetnames which is a list.
Below is the code I have utilised
combineddata = pd.DataFrame()
for club in data_sheetnames:
data = pd.read_excel(r'''C:\Users\me\Desktop\Data.xlsx''', header = 1, index_col = 2, sheet_name = club)
combineddata.append(data)
If I were to change combineddata to a blank list then I get a dictionary of dataframes.
The solution is that append does not work in place.
It returns the appended DataFrame.
Therefore
combineddata = pd.DataFrame()
for club in data_sheetnames:
data = pd.read_excel(r'''C:\Users\me\Desktop\Data.xlsx''', header = 1, index_col = 2, sheet_name = club)
combineddata = combineddata.append(data)
should solve the issue
An easier way is just to do this:
combined_data = pd.concat([pd.read_excel(sheet_name) for sheet_name in data_sheetnames])
Related
I have a dataframe that's created by a host of functions. From there, I have two more dataframes I need to create off the master frame. I have another function that takes that master frame and does a few more transformations on it. One of those is changing the column names, however that is in turn changing the column names on the master and I can't figure out why.
def create_y_df(c_dataframe: pd.DataFrame):
x_col_list = [str(i) for i in c_dataframe.columns]
for i, j in enumerate(x_col_list):
if 'Unnamed:' in j:
x_col_list[i] = x_col_list[i-1]
x_col_list[i-1] = 'drop'
c_dataframe.columns = x_col_list
c_dataframe = c_dataframe.drop(['drop'], axis=1)
c_dataframe = c_dataframe.apply(lambda x: pd.Series(x.dropna().values))
return c_dataframe
master_df = create_master(params)
y_df = create_y_df(master_df)
After running this, if I export master_df again, the columns now include 'drop'. What's interesting is that if I remove the columns renaming loop from create_y_df but leave the x.dropna(), that portion is not applied to master_df. I just have no idea why the c_dataframe.column = x_col_list from create_y_df() is applying to master_df
I can't apply the alterations I make to dataframes inside a dictionary. The changes are done with a for loop.
The problem is that although the loop works because the single iterated df makes the changes, they do not apply to the dictionary they are in.
The end goal is to create a merge of all the dataframes since they come from different excel sheets and sheets.
Here the code:
Import the two excel files, assigning None to the Sheet_Name parameter in order to import all the sheets of the document into a dict. I have 8 sheet in EEG excel file and 5 in SC file
import numpy as np
impody pandas as np
eeg = pd.read_excel("path_file", sheet_name = None)
sc = pd.read_excel("path_file" sheet_name = None)
Merges the first dictionary with the second one with the update method. Now the EEG dict contains both EEG and SC.
So now I have a dict with 13 df inside
eeg.update(sc)
The loop for is needed in order to carry out some modifications inside the single df.
reset the index to a specific column (common on all df), change its name, add a prefix on the variable that corresponds to the key of the df and lastly change the 0 with nan.
for key, df in eeg.items():
df.set_index(('Unnamed: 0'), inplace = True)
df.index.rename(('Sbj'), inplace = True)
df = df.add_prefix( key + '_')
df.replace (0, np.nan, inplace = True)
Although the loop is set on the dictionary items and the single iterated dataframe works, I don't see the changes on the dictionary df's and therefore can't proceed to extract them into a list, then merge.
As you can see in the fig.1 the single df in the for loop is good!
but when I go to the df in dict, they still result as before.
You need to map your modified dataframe back into your dictionary:
for key, df in eeg.items():
df.set_index(('Unnamed: 0'), inplace = True)
df.index.rename(('Sbj'), inplace = True)
df = df.add_prefix( key + '_')
df.replace (0, np.nan, inplace = True)
eeg[key] = df #map df back into eeg
What you probably want is:
# merge the dataframes in your dictionary into one
df1 = pd.DataFrame()
for key, df in eeg.items():
df1 = pd.concat([df1,df])
# apply index-changes to the merged dataframe
df1.set_index(('Unnamed: 0'), inplace = True)
df1.index.rename(('Sbj'), inplace = True)
df1 = df1.add_prefix( key + '_')
df1.replace (0, np.nan, inplace = True)
I have multiple csv files
I was able to load them as data frames into dictionary by using keywords
# reading files into dataframes
csvDict = {}
for index, rows in keywords.iterrows():
eachKey = rows.Keyword
csvFile = "SRT21" + eachKey + ".csv"
csvDict[eachKey] = pd.read_csv(csvFile)
Now I have other functions to apply on every data frame's specific column.
on a single data frame the code would be like this
df['Cat_Frames'] = df['Cat_Frames'].apply(process)
df['Cat_Frames'] = df['Cat_Frames'].apply(cleandata)
df['Cat_Frames'] = df['Cat_Frames'].fillna(' ')
My question is how to loop through every data frame in the dictionary to apply those function?
I have tried
for item in csvDict.items():
df = pd.DataFrame(item)
df
and it gives me empty result
any solution or suggestion?
You can chain the applys like this:
for key, df in csvDict.items():
df['Cat_Frames'] = df['Cat_Frames'].apply(process).apply(cleandata).fillna(' ')
Items returns a tuple of key/value, so you should make your for loop actually say:
for key, value in csvDict.items():
print(df)
also you need to print the df if you aren't in jupyter
for key, value in csvDict.items():
df = pd.DataFrame(value)
df
I think this is how you should traverse the dictionary.
When there is no processing of data from one data set/frame involving another data set, don't collect data sets.
Just process the current one, and proceed to the next.
The conventional name for a variable receiving an "unpacked value" not going to be used is _:
for _, df in csvDict.items():
df['Cat_Frames'] = df['Cat_Frames'].apply(process).apply(…
- but why ask for keys to ignore them? Iterate the values:
for df in csvDict.values():
df['Cat_Frames'] = df['Cat_Frames'].apply(process).apply(…
I am currently having a problem where I am splitting a column into two separate columns. When I perform my code it runs without any errors, but the dataframe is still the same. I'm not sure where my mistake is.
url = 'https://www.teamrankings.com/nba/player/clint-capela/game-log'
html = requests.get(url).content
get_log = pd.read_html(html)
players_log = get_log[0]
game_log = players_log.head()
players_log.info()
players_log.join(players_log['FGM-FGA'].str.split('-', 1, expand=True).rename(columns={0:'A', 1:'B'}))
players_log.info()
print(game_log)
Pandas operations by default doesn't change the original dataframe and returns a new object, so
players_log = players_log.join(players_log['FGM-FGA'].str.split('-', 1, expand=True).rename(columns={0:'A', 1:'B'}))
should do it
The aim is to create an interaction dataframe where I can handle values of the cells without coding.
For me it seems should be in the following way:
creating ipysheet.sheet
handling cells manually
converting it to pandas dataframe
the problem is:
after creating a ipysheet.sheet I manualy changed the values of some cells and then convert it to pandas dataframe, but changes done are not reflected in this datafarme; if you just call this sheet without converting you can see these changes;
d = {'col1': [2,8], 'col2': [3,6]}
df = pd.DataFrame(data=d)
sheet1 = ipysheet.sheet(rows=len(df.columns) +1 , columns=3)
first_col = df.columns.to_list()
first_col.insert(0, 'Attribute')
column = ipysheet.column(0, value = first_col, row_start = 0)
cell_value1 = ipysheet.cell(0, 1, 'Format')
cell_value2 = ipysheet.cell(0, 2, 'Description')
sheet1
creating a sheet1
ipysheet.to_dataframe(sheet1)
converting to pd.DataFrame
Solved by predefining all empty spaces as np.nan. You can handle it manually and it transmits to DataFrame when converting.