I have a best practice question. Today i learned how to Read and write files in Pandas. How to create a Table, how to add a column and row and how to drop them.
I have an excel file with the following content:
I create a new Column "Price_average" and I average "Price_min" and "Price_max" and output it as output_1.xlsx
#!/usr/bin/env python3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xlrd
df = pd.read_excel('original.xlsx')
print (df)
df['Price_average'] = (df.Price_min + df.Price_max)/2
df.to_excel('output_1.xlsx', sheet_name='sheet1', index=False)
print (df)
I then prop the columns "Price_min" and "price_max" with:
df = df.drop(['Price_min', 'Price_max'], axis=1)
And lets say I want to Create This table now:
I can either delete "Age" and "Price_average" and and swap "email" with "brand" or can I simply select the Columns I want to create a new spreadsheet?
Whats the best and cleanest way to do it? To subtract the unwanted Columns from the file and rearrange and if wanted rename the columns or Pick and choose the needed columns and create a new file with them in the right order. Any suggestions? And what's the cleanest way to solve it?
You can try this,
selected = df[['Age', 'Price_average', 'Email', 'Brand']]
If you want to change column names,
renamed = selected.rename(columns={'Brand': 'brand', 'Email':'email'})
Related
I saw this code
combine rows and add up value in dataframe,
but I want to add the values in cells for the same day, i.e. add all data for a day. how do I modify the code to achieve this?
Check below code:
import pandas as pd
df = pd.DataFrame({'Price':[10000,10000,10000,10000,10000,10000],
'Time':['2012.05','2012.05','2012.05','2012.06','2012.06','2012.07'],
'Type':['Q','T','Q','T','T','Q'],
'Volume':[10,20,10,20,30,10]
})
df.assign(daily_volume = df.groupby('Time')['Volume'].transform('sum'))
Output:
I am trying to create a list from a CSV. This CSV contains a 2 dimensional table [540 rows and 8 columns] and I would like to create a list that contains the values of an specific column, column 4 to be specific.
I tried: list(df.columns.values)[4], it does mention the name of the column but i'm trying to get the values from the rows on column 4 and make them a list.
import pandas as pd
import urllib
#This is the empty list
company_name = []
#Uploading CSV file
df = pd.read_csv('Downloads\Dropped_Companies.csv')
#Extracting list of all companies name from column "Name of Stock"
companies_column=list(df.columns.values)[4] #This returns the name of the column.
companies_column = list(df.iloc[:,4].values)
So for this you can just add the following line after the code you've posted:
company_name = df[companies_column].tolist()
This will get the column data in the companies column as pandas Series (essentially a Series is just a fancy list) and then convert it to a regular python list.
Or, if you were to start from scratch, you can also just use these two lines
import pandas as pd
df = pd.read_csv('Downloads\Dropped_Companies.csv')
company_name = df[df.columns[4]].tolist()
Another option: If this is the only thing you need to do with your csv file, you can also get away just using the csv library that comes with python instead of installing pandas, using this approach.
If you want to learn more about how to get data out of your pandas DataFrame (the df variable in your code), you might find this blog post helpful.
I think that you can try this for getting all the values of a specific column:
companies_column = df[{column name}]
Replace "{column name}" with the column you want to access the values of.
I am trying to create a dataframe where the column lengths are not equal. How can I do this?
I was trying to use groupby. But I think this will not be the right way.
import pandas as pd
data = {'filename':['file1','file1'], 'variables':['a','b']}
df = pd.DataFrame(data)
grouped = df.groupby('filename')
print(grouped.get_group('file1'))
Above is my sample code. The output of which is:
What can I do to just have one entry of 'file1' under 'filename'?
Eventually I need to write this to a csv file.
Thank you
If you only have one entry in a column the other will be NaN. So you could just filter the NaNs by doing something like df = df.at[df["filename"].notnull()]
Currently self-teaching Python and running into some issues. My challenge requires me to count the number of unique values in a column of an excel spreadsheet in which the rows have no missing values. Here is what I've got so far but I can't seem to get it to work:
import xlrd
import pandas as pd
workbook = xlrd.open_workbook("*name of excel spreadsheet*")
worksheet = workbook.sheet_by_name("*name of specific sheet*")
pd.value_counts(df.*name of specific column*)
s = pd.value_counts(df.*name of specific column*)
s1 = pd.Series({'nunique': len(s), 'unique values': s.index.tolist()})
s.append(s1)
print(s)
Thanks in advance for any help.
Use the built in to find the unique in the columns:
sharing an example with you:
import pandas as pd
df=pd.DataFrame(columns=["a","b"])
df["a"]=[1,3,3,3,4]
df["b"]=[1,2,2,3,4]
print(df["a"].unique())
will give the following result:
[1 3 4]
So u can store it as a list to a variable if you like, with:
l_of_unique_vals=df["a"].unique()
and find its length or do anything as you like
df = pd.read_excel("nameoffile.xlsx", sheet_name=name_of_sheet_you_are_loading)
#in the line above we are reading the file in a pandas dataframe and giving it a name df
df["column you want to find vals from"].unique()
First you can use Pandas read_exel and then unique such as #Inder suggested.
import pandas as pd
df = pd.read_exel('name_of_your_file.xlsx')
print(df['columns'].unique())
See more here.
I have an excel sheet
and another like this
I want to add the aisle_id in the first sheet according to the product_id like this
I need help for doing this preferably using python dataframes or sql server
Use pandas read_excel module as follows:
I have removed the 'aisle_id' column in the first DataFrame as it was not defined and would have lead to a conflict.
import pandas as pd
input_file1 = 'sales_table.xlsx'
input_file2 = 'sales_table2.xlsx'
df1 = pd.read_excel(input_file1)
df1 = df1[['order_id', 'product_id', 'add_to_cart_order', 'reordered']]
df2 = pd.read_excel(input_file2)
pd.merge(df1, df2)
This will create a new DataFrame merged on common column by default.