I am having below file(file1.xlsx) as input. In total i am having 32 columns in this file and almost 2500 rows. Just for example i am mentioning 5 columns in screen print
I want to edit same file with python and want output as (file1.xlsx)
it should be noted i am adding one column named as short and data is a kind of substring upto first decimal of data present in name(A) column of same excel.
Request you to please help
Regards
Kawaljeet
Here is what you need...
import pandas as pd
file_name = "file1.xlsx"
df = pd.read_excel(file_name) #Read Excel file as a DataFrame
df['short'] = df['Name'].str.split(".")[0]
df.to_excel("file1.xlsx")
hello guys i solved the problem with below code:
import pandas as pd
import os
def add_column():
file_name = "cmdb_inuse.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name,) #Read Excel file as a DataFrame
df['short'] = [x.split(".")[0] for x in df['Name']]
df.to_excel("cmdb_inuse.xlsx", index=False)
Related
I am parsing a large excel data file to another one, however the headers are very abnormal. I tried to use "read_excel skiprows" and that did not work. I also tried to include the header in
df = pd.read_excel(user_input, header= [1:3], sheet_name = 'PN Projection'), but then I get this error "ValueError: cannot join with no overlapping index names." To get around this I tried to name the columns by location and that did not work either.
When I run the code as shows below everything works fine, but past cell "U" I get the header titles to be "unnamed1, 2, ..." I understand this is because pandas is considering the first row to be the header(which are empty), but how do I fix this? Is there a way to preserve the headers without manually typing in the format for each cell? Any and all help is appreciated, thank you!
small section of the excel file header
the code I am trying to run
#!/usr/bin/env python
import sys
import os
import pandas as pd
#load source excel file
user_input = input("Enter the path of your source excel file (omit 'C:'): ")
#reads the source excel file
df = pd.read_excel(user_input, sheet_name = 'PN Projection')
#Filtering dataframe
#Filters out rows with 'EOL' in column 'item status' and 'xcvr' in 'description'
df = df[~(df['Item Status'] == 'EOL')]
df = df[~(df['Description'].str.contains("XCVR", na=False))]
#Filters in rows with "XC" or "spartan" in 'description' column
df = df[(df['Description'].str.contains("XC", na=False) | df['Description'].str.contains("Spartan", na=False))]
print(df)
#Saving to a new spreadsheet called Filtered Data
df.to_excel('filtered_data.xlsx', sheet_name='filtered_data')
If you do not need the top 2 rows, then:
df = pd.read_excel(user_input, sheet_name = 'PN Projection',error_bad_lines=False, skiprows=range(0,2)
This has worked for me when handling several strangely formatted files. Let me know if this isn't what your looking for, or if their are additional issues.
experts, i want to remove a very first row from excel file using python. I am sharing a a screen print of my source excel file
i want out put as
I am using below python code to remove first row from excel but when i am reading it as data frame and printing that i am observing that data in data frame is being read as shown in below screen print
and the code which i am using is
import pandas as pd
import os
def remove_header():
file_name = "AV Clients.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name) #Read Excel file as a DataFrame
#df = df.drop([0])
print(df)
#df.to_excel("AV_Clients1.xlsx", index=False)
remove_header()
Please suggest how i can remove a very first row from excel file whose screen print i have shared at top.
Thanks in advance
Kawaljeet
Just add skiprows argument while reading excel.
import pandas as pd
import os
def remove_header():
file_name = "AV Clients.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name, skiprows = 1) #Read Excel file as a DataFrame
print(df)
df.to_excel("AV_Clients1.xlsx", index=False)
remove_header()
enter image description here I have file named Example.xls in which i have data in tab sales and purchase.
We have data in both tab from Column A to E.
When i import these data through pandas module, i want that result like Column A to F where column F should display the name sheet name. How to display the name of sheet name in pandas module?
I am using code
all= pd.read_excel(Example.xlsx',sheet_name=['Sales','Purchas'])
enter image description here
and then
df= pd.concat(All[frame]for fram in All.keys())
and then after i want to put the name of tabs in my data frame "All" in the last column which is F respectively
I think this is the simplest way.
import pandas as pd
path = r'path_of_your_file'
workbook = pd.read_excel(path, sheet_name = None)
df= pd.DataFrame()
for sheet_name, sheet in workbook.items():
sheet['sheet'] = sheet_name
df = df.append(sheet)
# Reset your index or you'll have duplicates
df = df.reset_index(drop=True)
The below code will solve your problem:
import os
from glob import glob
import pandas as pd
f_mask = r'path\*.xlsx' ## The folder path where your Example.xlsx is stored
df = \
pd.concat([df.assign(file=os.path.splitext(os.path.basename(f))[0],
sheet=sheet)
for f in glob(f_mask)
for sheet, df in pd.read_excel(f, sheet_name=None).items()],
ignore_index=True)
The code works in following way:
Check the base folder and take all the .xlsx files in it
Read the files one by one
Make two additional columns, one for file name other for sheet name
This solution will work if you want to do the exercise for more than 1 .xlsx file
I have a table as below:
How can I print all the sources that have an 'X' for a particular column?. For example, if I want to specify "Make", the output should be:
Delivery
Reputation
Profitability
PS: The idea is to import the excel file in python and do this operation.
use pandas
import pandas as pd
filename = "yourexcelfile"
dataframe = pd.read_excel(filename)
frame = dataframe.loc[dataframe["make"] == "X"]
print(frame["source"])
I am still learning python. I am trying to import multiple workbooks and all the worksheets into one data frame.
Here is what I have so far:
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
for file in glob.glob("*.xls"): #import every file that ends in .xls
df = pd.read_excel(file)
all_data = all_data.append(df, ignore_index = True)
all_data.shape #12796 rows with 19 columns # we will have to find a way to check if this is accurate
I am having real trouble finding any documentation that will confirm/explain whether or not this code imports all the data sheets in every workbook. Some of these files have 15-20 sheets
Here is a link to where I found the glob explanation: http://pbpython.com/excel-file-combine.html
Any and all advice is greatly appreciated. I am still really new to R and Python so if you could explain this in as much detail as possible I would greatly appreciate it!
What you are missing is importing all the sheets in the workbook.
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
rows = 0
for file in glob.glob("*.xls"): #import every file that ends in .xls
# df = pd.read_excel(file).. This will import only first sheet
xls = pd.ExcelFile(file)
sheets = xls.sheet_names # To get names of all the sheets
for sheet_name in sheets:
df = pd.read_excel(file, sheetname=sheet_name)
rows += df.shape[0]
all_data = all_data.append(df, ignore_index = True)
print(all_data.shape[0]) # Now you will get all the rows which should be equal to rows
print(rows)