I'm trying to open the demo.csv to convert it to an xlsx to sort column x, header name is called Birthplace, but I can't wrap my head around why the column doesn't want to sort.
Its does everything fine but doesn't sort the column.
import os
import time
from pathlib import Path
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
username = os.getenv("username")
filepath_in = Path(f'C:\\Users\\{username}\\Downloads\\demo.csv').resolve()
filepath_out = Path(f'C:\\Users\\{username}\\Downloads\\demo.xlsx').resolve()
pd.read_csv(filepath_in, delimiter=";").to_excel(filepath_out)
absolutePath = Path(f'C:\\Users\\{username}\\Downloads\\demo.xlsx').resolve()
os.system(f'start excel.exe "{absolutePath}"')
df = pd.read_excel(absolutePath)
print(df)
time.sleep(5)
df.sort_values(by='Birthplace',ascending=False, ignore_index=True).head()
print (df.sort_values)
I think I understand the confusion. Pandas will read the CSV file but it will not automatically save the results. You will have to save the file explicitly using something like df.to_excel or df.to_csv.
As OP wrote in their question, one can sort the dataframe using .sort_values(), but it is important to keep in mind that this function returns a new dataframe. We need to reassign the output of .sort_values() to df.
import pandas as pd
df = pd.read_csv("demo.csv")
df = df.sort_values(by="Birthplace", ascending=False, ignore_index=True)
df.to_excel("demo.xlsx")
Once you save the file demo.xlsx, then you should see the sorted columns in Excel.
Related
I need to capture date from multiple csv filenames and add that date in each file as a new column using Python , I have this code that works well with Excel files and I am trying to do exactly the same with CSV files, If someone could help me that would be much appreciated.
Filenames are as following...
Scan_05-22-2021.csv
Scan_05-23-2021.csv
Scan_05-24-2021.csv and so on..
Excel code that works..
import openpyexcel
import os
import pandas as pd
import glob
import csv
from openpyexcel import load_workbook
import os
path_to_xls = os.getcwd() # or r'<path>'
for xls in os.listdir ('C:\Python'):
if xls.endswith(".csv") or xls.endswith(".xlsx"):
f = load_workbook(filename=xls)
sheet = f.active
# Change here the name of the new column
sheet.cell(row=1, column=25).value = "DateTest"
for i in range(sheet.max_row-1):
#xls.split('_')[1][:-5] #kaes value of Col1 and dumps/overwrites in column 3
sheet.cell(row=i+2, column=25).value = xls.split('_')[1][:-5]
f.save(xls)
f.close()
You should be able to do this with pandas
use pd.read_csv to load the files as DataFrames
you can use the iterrows method to go ever rows
and simply append to the new file.
this cheatsheet could be of use
Good luck!
I have a (theoretically) simple task. I need to pull out a single column of 4000ish names from a table and use it in another table.
I'm trying to extract the column using pandas and I have no idea what is going wrong. It keeps flagging an error:
TypeError: string indices must be integers
import pandas as pd
file ="table.xlsx"
data = file['Locus tag']
print(data)
You have just add file name and define the path . But you cannot load the define pandas read excel function . First you have just the read excel function from pandas . That can be very helpful to you read the data and extract the column etc
Sample Code
import pandas as pd
import os
p = os.path.dirname(os.path.realpath("C:\Car_sales.xlsx"))
name = 'C:\Car_sales.xlsx'
path = os.path.join(p, name)
Z = pd.read_excel(path)
Z.head()
Sample Code
import pandas as pd
df = pd.read_excel("add the path")
df.head()
For the past few days I've been trying to do a relatively simple task but I'd always encounter some errors so I'd really appreciate some help on this. Here goes:
I have an Excel file which contains a specific column (Column F) that has a list of IDs.
What I want to do is for the program to read this excel file and allow the user to input any of the IDs they would like.
When the user types in one of the IDs, I would want the program to return a bunch IDs that contain the text that the user has inputted, and after that I'd like to export those 'bunch of IDs' to a new & separate Excel file where all the IDs would be displayed in one column but in separate rows.
Here's my code so far, I've tried using arrays and stuff but nothing seems to be working for me :/
import pandas as pd
import numpy as np
import re
import xlrd
import os.path
import xlsxwriter
import openpyxl as xl;
from pandas import ExcelWriter
from openpyxl import load_workbook
# LOAD EXCEL TO DATAFRAME
xls = pd.ExcelFile('N:/TEST/TEST UTILIZATION/IA 2020/Dev/SCS-FT-IE-Report.xlsm')
df = pd.read_excel(xls, 'FT')
# GET USER INPUT (USE AD1852 AS EXAMPLE)
value = input("Enter a Part ID:\n")
print(f'You entered {value}\n\n')
i = 0
x = df.loc[i, "MFG Device"]
df2 = np.array(['', 'MFG Device', 'Loadboard Group','Socket Group', 'ChangeKit Group'])
for i in range(17367):
# x = df.loc[i, "MFG Device"]
if value in x:
df = np.array[x]
df2.append(df)
i += 1
print(df2)
# create excel writer object
writer = pd.ExcelWriter('N:/TEST/TEST UTILIZATION/IA 2020/Dev/output.xlsx')
# write dataframe to excel
df2.to_excel(writer)
# save the excel
writer.save()
print('DataFrame is written successfully to Excel File.')
Any help would be appreciated, thanks in advance! :)
It looks like you're doing much more than you need to do. Rather than monkeying around with xlsxwriter, pandas.DataFrame.to_excel is your friend.
Just do
df2.to_excel("output.xlsx")
You don't need xlsxwriter. Simply df.to_excel() would work. In your code df2 is a numpy array/ First convert it into a pandas DataFrame format a/c to the requirement (index and columns) before writing it to excel.
I have approximately 300 files which are to be renamed as per the excel sheet mentioned below
The folder looks something like this :
I have tried writing following code, I think there will be a need of looping aswell. But it is not able to rename even one file. Any clue how this can be corrected.
import os
import pandas as pd
os.path.abspath('C:\\Users\\Home\\Desktop')
master=pd.read_excel('C:\\Users\\Home\\Desktop\\Test_folder\\master.xlsx')
master['old']=
('C:\\Users\\Home\\Desktop\\Test_folder\\'+master['oldname']+'.xlsx')
master['new']=
('C:\\Users\\Home\\Desktop\\Test_folder\\'+master['newname']+'.xlsx')
newmaster=master[['old','new']]
os.rename(newmaster['old'],newmaster['new'])
Load stuff.
import os
import pandas as pd
master = pd.read_excel('C:\\Users\\Home\\Desktop\\Test_folder\\master.xlsx')
Set your current directory to the folder.
os.chdir('C:\\Users\\Home\\Desktop\\Test_folder\\')
Rename things one at a time. While it would be cool, os.rename is not designed to work with pandas.
for row in master.iterrows():
oldname, newname = row[1]
os.rename(oldname+'.xlsx', newname+'.xlsx')
Basically, you are passing two pandas Series into os.rename() which expects two strings. Consider passing each Series values elementwise using apply(). And use the os-agnostic, os.path.join to concatenate folder and file names:
import os
import pandas as pd
cd = r'C:\Users\Home\Desktop\Test_folder'
master = pd.read_excel(os.path.join(cd, 'master.xlsx'))
def change_names(row):
os.rename(os.path.join(cd, row[0] +'.xlsx'), os.path.join(cd, row[1] +'.xlsx'))
master[['oldname', 'newname']].apply(change_names, axis=1)
I'm looping thru several .xlsx files in a folder and spitting out their column names like so.
import openpyxl
import os
import glob
import numpy as np
import pandas as pd
glob.glob("c:/myfolder/*.xlsx")
all_sheets_data = pd.DataFrame()
for f in glob.glob("c:\\myfolder\\*.xlsx"):
df = pd.read_excel(f)
all_sheets_data = all_data.append(df,ignore_index=True)
print (df)
I'm looking to add a new column called "RESULTS". I want to add/insert it in the very left column. I've searched for Add Column help but haven't found anything that works. Any suggestions, would really appreciate it.