I have a (theoretically) simple task. I need to pull out a single column of 4000ish names from a table and use it in another table.
I'm trying to extract the column using pandas and I have no idea what is going wrong. It keeps flagging an error:
TypeError: string indices must be integers
import pandas as pd
file ="table.xlsx"
data = file['Locus tag']
print(data)
You have just add file name and define the path . But you cannot load the define pandas read excel function . First you have just the read excel function from pandas . That can be very helpful to you read the data and extract the column etc
Sample Code
import pandas as pd
import os
p = os.path.dirname(os.path.realpath("C:\Car_sales.xlsx"))
name = 'C:\Car_sales.xlsx'
path = os.path.join(p, name)
Z = pd.read_excel(path)
Z.head()
Sample Code
import pandas as pd
df = pd.read_excel("add the path")
df.head()
Related
I am trying to process an Excel file with Pandas. The filter to be applied is by the values of the "Test Code" column which has the format "XX10.X/XX12.X" (i.e: EF10.1). The problem is that the dataframe and neglects everything after the dot when reading the column, leaving just "XX10". The information after the dot is the most important information.
The original document classifies those cells as a date, which probably is altering the normal processing of the values.
excelfile
The code I am using is:
import os
import pandas as pd
file = "H2020_TRI-HP_T6.2_PropaneIceFaultTests_v1"
folder = "J:\Downloads"
file_path = os.path.join(folder,file+".xlsx")
df = pd.read_excel(file_path,sheet_name="NF10")
df["Test Code"]
The output is:
output
I'm trying to open the demo.csv to convert it to an xlsx to sort column x, header name is called Birthplace, but I can't wrap my head around why the column doesn't want to sort.
Its does everything fine but doesn't sort the column.
import os
import time
from pathlib import Path
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
username = os.getenv("username")
filepath_in = Path(f'C:\\Users\\{username}\\Downloads\\demo.csv').resolve()
filepath_out = Path(f'C:\\Users\\{username}\\Downloads\\demo.xlsx').resolve()
pd.read_csv(filepath_in, delimiter=";").to_excel(filepath_out)
absolutePath = Path(f'C:\\Users\\{username}\\Downloads\\demo.xlsx').resolve()
os.system(f'start excel.exe "{absolutePath}"')
df = pd.read_excel(absolutePath)
print(df)
time.sleep(5)
df.sort_values(by='Birthplace',ascending=False, ignore_index=True).head()
print (df.sort_values)
I think I understand the confusion. Pandas will read the CSV file but it will not automatically save the results. You will have to save the file explicitly using something like df.to_excel or df.to_csv.
As OP wrote in their question, one can sort the dataframe using .sort_values(), but it is important to keep in mind that this function returns a new dataframe. We need to reassign the output of .sort_values() to df.
import pandas as pd
df = pd.read_csv("demo.csv")
df = df.sort_values(by="Birthplace", ascending=False, ignore_index=True)
df.to_excel("demo.xlsx")
Once you save the file demo.xlsx, then you should see the sorted columns in Excel.
I am converting the each sas dataset from the list of directory to individual dataframe in pandas
import os
import pandas as pd
import pyreadstat as pyd
os.chdir(r'XX\XX\XX\XXX')
Assume the default directory contains the list of sasdatsets.
aa.sas7bdat
bb.sas7bdat
cc.sas7bdat
dd.sas7bdat
ee.sas7bdat
Now i am creating the dictionary where it iterates each sas datasets using pyd.read_sas7bdat import into the individual data frame.
ddict={}
for file in os.listdir():
if file.endswith(".sas7bdat"):
name = os.path.splitext(file)[0]
ddict[name]=pyd.read_sas7bdat(file,metadataonly=False)
But i am still not able to succeed with the above code and pls help how to achieve. My output expected was to create new dataframe for each sasdatasets i,e there shall be multiple dataframes Note: dataframe should be the name of the sasdatasets without the extension
for example aa.sas7bdat --> sas datasets aa --> to be created as
dataframe
I would do it like this:
import os
import pyreadstat as pyd
ddict={}
for file in os.listdir():
if file.endswith(".sas7bdat"):
name = os.path.splitext(file)[0]
df, meta = pyd.read_sas7bdat(file)
# store the dataframe in a dictionary
ddict[name]= df
# alternatively bind to a new variable name
exec(name + "= df.copy()")
Remember that read_sas7bdat gives you a tuple of dataframe and metadata object, not a dataframe only.
I have large data-frame in a Csv file sample1 from that i have to generate a new Csv file contain only 100 data-frame.i have generate code for it.but i am getting key Error the label[100] is not in the index?
I have just tried as below,Any help would be appreciated
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv")
data_frame1 = data_frame[:100]
data_frame.to_csv("C:/users/raju/sample.csv")`
`
The correct syntax is with iloc:
data_frame.iloc[:100]
A more efficient way to do it is to use nrows argument who purpose is exactly to extract portions of files. This way you avoid wasting resources and time parsing useless rows:
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv", nrows=101) # 100+1 for header
data_frame.to_csv("C:/users/raju/sample.csv")
I'm looping thru several .xlsx files in a folder and spitting out their column names like so.
import openpyxl
import os
import glob
import numpy as np
import pandas as pd
glob.glob("c:/myfolder/*.xlsx")
all_sheets_data = pd.DataFrame()
for f in glob.glob("c:\\myfolder\\*.xlsx"):
df = pd.read_excel(f)
all_sheets_data = all_data.append(df,ignore_index=True)
print (df)
I'm looking to add a new column called "RESULTS". I want to add/insert it in the very left column. I've searched for Add Column help but haven't found anything that works. Any suggestions, would really appreciate it.