How do I put a file path variable into pandas.read_csv? - python

I tried to apply it through os.environ like so:
import os
import pandas as pd
os.environ["FILE"] = "File001"
df = pd.read_csv('/path/$FILErawdata.csv/')
But pandas doesn't recognize $FILE and instead gives me $FILErawdata.csv not found
Is there an alternative way to do this?

New Answer:
If you like string interpolation, python now uses f-strings for string interpolation:
import os
import pandas as pd
filename = "File001"
df = pd.read_csv(f'/path/{filename}rawdata.csv/')
Old Answer:
Python doesn't use variables like shells scripts do. Variables don't get automatically inserted into strings.
To do this, you have to create a string with the variable inside.
Try this:
import os
import pandas as pd
filename = "File001"
df = pd.read_csv('/path/' + filename + 'rawdata.csv/')

df = pd.read_csv('/path/%(FILE)srawdata.csv' % os.environ)
I suspect you need to remove the trailing '/'.

Related

How to sort an column in Python?

I'm trying to open the demo.csv to convert it to an xlsx to sort column x, header name is called Birthplace, but I can't wrap my head around why the column doesn't want to sort.
Its does everything fine but doesn't sort the column.
import os
import time
from pathlib import Path
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
username = os.getenv("username")
filepath_in = Path(f'C:\\Users\\{username}\\Downloads\\demo.csv').resolve()
filepath_out = Path(f'C:\\Users\\{username}\\Downloads\\demo.xlsx').resolve()
pd.read_csv(filepath_in, delimiter=";").to_excel(filepath_out)
absolutePath = Path(f'C:\\Users\\{username}\\Downloads\\demo.xlsx').resolve()
os.system(f'start excel.exe "{absolutePath}"')
df = pd.read_excel(absolutePath)
print(df)
time.sleep(5)
df.sort_values(by='Birthplace',ascending=False, ignore_index=True).head()
print (df.sort_values)
I think I understand the confusion. Pandas will read the CSV file but it will not automatically save the results. You will have to save the file explicitly using something like df.to_excel or df.to_csv.
As OP wrote in their question, one can sort the dataframe using .sort_values(), but it is important to keep in mind that this function returns a new dataframe. We need to reassign the output of .sort_values() to df.
import pandas as pd
df = pd.read_csv("demo.csv")
df = df.sort_values(by="Birthplace", ascending=False, ignore_index=True)
df.to_excel("demo.xlsx")
Once you save the file demo.xlsx, then you should see the sorted columns in Excel.

Create n data frames using a for loop

I would like to know how to name in a different way the data frames that I am going to create using the code below.
import pandas as pd
import glob
os.chdir("/Users/path")
dataframes=[]
paths = glob.glob("*.csv")
for path in paths:
dataset= pd.read_csv(path)
dataframes.append(dataset)
I would like to have something like this:
df1
df2
df3
....
in order to use each of them for different analysis purposes. In the folder I have files like
analysis_for_market.csv, dataset_for_analysis.csv, test.csv, ...
Suppose I have 23 csv files (this length is given by dataframes as it appends each of df).
For each of them I would like to create a dataframe df in python in order to run different analysis.
I would do for one of it:
df=pd.read_csv(path) (where path is "/path/analysis_for_market.csv").
and then I could work on it (adding columns, dropping them, and so on).
However, I would like also to be able to work with another dataset, let say dataset_for_analysis.csv, so I would need to create a new dataframe, df2. This could be useful in case I would like to compare rows.
And so on. Potentially I would need a df for each dataset, so I would need 23 df.
I think it could be done using a for loop, but I have not idea on how to call the df(for example, execute df.describe for the two examples above).
Could you please tell me how to do this?
If you find a possible question related to mine, could you please add it in a comment, before closing my question (as a previous post was closed before solving my issues)?
Thank you for your help and understanding.
Update:
import os
import pandas as pd
import glob
os.chdir("/Users/path")
paths = glob.glob("*.csv")
dataframes=[]
df={}
for x in range(1,len(paths)):
for path in paths:
df["0".format(x)]=pd.read_csv(path)
#dataframes[path] = df # it gives me the following error: TypeError: list indices must be integers or slices, not str
df["2"]
it works only for 0 as in the code, but I do not know how to let the value ranges between 1 and len(paths)
Setting the name of dataframe will do the job.
import pandas as pd
import glob
import os
os.chdir("/Users/path")
df = {}
paths = glob.glob("*.csv")
for index, path in enumerate(paths):
df[str(index)]= pd.read_csv(path)
This is working fine for me. If i call df['0'], this is giving me the first dataframe.
You can create a global variable with any name you like by doing
"globals()["df32"] = ..."
But that is usually viewed as poor coding practice (because you might be clobbering existing names without knowing it).
Instead, just create a dictionary mydfs (say) and do mydfs[1]=...
from glob import glob
import pandas as pd
for i, path in enumerate(glob('*.csv')):
exec("{} = {}".format("df{0:03d}".format(i), pd.read_csv(path, encoding = 'latin-1')))
You can adjust the 0:03d bit to the number of leading zeros you'd like if you need to or can just skip it alltogether with df{i}.

Syntax error when assigning an absolute path using pandas

I am trying to import and graph a .csv dataset using pandas, though when assigning the file path for the csv to be read, It reads as:
File "C:\Users\17024\test.py", line 5
dataframe = pd.read_csv(C:/PY_ABS_PATH)
^
SyntaxError: invalid syntax
The code I used is bellow:
import pandas as pd
import matplotlib.pyplot as plt
#import os
dataframe = pd.read_csv(C:/PY_ABS_PATH/scottish_hills.csv)
print(dataframe.head(10))
You can call it as a raw string using 'r' and quotes like
dataframe = pd.read_csv(r'C:/PY_ABS_PATH/scottish_hills.csv')
or by replacing a single frontslash with double backslashes and adding quotes
dataframe = pd.read_csv('C:\\PY_ABS_PATH\\scottish_hills.csv')
You need to pass the path of the file as a string, therefore you need to use apostrophes. Try with the following:
dataframe = pd.read_csv(r'C:/PY_ABS_PATH/scottish_hills.csv')

How to Read file of specific pattern csv(regex) and create DataFrame in python using pandas

I try to create the DataFrame using method to csv.in place of the path I want to give regex pattern so that all file with this pattern gets. But this I don't get the file as per my expectation.
Please help me to solve the problem.
import pandas as pd
df=pd.to_csv(path+"^\d{8}_\d{6}$",sep="|",Header=none,names=col)
But this line does not fetch the exact file pattern.
directly this regular expression comes for search, please help me solve this.
The solution have 2 steps. The first step is you have to find all path that match a specific pattern. The second one is you read data from each DataFrame and concat it after that. The pandas library do not support the 1 step (I think, need recheck soon). So you could use glob library for that.
Code sample:
import pandas as pd
import glob
root_path = './'
datasheet_path_pattern = root_path + ('[0-9]' * 8) + '_' + ('[0-9]' * 6)
datasheet_paths = [path for path in glob.iglob(datasheet_path_pattern)]
datasheet = []
for datasheet_path in datasheet_paths:
df = pd.read_csv(datasheet_path, sep="|", Header=none, names=col)
datasheet.append(df)
datasheet = pd.concat(datasheet)

Renaming all the excel files as per the list in DataFrame in Python

I have approximately 300 files which are to be renamed as per the excel sheet mentioned below
The folder looks something like this :
I have tried writing following code, I think there will be a need of looping aswell. But it is not able to rename even one file. Any clue how this can be corrected.
import os
import pandas as pd
os.path.abspath('C:\\Users\\Home\\Desktop')
master=pd.read_excel('C:\\Users\\Home\\Desktop\\Test_folder\\master.xlsx')
master['old']=
('C:\\Users\\Home\\Desktop\\Test_folder\\'+master['oldname']+'.xlsx')
master['new']=
('C:\\Users\\Home\\Desktop\\Test_folder\\'+master['newname']+'.xlsx')
newmaster=master[['old','new']]
os.rename(newmaster['old'],newmaster['new'])
Load stuff.
import os
import pandas as pd
master = pd.read_excel('C:\\Users\\Home\\Desktop\\Test_folder\\master.xlsx')
Set your current directory to the folder.
os.chdir('C:\\Users\\Home\\Desktop\\Test_folder\\')
Rename things one at a time. While it would be cool, os.rename is not designed to work with pandas.
for row in master.iterrows():
oldname, newname = row[1]
os.rename(oldname+'.xlsx', newname+'.xlsx')
Basically, you are passing two pandas Series into os.rename() which expects two strings. Consider passing each Series values elementwise using apply(). And use the os-agnostic, os.path.join to concatenate folder and file names:
import os
import pandas as pd
cd = r'C:\Users\Home\Desktop\Test_folder'
master = pd.read_excel(os.path.join(cd, 'master.xlsx'))
def change_names(row):
os.rename(os.path.join(cd, row[0] +'.xlsx'), os.path.join(cd, row[1] +'.xlsx'))
master[['oldname', 'newname']].apply(change_names, axis=1)

Categories

Resources