My project
et->datacollector
->eventprocessor->multilang->resources->python->tenderevent->rules->Table.py
->target->inpout->Read.csv
Table.py
import pandas as pd
df_LFB1 = pd.read_csv('Read.csv', sep = ',', usecols = [1,2,7,59])
Now above I want to use Read.csv file how should I give the directory of Read.csv file in pd.read_csv
import os
os.getcwd()
Out[42]: '/Users/Documents'
## os.path.abspath(__file__) ## inside script
If I have the 'Read.csv' file in my current working directory '/Users/Documents', I can read the file like below.
df_LFB1 = pd.read_csv('Read.csv', sep = ',', usecols = [1,2,7,59])
and if my file is not in current working dierctory but in some other directory lets say et directory is in /home/project,
df_LFB1 = pd.read_csv(r'/home/project/et/eventprocessor/target/inpout/ Read.csv',
sep = ',', usecols = [1,2,7,59])
Above statement will read the file.
Note: when you provide absolute path to file. It doesnt not matter where your script resides.
Related
So here's my code :
climate_data = np.genfromtxt('climate.txt', delimiter = ',', skip_header = 1)
I put this python file in the same directory as the "climate.txt" file and I also already import numpy to my python file
And I get this error :
OSError: climate.txt not found.
What can I do to fix this?
Putting this python script file in the same directory doesn't guarantee your code will be run in the same directory. To make sure of that, use absolute path to climate.txt, or you can get the absolute path to the script directory by writing:
import os
DIR_PATH = os.path.dirname(os.path.abspath(__file__))
climate_data = np.genfromtxt(os.path.join(DIR_PATH, 'climate.txt'), delimiter = ',', skip_header = 1)
import os
import pandas as pd
FILES = os.listdir("/CADEC/original")
for file in FILES:
if file.startswith("ARTHROTEC."):
print(file)
ARTHROTEC.1.ann
ARTHROTEC.10.ann
ARTHROTEC.100.ann
ARTHROTEC.101.ann
ARTHROTEC.102.ann
ARTHROTEC.103.ann
ARTHROTEC.104.ann
ARTHROTEC.105.ann
ARTHROTEC.106.ann
ARTHROTEC.107.ann
ARTHROTEC.108.ann
ARTHROTEC.109.ann
ARTHROTEC.11.ann
ARTHROTEC.110.ann
ARTHROTEC.111.ann
ARTHROTEC.112.ann
ARTHROTEC.113.ann
ARTHROTEC.114.ann
ARTHROTEC.115.ann
...
I want to extract data from all the files starting with certain letters under a directory. As shown above, when I iterate over the directory and print every file name that fits, I get a column of file names (strings). Meanwhile, data = pd.read_csv("/CADEC/original/ARTHROTEC.1.ann", sep='\t', header=None) works perfectly well. However, running the following code would just return error. Why is the file not found? What should I do to fix this?
for file in FILES:
if file.startswith("ARTHROTEC."):
data = pd.read_csv(file, sep='\t', header=None)
FileNotFoundError: [Errno 2] File ARTHROTEC.1.ann does not exist: 'ARTHROTEC.1.ann'
os.listdir only returns the file names in the directory, it does not return the path, and pandas needs the path (or relative path) to the file, unless the file is in the same directory as the code.
You will be better off to learn the pathlib module, which treats paths as objects with methods, instead of strings.
.glob - produces a Generator of objects matching the pattern
Python 3's pathlib Module: Taming the File System
pathlib may take some getting used to, but all the methods for extracting specific parts of the path, like .suffix for the file extension, or .stem for the file name, make it worthwhile.
import pandas as pd
from pathlib import Path
# create the path object and get the files with .glob
files = Path('/CADEC/original').glob('ARTHROTEC*.ann')
# create a list of dataframes, 1 dataframe for each file
df_list = [pd.read_csv(file, sep='\t', header=None) for file in files]
# alternatively, create a dict of dataframes with the filename as the key
df_dict = {file.stem: pd.read_csv(file, sep='\t', header=None) for file in files}
Example
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] on win32
import os
...: from pathlib import Path
...: os.listdir('e:/PythonProjects/stack_overflow/t-files')
Out[2]:
['.ipynb_checkpoints',
'03900169.txt',
'142233.0.txt',
'153431.2.txt',
'17371271.txt',
'274301.5.txt',
'42010316.txt',
'429237.7.txt',
'570651.4.txt',
'65500027.txt',
'688599.3.txt',
'740103.5.txt',
'742537.6.txt',
'87505504.txt',
'90950222.txt',
't1.txt',
't2.txt',
't3.txt']
list(Path('e:/PythonProjects/stack_overflow/t-files').glob('*'))
Out[3]:
[WindowsPath('e:/PythonProjects/stack_overflow/t-files/.ipynb_checkpoints'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/03900169.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/142233.0.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/153431.2.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/17371271.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/274301.5.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/42010316.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/429237.7.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/570651.4.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/65500027.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/688599.3.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/740103.5.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/742537.6.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/87505504.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/90950222.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/t1.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/t2.txt'),
WindowsPath('e:/PythonProjects/stack_overflow/t-files/t3.txt')]
I have multiple excel files in a folder ('folder_A') (file 1, 2, 3, etc.)
I want to import those file, do something with them (in pandas) and write the excel file to a csv file in a different folder ('updated_folder_A').
I almost got it working but for some reason it doesn't work
the files don't go ('updated_folder_A'). Can someone tell my what I'm doing wrong?
test.py:
import glob
import pandas as pd
files = glob.glob('folder_A/*.xlxs')
for file in files:
df = pd.read_excel(file)
df['Col1'] = df['Col1'] / 60
df.to_csv('updated_{}'.format(file), index = False)
Expanding on #Anteino's answer, assuming your folder structure is like this:
Parent folder
folder_A
file1.xlsx
file2.xlsx
updated_folder_A
Then, if your script's inside Parent folder, this should work:
import glob
import pandas as pd
files = glob.glob('folder_A/*.xlxs')
for file in files:
df = pd.read_excel(file)
df['Col1'] = df['Col1'] / 60
file = file[:-5] #Extract .xslx from file name
df.to_csv('updated_folder_A/updated_{}.csv'.format(file), index = False)
Change the last line to:
df.to_csv('updated_folder_A/updated_{}'.format(file), index = False)
And make sure that folder exists too.
I can read a csv with relative path using below.
import pandas as pd
file_path = './Data Set/part-0000.csv'
df = pd.read_csv(file_path )
but when there are multiple files, I am using glob, File paths are mixed with forward and backward slash. thus unable to read file due to wrong path.
allPaths = glob.glob(path)
file path looks like below for path = "./Data Set/UserIdToUrl/*"
"./Data Set/UserIdToUrl\\part-0000.csv"
file path looks like below for path = ".\\Data Set\\UserIdToUrl\\*"
".\\Data Set\\UserIdToUrl\\part-0000.csv"
If i am using
normalPath = os.path.normpath(path)
normalPath is missing the relative ./ or .\\ like below.
'Data Set\UserIdToUrl\part-00000.csv'
Below could work, what is the best way to do it so that it work in both windows and linux?
".\\Data Set\\UserIdToUrl\\part-0000.csv"
or
"./Data Set/UserIdToUrl/part-0000.csv"
Please ask clarification question, if any. Thanks in advance for comments and answers.
More Info:
I guess the problem is only in windows but not in linux.
Below is shortest program to show issue. consider there are files in path './Data Set/UserIdToUrl/*' and it is correct as i can read file when providing path to file directly to pd.read_csv('./Data Set/UserIdToUrl/filename.csv').
import os
import glob
import pandas as pd
path = "./Data Set/UserIdToUrl/*"
allFiles = glob.glob(path)
np_array_list = []
for file_ in allFiles:
normalPath = os.path.normpath(file_)
print(file_)
print(normalPath)
df = pd.read_csv(file_,index_col=None, header=0)
np_array_list.append(df.as_matrix())
Update2
I just googled glob library. Its definition says 'glob — Unix style pathname pattern expansion'. I guess, I need some utility function that could work in both unix and windows.
you can use abspath
for file in os.listdir(os.path.abspath('./Data Set/')):
...: if file.endswith('.csv'):
...: df = pandas.read_csv(os.path.abspath(file))
Try this:
import pandas as pd
from pathlib import Path
dir_path = 'Data Set'
datas = []
for p in Path(dir_path).rglob('*.csv'):
df = pd.read_csv(p)
datas.append(df)
I'm learning Python and can't seem to get pandas dataframes to save. I'm not getting any errors, the file just doesn't appear in the folder.
I'm using a windows10 machine, python3, jupyter notebook and saving to a local google drive folder.
Any ideas?
import feedparser
import pandas as pd
rawrss = [
'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
'https://www.yahoo.com/news/rss/',
'http://www.huffingtonpost.co.uk/feeds/index.xml',
'http://feeds.feedburner.com/TechCrunch/',
]
posts = []
for url in rawrss:
feed = feedparser.parse(url)
for post in feed.entries:
posts.append((post.title, post.link, post.summary))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary']) # pass data to init
df.to_csv('df.to_csv('c:\\Users\\username\\Documents\\myfilename.csv', index=False)', index=False)
The file should be saved in the current working directory.
import os
cwd = os.getcwd()
print(cwd)
In the last line of your code change this:
df.to_csv('C://myfilename.csv', index=False)
Now your file is saved in C drive.
You can change the path as per you wish.
for eg.
df.to_csv('C://Folder//myfilename.csv', index=False)
2.Alternatively ,If you want to locate where your file is stored.
import os
print(os.getcwd())
This gives you the directory where the files are stored.
you can also change your working directory as per your wish.
Just at the beginning of your code
import os
os.chdir("path_to_folder")
In this case then no need of specifying path at the time of saving it to CSV.
You can write a function that saves the file and returns a boolean value as the following:
import os
def save_data(path, file, df):
if (df.to_csv(saving_path + file + '.csv', index = False)):
return True
else:
return False
But you have to provide the right path though.
add this code to the bottom of your file
import os
print(os.getcwd())
That's where your file is
Try writing a simple file with a new script.
F = open(your_path_with_filename, 'w')
F.write("hello")
F.close()