How to load a file in Pandas from a parent directory - python

I've not used Pandas in a while but wanted to load a JSON file.
I've traditionally had an overarching directory (on Mac) called DataAnalyis and store all the data I've collected in folders describing what they contain.
I then created a folder called IPythonnotebooks in which I kept my scripts.
Loading a file - let's call it 'dummy.json' was trivial. It's in a folder called dummy.
The code was simple:
import pandas as pd
df = pd.read_json('../dummy/dummy.json')
That doesn't work any more. What have I got wrong?
Update:
DataAnalysis
---dummy
----dummy.json
---IPythonnotebooks
----dummy.pynb
Apologies if this is not the correct way to present file structure. I start up the notebook file in the folder IPythonnotebooks

Sometimes JupyterLab (and maybe other 'notebooks') start with diff cwd than you might think.
import os
os.getcwd()
check if it matches your '../dummy/dummy.json'
or check if this works:
import os
fullpath = os.path.realpath("dummy.json")
df = pd.read_json(fullpath)

Related

How can I import a csv from another folder in python?

I have a script in python, I want to import a csv from another folder. how can I do this? (for example, my .py is in a folder and I want to reach the data from the desktop)
First of all, you need to understand how relative and absolute paths work.
I write an example using relative paths. I have two folders in desktop called scripts which includes python files and csvs which includes csv files. So, the code would be:
df = pd.read_csv('../csvs/file.csv)
The path means:
.. (previous folder, in this case, desktop folder).
/csvs (csvs folder).
/file.csv (the csv file).
If you are on Windows:
Right-click on the file on your desktop, and go to its properties.
You should see a Location: tag that has a structure similar to this: C:\Users\<user_name>\Desktop
Then you can define the file path as a variable in Python as:
file_path = r'C:\Users\<your_user_name>\Desktop\<your_file_name>.csv'
To read it:
df = pd.read_csv(file_path)
Obviously, always try to use relative paths instead of absolute paths like this in your code. Investing some time into learning the Pathlib module would greatly help you.

Python: Excel file (xlsx) export with variable as the file path, using pandas

I defined an .xlsx file path as the variable output:
print(output)
r'C:\Users\Kev\Documents\Python code.xlsx'
I want to export a pandas dataframe as an .xlxs file, but need to have the file path as the output variable.
I can get the code to work with the file path. I've tried about a dozen ways (copying and/or piecing code together from documentation, stack overflow, blogs, etc.) and getting a variety of errors. None worked. Here is one that worked with the file path:
df = pd.DataFrame(file_list)
df.to_excel(r'C:\Users\Kev\Documents\Python code.xlsx', index=False)
I would want something like:
df.to_excel(output, index=False)
In any form or package, as long as it produces the same xlsx file and won’t need to be edited to change the file path and name (that would be done where the variable output is defined.
I've attempted several iterations on the XlsxWriter site, the openpyxl site, the pandas site, etc.
(with the appropriate python packages). Working in Jupyter Notebook, Python 3.8.
Any resources, packages, or code that will help me to use a variable in place of a file path for an xlsx export from a pandas dataframe?
Why I want it like this is a long story, but basically I'll have several places at the top of the code where myself and other (inexperienced) coders can quickly put file paths in and search for keywords (rather than hunt through code to find where to replace paths). The data itself is file paths that I'll iteratively search through (this is the beginning of a larger project).
try to put the path this way
output = "C://Users//Kev//Documents//Python code.xlsx"
df.to_excel(output , index=False)
Always worked for me
or you can also do like
output = "C://Users//Kev//Documents//"
df.to_excel(output +"Python code.xlsx" , index=False)
os module would be the most useful here:
from os import path
output = path.abspath("your_excel_file.xlsx")
print(output)
this will return the current working directory path plus the file name you've put into the abspath function as a parameter. Also for those interested about why some people use backslash "\" and not forwardslash "/" while writing file paths here is a good stackoverflow answer for it So what IS the right direction of the path's slash (/ or \) under Windows?
You can use format strings with python3
import pandas as pd
df = pd.DataFrame({"a":"b"}, {"c": "d"})
file_name = "filename.xlsx"
df.to_excel(f"/your/path/to/file/{file_name}", index=False)
Assuming that OP's dataframe is df, that OP is using Windows and wants to store the file in the Desktop, OP's username is cowboykevin05, and the filename that one wants is 0001.xlsx, one can use os.path as follows
from os import path
df.to_excel(path.join('C:\\Users\\cowboykevin05\\Desktop', '0001.xlsx'), index=False)

How do you know where a csv file is stored once you write a DataFrame onto disk?

import pandas as pd
hand_1=pd.DataFrame({
'Tables of 5':[5,10,15,20,25],
'Tables of 6':[6,12,18,24,30]})
hand_1.to_csv('Tables.csv')`
How do i find out where Tables.csv is stored?
Is this where python stores csv files by default and can this be changed?
It will be saved in your current working directory. If you would like to learn it, you can use the following code:
import os
current_directory = os.getcwd()
You can give a full path instead of tables.csv to store in another directory.

Import file into python

I'm new to python and trying to learn a few exercises via colab. I want to import a CSV file that I saved to my desktop. Unfortunately, I keep getting a "cannot find file" error message. Not sure what I'm doing wrong.
Here's my code:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
mpg = pd.read_csv(r"C:\Users\micha\OneDrive\Desktop\mpg2018.csv.csv")
I tried to change csv.csv to csv.txt or leave as just .csv but nothing works. Any help would be great!
Foward slashes will work in this function.
mpg = pd.read_csv("C:/Users/micha/OneDrive/Desktop/filename.csv")
Since you've imported "os", you could also use path.join()
p = os.path.join("C:\\", "Users", "micha", "OneDrive", "Desktop", "a.csv")
mpg = pd.read_csv(p)
Also, that file format repetition within the name seems unnecessary. It may lead to more confusion.
You are using colab.research.google.com, which lives in its own cloud world and has no idea of what files are on your personal machine. However, if you
from google.colab import files
files.upload()
It will open a nice dialog box which will allow you to find the file in the usual way.
In files section you find Upload button click on it add your file in colab
now simply import pd.read_csv(path)
how you find path: left click on csv file then select copy path and paste in place of path in pd.read_csv()

cannot write file with full path in Python

This is a problem that has been previously solved (cannot write file with full path in Python) however I followed the advice in the previous answer and it didn't work and that's why I'm posting this.
I'm trying to access a csv file to load into the pandas dataframe.
import os
output_path = os.path.join('Desktop/My_project_folder', 'train.csv')
This is returning:
IOError: File Desktop/My_project_folder/train.csv does not exist
edit: I don't understand because the train.csv file exists in my project folder.
The os.path.join() function is platform agnostic meaning it can run across multiple OS (PC, Mac, Linux) without having the need to specify directories or subdirectories with forward or back slashes. Hence, simply separate paths and file names by commas:
myDir = '/path/to/Desktop/My_project_folder'
output_path = os.path.join(myDir, 'train.csv')
However, if Python script resides in same directory as data, have script detect its own path and then import data frame into pandas and avoiding hard-coding whole path names:
import os
import pandas as pd
# SET CURRENT DIRECTORY
cd = os.path.dirname(os.path.abspath(__file__))
traindf = read_csv(os.path.join(cd, 'train.csv'))

Categories

Resources