I'm learning Python and can't seem to get pandas dataframes to save. I'm not getting any errors, the file just doesn't appear in the folder.
I'm using a windows10 machine, python3, jupyter notebook and saving to a local google drive folder.
Any ideas?
import feedparser
import pandas as pd
rawrss = [
'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
'https://www.yahoo.com/news/rss/',
'http://www.huffingtonpost.co.uk/feeds/index.xml',
'http://feeds.feedburner.com/TechCrunch/',
]
posts = []
for url in rawrss:
feed = feedparser.parse(url)
for post in feed.entries:
posts.append((post.title, post.link, post.summary))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary']) # pass data to init
df.to_csv('df.to_csv('c:\\Users\\username\\Documents\\myfilename.csv', index=False)', index=False)
The file should be saved in the current working directory.
import os
cwd = os.getcwd()
print(cwd)
In the last line of your code change this:
df.to_csv('C://myfilename.csv', index=False)
Now your file is saved in C drive.
You can change the path as per you wish.
for eg.
df.to_csv('C://Folder//myfilename.csv', index=False)
2.Alternatively ,If you want to locate where your file is stored.
import os
print(os.getcwd())
This gives you the directory where the files are stored.
you can also change your working directory as per your wish.
Just at the beginning of your code
import os
os.chdir("path_to_folder")
In this case then no need of specifying path at the time of saving it to CSV.
You can write a function that saves the file and returns a boolean value as the following:
import os
def save_data(path, file, df):
if (df.to_csv(saving_path + file + '.csv', index = False)):
return True
else:
return False
But you have to provide the right path though.
add this code to the bottom of your file
import os
print(os.getcwd())
That's where your file is
Try writing a simple file with a new script.
F = open(your_path_with_filename, 'w')
F.write("hello")
F.close()
Related
I am trying to open a directory in which there are multiple json files, to then make a data frame of the data in each of them, I try this:
for file in os.listdir('Datasets/'):
json_read = pd.read_json(file)
However, it gives me an error:
ValueError: Expected object or value
As I inspect the type of the files, it says they are class str. When opening a single file in the directory with read_json it does work correctly as the file is recognized as json. I am not quite sure why the files are turned into strings nor how to solve it. Do you have any tips?
Thanks in advance!
import os
import pandas as pd
base_dir = '/path/to/dir'
#Get all files in the directory
data_list = []
for file in os.listdir(base_dir):
#If file is a json, construct it's full path and open it, append all json data to list
if file.endswith('json'):
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
data_list.append(json_data)
print(data_list)
You probably need to build a list of DataFrames. You may not be able to process every file in the given directory so try this:
import pandas as pd
from glob import glob
from os.path import join
BASEDIR = 'Datasets'
dataframes = []
for file in glob(join(BASEDIR, '*.json')):
try:
dataframes.append(pd.read_json(file))
except ValueError:
print(f'Unable to process {file}')
print(f'Successfully constructed {len(dataframes)} dataframes')
import os
import json
#heres some information about get list of file in a folder: <https://www.geeksforgeeks.org/python-list-files-in-a-directory/>
#heres some information about how to open a json file: <https://www.geeksforgeeks.org/read-json-file-using-python/>
path = "./data"
file_list = os.listdir(path) #opens the directory form path and gets name of file
for i in range(len(file_list)): #loop the list with index number
current = open(path+"/"+file_list[i]) #opens the file in order by current index number
data = json.load(current) #loads the data form file
for k in data['01']: #
print(k)
output
main.json :
{'name': 'Nava', 'Surname': 'No need for to that'}
data.json :
{'name': 'Nava', 'watchs': 'Anime'}
heres a link to run code online https://replit.com/#Nava10Y/opening-open-multiple-json-files-in-a-directory
I have 7 vcf files present in 2 directories:
dir
I want to concatenate all files present on both folders and then read them through python.
I am trying this code:
# Import Modules
import os
import pandas as pd
import vcf
# Folder Path
path1 = "C://Users//USER//Desktop//Anas/VCFs_1/"
path2 = "C://Users//USER//Desktop//Anas/VCFs_2/"
#os.chdir(path1)
def read(f1,f2):
reader = vcf.Reader(open(f1,f2))
df = pd.DataFrame([vars(r) for r in reader])
out = df.merge(pd.DataFrame(df.INFO.tolist()),
left_index=True, right_index=True)
return out
# Read text File
def read_text_file(file_path1,file_path2):
with open(file_path1, 'r') as f:
with open(file_path2,'r') as f:
print(read(path1,path2))
# iterate through all file
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".vcf"):
file_path1 = f"{path1}\{file}"
file_path2 = f"{path2}\{file}"
print(file_path1,"\n\n",file_path2)
# call read text file function
#data = read_text_file(path1,path2)
print(read_text_file(path1,path2))
But its giving me permission error. I know when we try to read folders instead files then we get this error. But how can i read files present in folders? Any suggestion?
You may need to run your Python code with Administrator privileges, if you are trying to access another user's files.
I am interested in getting this script to open an excel file, and save it again as a .csv or .txt file. I'm pretty sure the problem with this is the iteration - I haven't coded it correctly to iterate properly over the contents of the folder. I am new to Python, and I managed to get this code to sucessfully print a copy of the contents of the items in the folder by the commented out part. Can someone please advise what needs to be fixed?
My error is: raise XLRDError('Unsupported format, or corrupt file: ' + msg)
from xlrd import open_workbook
import csv
import glob
import os
import openpyxl
cwd= os.getcwd()
print (cwd)
FileList = glob.glob('*.xlsx')
#print(FileList)
for i in FileList:
rb = open_workbook(i)
wb = copy(rb)
wb.save('new_document.csv')
I would just use:
import pandas as pd
import glob
import os
file_list = glob.glob('*.xlsx')
for file in file_list:
filename = os.path.split(file, )[1]
pd.read_excel(file).to_csv(filename.replace('xlsx', 'csv'), index=False)
It appears that your error is related to the excel files, not because of your code.
Check that your files aren't also open in Excel at the same time.
Check that your files aren't encrypted.
Check that your version of xlrd supports the files you are reading
In the above order. Any of the above could have caused your error.
I can read a csv with relative path using below.
import pandas as pd
file_path = './Data Set/part-0000.csv'
df = pd.read_csv(file_path )
but when there are multiple files, I am using glob, File paths are mixed with forward and backward slash. thus unable to read file due to wrong path.
allPaths = glob.glob(path)
file path looks like below for path = "./Data Set/UserIdToUrl/*"
"./Data Set/UserIdToUrl\\part-0000.csv"
file path looks like below for path = ".\\Data Set\\UserIdToUrl\\*"
".\\Data Set\\UserIdToUrl\\part-0000.csv"
If i am using
normalPath = os.path.normpath(path)
normalPath is missing the relative ./ or .\\ like below.
'Data Set\UserIdToUrl\part-00000.csv'
Below could work, what is the best way to do it so that it work in both windows and linux?
".\\Data Set\\UserIdToUrl\\part-0000.csv"
or
"./Data Set/UserIdToUrl/part-0000.csv"
Please ask clarification question, if any. Thanks in advance for comments and answers.
More Info:
I guess the problem is only in windows but not in linux.
Below is shortest program to show issue. consider there are files in path './Data Set/UserIdToUrl/*' and it is correct as i can read file when providing path to file directly to pd.read_csv('./Data Set/UserIdToUrl/filename.csv').
import os
import glob
import pandas as pd
path = "./Data Set/UserIdToUrl/*"
allFiles = glob.glob(path)
np_array_list = []
for file_ in allFiles:
normalPath = os.path.normpath(file_)
print(file_)
print(normalPath)
df = pd.read_csv(file_,index_col=None, header=0)
np_array_list.append(df.as_matrix())
Update2
I just googled glob library. Its definition says 'glob — Unix style pathname pattern expansion'. I guess, I need some utility function that could work in both unix and windows.
you can use abspath
for file in os.listdir(os.path.abspath('./Data Set/')):
...: if file.endswith('.csv'):
...: df = pandas.read_csv(os.path.abspath(file))
Try this:
import pandas as pd
from pathlib import Path
dir_path = 'Data Set'
datas = []
for p in Path(dir_path).rglob('*.csv'):
df = pd.read_csv(p)
datas.append(df)
I want to go into my current folder and edit whatever files is in the folder with the string "test" onto cell 0,0 and then save it as book1.xlsx but my code is giving me an error. Can anyone help?
import xlrd
import os
import glob
from xlutils.copy import copy
fileDir = os.getcwd()
fileLocation = glob.glob("*.xlsx")
x = copy(fileLocation)
x.get_sheet(0).write(0,0,"test")
x.save('book1.xlsx')
glob.glob("*.xlsx") must return a list, so I think the error is on the copy statement.
Try with :
x = copy(fileLocation[0])