I have 100s of similar json files and I want to save the contents of these json files into one single csv file. This is the code I wrote for the same. But it's not doing what I want to do.
Desired output is csv file: https://drive.google.com/file/d/1cgwdbnvETLf6nO1tNnH0F_-fLxUOdT7L/view?usp=sharing
Please tell me what can be done to get the above output? Thanks
JSON file format: https://drive.google.com/file/d/1-OZYrfUtDJmwcRUjpBgn59zJt5MjtmWt/view?usp=sharing
list_=['politifact13565', 'politifact13601']
for i in list_:
with open("{}/news content.json".format(i)) as json_input:
json_data = json.load(json_input, strict=False)
mydict = {}
mydict["url"] = json_data["url"]
mydict["text"] = json_data["text"]
mydict["images"]=json_data["images"]
mydict["title"]=json_data["title"]
df = pd.DataFrame.from_dict(mydict, orient='index')
df = df.T
df.append(df, ignore_index=True)
df.to_csv('out.csv')
print(df)
SOLVED:
list_=['politifact13565', 'politifact13601']
for i in list_:
with open("{}/news content.json".format(i)) as json_input:
json_data = json.load(json_input, strict=False)
mydict = {}
mydict["url"] = json_data["url"]
mydict["text"] = json_data["text"]
mydict["images"]=json_data["images"]
mydict["title"]=json_data["title"]
df = pd.DataFrame.from_dict(mydict, orient='index')
df = df.T
df.append(df, ignore_index=True)
df.to_csv('out.csv', mode='a', header=False)
print(df)
Your solution is quite close to the desired output, you just need to transpose the imported json:
import glob
directory = "your/path/to/jsons/*.json"
df = pd.concat([pd.read_json(f, orient="index").T for f in glob.glob(directory)], ignore_index=True)
Aferwards you can save the df using df.to_csv("tweets.csv")
Hopefully that helps you!
list_=['politifact13565', 'politifact13601']
for i in list_:
with open("{}/news content.json".format(i)) as json_input:
json_data = json.load(json_input, strict=False)
mydict = {}
mydict["url"] = json_data["url"]
mydict["text"] = json_data["text"]
mydict["images"]=json_data["images"]
mydict["title"]=json_data["title"]
df = pd.DataFrame.from_dict(mydict, orient='index')
df = df.T
df.append(df, ignore_index=True)
df.to_csv('out.csv', mode='a', header=False)
print(df)
Related
In my code, the csv-writer is writing some un-realistic values to the CSV file.
My goal is to read all csv files in one directory and put filter on any specific column and write the filtered dataframe to a consolidated csv file.
I am able to get the outputs as required in the VS console, but I am not able to write them into a csv file.
Kindly help to understand what I am doing incorrect.
This is my sample input:
And this is the output I am getting:
Code:
import pandas as pd
import os
import glob
import csv
from pandas.errors import EmptyDataError
# use glob to get all the csv files
# in the folder
path = os.getcwd()
#print(path)
csv_files = glob.glob(os.path.join(path, "*.csv"))
print(csv_files)
col_name = input("Enter the column name to filter: ")
print(col_name)
State_Input = input("Enter the {} ".format(col_name) )
print(State_Input)
df_empty = pd.DataFrame()
for i in csv_files:
try:
df = pd.read_csv(i)
#print(df.head(5))
State_Filter = df["State"] == State_Input
print(df[State_Filter])
df_child = (df[State_Filter])
with open('D:\\PythonProjects\\File-Split-Script\\temp\\output\\csv_fil111.csv', 'w') as csvfile:
data_writer = csv.writer(csvfile, dialect = 'excel')
for row in df_child:
data_writer.writerows(row)
except EmptyDataError as e:
print('There was an error in your input, please try again :{0}'.format(e))
Use pd.to_csv to write your file at once. Prefer store your filtered dataframes into a list then concatenate all of them to a new dataframe:
import pandas as pd
import pathlib
data_dir = pathlib.Path.cwd()
# Your input here
state = input('Enter the state: ') # Gujarat, Bihar, ...
print(state)
data = []
for csvfile in data_dir.glob('*.csv'):
df = pd.read_csv(csvfile)
df = df.loc[df['State'] == state]]
data.append(df)
df = pd.concat(data, axis=1, ignore_index=True)
df.to_csv('output.csv', axis=0)
I'm trying to write my read/write function to a csv, but it can't return any value.
I'm reading from a CSV, replacing the " ; " in the second column with " " and performing and saving the csv already handled.
But for some reason it doesn't save my csv, is my function wrong?
I'm starting out in the Python world, and I'm having a bit of trouble.
import pandas as pd
header_col = ['col0','col1','col2','col3','col4','col5','col6','col7','col8','col9']
df = pd.read_csv('myfile_<date>.csv', encoding="ISO-8859-1", sep=';', names=header_col, header=None)
def file_load(df):
df['col1'] = df['col1'].str.replace(';',' ')
df.drop(columns=['col8'], inplace=True)
df.drop(columns=['col9'], inplace=True)
return df
def save_file(dataframe):
df = dataframe
df.to_csv('myfile_<date>_treat.csv' ,sep=';', encoding='utf-8', index=False)
import pandas as pd
def file_load(df):
df['col1'] = str(df['col1']).replace(';',' ')
df.drop(columns=['col8'], inplace=True)
df.drop(columns=['col9'], inplace=True)
return df
def save_file(dataframe):
df = dataframe
df.to_csv('myfile_<date>_treat.csv' ,sep=',', encoding='utf-8',
index=False)
def main():
header_col=
['col0','col1','col2','col3','col4','col5','col6','col7','col8','col9']
df = pd.read_csv('myfile_<date>.csv', encoding="ISO-8859-1", sep=';',
names=header_col, header=None)
df1 = file_load(df)
save_file(df1)
if __name__ == '__main__':
main()
I am trying to parse a list of .txt files within a zip folder but it's only parsing one file from that list
Code:
def custom_parse(self, response):
self.logger.info(response.url)
links = response.xpath("//a[contains(#href, '.zip')]/#href").getall()
for link in list(set(links)):
print(link)
local_path = self.download_file("https://www.sec.gov" + link)
zip_file = zipfile.ZipFile(local_path)
zip_csv_files = [file_name for file_name in zip_file.namelist() if file_name.endswith(".txt") and "pre" not in file_name]
zip_csv_file = zip_csv_files[0]
with zip_file.open(zip_csv_file, "r") as zip:
# df = pd.read_csv(BytesIO(zip.read()), dtype=object)
df = pd.read_csv(zip, dtype=object, header=None, sep='delimiter')
df = self.standardized(df)
for k, row in df.iterrows():
yield dict(row)
def standardized(self, df):
# df.columns = [col.lower().strip().replace(" ", "_") for col in df.columns]
df = df.fillna('')
return df
I am going to assume it's due to zip_csv_file = zip_csv_files[0] but I am unsure how I can modify my current code to parse all the .txt files in a given zip folder.
You already pull out all the .txt files with your list comprehension, so just read those in a loop and concatenate them. This is untested, but should be close
replace the appropriate section of your code with this:
UPDATE:
zip_file = zipfile.ZipFile(local_path)
text_files = zip_file.infolist()
df_list =[]
for file_name in text_files:
if file_name.filename.endswith(".txt") and "pre" not in file_name.filename:
df_list.append(pd.read_csv(zip_file(open(file_name.filename)), dtype=object, header=None, sep='delimiter'))
df = pd.concat(df_list)
df = self.standardized(df)
Hi I am working on csv file and I have a data I want to append these data to the csv file. But firstly I want to check if the csv file exists if TRUE then just open the csv file and append the data to csv file and save it, if NOT just create a DataFrame and with these data and save it.
Note: I have a csv file in my I want to append the sample of data to my csv file
thanks in advance.
here is my trying.
#sample of data
ID = 5
img_Latitude = 38786454
img_Longitude = 1118468
meta_lat = 45778
meta_long = 886556
#create a function
def create_csv( ID, img_Latitude, img_Longitude,meta_lat, meta_long):
#check if the file is exists, if True
if os.path.isfile('C:/My/Path/compare_coordinates.csv'):
#read the csv file
df = pd.read_csv('compare_coordinates.csv')
#make pd.series
data = pd.Series([ID, img_Latitude, img_Longitude, meta_lat, meta_long],
index=['ID', 'img_Latitude', 'img_Longitude', 'meta_lat','meta_long'])
#append the data to df
df.append(data, ignore_index=True)
else:
data = [ID, img_Latitude, img_Longitude, meta_lat, meta_long]
columns = ['ID', 'img_Latitude', 'img_Longitude', 'meta_lat','meta_long']
df = pd.DataFrame(data, columns).T
df.to_csv('C:/My/Path/compare_coordinates.csv', index=False)
The line df.append(data, ignore_index = True) needs to be:
df = df.append(data, ignore_index = True)
This is because DatFrame.append returns a new DF with the appended lines, it does not append in-place:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html
To get the values that needed must be saved in variable so for the line
df.append(data, ignore_index = True) to be edited to df = df.append(data, ignore_index = True) and for the getting value of file exists or not as following codes:
def create_csv( ID, img_Latitude, img_Longitude,meta_lat, meta_long):
Path = os.path.isfile('My/path/compare_coordinates1.csv')
if Path==True:
df = pd.read_csv('compare_coordinates1.csv')
data = pd.Series([ID, img_Latitude, img_Longitude, meta_lat, meta_long],
index=['ID', 'img_Latitude', 'img_Longitude', 'meta_lat','meta_long'])
df = df.append(data, ignore_index=True)
else:
data = [ID, img_Latitude, img_Longitude, meta_lat, meta_long]
columns = ['ID', 'img_Latitude', 'img_Longitude', 'meta_lat','meta_long']
df = pd.DataFrame(data, columns).T
df.to_csv('My/path/compare_coordinates1.csv', index=False)
I am reading in multiple files and adding them to a list:
import pandas as pd
import glob
import ntpath
path = r'C:\Folder1\Folder2\Folder3\Folder3'
all_files = glob.glob(path + "/*.dat") #.dat files only
mylist = []
for filename in all_files:
name = ntpath.basename(filename) # for renaming the DF
name = name.replace('.dat','') # remove extension
try:
name = pd.read_csv(filename, sep='\t', engine='python')
mylist.append(name)
except:
print(f'File not read:{filename}')
Now I want to just display the DFs in this list.
This is what I've tried:
for thing in mylist:
print(thing.name)
AttributeError: 'DataFrame' object has no attribute 'name'
And
for item in mylist:
print(item)
But that just prints the whole DF content.
name = pd.read_csv(filename, sep='\t', engine='python')
mylist.append(name)
Here, name is a dataframe, not the name of your dataframe.
To add name to your dataframe, use
df = pd.read_csv(filename, sep='\t', engine='python')
df_name="Sample name"
mylist.append({'data':df, 'name':df_name})
>>> print(thing['name'])
Sample name
You can use a dictionary for that.
Writing to dict:
import pandas as pd
import glob
import ntpath
path = r'C:\Folder1\Folder2\Folder3\Folder3'
all_files = glob.glob(path + "/*.dat") #.dat files only
mydict = {}
for filename in all_files:
name = ntpath.basename(filename) # for renaming the DF
name = name.replace('.dat','') # remove extension
try:
mydict[name] = pd.read_csv(filename, sep='\t', engine='python')
except:
print(f'File not read:{filename}')
To read a df (say filename1) again:
df = mydict['filename1']
or to iterate over all df's in mydict:
for df in mydict.values():
# use df...
or:
for key in mydict:
print(key)
df = mydict[key]
# use df...