Reading the 2nd entry of a .json - python

I am trying to read the density entry of a list of arrays within a .json file. He's a small portion of the file from the beginning:
["time_tag","density","speed","temperature"],["2019-04-14 18:20:00.000","4.88","317.9","11495"],["2019-04-14 18:21:00.000","4.89","318.4","11111"]
This is the code I have thus far:
with open('plasma.json', 'r') as myfile:
data = myfile.read()
obj = json.loads(data)
print(str(obj['density']))
It should print everything under the density column but I'm getting an error saying that the file can't be opened

First, you json file is not correct. If you want to read it with a single call obj = json.load(data), the json file should be:
[["time_tag","density","speed","temperature"],["2019-04-14 18:20:00.000","4.88","317.9","11495"],["2019-04-14 18:21:00.000","4.89","318.4","11111"]]
Notice the extra square bracket, making it a single list of sublists.
This said, being obj a list of lists, there is no way print(str(obj['density'])) will work. You need to loop on the list to print what you want, or convert this to a dataframe before.
Looping directly
idx = obj[0].index('density') #get the index of the density entry
#from the first list in obj, the header
for row in obj[1:]: #looping on all sublists except the first one
print(row[idx]) #printing
Using a dataframe (pandas)
import pandas as pd
df = pd.DataFrame(obj[1:], columns=obj[0]) #converting to a dataframe, using
#first row as column header
print(df['density'])

Are you sure your data is a valid json and not a csv?
As the snippet of data provided above matches that of a csv file and not a json.
You will be able to read the density key of the csv with:
import csv
input_file = csv.DictReader(open("plasma.csv"))
for row in input_file:
print(row['density'])
Data formatted as csv
["time_tag","density","speed","temperature"]
["2019-04-14 18:20:00.000","4.88","317.9","11495"]
["2019-04-14 18:21:00.000","4.89","318.4","11111"]
Result
4.88
4.89

Related

pandas reading csv with one row spanning multiple lines

My csv starts out like this:
,index,spotify_id,artist_name,track_name,album_name,duration_ms,lyrics,lyrics_bert_embeddings
0,0,5Jk0vfT81ltt2rYyrWDzZ5,Hundred Waters,Xtalk - Kodak to Graph Remix,The Moon Rang Like a Bell,285327,not fetched,"[ 0.00722605 -0.23726921 0.15163635 -0.28774077 0.07081255 0.26606813
each row ends like this in a new line:
0.03439684 -0.29289168 0.13590978 0.2332756 -0.24305075 0.2034984 ]"
These values are from a big numpy array encoded with np.array2string() and span multiple lines in the csv.
When using pd.read_csv it throws an "ParserError: Error tokenizing data. C error: EOF inside string starting at row 90607". When using the parameter engine="python" it throws an "ParserError: unexpected end of data". When using the seperator sep='\t+' it just puts each line in a new row in the dataframe. When using csv.reader by using with open(file_path) and then iterating through each line, the same happens as with the sep='\t+'.
Is there a way to automatically append each row to the original row it belongs to or do I have to preprocess this by hand?
I tried to use your csv data to check it. I pasted the code along with the answer below,
import pandas as pd
import csv
data_path = 'dt.csv'
df = pd.read_csv(data_path, header = None, quoting=csv.QUOTE_NONE, encoding='utf-8')
dt_json = pd.DataFrame.to_json(df)
print(dt_json)
For an example, I just tried to change the data format from CSV to JSON using pandas dataframe.
{"0":{"0":null,"1":0.0},
"1":{"0":"index","1":"0"},
"2":{"0":"spotify_id","1":"5Jk0vfT81ltt2rYyrWDzZ5"},
"3":{"0":"artist_name","1":"Hundred Waters"},
"4":{"0":"track_name","1":"Xtalk - Kodak to Graph Remix"},
"5":{"0":"album_name","1":"The Moon Rang Like a Bell"},
"6":{"0":"duration_ms","1":"285327"},
"7":{"0":"lyrics","1":"not fetched"},
"8":{"0":"lyrics_bert_embeddings","1":"[ 0.00722605 -0.23726921 0.15163635 -0.28774077 0.07081255 0.26606813\r\n 0.03439684 -0.29289168 0.13590978 0.2332756 -0.24305075 0.2034984 ]"}}
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html?highlight=read_csv

how to edit a csv in python and add one row after the 2nd row that will have the same values in all columns except 1

I'm new in Python language and i'm facing a small challenge in which i havent been able to figure it out so far.
I receive a csv file with around 30-40 columns and 5-50 rows with various details in each cell. The 1st row of the csv has the title for each column and by the 2nd row i have item values.
What i want to do is to create a python script which will read the csv file and every time to do the following:
Add a row after the actual 1st item row, (literally after the 2nd row, cause the 1st row is titles), and in that new 3rd row to contain the same information like the above one with one difference only. in the column "item_subtotal" i want to add the value from the column "discount total".
all the bellow rows should remain as they are, and save this modified csv as a new file with the word "edited" added in the file name.
I could really use some help because so far i've only managed to open the csv file with a python script im developing, but im not able so far to add the contents of the above row to that newly created row and replace that specific value.
Looking forward any help.
Thank you
Here Im attaching the CSV with some values changed for privacy reasons.
order_id,order_number,date,status,shipping_total,shipping_tax_total,fee_total,fee_tax_total,tax_total,discount_total,order_total,refunded_total,order_currency,payment_method,shipping_method,customer_id,billing_first_name,billing_last_name,billing_company,billing_email,billing_phone,billing_address_1,billing_address_2,billing_postcode,billing_city,billing_state,billing_country,shipping_first_name,shipping_last_name,shipping_address_1,shipping_address_2,shipping_postcode,shipping_city,shipping_state,shipping_country,shipping_company,customer_note,item_id,item_product_id,item_name,item_sku,item_quantity,item_subtotal,item_subtotal_tax,item_total,item_total_tax,item_refunded,item_refunded_qty,item_meta,shipping_items,fee_items,tax_items,coupon_items,order_notes,download_permissions_granted,admin_custom_order_field:customer_type_5
15001_TEST_2,,"2017-10-09 18:53:12",processing,0,0.00,0.00,0.00,5.36,7.06,33.60,0.00,EUR,PayoneCw_PayPal,"0,00",0,name,surname,,name.surname#gmail.com,0123456789,"address 1",,41541_TEST,location,,DE,name,surname,address,01245212,14521,location,,DE,,,1328,302,"product title",103,1,35.29,6.71,28.24,5.36,0.00,0,,"id:1329|method_id:free_shipping:3|method_title:0,00|total:0.00",,id:1330|rate_id:1|code:DE-MWST-1|title:MwSt|total:5.36|compound:,"id:1331|code:#getgreengent|amount:7.06|description:Launchcoupon for friends","text string",1,
You can also use pandas to manipulate the data from the csv like this:
import pandas
import copy
Read the csv file into a pandas dataframe:
df = pandas.read_csv(filename)
Make a deepcopy of the first row of data and add the discount total to the item subtotal:
new_row = copy.deepcopy(df.loc[1])
new_row['item_subtotal'] += new_row['discount total']
Concatenate the first 2 rows with the new row and then everything after that:
df = pandas.concat([df.loc[:1], new_row, df.loc[2:]], ignore_index=True)
Change the filename and write the out the new csv file:
filename = filename.strip('.csv') + 'edited.csv'
df.to_csv(filename)
I hope this helps! Pandas is great for cleanly handling massive amounts of data, but may be overkill for what you are trying to do. Then again, maybe not. It would help to see an example data file.
The first step is to turn that .csv into something that is a little easier to work with. Fortunately, python has the 'csv' module which makes it easy to turn your .csv file into a much nicer list of lists. The below will give you a way to both turn your .csv into a list of lists and turn the modified data back into a .csv file.
import csv
import copy
def csv2list(ifile):
"""
ifile = the path of the csv to be converted into a list of lists
"""
f = open(ifile,'rb')
olist=[]
c = csv.reader(f, dialect='excel')
for line in c:
olist.append(line) #and update the outer array
f.close
return olist
#------------------------------------------------------------------------------
def list2csv(ilist,ofile):
"""
ilist = the list of lists to be converted
ofile = the output path for your csv file
"""
with open(ofile, 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
[csvwriter.writerow(x) for x in ilist]
Now, you can simply copy list[1] and change the appropriate element to reflect your summed value using:
listTemp = copy.deepcopy(ilist[1])
listTemp[n] = listTemp[n] + listTemp[n-x]
ilist.insert(2,listTemp)
As for how to change the file name, just use:
import os
newFileName = os.path.splitext(oldFileName)[0] + "edited" + os.path.splitext(oldFileName)[1]
Hopefully this will help you out!

Splitting this CSV file into a list

So I want to read in a csv file in Python3 and split it into a list. Where each index[0,1, ..., onwards] relates to each value separated by comma.
This is my csv file:
2017-04-1,14.9,30.1,0,8.2,10.8,NE,33,11:55,20.3,51,0,E,11,1023.9,29.5,25,0,ENE,7,1020.3
,2017-04-2,17.4,31.6,0,8.0,10.9,NE,35,08:56,23.5,34,0,NE,17,1021.4,30.7,20,0,SE,9,1018.6
2017-04-3,12.1,31.8,0,6.8,10.8,SSW,33,15:14,23.1,39,0,SSE,6,1022.7,29.3,34,0,SSW,19,1020.8
,2017-04-4,15.4,30.4,0,7.0,10.7,E,28,03:01,19.8,64,0,ESE,11,1024.9,29.7,29,0,S,9,1020.3
,2017-04-5,12.3,30.4,0,5.2,10.6,S,19,13:10,21.7,55,0,NE,6,1018.2,27.7,37,0,WSW,11,1013.5
,2017-04-6,13.7,24.4,0,5.2,8.1,SW,43,16:16,17.1,94,7,NE,2,1013.1,22.8,63,3,SSW,20,1011.7
,2017-04-7,14.8,22.3,0,5.4,8.9,SSE,35,06:26,16.4,56,5,SSE,17,1023.6,21.3,33,3,SSE,4,1021.6
,2017-04-8,8.7,23.6,0,5.0,10.5,SW,33,15:41,16.0,58,0,SE,7,1027.6,22.1,44,0,SW,17,1024.5
,2017-04-9,11.1,27.4,0,4.8,10.4,ESE,24,10:30,18.1,56,0,ENE,13,1027.4,26.8,26,0,NE,9,1023.1
,2017-04-10,10.0,30.1,0,5.6,10.4,SSW,24,16:38,19.3,51,4,E,9,1022.7,30.0,20,1,E,6,1018.4
,2017-04-11,13.1,28.1,0,6.6,10.5,SW,28,15:02,21.8,38,0,NE,9,1016.6,26.6,35,0,SW,13,1015.7
,2017-04-12,10.6,23.8,0,5.2,9.7,SW,35,16:19,17.4,69,6,ENE,9,1021.5,23.0,52,1,SW,15,1019.9
,2017-04-13,12.9,26.8,0,4.2,10.4,SSW,31,15:56,19.9,64,1,ESE,11,1024.3,25.5,49,1,SW,15,1020.1
,2017-04-14,12.8,29.0,0,5.8,6.2,SW,22,15:43,18.1,73,4,SSE,2,1019.3,27.6,42,5,SSW,11,1015.4
,2017-04-15,14.8,29.3,0,4.0,7.3,SSE,31,22:03,18.5,73,6,S,11,1015.7,28.2,38,7,SE,9,1011.7
,2017-04-16,17.2,25.1,0,5.4,7.0,SSE,35,00:43,18.8,66,4,ESE,11,1014.6,24.4,54,5,SW,11,1011.6
,2017-04-17,15.4,21.8,0,5.0,2.5,SSW,24,07:56,17.8,74,5,S,13,1015.3,21.4,59,8,SSW,11,1013.2
,2017-04-18,15.3,25.0,0,4.0,8.0,SSW,31,19:02,19.7,72,6,SSE,9,1013.0,22.8,63,1,SW,15,1010.4
Currently when I read it in, each element is being split at the end of the line. So if I accessed index[0] this would be the output.
2017-04-1,14.9,30.1,0,8.2,10.8,NE,33,11:55,20.3,51,0,E,11,1023.9,29.5,25,0,ENE,7,1020.3
What I need to understand is how to split the csv so that if I access index[0], I will be given 2017-04-1. And index[1] would give the next value after the comma.
Here is my code at the moment.
import matplotlib.pyplot as plt
# Opening and reading files
weatherdata = open('weather.csv', encoding='latin1')
data = weatherdata.readlines()
Encoding is required to be in latin one because it needs to be able to handle degree symbols.
Thanks for the guidance.
You have already read all lines into data:
weatherdata = open('weather.csv')
data = weatherdata.readlines()
data
data will be a string list:
['2017-04-1,14.9,30.1,0,8.2,10.8,NE,33,11:55,20.3,51,0,E,11,1023.9,29.5,25,0,ENE,7,1020.3\n',
'2017-04-2,17.4,31.6,0,8.0,10.9,NE,35,08:56,23.5,34,0,NE,17,1021.4,30.7,20,0,SE,9,1018.6 \n',
'2017-04-3,12.1,31.8,0,6.8,10.8,SSW,33,15:14,23.1,39,0,SSE,6,1022.7,29.3,34,0,SSW,19,1020.8\n',
'2017-04-4,15.4,30.4,0,7.0,10.7,E,28,03:01,19.8,64,0,ESE,11,1024.9,29.7,29,0,S,9,1020.3\n',
'2017-04-5,12.3,30.4,0,5.2,10.6,S,19,13:10,21.7,55,0,NE,6,1018.2,27.7,37,0,WSW,11,1013.5\n',
'2017-04-6,13.7,24.4,0,5.2,8.1,SW,43,16:16,17.1,94,7,NE,2,1013.1,22.8,63,3,SSW,20,1011.7\n',
'2017-04-7,14.8,22.3,0,5.4,8.9,SSE,35,06:26,16.4,56,5,SSE,17,1023.6,21.3,33,3,SSE,4,1021.6\n',
'2017-04-8,8.7,23.6,0,5.0,10.5,SW,33,15:41,16.0,58,0,SE,7,1027.6,22.1,44,0,SW,17,1024.5\n',
'2017-04-9,11.1,27.4,0,4.8,10.4,ESE,24,10:30,18.1,56,0,ENE,13,1027.4,26.8,26,0,NE,9,1023.1\n',
'2017-04-10,10.0,30.1,0,5.6,10.4,SSW,24,16:38,19.3,51,4,E,9,1022.7,30.0,20,1,E,6,1018.4\n',
'2017-04-11,13.1,28.1,0,6.6,10.5,SW,28,15:02,21.8,38,0,NE,9,1016.6,26.6,35,0,SW,13,1015.7\n',
'2017-04-12,10.6,23.8,0,5.2,9.7,SW,35,16:19,17.4,69,6,ENE,9,1021.5,23.0,52,1,SW,15,1019.9\n',
'2017-04-13,12.9,26.8,0,4.2,10.4,SSW,31,15:56,19.9,64,1,ESE,11,1024.3,25.5,49,1,SW,15,1020.1\n',
'2017-04-14,12.8,29.0,0,5.8,6.2,SW,22,15:43,18.1,73,4,SSE,2,1019.3,27.6,42,5,SSW,11,1015.4\n',
'2017-04-15,14.8,29.3,0,4.0,7.3,SSE,31,22:03,18.5,73,6,S,11,1015.7,28.2,38,7,SE,9,1011.7\n',
'2017-04-16,17.2,25.1,0,5.4,7.0,SSE,35,00:43,18.8,66,4,ESE,11,1014.6,24.4,54,5,SW,11,1011.6\n',
'2017-04-17,15.4,21.8,0,5.0,2.5,SSW,24,07:56,17.8,74,5,S,13,1015.3,21.4,59,8,SSW,11,1013.2\n',
'2017-04-18,15.3,25.0,0,4.0,8.0,SSW,31,19:02,19.7,72,6,SSE,9,1013.0,22.8,63,1,SW,15,1010.4']
Then use data[0].split(',')[0], it will give you:
'2017-04-1'
and data[0].split(',')[1], will be:
'14.9'
and so on.
Simply read and then split:
weatherdata = open('weather.csv')
data = [line.split(',') for line in weatherdata.read().splitlines()]
Or you can use pandas and it does it for you,Pandas is very useful to read dataset and manipulate them,it reads the data all at once and you can get different columns after reading them
import pandas as pd
df = pd.read_csv('weather.csv')
df.column[0]# this will get the first column

Python: How to append certain information from a CSV file onto an array

How is it possible to append all the values from the first column of a CSV file onto an array.
Given the csv
Anas,5/1/2015,2.875
Jack,5/1/2015,33.925
Conk,5/1/2015,136.85
The array would be
['Anas', 'Jack', 'Conk']
Also would it be the same code if I needed a separate array with the second column's values?
This is what i have tried
for line in file:
Data=line.split(",")
FromFile=Data[0]
questions=FromFile.append(line)
with open("example.csv") as f:
allData = [line.strip().split(",") for line in f]
firstColumn = [data[0] for data in allData]
secondColumn = [data[1] for data in allData]
thirdColumn = [data[2] for data in allData]

How to extract specific data from a downloaded csv file and transpose into a new csv file?

I'm working with an online survey application that allows me to download survey results into a csv file. However, the format of the downloaded csv puts each survey question and answer in a new column, whereas, I need the csv file to be formatted with each survey question and answer on a new row. There is also a lot of data in the downloaded csv file that I want to ignore completely.
How can I parse out the desired rows and columns of the downloaded csv file and write them to a new csv file in a specific format?
For example, I download data and it looks like this:
V1,V2,V3,Q1,Q2,Q3,Q4....
null,null,null,item,item,item,item....
0,0,0,4,5,4,5....
0,0,0,2,3,2,3....
The first row contains the 'keys' that I will need except V1-V3 must be excluded. Row 2 must be excluded altogether. Row 3 is my first subject so I need the values 4,5,4,5 to be paired with the keys Q1,Q2,Q3,Q4. And row 4 is a new subject which needs to be excluded as well since my program only handles one subject at a time.
The csv file that I need to create in order for my script to function properly looks like this:
Q1,4
Q2,5
Q3,4
Q4,5
I've tried using this izip to pivot the data, but I don't know how to specifically select the rows and columns I need:
from itertools import izip
a = izip(*csv.reader(open("CDI.csv", "rb")))
csv.writer(open("CDI_test.csv", "wb")).writerows(a)
Here is a simple python script that should do the job for you. It takes in arguments from the command line that designate the number of entries you want to skip at the beginning of the line,the input you want to skip at the end of the line, the input file and the output file. So for example, the command would look like
python question.py 3:7 input.txt output.txt
You can also substitute sys.argv[1] for 3, sys.argv[2] for "input.txt" and so on within the script if you don't want to state the arguments every time.
Text file version:
import sys
inputFile = open(sys.argv[2],"r")
outputFile = open(sys.argv[3], "w")
leadingRemoved=int(sys.argv[1])
#strips extra whitespace from each line in file then splits by ","
lines = [x.strip().split(",") for x in inputFile.readlines()]
#zips all but the first x number of elements in the first and third row
zipped = zip(lines[0][leadingRemoved:],lines[2][leadingRemoved:])
for tuples in zipped:
#writes the question/ number pair to a file.
outputFile.write(",".join(tuples))
inputFile.close()
outputFile.close()
#input from command line: python questions.py leadingRemoved pathToInput pathToOutput
CSV file version:
import sys
import csv
with open(sys.argv[2],"rb") as inputFile:
#removes null bytes
reader = csv.reader((line.replace('\0','') for line in inputFile),delimiter="\t")
outputFile = open(sys.argv[3], "wb")
leadingRemoved,endingremoved=[int(x) for x in sys.argv[1].split(":")]
#creates a 2d array of all the elements for each row
lines = [x for x in reader]
print lines
#zips all but the first x number of elements in the first and third row
zipped = zip(lines[0][leadingRemoved:endingremoved],lines[2][leadingRemoved:endingremoved])
writer = csv.writer(outputFile)
writer.writerows(zipped)
print zipped
outputFile.close()
Something similar I did using multiple values but could be changed to single values.
#!/usr/bin/env python
import csv
def dict_from_csv(filename):
'''
(file)->list of dictionaries
Function to read a csv file and format it to a list of dictionaries.
The headers are the keys with all other data becoming values
The format of the csv file and the headers included need to be know to extract the email addresses
'''
#open the file and read it using csv.reader()
#read the file. for each row that has content add it to list mf
#the keys for our user dict are the first content line of the file mf[0]
#the values to our user dict are the other lines in the file mf[1:]
mf = []
with open(filename, 'r') as f:
my_file = csv.reader(f)
for row in my_file:
if any(row):
mf.append(row)
file_keys = mf[0]
file_values= mf[1:] #choose row/rows you want
#Combine the two lists, turning into a list of dictionaries, using the keys list as the key and the people list as the values
my_list = []
for value in file_values:
my_list.append(dict(zip(file_keys, file_values)))
#return the list of dictionaries
return my_list
I suggest you read up on pandas for this type of activity:
http://pandas.pydata.org/pandas-docs/stable/io.html
import pandas
input_dataframe = pandas.read_csv("input.csv")
transposed_df = input_dataframe.transpose()
# delete rows and edit data easily using pandas dataframe
# this is a good library to get some experience working with
transposed_df.to_csv("output.csv")

Categories

Resources