how can I extract columns from my json dataset? - python

my jsons looks like {"metadata"} ,but it always return me as [] empty , but I want to properly extract the keys
with gzip.open(gzip_file) as file:
parser = ijson.parse(file)
objects = ijson.items(parser, 'meta.view.columns.item')
columns = list(objects)

Related

How can I store a huge list into a csv file?

For example:
header = ['a','b','c']
data = ['1','2','3','4','5','6',...,'100']
How can I combine each three of the data to match the header?
Expected output:
Here's how you can do this.
import csv
header = ['a','b','c']
data = ['1','2','3','4','5','6','7','8','9']
file = csv.writer(open(f"testing.csv", "a",newline=''))
file.writerow(header)
for i in range(0,len(data),3):
new_list = data[i:i+3]
file.writerow(new_list)
Here's the screenshot of 'testing.csv':

Extract Data from a Text File - Data is in a Strange Format

I have a text file that is comprised of integers and their corresponding countries. How can I import/read the data and load it into a Python data structure, e.g. two lists, a dictionary, data frame?
You can do something like this:
codes = []
countries = []
with open('blablabla', 'r') as f:
for line in f.readlines():
codes.append(line.split('=')[0].strip())
countries.append(line.split('=')[1].strip()

How to name dataframes with a for loop?

I want to read several files json files and write them to a dataframe with a for-loop.
review_categories = ["beauty", "pet"]
for i in review_categories:
filename = "D:\\Library\\reviews_{}.json".format(i)
output = pd.read_json(path_or_buf=filename, lines=True)
return output
The problem is I want each review category to have its own variable, like a dataframe called "beauty_reviews", and another called "pet_reviews", containing the data read from reviews_beauty.json and reviews_pet.json respectively.
I think it is easy to handle the dataframes in a dictionary. Try the codes below:
review_categories = ["beauty", "pet"]
reviews = {}
for review in review_categories:
df_name = review + '_reviews' # the name for the dataframe
filename = "D:\\Library\\reviews_{}.json".format(review)
reviews[df_name] = pd.read_json(path_or_buf=filename, lines=True)
In reviews, you will have a key with the respective dataframe to store the data. If you want to retrieve the data, just call:
reviews["beauty_reviews"]
Hope it helps.
You can first pack the files into a list
reviews = []
review_categories = ["beauty", "pet"]
for i in review_categories:
filename = "D:\\Library\\reviews_{}.json".format(i)
reviews.append(pd.read_json(path_or_buf=filename, lines=True))
and then unpack your results into the variable names you wanted:
beauty_reviews, pet_reviews = reviews

Read a csv file with multiple data sections into an addressable structure

I have made a csv file which looks like this:
Now, in my Python file I want it to take the data from food field place column, which is only:
a
b
c
d
e
Then I want it to take from drink field only the data from taste and so on.
My question is: How do I make a database that will have like "fields" (IE: food/drinks) and inside each field address the specific cells I described?
This question is pretty wide open, so I will show two possible ways to parse this data into a structure that can be accessed in the manner you described.
Solution #1
This code uses a bit more advanced python and libraries. It uses a generator around a csv reader to allow the multiple sections of the data to be read efficiently. The data is then placed into a pandas.DataFrame per section. And each data frame is accessible in a dict.
The data can be accessed like:
ratings['food']['taste']
This will give a pandas.Series. A regular python list can be had with:
list(ratings['food']['taste'])
Code to read data to Pandas Dataframe using a generator:
def csv_record_reader(csv_reader):
""" Read a csv reader iterator until a blank line is found. """
prev_row_blank = True
for row in csv_reader:
row_blank = (row[0] == '')
if not row_blank:
yield row
prev_row_blank = False
elif not prev_row_blank:
return
ratings = {}
ratings_reader = csv.reader(my_csv_data)
while True:
category_row = list(csv_record_reader(ratings_reader))
if len(category_row) == 0:
break
category = category_row[0][0]
# get the generator for the data section
data_generator = csv_record_reader(ratings_reader)
# first row of data is the column names
columns = next(data_generator)
# use the rest of the data to build a data frame
ratings[category] = pd.DataFrame(data_generator, columns=columns)
Solution #2
Here is a solution to read the data to a dict. The data can be accessed with something like:
ratings['food']['taste']
Code to read CSV to dict:
from collections import namedtuple
ratings_reader = csv.reader(my_csv_data)
ratings = {}
need_category = True
need_header = True
for row in ratings_reader:
if row[0] == '':
if not (need_category or need_header):
# this is the end of a data set
need_category = True
need_header = True
elif need_category:
# read the category (food, drink, ...)
category = ratings[row[0]] = dict(rows=[])
need_category = False
elif need_header:
# read the header (place, taste, ...)
for key in row:
category[key] = []
DataEnum = namedtuple('DataEnum', row)
need_header = False
else:
# read a row of data
row_data = DataEnum(*row)
category['rows'].append(row_data)
for k, v in row_data._asdict().items():
category[k].append(v)
Test Data:
my_csv_data = [x.strip() for x in """
food,,
,,
place,taste,day
a,good,1
b,good,2
c,awesome,3
d,nice,4
e,ok,5
,,
,,
,,
drink,,
,,
place,taste,day
a,good,1
b,good,2
c,awesome,3
d,nice,4
e,ok,5
""".split('\n')[1:-1]]
To read the data from a file:
with open('ratings_file.csv', 'rb') as ratings_file:
ratings_reader = csv.reader(ratings_file)

Unicode text in pandas dataframe cannot parse to JSON

I'm trying write python code to build a nested JSON file from a flat table in a pandas data frame. I created a dictionary of dictionaries from the pandas dataframe. When I try to export the dict to JSON, I get an error that the unicode text is not a string. How can I convert dictionaries with unicode strings to JSON?
My current code is:
data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
columnList = tuple(data[0:])
for index, row in data.iterrows():
dataRow = tuple(row)
rowDict = dict(zip(dataRow[2:],columnList[2:]))
memberId = str(tuple(row[1]))
teamName = str(tuple(row[0]))
memberDict1 = {memberId[1:2]:rowDict}
memberDict2 = {teamName:memberDict1}
This produces a dict of dicts like where each row looks like this:
'(1L,)': {'0': {(u'Doe',): (u'lastname',), (u'John',): (u'firstname',), (u'none',): (u'mobile',), (u'916-555-1234',): (u'phone',), (u'john.doe#wildlife.net',): (u'email',), (u'Anon',): (u'orgname',)}}}
But when I try to dump to JSON, the unicode text can't be parsed as strings, so I get this error:
TypeError: key (u'teamname',) is not a string
How can I convert my nested dicts to JSON without invoking the error?

Categories

Resources