how to read JSON data from database table into python pandas? - python

Instead of loading data from a JSON file, I need to retrieve JSON data from database and apply business logic. So, in order to do that I have used python "json" module to load data, and I could see the data getting printed in my console. However, when I am trying to read that data in pandas to create a dataframe out of it, its not happening. Please see my code below
def jsonRd():
json_obj = json.loads("my table name")
json_ead = pd.read_json(json_obj)
There are other confidential data which I cannot put here are the part of above function. So, when I print "json_obj", it is showing data. But when I try to print "json_eed", nothing seems to be happening. Don't see any error also.
please suggest

Related

Pandas txt to csv output only displays the first two lines of values, how do I get the full data to show?

My issue is as follows.
I've gathered some contact data from SurveyMonkey using the SM API, and I've converted that data into a txt file. When opening the txt file, I see the full data from the survey that I'm trying to convert into csv, however when I use the following code:
df = pd.read_csv("my_file.txt",sep =",", encoding = "iso-8859-10")
df.to_csv('my_file.csv')
It creates a csv file with only two lines of values (and cuts off in the middle of the second line). Similarly if I try to organize the data within a pandas dataframe, it only registers the first two lines, meaning most of my txt file is not being read registered.
As I've never run into this problem before and I've been able to convert into CSV without issues, I'm wondering if anyone here has ideas as to what might be causing this issue to occur and how I could go about solving it?
All help is much appreciated.
Edit:
I was able to get the data to display properly in csv, when I converted it directly into csv from json instead of converting it to a txt file first. I was not however able to figure out what when wrong in the conversion from txt to csv, as I tried multiple different encodings but came to the same result.

Converting output into .csv file

I am new to Python, and I am currently writing code to parse through an excel sheet of websites, look at websites that have been modified more than three months ago, and then pull out the names and emails of contacts at those sites. My problem now is that whenever I run the code in my terminal, it only shows me some of the output, so I'd like to export it to a .csv file or really anything that lets me see all the values but I'm not sure how.
import pandas as pd
data = pd.read_csv("filename.csv")
data.sort_values("Last change", inplace = True)
filter1 = data["Last change"]<44285
data.where(filter1, inplace = True)
print(data)
note: the 44285 came from me converting the dates in excel to integers so I didn't have to in Python, lazy I know but I'm learning
You can try converting it to a csv.
data.to_csv('data.csv')
Alternately if you want to just view more records, for example 50, you could do this:
print(data.head(50))
If you can share your parser code also, I think we can save your from the hassle of editing the excel in between the process. Or maybe something from here with a few more lines.
To solve your problem use
data.to_csv("resultfile.csv")
or if you want an excel file
data.to_excel("resultfile.xlsx")

How to transform CSV or DB record data for Kafka and how to get it back into a csv or DF on the other side

I've successfully set up a Kafka instance at my job and I've been able to pass simple 'Hello World' messages through it.
However, I'm not sure how to do more interesting things. I've got a CSV that contains four records from a DB that I'm trying to move through kafka, then take into a DF on the other side and save it as a CSV again.
producer = KafkaProducer(boostrap_servers='my-server-id:443',
....
df = pd.read_csv('data.csv')
df = df.to_json()
producer.send(mytopic, df.encode('utf8'))
This returns code in a tuple object (conusmer.record object, bool) that contains a list of my data. I can access the data as:
msg[0][0][6].decode('utf8')
But that comes in as a single string that I can't pass to a dataframe simply (it just merges everything into one thing).
I'm not sure if I even need a dataframe or a to_json() method or anything. I'm really just not sure how to organize data to send properly and then return it and feed it back into a dataframe so that I can either a) save it to a CSV or b) reinsert the dataframe to a DB with to_Sql.
Kafka isn't really suited to send entire matricies/dataframes around.
You can send a list of CSV rows, JSON arrays, or preferrably some other compressable binary dataformat such as Avro or Protobuf as whole objects. If you are working exclusively in Python, you could pickle the data you send and receive.
When you read the data, you must deserialize it but how you do that, is ultimately your choice, and there is no simple answer for any given application.
The solution, for this one case, would be json_normalize, then to_csv, however... And I would like to point out that Kafka isn't required for you to test that, as you definitely should be writing unit tests...
df = pd.read_csv('data.csv')
jdf = df.to_json()
msg_value = jdf # pretend you got a message from Kafka, as a JSON string
df = pd.json_normalize(msg_value) # back to a dataframe
df.to_csv()

CSV to JSON without needing it to save in another json file using Python

I am looking for a way to convert the CSV data into JSON data without needing it to save it another JSON file. Is it possible?
So the following functionality needs to be carried out.
Sample Code:
df= pd.read_csv("file_xyz.csv").to_json("another_file.json")
data_json = pd.read_json("another_file.json")
Now, if I had to do the same thing without having to save my data in "another_file.json". I want the data_json to have JSON data by directly performing some operations on CSV data. Is it possible? How can we do that?
Use DataFrame.to_json without filename:
j = pd.read_csv("file_xyz.csv").to_json()
Or if want convert output to dictionary for next processing use DataFrame.to_dict:
d = pd.read_csv("file_xyz.csv").to_dict()

How to read API JSON data and store as Python dictionary

I am pulling in info from an API. The returned data is in JSON format. I have to iterate through and get the same data for multiple inputs. I want to save the JSON data for each input in a python dictionary for easy access. This is what I have so far:
import pandas
import requests
ddict = {}
read_input = pandas.read_csv('input.csv')
for d in read_input.values:
print(d)
url = "https://api.xyz.com/v11/api.json?KEY=123&LOOKUP={}".format(d)
response = requests.get(url)
data = response.json()
ddict[d] = data
df = pandas.DataFrame.from_dict(ddict, orient='index')
with pandas.ExcelWriter('output.xlsx') as w:
df.to_excel(w, 'output')
With the above code, I get the following output:
a.com
I also get an excel output with the data only from this first line. My input csv file has close to 400 rows so I should be seeing more than 1 line in the output and in the output excel file.
If you have a better way of doing this, that would be appreciated. In addition, the excel output I get is very hard to understand. I want to read the JSON data using dictionaries and subdictionaries but I don't completely understand the format of the underlying data - I think it looks closest to a JSON array.
I have looked at numerous other posts including Parsing values from a JSON file using Python? and How do I write JSON data to a file in Python? and Converting JSON String to Dictionary Not List and How do I save results of a "for" loop into a single variable? but none of the techniques have worked so far. I would prefer not to pickle, if possible.
I'm new to Python so any help is appreciated!
I'm not going to address your challenges with JSON here as I'll need more information on the issues you're facing. However, with respect to reading from CSV using Pandas, here's a great resource: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html.
Now, your output is being read the way it is because a.com is being considered the header (undesirable). Your read statement should be:
read_input = pandas.read_csv('input.csv', header=None)
Now, read_input is a DataFrame (documentation). So, what you're really looking for is the values in the first column. You can easily get an array of values by read_input.values. This gives you a separate array for each row. So your for loop would be:
for d in read_input.values:
print(d[0])
get_info(d[0])
For JSON, I'd need to see a sample structure and your desired way of storing it.
I think there is a awkwardness in you program.
Try with this:
ddict = {}
read_input = pandas.read_csv('input.csv')
for d in read_input.values:
url = "https://api.xyz.com/v11/api.json?KEY=123&LOOKUP={}".format(d)
response = requests.get(url)
data = response.json()
ddict[d] = data
Edit: iterate the read_input.values.

Categories

Resources