I'm trying to insert a very large CSV file into InfluxDB and am inserting it as such in Python:
influx_pd = influxdb.DataFrameClient(host, port, user, password, db, verify_ssl=False)
for frame in pd.read_csv(infile, chunksize=batch_count):
frame.set_index(pd.DatetimeIndex(frame[date_pk]), inplace=True)
frame.dropna(axis=1, how='all')
influx_pd.write_points(frame, 'patients')
However, on the first call to write_points, I'm receiving this error (truncated):
raise InfluxDBClientError(response.content, response.status_code)
influxdb.exceptions.InfluxDBClientError: 400: {"error":"unable to parse 'enroll_pd Pt Id=\"21.0\",Admit Date=\"2010-12-05\", ... MRSA Screening=\"Negative\" 1291507200000000000': invalid field format\nunable to parse ... (ellipses used to truncate)
I had read about issues with InfluxDB and NaN values (which my CSV file does contain), so I tried inserting placeholder values for NaN values but receive the same result. Could someone please help me locate the issue in my code? It would be much appreciated.
I'm using an InfluxDB 1.3 Docker image just FYI.
So I realized that I had to explicitly specify the protocol to be json, as such:
influx_pd.write_points(frame, measurement='enroll_pd', protocol='json')
in addition to filling in NaN values (JSON has no support for those) with an imputation method. I thought the docs I was under the impression that json was the default, I guess that was not the case.
This, of course, might only be one solution. I welcome other, alternative solutions that work.
Related
Im trying to retrieve data from a database named RethinkDB, they output JSON when called with r.db("Databasename").table("tablename").insert([{ "id or primary key": line}]).run(), when doing so it outputs [{'id': 'ValueInRowOfid\n'}] and I want to parse that to just the value eg. "ValueInRowOfid". Ive tried with JSON in Python, but I always end up with the typeerror: list indices must be integers or slices, not str, and Ive been told that it is because the Database outputs invalid JSON format. My question is how can a JSON format be invalid (I cant see what is invalid with the output) and also what would be the best way to parse it so that the value "ValueInRowOfid" is left in a Operator eg. Value = ("ValueInRowOfid").
This part imports the modules used and connects to RethinkDB:
import json
from rethinkdb import RethinkDB
r = RethinkDB()
r.connect( "localhost", 28015).repl()
This part is getting the output/value and my trial at parsing it:
getvalue = r.db("Databasename").table("tablename").sample(1).run() # gets a single row/value from the table
print(getvalue) # If I print that, it will show as [{'id': 'ValueInRowOfid\n'}]
dumper = json.dumps(getvalue) # I cant use `json.loads(dumper)` as JSON object must be str. Which the output of the database isnt (The output is a list)
parsevalue = json.loads(dumper) # After `json.dumps(getvalue)` I can now load it, but I cant use the loaded JSON.
print(parsevalue["id"]) # When doing this it now says that the list is a str and it needs to be an integers or slices. Quite frustrating for me as it is opposing it self eg. It first wants str and now it cant use str
print(parsevalue{'id'}) # I also tried to shuffle it around as seen here, but still the same result
I know this is janky and is very hard to comprehend this level of stupidity that I might be on. As I dont know if it is the most simple problem or something that just isnt possible (Which it should or else I cant use my data in the database.)
Thank you for reading this through and not jumping straight into the comments and say that I have to read the JSON documentation, because I have and I havent found a single piece that could help me.
I tried reading the documentation and watching tutorials about JSON and JSON parsing. I also looked for others whom have had the same problems as me and couldnt find.
It looks like it's returning a dictionary ({}) inside a list ([]) of one element.
Try:
getvalue = r.db("Databasename").table("tablename").sample(1).run()
print(getvalue[0]['id'])
I am working on a program that reads the content of a Restful API from ImportIO. The connection works, and data is returned, but it's a jumbled mess. I'm trying to clean it to only return Asins.
I have tried using the split keyword and delimiter to no success.
stuff = requests.get('https://data.import.io/extractor***')
stuff.content
I get the content, but I want to extract only Asins.
results
While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. the response will do that for you when you access .text.
response.txt
Because the decoding of bytes to str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:
If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():
response.json()
The type of the return value of .json() is a dictionary, so you can access values in the object by key.
You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.
For More Info: https://realpython.com/python-requests/
What format is the return information in? Typically Restful API's will return the data as json, you will likely have luck parsing the it as a json object.
https://realpython.com/python-requests/#content
stuff_dictionary = stuff.json()
With that, you can load the content is returned as a dictionary and you will have a much easier time.
EDIT:
Since I don't have the full URL to test, I can't give an exact answer. Given the content type is CSV, using a pandas DataFrame is pretty easy. With a quick StackOverflow search, I found the following answer: https://stackoverflow.com/a/43312861/11530367
So I tried the following in the terminal and got a dataframe from it
from io import StringIO
import pandas as pd
pd.read_csv(StringIO("HI\r\ntest\r\n"))
So you should be able to perform the following
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(stuff.content))
If that doesn't work, consider dropping the first three bytes you have in your response: b'\xef\xbb\xf'. Check the answer from Mark Tolonen to get parse this.
After that, selecting the ASIN (your second column) from your dataframe should be easy.
asins = df.loc[:, 'ASIN']
asins_arr = asins.array
The response is the byte string of CSV content encoded in UTF-8. The first three escaped byte codes are a UTF-8-encoded BOM signature. So stuff.content.decode('utf-8-sig') should decode it. stuff.text may also work if the encoding was returned correctly in the response headers.
I'm aware there are a zillions posts about encoding / decoding problems on the forum but after going through half of them I wasn't able to find one that did the trick for me. So be nice if it is somewhere in the other half...
My issue :
I have a dbase (MS SQL) containing multilingual data (Latin1_General_CI_AS COLLATE), and I am using pymssql and pandas to convert it to a dataframe for use outside of python. All works fine except for the non latin characters and I'm completely stuck at this moment.
This is my (simplified) python 3 code:
import pandas as pd
import pymssql
def rm_main():
conn = pymssql.connect(server='***',port=4133, user='***', charset='UTF-8', password='***', database='**')
q="""
SELECT goodmorning FROM myTable
"""
df = pd.read_sql(q,conn)
df['encoded_goodmorning'] = df.goodmorning.str.encode('utf-8')
return df
what is in my database is a field called goodmorning, and it contains the following string : Dzień dobry
When calling the data as above, using just pymssql the data is retrieved correctly.
When I want to use the read_sql method form pandas I get the dreadfull question mark as follows : Dzie? dobry
Using the encoding options I get a bit further in the right direction as I get the following : b'Dziexc5x84 dobry', where c5 84 is the utf hex code for my small latin n with acute. So my content is complete but it is not very reader friendly.
Now where I fail miserably is to get this into the 'friendly format' again (so that it just says 'Dzień dobry' again).
What do I overlook here? Are there better approaches to do this? it seems like something very obvious but whatever I tried (encoding / decoding) either doesn't make a difference or it simply brakes the code.
I am having quite a bit of trouble with getting the correct form data saved to a server via POST with Requests (2.8.1) module.
I have previous code which does exactly what I want it to do: it encodes a bunch of key:value pairs into the correct header:value payload dict format, and successfully POSTS to the URI. I get a 200 response (what I'm looking for) and everything is great.
This is a section of the OLD payload encoding function, with a ton of key:value pairs omitted for brevity.
Note: the checkbox value set could be any sequence of numbers between 1 and 25, I just wrote it as
item in range(1,5)
to illustrate that the list is comprised of int numbers, i.e. [ "", 1, 2, 3, 4, 5,...] or [ "", 2, 7, 5, 1, 25,...] etc.
checkboxList = ["",]
for item in range(1,5):
checkboxList.append(item)
payload['checkbox[ids][]'] = checkboxList
...
response = request.post(data_url, data=payload)
>> 200 OK!
Here is a print of what the payload dict (checkboxes) looks like before it's sent to the server:
{... "checkbox[ids][]" : [ "", 2, 17, 20, 5], ...}
And when I look on the page with a browser, all the payload information has been correctly recorded (omitted above) AND the checkboxes (shown above) are correct!
Originally, the checkbox values came from an excel file, as did the rest of the information that was put into the payload before being POSTed to the server. However, now I'm retrieving the information from an SQLite db.
Below is the NEW code that records the checkboxes incorrectly. I should note: I do not have access to the server, so I cannot easily tell if it's a server issue, but let's assume it's not the servers fault. I've had this issue previously, but I got it to work with the above code. However, now that I've started to store the values I need in a db, I cannot get the correct checkboxes recorded by the server.
This is what the data from the db column looks like:
12-5-1-22-4
(... I know this isn't great practice for DB mgmt, but I assume this isn't why the POST is recording the wrong data, and I wanted this question to be as closely representational to my code as possible.)
checkList = checkboxesFromDB.split('-')
payload['checkbox[ids][]'] = checkList
...
response = request.post(data_url, data=payload)
>> 200 OK!
When I look at the site with the browser, it records the checkboxes incorrectly. Now, i should note that 3 checkboxes are selected no matter what I pass to payload[checkbox[ids][]]
It's ALWAYS the same 3, incorrect checkboxes, even if I completely omit checkbox[ids][] from the payload dict. Knowing that, we could assume its a server issue. However, the nearly EXACT code from above works (when I grab the info from an excel file).
I've tried the following (with only one value as a test) without getting the correct checkboxes recorded by the server:
payload['checkbox[ids][]'] = '1'
payload['checkbox[ids][]'] = 1
payload['checkbox[ids][]'] = [1]
payload['checkbox[ids][]'] = ["",1]
payload['checkbox[ids][]'] = [1,""]
When uploading images to the same server, I had an encoding issue when retrieving the image BLOB from the db and trying to pass the buffer object directly to Requests as a file, but I fixed this with cStringIO encoding. (It took me forever as I'm really new to programming, and still unsure of syntax, let alone ways to handle this sort of stuff....) I thought I might be having a similar encoding issue, but with the testing and research I've done, I cannot determine either way as I feel like I'm a bit over my head.
I apologize if this is completely NOOB, but I've done extensive research, trying so many different things that I could think of. I tried passing strings, lists, dicts, forcing encoding of lists as utf-8.
The main reason I'm so perplexed is my original code WORKS, and my new code is nearly identical but doesn't. The only real difference I can think of is now my information is coming from a SQLite db (this particular checkbox column is TEXT type)
Can anyone help me, or point me in a new direction I haven't thought of/know of?
I went through all payload pairs to find that it was an issue with HTML.
I was saving HTML in my SQLite db (via BeautifulSoup without prettifying it) as TEXT. Then I was retrieving it and sending it as a string. This was throwing off the server response.
I have since swapped that sql column value type to VARCHAR (as is best for my use) and prettify it like this foo = bar.prettify(formatter="html")before saving to the db. Now, when i retrieve the value and pass it to the payload, everything works as it should.
I am trying to use the requests library in Python to push data (a raw value) to a firebase location.
Say, I have urladd (the url of the location with authentication token). At the location, I want to push a string, say International. Based on the answer here, I tried
data = {'.value': 'International'}
p = requests.post(urladd, data = sjson.dumps(data))
I get <Response [400]>. p.text gives me:
u'{\n "error" : "Invalid data; couldn\'t parse JSON object, array, or value. Perhaps you\'re using invalid characters in your key names."\n}\n'
It appears that they key .value is invalid. But that is what the answer linked above suggests. Any idea why this may not be working, or how I can do this through Python? There are no problems with connection or authentication because the following works. However, that pushes an object instead of a raw value.
data = {'name': 'International'}
p = requests.post(urladd, data = sjson.dumps(data))
Thanks for your help.
The answer you've linked is a special case for when you want to assign a priority to a value. In general, '.value' is an invalid name and will throw an error.
If you want to write just "International", you should write the stringified-JSON version of that data. I don't have a python example in front of me, but the curl command would be:
curl -X POST -d "\"International\"" https://...
Andrew's answer above works. In case someone else wants to know how to do this using the requests library in Python, I thought this would be helpful.
import simplejson as sjson
data = sjson.dumps("International")
p = requests.post(urladd, data = data)
For some reason I had thought that the data had to be in a dictionary format before it is converted to stringified JSON version. That is not the case, and a simple string can be used as an input to sjson.dumps().