Json Manipulation in Python - python

recent_json & historic_json Function Returns:
return(frame.to_json(orient='records'))
Main Function:
recentdata = recent_json(station)
historicdata = historic_json(station)
alldata = historicdata + recentdata
How can i add the data to the same json? The data has a break in it.
e.g :
"Relative_Humidity":93.0}][{"STATIONS_ID":"44","Date":1526774400000,
The ][ shouldn't be there. This is the place which historic data ends and recent data begin.
It is probably due to me concatenating them wrong. How can i truly concatenate them and show as one main json file like:
[{"STATIONS_ID":"44","Date":1356998400000,"Quality_Level":3,"Air_Temperature":8.4,"Relative_Humidity":91.0},
{"STATIONS_ID":"44","Date":1357002000000,"Quality_Level":3,"Air_Temperature":8.3,"Relative_Humidity":93.0}]

First off, you need to be aware that your to_json actually outputs strings. That said, you can use string manipulation to achieve your goal. The closing bracket is the last character of your first string and the opening bracket is the first character of your second string, so doing
alldata = historicdata[:-1] + ',' + recentdata[1:]
will give you your desired output as a string. You'll need to pass it to e.g. json.loads() to actually transform it to a Python object that can be accessed and worked with in conventional ways

Related

Issue using a variable with an r-string in Python

Fairly new to Python, and I've got a batch job that I now have to start saving some extracts from out to a company Sharepoint site. I've searched around and cannot seem to find a solution to the issue I keep running into. I need to pass a date into the filename, and was first having issues with using a normal string. If I just type out the entire thing as a raw string, I get the output I want:
x = r"\\mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts\2021-02-15_aRoute.xlsx"
print (x)
The output is: \mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts\2021-02-15_aRoute.xlsx
However, if I break the string into it's parts so I can get a parameter in there, I wind up having to toss an extra double-quote on the "x" parameter to keep the code from running into a "SyntaxError: EOL while scanning string literal" error:
x = r"\\mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts\""
timestamp = date_time_obj.date().strftime('%Y-%m-%d')
filename = "_aRoute.xlsx"
print (x + timestamp + filename)
But the output I get passes that unwanted double quote into my string: \mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts"2021-02-15_aRoute.xlsx
The syntax I need is clearly escaping me, I'm just trying to get the path built so I can save the file itself. If it happens to matter, I'm using pandas to write the file:
data = pandas.read_sql(sql, cnxn)
data.to_excel(string_goes_here)
Any help would be greatly appreciated!
Per the comment from #Matthias, as it turns out, an r-string can't end with a single backslash. The quick workaround, therefore, was:
x = r"\\mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts" + "\\"
The comment from #sammywemmy also linked to what looks to be a much more thorough solution.
Thank you both!

How to deserialize splitted json data

I stream data via Server Send Event and get about 500.000 datasets but instead of getting one json I get this (example of 2 of the 500.000 datasets)(this is how it looks like opening it in gedit, all question marks are \" and all new lines are \n):
data:{\"data\":[\"Kendrick\",\"Lamar\"]}\n\ndata:{\"data\":[\"David\",\"Bowie\"]}\n\n
... -
My goal is to get this into a database. I actually thought I put this into a dictionary and afterwards create a pandas dataframe from here on I should be able to get it into a database. But this ends up to be quite cumbersome. I ended up with something like this:
c1 = data_json[1:-1]
c2 = c1.replace('{data:{', '{\"data\":{')
c3 = c2.replace('}data:{', ', ')
c4 = '{' + c3 + '}'
but even here I have some problems since I have to add /n/n for the new lines. But as soon as I change c3 to c2.replace('}\n\ndata:{', ', ') I get Process finished with exit code 137 (interrupted by signal 9: SIGKILL). Coming from .NET I could handle this quite easy with a deserializer and I am wondering if there is a similar way to deserialize the data.
I get the data via sseclient and would be able to store them as bytes instead of string, if this would help, just fyi.
Any suggestions?
Juggling with replaces is of course a convoluted path -
the language does have the parsers for this kind of escaping built in -
the simpler of which would be passing the string that contains JSON through an eval call. But eval is seldom needed and should be avoided in most cases as "not elegant" - if not outright unsafe (but being unsafe actually just applies when you have no control over the input data - and even them, ast.literal_eval instead of plain eval can mitigate that). Anyway, there are other problems with the format that will prevent eval to work outright - the missing quotes of the outmost data:, for example.
Random rants apart, if your file content is actually:
data:{\"data\":[\"Kendrick\",\"Lamar\"]}\n\ndata:{\"data\":[\"David\",\"Bowie\"]}\n\n
It has two problems: "under-quoting' of the outmost data and an
"over-scaping" of the inner-data.
On an interactive Python session, using the "raw string" marker I can input your example line as it will be read from a file:
In [263]: a = r"""data:{\"data\":[\"Kendrick\",\"Lamar\"]}\n\ndata:{\"data\":[\"David\",\"Bowie\"]}\n\n"""
In [264]: print(a)
data:{\"data\":[\"Kendrick\",\"Lamar\"]}\n\ndata:{\"data\":[\"David\",\"Bowie\"]}\n\n
So, on to remove one level of backslashes - Python have an "unicode_escape" text encoding, but it only works from bytes-objects. We then resort to the "latin1" encoding, as it provides a byte-for-byte conversion of the unicode literal in "a" to bytes, and then apply an unicode_escape to remove the "\" :
In [266]: b = a.encode("latin1").decode("unicode_escape")
In [267]: print(b, "\n", repr(b))
data:{"data":["Kendrick","Lamar"]}
data:{"data":["David","Bowie"]}
'data:{"data":["Kendrick","Lamar"]}\n\ndata:{"data":["David","Bowie"]}\n\n'
now it is easy to parse:
We split the resulting string at "\n\n" and have one list with one record
(those you are calling "dataset") per element. Then we resort to string
manipulation to get rid of the starting "data:" and finally, json.load can work on the remaining part.
so:
import json
raw_data = open("mystrangefile.pseudo_json").read()
data = data.encode("latin1").decode("unicode_escape")
records = [json.loads(record.split(":", 1)[-1]) for record in data.split("\n\n")]
And "records" now should contain well behaved Python objects dictionaries, you can put in a database. (Unless Pandas can provide automatic mapping of the columns to a databas, it seems to be an uneeded step - a raw connection.executemany(""" INSERT ...""", records) with a proper open DB connection should suffice.
Also, on a sidenote you mentioned that you could handle this easily with a .NET deserializer: that is only if your files are not as broken as you have shown us - no possible standard serializer could know how to handle such an specific data format out of the box. But, if you actually is that more proeficient in another language/technology to do that, you could resort to write just a converter from the broken input to a properly encoded file, and use that as an intermediate step.
I'm not completely sure if I understood the format in which you get the string correctly, so please correct me if I'm wrong here:
data_json = 'data:{\\"data\\":[\\"Kendrick\\",\\"Lamar\\"]}\\n\\ndata:{\\"data\\":[\\"David\\",\\"Bowie\\"]}\\n\\n'
Your first line seems to strip the first and last character, which I don't see. Are there any additional characters you are stripping away here?
The two following substring replacements seem to have no effect as the substrings are not present in the initial string (if I got it correctly in the first place).
And finally in the last line you are wrapping your result with { and } which is not correct for lists in json. It should be [...]
I can't really tell why you would get a SIGKILL here, though. It does not throw any errors for me, it just does not do what you want it to do. Maybe you're running out of memory with all the 500k examples?
However, this would be a working solution (again, given that I got the initial string correctly):
c1 = data_json.replace('\\n\\n', '') # removing escaped newlines
c2 = c1.replace('data:', ',') # replacing the additional 'data:' with json delimiter ','
c3 = c2.replace('\\', '') # removing artificial escapes
c4 = c3[1:-1] # removing leading ',' (introduced in c2) and trailing newline
c5 = '[' + c4 + ']' # wrapping as list
Now you should be able to json.loads(c5) or whatever you need to do with that string.

How to remove double quotes from strings in a list in python?

I am trying to get some data in a list of dictionaries.
The data comes from a csv file so it's all string.
the the keys in the file all have double qoutes, but since these are all strings, I want to remove them so they look like this in the dictionary:
{'key':value}
instead of this
{'"key"':value}
I tried simply using string = string[1:-1], but this doesn's work...
Here is my code:
csvDelimiter = ","
tsvDelimiter = "\t"
dataOutput = []
dataFile = open("browser-ww-monthly-201305-201405.csv","r")
for line in dataFile:
line = line[:-1] # Removes \n after every line
data = line.split(csvDelimiter)
for i in data:
if type(i) == str: # Doesn't work, I also tried if isinstance(i, str)
# but that didn't work either.
print i
i = i[1:-1]
print i
dataOutput.append({data[0] : data[1]})
dataFile.close()
print "Data output:\n"
print dataOutput
all the prints I get from print i are good, without double quotes, but when I append data to dataOutput, the quotes are back!
Any idea how to make them disappear forever?
Strip it. For example:
data[0].strip('"')
However, when reading cvs files, the best is to use the built-in cvs module. It takes care of this for you.
As noted in the comments, when dealing with CSV files you truly ought to use Python's built-in csv module (linking to Python 2 docs since it seems that's what you're using).
Another thing to note is that when you do:
data = line.split(csvDelimiter)
every item in the returned list, if it is not empty, will be strings. There's no sense in doing a type check in the loop (though if there were a reason to you would use isinstance). I don't know what "didn't work" about it, though it's possible you were using unicode strings. On Python 2 you can usually use isinstance(..., basestring) where basestring is a base class for both str and unicode. On Python 3 just use str unless you know you're dealing with bytes.
You said: "I tried simply using string = string[1:-1], but this doesn't work...". It seems to work fine for me:
In [101]: s="'word'"
In [102]: s[1:-1]
Out[102]: 'word'

Splitting a string in Python 2.7

I want to know how to allow multiple inputs in Python.
Ex: If a message is "!comment postid customcomment"
I want to be able to take that post ID, put that somewhere, and then the customcomment, and put that somewhere else.
Here's my code:
import fb
token="access_token_here"
facebook=fb.graph.api(token)
#__________ Later on in the code: __________
elif msg.startswith('!comment '):
postid = msg.replace('!comment ','',1)
send('Commenting...')
facebook.publish(cat="comments", id=postid, message="customcomment")
send('Commented!')
I can't seem to figure it out.
Thank you in advanced.
I can't quite tell what you are asking but it seems that this will do what you want.
Assuming that
msg = "!comment postid customcomment"
you can use the built-in string method split to turn the string into a list of strings, using " " as a separator and a maximum number of splits of 2:
msg_list=msg.split(" ",2)
the zeroth index will contain "!comment" so you can ignore it
postid=msg_list[1] or postid=int(msg_list[1]) if you need a numerical input
message = msg_list[2]
If you don't limit split and just use the default behavior (ie msg_list=msg.split()), you would have to rejoin the rest of the strings separated by spaces. To do so you can use the built-in string method join which does just that:
message=" ".join(msg_list[2:])
and finally
facebook.publish(cat="comments", id=postid, message=message)

String Delimiter in Python

I want to do split a string using "},{" as the delimiter. I have tried various things but none of them work.
string="2,1,6,4,5,1},{8,1,4,9,6,6,7,0},{6,1,2,3,9},{2,3,5,4,3 "
Split it into something like this:
2,1,6,4,5,1
8,1,4,9,6,6,7,0
6,1,2,3,9
2,3,5,4,3
string.split("},{") works at the Python console but if I write a Python script in which do this operation it does not work.
You need to assign the result of string.split("},{") to a new string. For example:
string2 = string.split("},{")
I think that is the reason you think it works at the console but not in scripts. In the console it just prints out the return value, but in the script you want to make sure you use the returned value.
You need to return the string back to the caller. Assigning to the string parameter doesn't change the caller's variable, so those changes are lost.
def convert2list(string):
string = string.strip()
string = string[2:len(string)-2].split("},{")
# Return to caller.
return string
# Grab return value.
converted = convert2list("{1,2},{3,4}")
You could do it in steps:
Split at commas to get "{...}" strings.
Remove leading and trailing curly braces.
It might not be the most Pythonic or efficient, but it's general and doable.
I was taking the input from the console in the form of arguments to the script....
So when I was taking the input as {{2,4,5},{1,9,4,8,6,6,7},{1,2,3},{2,3}} it was not coming properly in the arg[1] .. so the split was basically splitting on an empty string ...
If I run the below code from a script file (in Python 2.7):
string="2,1,6,4,5,1},{8,1,4,9,6,6,7,0},{6,1,2,3,9},{2,3,5,4,3 "
print string.split("},{")
Then the output I got is:
['2,1,6,4,5,1', '8,1,4,9,6,6,7,0', '6,1,2,3,9', '2,3,5,4,3 ']
And the below code also works fine:
string="2,1,6,4,5,1},{8,1,4,9,6,6,7,0},{6,1,2,3,9},{2,3,5,4,3 "
def convert2list(string):
string=string.strip()
string=string[:len(string)].split("},{")
print string
convert2list(string)
Use This:
This will split the string considering },{ as a delimiter and print the list with line breaks.
string = "2,1,6,4,5,1},{8,1,4,9,6,6,7,0},{6,1,2,3,9},{2,3,5,4,3"
for each in string.split('},{'):
print each
Output:
2,1,6,4,5,1
8,1,4,9,6,6,7,0
6,1,2,3,9
2,3,5,4,3
If you want to print the split items in the list only you can use this simple print option.
string = "2,1,6,4,5,1},{8,1,4,9,6,6,7,0},{6,1,2,3,9},{2,3,5,4,3"
print string.split('},{')
Output:
['2,1,6,4,5,1', '8,1,4,9,6,6,7,0', '6,1,2,3,9', '2,3,5,4,3']
Quite simply ,you have to use split() method ,and "},{" as a delimeter, then print according to arguments (because string will be a list ) ,
like the following :
string.split("},{")
for i in range(0,len(string)):
print(string[i])

Categories

Resources