String conversion to dataframe

String conversion to dataframe - python

In this screenshot data (string datatype) and df2 (pandas dataframe) store the same data - a timestamp and a value.
How do I get data in a similar dataframe so I can append the values to df2 so I have all the data records and all the df2 records in one dataframe and matching the current format of df2 ?
I can post what I've tried so far, but all I get is errors :(

import ast
import pandas as pd
data = "[[1212.1221, -10.5],[2232.55, -19.44],[32432.87655, -445.88]]"
df = pd.DataFrame(ast.literal_eval(data),
columns=['index', 'data'])

Looks like your string data is a correctly formatted json (which from my knowledge looks exactly like Python dictionaries but is strict about double quotes over single quotes). Try:
import json
dict = json.loads(data)
This will convert your string into dict type from which you can easily create and manipulate DataFrames.
EDIT:
If any of your strings have single quotes, you can remedy this using str.replace("'", "\"") to convert them to double quotes. This will only cause problems if for whatever reason your data has incorrectly paired quotes.

Related

How to prevent NaN/NULL values from Pandas df getting quoted after using csv.QUOT_ALL option?

I have pandas df which I need to load in postgres db. Since some values contains special chars & delimiters, I need to use quoting = csv.QUOTE_ALL option as follows: -
df.to_csv(r"test.csv",index=False,sep='|',quoting=csv.QUOTE_ALL)
csv.QUOATE_ALL ensures special chars getting handled properly. However it's quoting even NaN values from df in csv file. So when I load from csv to db using COPY command, it throughs error for non char dtype columns as --
invalid input syntax for timestamp - ""
This is because output column data type is timestamp but value contains "", as we quoted all values.
So how to prevent NaN values from df getting quoted ?
Thanks in advance.

unable to read head after converting excel to csv

I am trying to read an excel file, convert it into csv and load its head:
df = pd.read_excel("final.xlsx", sheet_name="NewCustomerList")
# df = df.to_csv()
print(df.head(3))
Without converting to csv, the results look like this:
Note: The data and information in this document is reflective of a hypothetical situation and client. \
0 first_name
1 Chickie
2 Morly
However, if I uncomment the conversion, I get an error that:
'str' object has no attribute 'head'
I am guessing its because of the first line of the data. How else can I convert this properly and read it?

to_csv() is used to save a table to disk, it has no effect on memory-stored tables and returns None. So, you are changing your df variable to None with your commented line.
If you just want to display the table on screen in a specific format, perhaps take a look at to_string()
If you absolutely MUST have each row of your df as a comma-separated string then try a list comprehension:
my_csv_list = [','.join(map(str, row)) for row in df.itertuples()]
Beware of the csv format, if any datapoint contains a comma then you are in for a nightmare when decoding back to a table.

According to the documentation, Pandas' to_csv() method returns None (nothing) or a string.
You could further need to use something like in this answer to turn the string into a dataframe again and use its head.

how to replace a string with different string in the column of a data frame

I have a dataframe Adult and a column in the data frame workclass with thousands of rows. The column contains different string objects. I would like to replace all string ? with string Private I have tried different variations of the code:
Adult.loc[:,'workclass'] = Adult.loc[:,'workclass'].replace(to_replace="?", value=str("Private"))
After running the code I do not get an error but when I run the code Adult.workclass.unique() the ?is still in the data frame. How would I go about replacing the string with the correct string?
Thanks in advance

Try the following code:
Adult['workclass'] = Adult['workclass'].str.replace('?', 'Private')

Python/Pandas: read nested JSON

I am reading a data table from an API that returns me the data in JSON format, and one of the columns is itself a JSON string. I succeed in creating a Pandas dataframe for the overall table, but in the process of reading it, double quotes in the JSON string get converted to single quotes, and I can't parse the nested JSON.
I can't provide a reproducible example, but here is the key code:
myResult = requests.get(myURL, headers = myHeaders).text
myDF = pd.read_json(myResult, orient = "records", dtype = {"custom": str}, encoding = "unicode_escape")
Where custom is the nested JSON string. Try as I might by setting the dtype and encoding arguments, I cannot force Pandas to preserve the double quotes in the string.
So what started off as:
"custom": {"Field1":"Value1","Field2":"Value2"}
gets into the dataframe as:
{'Field1':'Value1','Field2':'Value2'}
I found this question which suggests using a custom parser for read_csv - but I can't see that this option is available for read_json.
I found a few suggestions here but the only one I could try was manually replacing the double quotes - and this causes fresh errors because there are apostrophes contained within the nested field values themselves...
The JSON strings are formatted correctly within myResult so it's the parsing applied by read_json that's the problem. Is there any way to change that or do I need to find some other way of reading this in?

Convert numbers to strings when reading an excel spreadsheet into a pandas DataFrame

I'm reading some excel spreadsheets (xlsx format) into pandas using read_excel, which generally works great. The problem I have is that when a column contains numbers, pandas converts these to float64 type, and I would like them to be treated as strings. After reading them in, I can convert the column to str:
my_frame.my_col = my_frame.my_col.astype('str')
This works as far as assigning the right type to the column, but when I view the values in this column, the strings are formatted in scientific-format e.g. 8.027770e+14, which is not what I want. I'd like to work out how to tell pandas to read columns as strings, or do the conversion later so that I get values in their original (non-scientific) format.

pandas.read_csv() has a dtype argument:
dtype : Type name or dict of column -> type
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32}

I solve it with round, if you do round(number,5) in most case you will not lose data and you will get zero in the case of 8.027770e+14

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

String conversion to dataframe - python

import ast import pandas as pd data = "[[1212.1221, -10.5],[2232.55, -19.44],[32432.87655, -445.88]]" df = pd.DataFrame(ast.literal_eval(data), columns=['index', 'data'])

Related

How to prevent NaN/NULL values from Pandas df getting quoted after using csv.QUOT_ALL option?

unable to read head after converting excel to csv

how to replace a string with different string in the column of a data frame

Python/Pandas: read nested JSON

Convert numbers to strings when reading an excel spreadsheet into a pandas DataFrame

Categories

Resources