How to read and query headers of txt using Pandas? - python

I am writing a script to read txt file using Pandas.
I need to query on particular type of hearders.
Reading excel is possible but i cannot read txt file.
import pandas as pd
#df=pd.read_excel('All.xlsx','Sheet1',dtype={'num1':str},index=False) #works
df=pd.read_csv('read.txt',dtype={'PHONE_NUMBER_1':str}) #doest work
array=['A','C']
a = df['NAME'].isin(array)
b = df[a]
print(b)

try using this syntax.
you are not using the correct key value
df=pd.read_csv('read.txt',dtype={'BRAND_NAME_1':str})

You can try this:
import pandas as pd
df = pd.read_table("input.txt", sep=" ", names=('BRAND_NAME_1'), dtype={'BRAND_NAME_1':str})

You can read file txt then astype for column.
Read file:
pd.read_csv('file.txt', names = ['PHONE_NUMBER_1', 'BRAND_NAME_1'])
names: is name of columns
Assign type:
df['PHONE_NUMBER_1'] = df['PHONE_NUMBER_1'].astype(str)

Related

Bug on read csv format , assertin the header in jupyter notebook [duplicate]

I a importing a .csv file in python with pandas.
Here is the file format from the .csv :
a1;b1;c1;d1;e1;...
a2;b2;c2;d2;e2;...
.....
here is how get it :
from pandas import *
csv_path = "C:...."
data = read_csv(csv_path)
Now when I print the file I get that :
0 a1;b1;c1;d1;e1;...
1 a2;b2;c2;d2;e2;...
And so on... So I need help to read the file and split the values in columns, with the semi color character ;.
read_csv takes a sep param, in your case just pass sep=';' like so:
data = read_csv(csv_path, sep=';')
The reason it failed in your case is that the default value is ',' so it scrunched up all the columns as a single column entry.
In response to Morris' question above:
"Is there a way to programatically tell if a CSV is separated by , or ; ?"
This will tell you:
import pandas as pd
df_comma = pd.read_csv(your_csv_file_path, nrows=1,sep=",")
df_semi = pd.read_csv(your_csv_file_path, nrows=1, sep=";")
if df_comma.shape[1]>df_semi.shape[1]:
print("comma delimited")
else:
print("semicolon delimited")

Changing the string values in a column Pandas Python

I have a pandas csv file down below. In the symbol column I want to replace all the BTC/USD plots to BTCUSD. How would I be able to do that?
Code:
# read_csv function which is used to read the required CSV file
data = pd.read_csv("sample.txt")
csv file:
unix,date,symbol,open,high,low,close,Volume BTC
1544217600,2018-12-07 21:20:00,BTC/USD,3348.77,3350.41,3345.07,3345.12,3.11919918
1544217540,2018-12-07 21:19:00,BTC/USD,3342.24,3351.14,3342.24,3346.37,21.11950697
1544217480,2018-12-07 21:18:00,BTC/USD,3336.02,3336.02,3336.02,3336.02,0.0
1544217420,2018-12-07 21:17:00,BTC/USD,3332.26,3336.02,3330.69,3336.02,3.28495056
Expected Output:
unix,date,symbol,open,high,low,close,Volume BTC
1544217600,2018-12-07 21:20:00,BTCUSD,3348.77,3350.41,3345.07,3345.12,3.11919918
1544217540,2018-12-07 21:19:00,BTCUSD,3342.24,3351.14,3342.24,3346.37,21.11950697
1544217480,2018-12-07 21:18:00,BTCUSD,3336.02,3336.02,3336.02,3336.02,0.0
1544217420,2018-12-07 21:17:00,BTCUSD,3332.26,3336.02,3330.69,3336.02,3.28495056
# importing pandas module
import pandas as pd
# reading csv file
data = pd.read_csv("sample.txt")
# overwriting column with replaced value
data["Team"]= data["symbol"].str.replace("BTC/USD", "BTCUSD", case = False)
You can use str.replace like this:
df['symbol'] = df['symbol'].str.replace('BTC/USD','BTCUSD')

Python, how to add a new column in excel

I am having below file(file1.xlsx) as input. In total i am having 32 columns in this file and almost 2500 rows. Just for example i am mentioning 5 columns in screen print
I want to edit same file with python and want output as (file1.xlsx)
it should be noted i am adding one column named as short and data is a kind of substring upto first decimal of data present in name(A) column of same excel.
Request you to please help
Regards
Kawaljeet
Here is what you need...
import pandas as pd
file_name = "file1.xlsx"
df = pd.read_excel(file_name) #Read Excel file as a DataFrame
df['short'] = df['Name'].str.split(".")[0]
df.to_excel("file1.xlsx")
hello guys i solved the problem with below code:
import pandas as pd
import os
def add_column():
file_name = "cmdb_inuse.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name,) #Read Excel file as a DataFrame
df['short'] = [x.split(".")[0] for x in df['Name']]
df.to_excel("cmdb_inuse.xlsx", index=False)

Python Datacompy library: how to save report string into a csv file?

I'm comparing between two data frames using Datacompy, but how can I save the final result as an excel sheet or csv file? I got a string as an output, but how can I save it as a CSV.
import pandas as pd
df1_1=pd.read_csv('G1-1.csv')
df1_2=pd.read_csv('G1-2.csv')
import datacompy
compare = datacompy.Compare(
df1_1,
df1_2,
join_columns='SAMPLED CONTENT (URL to content)',
)
print(compare.report())
I have tried this, and it worked for me:
with open('//Path', encoding='utf-8') as report_file:
report_file.write(compare.report())
If you just using pandas, you can try pandas's own way to write into csv:
> df = pd.DataFrame([['yy','rr'],['tt', 'rr'],['cc', 'rr']], index=range(3),
columns=['a', 'b'])
> df.to_csv('compare.csv')
I hadn't used datacompy, but I suggest that you can make your results into a dataframe, then you can use the to_csv way.
This is working fine me also Full code
compare = datacompy.Compare(
Oracle_DF1,PostgreSQL_DF2,
join_columns=['c_transaction_cd','c_anti_social_force_req_id'], #You can also specify a list of columns
abs_tol=0,
rel_tol=0,
df1_name = 'Oracle Source',
df2_name = 'PostgrSQL Reference'
)
compare.matches(ignore_extra_columns=False)
Report = compare.report() csvFileToWrite=r'D://Postgres_Problem_15Feb21//Oracle_PostgreSQLDataFiles//Sample//summary.csv'
with open(csvFileToWrite,mode='r+',encoding='utf-8') as report_file:
report_file.write(compare.report())

Read json file as pandas dataframe?

I am using python 3.6 and trying to download json file (350 MB) as pandas dataframe using the code below. However, I get the following error:
data_json_str = "[" + ",".join(data) + "]
"TypeError: sequence item 0: expected str instance, bytes found
How can I fix the error?
import pandas as pd
# read the entire file into a python array
with open('C:/Users/Alberto/nutrients.json', 'rb') as f:
data = f.readlines()
# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)
# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in and of itself, is one large JSON object
# basically... add square brackets to the beginning
# and end, and have all the individual business JSON objects
# separated by a comma
data_json_str = "[" + ",".join(data) + "]"
# now, load it into pandas
data_df = pd.read_json(data_json_str)
From your code, it looks like you're loading a JSON file which has JSON data on each separate line. read_json supports a lines argument for data like this:
data_df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)
Note
Remove lines=True if you have a single JSON object instead of individual JSON objects on each line.
Using the json module you can parse the json into a python object, then create a dataframe from that:
import json
import pandas as pd
with open('C:/Users/Alberto/nutrients.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data)
If you open the file as binary ('rb'), you will get bytes. How about:
with open('C:/Users/Alberto/nutrients.json', 'rU') as f:
Also as noted in this answer you can also use pandas directly like:
df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)
if you want to convert it into an array of JSON objects, I think this one will do what you want
import json
data = []
with open('nutrients.json', errors='ignore') as f:
for line in f:
data.append(json.loads(line))
print(data[0])
The easiest way to read json file using pandas is:
pd.read_json("sample.json",lines=True,orient='columns')
To deal with nested json like this
[[{Value1:1},{value2:2}],[{value3:3},{value4:4}],.....]
Use Python basics
value1 = df['column_name'][0][0].get(Value1)
Please the code below
#call the pandas library
import pandas as pd
#set the file location as URL or filepath of the json file
url = 'https://www.something.com/data.json'
#load the json data from the file to a pandas dataframe
df = pd.read_json(url, orient='columns')
#display the top 10 rows from the dataframe (this is to test only)
df.head(10)
Please review the code and modify based on your need. I have added comments to explain each line of code. Hope this helps!

Categories

Resources