Fairly new to python
I want to parse a file with \t separated values, images below. How do i remove the \t from the file and seperate the values into columns?
Code below.
import pandas as pd
import io
import requests
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt"
s = requests.get(url).content
df = pd.read_csv(io.StringIO(s.decode('utf-8')))
How it looks right now
How i want it to look
Add sep="\t" into pd.read_csv. The data is messy, thus double tab needs to be replaced:
df = pd.read_csv(
io.StringIO(s.decode('utf-8').replace("\t\t", "\t")),
header=None, sep="\t")
If using csv library is an option you can try:
import pandas as pd
import requests
import csv
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt"
raw_data = requests.get(url).content
file = open("raw_data.txt","w")
file.write(raw_data)
data = list(csv.reader(open('raw_data.txt', 'rb'), delimiter='\t'))
df = pd.DataFrame.from_records(data)
print df
Related
I'd like to create a function in Python to import CSV files from github, by just inputting the file name.
I tried the following code, but no dataframe is created. Can anyone give me a hand? Million thanks
import pandas as pd
def in_csv(file_name):
file = 'https://raw.githubusercontent.com/USER/file_name.csv'
file_name = pd.read_csv(file, header = 0)
in_csv('csv_name')
There are many ways to read à csv, but with à pandas.DataFrame:
import pandas as pd
def in_csv(file_name):
file_path = f'https://raw.githubusercontent.com/USER/{file_name}'
df = pd.read_csv(file_path, header = 0)
return df
df = in_csv('csv_name')
print(df.head())
Thanks #Alian and #Aaron
Just add '.csv' after {file_name} and work perfect.
import pandas as pd
def in_csv(file_name):
file_path = f'https://raw.githubusercontent.com/USER/{file_name}**.csv**'
df = pd.read_csv(file_path, header = 0)
return df
df = in_csv('csv_name')
In order to do this, you probably want to use a Python 3 F-string rather than a regular string. In this case, you would want to change your first line in the function to this:
file = f'https://raw.githubusercontent.com/USER/{file_name}.csv'
The f'{}' syntax uses the value of the variable or expression within the brackets instead of the string literal you included.
By using this code:
import pandas as pd
patients_df = pd.read_json('/content/students.json',lines=True)
patients_df.head()
the data are shown in tabular form look like this:
The main json file looks like this:
data = []
for line in open('/content/students.json', 'r'):
data.append(json.loads(line))
How can I get the score column of the table in an organized manner like column name Exam, Quiz, and Homework
Possible solution could be the following:
# pip install pandas
import pandas as pd
import json
def separate_column(row):
for e in row["scores"]:
row[e["type"]] = e["score"]
return row
with open('/content/students.json', 'r') as file:
data = [json.loads(line.rstrip()) for line in file]
df = pd.json_normalize(data)
df = df.apply(separate_column, axis=1)
df = df.drop(['scores'], axis=1)
print(df)
I have the following code to import some data from a website, I want to convert my data variable into a dataframe.
I've tried with pd.DataFrame and pd.read_csv(io.StringIO(data), sep=";") but always show me an error.
import requests
import io
# load file
data = requests.get('https://www.omie.es/sites/default/files/dados/AGNO_2020/MES_08/TXT/INT_PDBC_MARCA_TECNOL_1_01_08_2020_31_08_2020.TXT').content
# decode data
data = data.decode('latin-1')
# skip first 2 rows
data = data.split('\r\n')[2::]
del data[1]
# trying to fix csv structure
lines = []
lines_2 = []
for line in data:
line = ';'.join(line.split(';'))
if len(line) > 0 and line[0].isdigit():
lines.append(line)
lines_2.append(line)
else:
if len(lines) > 0:
lines_2.append(lines_2[-1] + line)
lines_2.remove(lines_2[-2])
else:
lines.append(line)
data = '\r\n'.join(lines_2)
print(data)
the expected ouput should be like this:
date 1 2
0 29/08/2020 HI RE ....
1 30/08/2020 HI RE ....
2 31/08/2020 HI RE ...
There are few rows that need to be added to the previos one (the main rows should be the rows who start by a date)
Prayson's answer is correct, but the skiprows parameter should also be used (otherwise the metadata is interpreted as column names).
import pandas as pd
df = pd.read_csv(
"https://www.omie.es/sites/default/files/dados/AGNO_2020/MES_08/TXT/INT_PDBC_MARCA_TECNOL_1_01_08_2020_31_08_2020.TXT",
sep=";",
skiprows=2,
encoding='latin-1',
)
print(df)
You can read text/csv data directly from URL with pandas
import pandas as pd
URI = 'https://www.omie.es/sites/default/files/dados/AGNO_2020/MES_08/TXT/INT_PDBC_MARCA_TECNOL_1_01_08_2020_31_08_2020.TXT'
df = pd.read_csv(URI, sep=';', encoding='latin1')
print(df)
pandas will do the downloading for you. So no need for requests or io.StringIO.
Please find below the code that I am using to put distorted data into CSV file in the table format:
import requests
from bs4 import BeautifulSoup
import csv
f = open('moneyControl-bonus','w' , newline = '')
writer = csv.writer(f)
f2 = open('moneyControl-dividend','w' , newline = '')
writer2 = csv.writer(f2)
url = 'https://www.moneycontrol.com/stocks/marketinfo/upcoming_actions/home.html'
headers = {'user-agent':'Mozilla/5.0'}
response = requests.get(url,headers)
soup = BeautifulSoup(response.content,'lxml')
div = soup.find_all('div',class_='tbldata36 PT10')[0]
for table in div.find_all('table'):
for row in table.find_all('tr'):
writer.writerow[row]
div2 = soup.find_all('div',class_='tbldata36 PT20')[0]
for table2 in soup.find_all('table'):
for row2 in table2.find_all('tr'):
writer2.writerow[row2]
Have you tried using pandas? This the default library to be used in writing the csv format.
# pip install pandas
import pandas as pd
You can write into the CSV file by either passing list of lists or by dictonary.
df = pd.DataFrame({'col1':[1,2,3,4],'col2':['a','b','c','d']})
df = pd.DataFrame([[1,2,3,4],['a','b','c','d']],columns=['col1','col2'])
df.to_csv(path_and_name_of_file)
You can use many different formats with that one such as Excel, table, text, json etc.
Please take a look at the official DataFrame Documentation
I'm trying to get an API call and save it as a dataframe.
problem is that I need the data from the 'result' column.
Didn't succeed to do that.
I'm basically just trying to save the API call as a csv file in order to work with it.
P.S when I do this with a "JSON to CSV converter" from the web it does it as I wish. (example: https://konklone.io/json/)
import requests
import pandas as pd
import json
res = requests.get("http://api.etherscan.io/api?module=account&action=txlist&
address=0xddbd2b932c763ba5b1b7ae3b362eac3e8d40121a&startblock=0&
endblock=99999999&sort=asc&apikey=YourApiKeyToken")
j = res.json()
j
df = pd.DataFrame(j)
df.head()
output example picture
Try this
import requests
import pandas as pd
import json
res = requests.get("http://api.etherscan.io/api?module=account&action=txlist&address=0xddbd2b932c763ba5b1b7ae3b362eac3e8d40121a&startblock=0&endblock=99999999&sort=asc&apikey=YourApiKeyToken")
j = res.json()
# print(j)
filename ="temp.csv"
df = pd.DataFrame(j['result'])
print(df.head())
df.to_csv(filename)
Looks like you need.
df = pd.DataFrame(j["result"])