PowerQueryConnection into the Python pandas dataframe - python

I am trying the retrieve/download a .csv file from the .odc file via Sharepoint (Office 365).
The below is the connection string which is used by the powerquery connection parameter from .odc connection string (this is usually used in Excel for the powerqueryconnection to retrieve the data):
<xml id=msodc>
<odc:OfficeDataConnection
xmlns:odc="urn:schemas-microsoft-com:office:odc"
xmlns="http://www.w3.org/TR/REC-html40">
<odc:PowerQueryConnection odc:Type="OLEDB">
<odc:ConnectionString>Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location="GetViewData?ListViewID=52&csvformat=true";Extended Properties=""
</odc:ConnectionString>
<odc:CommandType>SQL</odc:CommandType>
<odc:CommandText>SELECT * FROM [GetViewData?ListViewID=52&csvformat=true]</odc:CommandText>
</odc:PowerQueryConnection>
</odc:OfficeDataConnection>
</xml>
I want to extract the csv file which is this connecting to pandas dataframe. I tried the below code but not able to work it out
import pyodbc
import pandas as pd
conn = pyodbc.connect('Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location="GetViewData?ListViewID=52&csvformat=true";Extended Properties="&quot')
df = pd.read_sql("SELECT * FROM [GetViewData?ListViewID=52&csvformat=true]", conn)
conn.close()
please help,
thanks in advance

import pandas as pd
df = pd.read_excel(r'Path where the Excel file is stored\File name.xlsx', sheet_name='your Excel sheet name')
print(df)
The above code might be useful for your requirement.

Related

Uploading an excel file to a table in PostgreSQL

I have the following code that successfully uploads an excel file to postgreSQL
import pandas as pd
from sqlalchemy import create_engine
dir_path = os.path.dirname(os.path.realpath(__file__))
df = pd.read_excel(dir_path + '/'+file_name, "Sheet1")
engine= create_engine('postgresql://postgres:!Password#localhost/Database')
df.to_sql('identifier', con=engine, if_exists='replace', index=False)
However this leads to problems when trying to do simple queries such as updates in PgAdmin4.
Are there any other ways to insert an excel file into a postgeSQL table using python?
There is a faster way.
Take a look.

Loading data to_sql comeup with error "(psycopg2.errors.StringDataRightTruncation) value too long for type character varying(256)" - Redhift/python

I am trying to load a csv file from s3 to redshift table using python. I have used boto3 to pull data from s3. Used pandas to convert data types (timestamp, string and integer) and tried to upload the dataframe to table using to_sql (sqlalchemy). It ended up with error
cursor.executemany(statement, parameters) psycopg2.errors.StringDataRightTruncation: value too long for type character varying(256)".
Additional Info: string contains large amount of mixed data. Also I am able to take the output as csv in my local machine.
My code as follows,
import io
import boto3
import pandas as pd
from sqlalchemy import create_engine
from datetime import datetime
client = boto3.client('s3', aws_access_key_id="",
aws_secret_access_key="")
response = client.get_object(Bucket='', Key='*.csv')
file = response['Body'].read()
df = pd.read_csv(io.BytesIO(file))
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
df['text'] = df['text'].astype(str)
df['count'] = df['count'].fillna(0).astype(int)
con = create_engine('postgresql://*.redshift.amazonaws.com:5439/dev')
select_list = ['date','text','count']
write = df[select_list]
df = pd.DataFrame(write)
df.to_sql('test', con, schema='parent', index=False, if_exists='replace')
I am a beginner, please help me to understand what I am doing wrong. Ignore any typo errors. Thanks.

Load csv.gz file from google storage to bigquery using python

I want to load csv.gz file from storage to bigquery. Right now I using below code, but I am not sure if it is efficient way to load data to bigquery.
# -*- coding: utf-8 -*-
from io import BytesIO
import pandas as pd
from google.cloud import storage
import pandas_gbq as gbq
client = storage.Client.from_service_account_json(service_account)
bucket = client.get_bucket("bucketname")
blob = storage.blob.Blob("""somefile.csv.gz""", bucket)
content = blob.download_as_string()
df = pd.read_csv(BytesIO(content), delimiter=',', quotechar='"', low_memory=False)
df = df.astype(str)
df.columns = df.columns.str.replace("|", "")
df["dateinsert"] = pd.datetime.now()
gbq.to_gbq(df, 'desttable',
'projectid',
chunksize=None,
if_exists='append'
)
Please assist me to write this code in efficient way
I propose you this process:
Perform a load job into bigquery
Add the schema, yes 150 column is boring...
Add skip leading row option for skipping the header job_config.skip_leading_rows = 1
Name your table like this <dataset>.<tableBaseName>_<Datetime> The date time must be a string format compliant with BigQuery table name. For example YYYYMMDDHHMM
When you query your data, you can query a subset of table, and inject the table name in the query result, like this:
SELECT *,(SELECT table_id
FROM `<project>.<dataset>.__TABLES_SUMMARY__`
WHERE table_id LIKE '<tableBaseName>%') FROM `<project>.<dataset>.<tableBaseName>*`
Of course, you can raffine the * with the year, month, day,...
I think, I meet all your requirements. Comment if something goes wrong

Python: read a csv file generated dynamically by an API?

I want to read into pandas the csv generated by this URL:
https://www.alphavantage.co/query?function=FX_DAILY&from_symbol=EUR&to_symbol=USD&apikey=demo&datatype=csv
How should this be done?
I believe you can just read it with pd.read_csv
import pandas as pd
URL = 'https://www.alphavantage.co/query?function=FX_DAILY&from_symbol=EUR&to_symbol=USD&apikey=demo&datatype=csv'
df = pd.read_csv(URL)
Results:

Downloading data from two Worksheets of a URL with an Excel File

I am looking to gather all the data from the penultimate worksheet in this Excel file along with all the data in the last Worksheet from "Maturity Years" of 5.5 onward. The code I have below currently grabs data from solely the last workbook and I was wondering what the necessary alterations would be.
import urllib2
import pandas as pd
import os
import xlrd
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
socket = urllib2.urlopen(url)
xd = pd.ExcelFile(socket)
df = xd.parse(xd.sheet_names[-1], header=None)
print df
I was thinking of using glob but I haven't seen any application of it with an Online Excel file.
Edit: I think the following allows me to combine two worksheets of data into a single Dataframe. However, if there is a better answer please feel free to show it.
import urllib2
import pandas as pd
import os
import xlrd
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
socket = urllib2.urlopen(url)
xd = pd.ExcelFile(socket)
df1 = xd.parse(xd.sheet_names[-1], header=None)
df2 = xd.parse(xd.sheet_names[-2], header=None)
bigdata = df1.append(df2,ignore_index = True)
print bigdata

Categories

Resources