I am trying the retrieve/download a .csv file from the .odc file via Sharepoint (Office 365).
The below is the connection string which is used by the powerquery connection parameter from .odc connection string (this is usually used in Excel for the powerqueryconnection to retrieve the data):
<xml id=msodc>
<odc:OfficeDataConnection
xmlns:odc="urn:schemas-microsoft-com:office:odc"
xmlns="http://www.w3.org/TR/REC-html40">
<odc:PowerQueryConnection odc:Type="OLEDB">
<odc:ConnectionString>Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location="GetViewData?ListViewID=52&csvformat=true";Extended Properties=""
</odc:ConnectionString>
<odc:CommandType>SQL</odc:CommandType>
<odc:CommandText>SELECT * FROM [GetViewData?ListViewID=52&csvformat=true]</odc:CommandText>
</odc:PowerQueryConnection>
</odc:OfficeDataConnection>
</xml>
I want to extract the csv file which is this connecting to pandas dataframe. I tried the below code but not able to work it out
import pyodbc
import pandas as pd
conn = pyodbc.connect('Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location="GetViewData?ListViewID=52&csvformat=true";Extended Properties=""')
df = pd.read_sql("SELECT * FROM [GetViewData?ListViewID=52&csvformat=true]", conn)
conn.close()
please help,
thanks in advance
import pandas as pd
df = pd.read_excel(r'Path where the Excel file is stored\File name.xlsx', sheet_name='your Excel sheet name')
print(df)
The above code might be useful for your requirement.
Related
I have the following code that successfully uploads an excel file to postgreSQL
import pandas as pd
from sqlalchemy import create_engine
dir_path = os.path.dirname(os.path.realpath(__file__))
df = pd.read_excel(dir_path + '/'+file_name, "Sheet1")
engine= create_engine('postgresql://postgres:!Password#localhost/Database')
df.to_sql('identifier', con=engine, if_exists='replace', index=False)
However this leads to problems when trying to do simple queries such as updates in PgAdmin4.
Are there any other ways to insert an excel file into a postgeSQL table using python?
There is a faster way.
Take a look.
I am trying to load a csv file from s3 to redshift table using python. I have used boto3 to pull data from s3. Used pandas to convert data types (timestamp, string and integer) and tried to upload the dataframe to table using to_sql (sqlalchemy). It ended up with error
cursor.executemany(statement, parameters) psycopg2.errors.StringDataRightTruncation: value too long for type character varying(256)".
Additional Info: string contains large amount of mixed data. Also I am able to take the output as csv in my local machine.
My code as follows,
import io
import boto3
import pandas as pd
from sqlalchemy import create_engine
from datetime import datetime
client = boto3.client('s3', aws_access_key_id="",
aws_secret_access_key="")
response = client.get_object(Bucket='', Key='*.csv')
file = response['Body'].read()
df = pd.read_csv(io.BytesIO(file))
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
df['text'] = df['text'].astype(str)
df['count'] = df['count'].fillna(0).astype(int)
con = create_engine('postgresql://*.redshift.amazonaws.com:5439/dev')
select_list = ['date','text','count']
write = df[select_list]
df = pd.DataFrame(write)
df.to_sql('test', con, schema='parent', index=False, if_exists='replace')
I am a beginner, please help me to understand what I am doing wrong. Ignore any typo errors. Thanks.
I want to load csv.gz file from storage to bigquery. Right now I using below code, but I am not sure if it is efficient way to load data to bigquery.
# -*- coding: utf-8 -*-
from io import BytesIO
import pandas as pd
from google.cloud import storage
import pandas_gbq as gbq
client = storage.Client.from_service_account_json(service_account)
bucket = client.get_bucket("bucketname")
blob = storage.blob.Blob("""somefile.csv.gz""", bucket)
content = blob.download_as_string()
df = pd.read_csv(BytesIO(content), delimiter=',', quotechar='"', low_memory=False)
df = df.astype(str)
df.columns = df.columns.str.replace("|", "")
df["dateinsert"] = pd.datetime.now()
gbq.to_gbq(df, 'desttable',
'projectid',
chunksize=None,
if_exists='append'
)
Please assist me to write this code in efficient way
I propose you this process:
Perform a load job into bigquery
Add the schema, yes 150 column is boring...
Add skip leading row option for skipping the header job_config.skip_leading_rows = 1
Name your table like this <dataset>.<tableBaseName>_<Datetime> The date time must be a string format compliant with BigQuery table name. For example YYYYMMDDHHMM
When you query your data, you can query a subset of table, and inject the table name in the query result, like this:
SELECT *,(SELECT table_id
FROM `<project>.<dataset>.__TABLES_SUMMARY__`
WHERE table_id LIKE '<tableBaseName>%') FROM `<project>.<dataset>.<tableBaseName>*`
Of course, you can raffine the * with the year, month, day,...
I think, I meet all your requirements. Comment if something goes wrong
I want to read into pandas the csv generated by this URL:
https://www.alphavantage.co/query?function=FX_DAILY&from_symbol=EUR&to_symbol=USD&apikey=demo&datatype=csv
How should this be done?
I believe you can just read it with pd.read_csv
import pandas as pd
URL = 'https://www.alphavantage.co/query?function=FX_DAILY&from_symbol=EUR&to_symbol=USD&apikey=demo&datatype=csv'
df = pd.read_csv(URL)
Results:
I am looking to gather all the data from the penultimate worksheet in this Excel file along with all the data in the last Worksheet from "Maturity Years" of 5.5 onward. The code I have below currently grabs data from solely the last workbook and I was wondering what the necessary alterations would be.
import urllib2
import pandas as pd
import os
import xlrd
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
socket = urllib2.urlopen(url)
xd = pd.ExcelFile(socket)
df = xd.parse(xd.sheet_names[-1], header=None)
print df
I was thinking of using glob but I haven't seen any application of it with an Online Excel file.
Edit: I think the following allows me to combine two worksheets of data into a single Dataframe. However, if there is a better answer please feel free to show it.
import urllib2
import pandas as pd
import os
import xlrd
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
socket = urllib2.urlopen(url)
xd = pd.ExcelFile(socket)
df1 = xd.parse(xd.sheet_names[-1], header=None)
df2 = xd.parse(xd.sheet_names[-2], header=None)
bigdata = df1.append(df2,ignore_index = True)
print bigdata