How to insert data frame result into one specific column in python? - python

I have a data frame result which I need to insert into existing table line by line, How will I insert result into a specific column name - imgtext
table structure
As in SQL I know I can write query as -
INSERT INTO tableName(imgtext) VALUES('Learn MySQL INSERT Statement');
Python script:
This code takes some value from csv and return some data with help of beautifulsoup
resulted data will save in csv
Problem:
Now rather than saving into csv how will I inserted resulted data into SQL table in specific column name imgtext
seeking solution:
How will I process csv data using data frame for inserting result into SQL rather than CSV.
`
img_text_list = []
df1 = pd.DataFrame(
columns=['imgtext'])
img_formats = [".jpg", ".jpeg"]
df = pd.read_csv("urls.csv")
urls = df["urls"].tolist()
for y in urls:
response = requests.get(y)
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img', class_='pick')
img_srcs = ["https://myimpact.in/" + img['src'].replace(
'\\', '/') if img.has_attr('src') else '-' for img in img_tags]
for count, x in enumerate(img_srcs):
if x != '-':
if pathlib.Path(x).suffix in img_formats:
response = requests.get(x)
img = Image.open(io.BytesIO(response.content))
text = pt.image_to_string(img, lang="hin")
# how to insert this text value into sql table column name - imgtext
img_text_list.append(text)
df1['img_text'] = img_text_list
df1.to_csv('data.csv', encoding='utf-8')
`

to add value from CSV to your SQL table you will need to use a Python SQL Driver (pyodbc). Please see the sample code for connecting python to SQL.
sample code:
import pyodbc
import pandas as pd
server = 'yourservername'
database = 'yourdatabasename'
username = 'username'
password = 'yourpassword'
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
# Insert Dataframe into SQL Server:
for index, row in df.iterrows():
cursor.execute("Insert your QUERY here")
cnxn.commit()
cursor.close()
Prequisite:
Please install the pyodbc package here https://mkleehammer.github.io/pyodbc/
Reference:
https://learn.microsoft.com/en-us/sql/machine-learning/data-exploration/python-dataframe-sql-server?view=sql-server-ver16

Related

Setting Data Frame Column Names with Data Frame includes extra characters: ('ColumnName',)

I've got a python script set to pull data and column names from a Pervasive PSQL database, and it then creates the table and records in MS SQL. I'm creating data frames for the data and for the column names, then renaming the the data's column names from the column name date frame.
However, when the table is created in MS SQL the column names come in as = ('ColumnName',)
The desired column names would not have ('',) and should read as = ColumnName
Below is the code i'm using to get here. Any help on formatting the column names to not include those extra characters would be very helpful!
'''
Start - Pull Table Data
'''
conn = pyodbc.connect(conn_str, autocommit=True)
cursor = conn.cursor()
statement = 'select * from '+db_tbl_name
stRows = cursor.execute(statement)
df_data = pandas.DataFrame((tuple(t) for t in stRows))
df_data = df_data.applymap(str)
'''
End - Pull Table Data
'''
'''
Start - Pull Column Names
'''
conn = pyodbc.connect(conn_str, autocommit=True)
cursor_col = conn.cursor()
statement_col = "select CustomColumnName from "+db_col_tbl_name+" where CustomTableName = '"+db_tbl_name+"' and ODBCOptions > 0 order by FieldNumber"
stRows_col = cursor_col.execute(statement_col)
df_col = pandas.DataFrame((tuple(t) for t in stRows_col))
'''
End - Pull Column Names
'''
'''
Start - Add Column Names to Table data (df_data)
'''
df_data.columns = df_col
'''
End - Add Column Names to Table data (df_data)
'''
'''
Start - Create a sqlalchemy engine
'''
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=Server;"
"DATABASE=DB;"
"UID=UID;"
"PWD=PWD;")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
'''
End - Create a sqlalchemy engine
'''
'''
Start - Create sql table and rows from sqlalchemy engine
'''
df_data.to_sql(name=db_tbl_name, con=engine, if_exists='replace')
'''
End - Create sql table and rows from sqlalchemy engine
'''
This worked for me and should resolve the issue. Change.
df_col = pandas.DataFrame((tuple(t) for t in stRows_col))
to
df_col=[]
for row in stRows_col:
df_col.append(row[0])
Pyodbc would be moving the data it captures into pyodbc objects. The type(stRows_col) would yield <class 'pyodbc.Cursor'> and row would give you a <class 'pyodbc.Row'>. You can get the value for the pyodbc.Row by using row[0]

JSON response into database

Ok, I have tried several kinds of solutions recommended by others on this site and other sited. However, I can't get it work as I would like it to do.
I get a XML-response which I normalize and then save to a CSV. This first part works fine.
Instead of saving it to CSV I would like to save it into an existing table in an access database. The second part below:
Would like to use an existing table instead of creating a new one
The result is not separated with ";" into different columns. Everything ends up in the same column not separated, see image below
response = requests.get(u,headers=h).json()
dp = pd.json_normalize(response,'Units')
response_list.append(dp)
export = pd.concat(response_list)
export.to_csv(r'C:\Users\username\Documents\Python Scripts\Test\Test2_'+str(now)+'.csv', index=False, sep=';',encoding='utf-8')
access_path = r"C:\Users\username\Documents\Python Scripts\Test\Test_db.accdb"
conn = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \
.format(access_path))
strSQL = "SELECT * INTO projects2 FROM [text;HDR=Yes;FMT=sep(;);" + \
"Database=C:\\Users\\username\\Documents\\Python Scripts\\Test].Testdata.csv;"
cur = conn.cursor()
cur.execute(strSQL)
conn.commit()
conn.close()
If you already have the data in a well-formed pandas DataFrame then you don't really need to dump it to a CSV file; you can use the sqlalchemy-access dialect to push the data directly into an Access table using pandas' to_sql() method:
from pprint import pprint
import urllib
import pandas as pd
import sqlalchemy as sa
connection_string = (
r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
r"DBQ=C:\Users\Public\Database1.accdb;"
r"ExtendedAnsiSQL=1;"
)
connection_uri = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
engine = sa.create_engine(connection_uri)
with engine.begin() as conn:
# existing data in table
pprint(
conn.execute(sa.text("SELECT * FROM user_table")).fetchall(), width=30
)
"""
[('gord', 'gord#example.com'),
('jennifer', 'jennifer#example.com')]
"""
# DataFrame to insert
df = pd.DataFrame(
[
("newdev", "newdev#example.com"),
("newerdev", "newerdev#example.com"),
],
columns=["username", "email"],
)
df.to_sql("user_table", engine, index=False, if_exists="append")
with engine.begin() as conn:
# updated table
pprint(
conn.execute(sa.text("SELECT * FROM user_table")).fetchall(), width=30
)
"""
[('gord', 'gord#example.com'),
('jennifer', 'jennifer#example.com'),
('newdev', 'newdev#example.com'),
('newerdev', 'newerdev#example.com')]
"""
(Disclosure: I am currently the maintainer of the sqlalchemy-access dialect.)
Solved with the following code
SE_export_Tuple = list(zip(SE_export.Name,SE_export.URL,SE_export.ImageUrl,......,SE_export.ID))
print(SE_export_Tuple)
access_path = r"C:\Users\username\Documents\Python Scripts\Test\Test_db.accdb"
conn = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \
.format(access_path))
cursor = conn.cursor()
mySql_insert_query="INSERT INTO Temp_table (UnitName,URL,ImageUrl,.......,ID) VALUES (?,?,?,......,?)"
cursor.executemany(mySql_insert_query,SE_export_Tuple)
conn.commit()
conn.close()
However, when I add many fields I get an error at "executemany", saying:
cursor.executemany(mySql_insert_query,SE_export_Tuple)
Error: ('HY004', '[HY004] [Microsoft][ODBC Microsoft Access Driver]Invalid SQL data type (67) (SQLBindParameter)')

How do I create a csv file using Python from sybase db table depending on a value in a column of that table?

I am new at python and trying to write a script which connects with the database(Sybase ase) and copies the data present the table in the specific size of CSV files depending on one column in the table(index)
Code that I have tried:
server = 'XYZ'
database = 'user_details'
username = ''
password = ''
#have used sql express for now
import pyodbc
import csv
db = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+password+'')
c = db.cursor()
c.execute("SELECT * FROM userinfo")
list1 = c.fetchall()
#print (list1)
cursor = db.cursor()
cursor.execute("SELECT * FROM userinfo;")
with open("C:\\Users\\ABC\\Desktop\\out.csv", "w", newline='') as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerow([i[0] for i in cursor.description]) # write headers
csv_writer.writerows(cursor)
Now, I want to make CSV in batches. Like for eg. first 100 records of the table in 1 file(file_1.csv) and next 100 records in another file(file_2.csv). Depending on the output of the select statement.
How do I create a csv file using Python from sybase db table depending on a value in a column of that table?

Query DB table and convert to DataFrame ok, but data is truncated when exported

I am querying a MySQL DB and fetching data from a table. My code looks like this.
# login creds, etc., etc.,
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM myTable")
Everything seems ok, I guess, but when I export the data to a text file.
myresult = mycursor.fetchall()
df = pd.DataFrame(myresult)
#print(df)
df=str(df)
outF = open("C:\\Users\\ryans\\OneDrive\\Desktop\\test.txt", "w")
outF.writelines(df)
outF.close()

How to copy entire SQL Server table into CSV including column headers?

Summary
I have a Python program (2.7) that connects to a SQL Server database using SQLAlchemy. I want to copy the entire SQL table into a local CSV file (including column headers). I'm new to SQLAlchemy (version .7) and so far I'm able to dump the entire csv file, but I have to explicitly list my column headers.
Question
How do I copy an entire SQL table into a local CSV file (including column headers)? I don't want to explicitly type in my column headers. The reason is that I want to avoid changing the code if there's changes in the table's columns.
Code
import sqlalchemy
# Setup connection info, assume database connection info is correct
SQLALCHEMY_CONNECTION = (DB_DRIVER_SQLALCHEMY + '://'
+ DB_UID + ":" + DB_PWD + "#" + DB_SERVER + "/" + DB_DATABASE
)
engine = sqlalchemy.create_engine(SQLALCHEMY_CONNECTION, echo=True)
metadata = sqlalchemy.MetaData(bind=engine)
vw_AllCenterChatOverview = sqlalchemy.Table( \
'vw_AllCenterChatOverview', metadata, autoload=True)
metadata.create_all(engine)
conn = engine.connect()
# Run the SQL Select Statement
result = conn.execute("""SELECT * FROM
[LifelineChatDB].[dbo].[vw_AllCenterChatOverview]""")
# Open file 'output.csv' and write SQL query contents to it
f = csv.writer(open('output.csv', 'wb'))
f.writerow(['StartTime', 'EndTime', 'Type', 'ChatName', 'Queue', 'Account',\
'Operator', 'Accepted', 'WaitTimeSeconds', 'PreChatSurveySkipped',\
'TotalTimeInQ', 'CrisisCenterKey']) # Where I explicitly list table headers
for row in result:
try:
f.writerow(row)
except UnicodeError:
print "Error running this line ", row
result.close()
Table Structure
In my example, 'vw_AllCenterChatOverview' is the table. Here's the Table Headers:
StartTime, EndTime, Type, ChatName, Queue, Account, Operator, Accepted, WaitTimeSeconds, PreChatSurveySkipped, TotalTimeInQ, CrisisCenterKey
Thanks in advance!
Use ResultProxy.keys:
# Run the SQL Select Statement
result = conn.execute("""SELECT * FROM
[LifelineChatDB].[dbo].[vw_AllCenterChatOverview]""")
# Get column names
column_names = result.keys()

Categories

Resources