Hope everyone is well and staying safe.
I'm uploading an Excel file to SQL using Python. I have three fields: CarNumber nvarchar(50), Status nvarchar(50), and DisassemblyStart date.
I'm having an issue importing the DisassemblyStart field dates. The connection and transaction using Python are succesful.
However I get zeroes all over, even though the Excel file is populated with dates. I've tried switching to nvarchar(50), date, and datetime to see if I can least get a string and nothing. I saved the Excel file as CSV and TXT and tried uploading it and still got zeroes. I added 0.001 to every date in Excel (as to add an artificial time) in case that would make it clic but nothing happened. Still zeroes. I sure there's a major oversight from being too much in the weeds. I need help.
The Excel file has the three field columns.
This is the Python code:
import pandas as pd
import pyodbc
# Import CSV
data = pd.read_csv (r'PATH_TO_CSV\\\XXX.csv')
df = pd.DataFrame(data,columns= ['CarNumber','Status','DisassemblyStart'])
df = df.fillna(value=0)
# Connect to SQL Server
conn = pyodbc.connect("Driver={SQL Server};Server=SERVERNAME,PORT ;Database=DATABASENAME;Uid=USER;Pwd=PW;")
cursor = conn.cursor()
# Create Table
cursor.execute ('DROP TABLE OPS.dbo.TABLE')
cursor.execute ('CREATE TABLE OPS.dbo.TABLE (CarNumber nvarchar(50),Status nvarchar(50), DisassemblyStart date)')
# Insert READ_CSV INTO TABLE
for row in df.itertuples():
cursor.execute('INSERT INTO OPS.dbo.TABLE (CarNumber,Status,DisassemblyStart) VALUES (?,?,Convert(datetime,?,23))',row.CarNumber,row.Status,row.DisassemblyStart)
conn.commit()
conn.close()
Help will be much appreciated.
Thank you and be safe,
David
Related
I have a MySQL dump file as .sql format. Its size is around 100GB. There are just two tables in int. I have to extract data from this file using Python or Bash. The issue is the insert statement contains all data and that line is too lengthy. Hence, normal practice cause Memory issue as that line (i.e., all data) is load in loop also.
Is there any efficient way or tool to get data as CSV?
Just a little explanation. Following line contains actual data and it is of very large size.
INSERT INTO `tblEmployee` VALUES (1,'Nirali','Upadhyay',NULL,NULL,9,'2021-02-08'),(2,'Nirali','Upadhyay',NULL,NULL,9,'2021-02-08'),(3,'Nirali','Upadhyay',NULL,NULL,9,'2021-02-08'),....
The issue is that I cannot import it into MySQL due to resources issues.
I'm not sure if this is what you want, but pandas has a function to turn sql into a csv. Try this:
import pandas as pd
import sqlite3
connect = sqlite3.connect("connections.db")
cursor = connect.cursor()
# save sqlite table in a DataFrame
dataframe = pd.read_sql(f'SELECT * FROM table', connect)
# write DataFrame to CSV file
dataframe.to_csv("filename.csv", index = False)
connect.commit()
connect.close()
If you want to change the delimiter, you can do dataframe.to_csv("filename.csv", index = False, sep='3') and just change the '3' to your delimiter choice.
I have a pandas dataframe that I want to insert into a sqlite3 database (writing in python 3.6+).
However, when I insert the data into the table, it shows up as a text file, with text being written continuously. I tried copy/pasting the text from the database file into here on Stackoverflow, but Stackoverflow says there is an error when trying to submit. As an aside data point, when I open Sublime Text and copy/paste the text from the database, the only text that gets pasted is:
SQLite format 3
So, I am pasting below a screenshot of the text from the database. It looks like Stackoverflow doesn't like those unusual question mark characters.
My code to pass the pandas dataframe into sqlite3 is as follows. Note that I am only creating the table here (not inserting data yet) and still get this phenotype.
cols = ["name", "website", "phone", "address", "city"]
df = pd.DataFrame(columns = cols)
conn = sqlite3.connect('TestDB2.db')
c = conn.cursor()
c.execute("""
CREATE TABLE STUDENTS(
id integer PRIMARY KEY,
name text,
age integer,
city text,
country text
)""")
conn.commit()
dfObj.to_sql('STUDENTS', conn, if_exists='replace', index = False)
The pandas dataframe looks nice in python and when writing to an excel file. I am new to the sqlite3 world but am familiar with postgres. Any help would be much appreciated!!
As #DinoCoderSaurus mentions:
a sqlite3 database is not a text file. The result here is what one gets when opening a sqlite database with a text editor. Use command line sqlite3 or some tool (my fave DB browser for sqlite) to inspect db contents. It is unclear from the posted code what is dfObj. The to_sql is meant to insert data.
I did not realize I was looking at the file in a text editor. I installed a plugin for Visual Studio and expected it to provide database visualization. It does not. I installed DB browser and my data is there.
Is there any way to turn my excel workbook into MySQL database. Say for example my Excel workbook name is copybook.xls then MySQL database name would be copybook. I am unable to do it. Your help would be really appreciated.
Here I give an outline and explanation of the process including links to the relevant documentation. As some more thorough details were missing in the original question, the approach needs to be tailored to particular needs.
The solution
There's two steps in the process:
1) Import the Excel workbook as Pandas data frame
Here we use the standard method of using pandas.read_excel to get the data out from Excel file. If there is a specific sheet we want, it can be selected using sheet_name. If the file contains column labels, we can include them using parameter index_col.
import pandas as pd
# Let's select Characters sheet and include column labels
df = pd.read_excel("copybook.xls", sheet_name = "Characters", index_col = 0)
df contains now the following imaginary data frame, which represents the data in the original Excel file
first last
0 John Snow
1 Sansa Stark
2 Bran Stark
2) Write records stored in a DataFrame to a SQL database
Pandas has a neat method pandas.DataFrame.to_sql for interacting with SQL databases through SQLAlchemy library. The original question mentioned MySQL so here we assume we already have a running instance of MySQL. To connect the database, we use create_engine. Lastly, we write records stored in a data frame to the SQL table called characters.
from sqlalchemy import create_engine
engine = create_engine('mysql://USERNAME:PASSWORD#localhost/copybook')
# Write records stored in a DataFrame to a SQL database
df.to_sql("characters", con = engine)
We can check if the data has been stored
engine.execute("SELECT * FROM characters").fetchall()
Out:
[(0, 'John', 'Snow'), (1, 'Sansa', 'Stark'), (2, 'Bran', 'Stark')]
or better, use pandas.read_sql_table to read back the data directly as data frame
pd.read_sql_table("characters", engine)
Out:
index first last
0 0 John Snow
1 1 Sansa Stark
2 2 Bran Stark
Learn more
No MySQL instance available?
You can test the approach by using an in-memory version of SQLite database. Just copy-paste the following code to play around:
import pandas as pd
from sqlalchemy import create_engine
# Create a new SQLite instance in memory
engine = create_engine("sqlite://")
# Create a dummy data frame for testing or read it from Excel file using pandas.read_excel
df = pd.DataFrame({'first' : ['John', 'Sansa', 'Bran'], 'last' : ['Snow', 'Stark', 'Stark']})
# Write records stored in a DataFrame to a SQL database
df.to_sql("characters", con = engine)
# Read SQL database table into a DataFrame
pd.read_sql_table('characters', engine)
Still learning Python, so bear with me. I use the following script to import a csv file into a local SQL database. My problem is that the csv file usually has a bunch of empty rows at the end of it and I get primary key errors upon import. What's the best way to handle this? If I manually edit the csv in a text editor I can delete all the rows of ,,,,,,,,,,,,,,,,,,,,,,,,,,, and it works perfectly.
Bonus question, is there an easy way to iterate through all .csv files in a directory, and then delete or move them after they've been processed?
import pandas as pd
data = pd.read_csv (r'C:\Bookings.csv')
df = pd.DataFrame(data, columns= ['BookingKey','BusinessUnit','BusinessUnitKey','DateTime','Number','Reference','ExternalId','AmountTax','AmountTotal','AmountPaid','AmountOpen','AmountTotalExcludingTax','BookingFee','MerchantFee','ProcessorFee','NumberOfPersons','Status','StatusDateTime','StartTime','EndTime','PlannedCheckinTime','ActualCheckinTime','Attendance','AttendanceDatetime','OnlineBookingCheckedDatetime','Origin','CustomerKey'])
df = df.fillna(value=0)
print(df)
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=D3VBUP\SQLEXPRESS;'
'Database=BRIQBI;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
for row in df.itertuples():
cursor.execute('''
INSERT INTO BRIQBI.dbo.Bookings (BookingKey,BusinessUnit,BusinessUnitKey,DateTime,Number,Reference,ExternalId,AmountTax,AmountTotal,AmountPaid,AmountOpen,AmountTotalExcludingTax,BookingFee,MerchantFee,ProcessorFee,NumberOfPersons,Status,StatusDateTime,StartTime,EndTime,PlannedCheckinTime,ActualCheckinTime,Attendance,AttendanceDatetime,OnlineBookingCheckedDatetime,Origin,CustomerKey)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
''',
row.BookingKey,
row.BusinessUnit,
row.BusinessUnitKey,
row.DateTime,
row.Number,
row.Reference,
row.ExternalId,
row.AmountTax,
row.AmountTotal,
row.AmountPaid,
row.AmountOpen,
row.AmountTotalExcludingTax,
row.BookingFee,
row.MerchantFee,
row.ProcessorFee,
row.NumberOfPersons,
row.Status,
row.StatusDateTime,
row.StartTime,
row.EndTime,
row.PlannedCheckinTime,
row.ActualCheckinTime,
row.Attendance,
row.AttendanceDatetime,
row.OnlineBookingCheckedDatetime,
row.Origin,
row.CustomerKey
)
conn.commit()
Ended up being really easy. I added the dropna function so all the rows of data that had no data in them would be dropped.
df = df.dropna(how = 'all')
Now off to find out how to iterate through multiple files in a directory and move them to another location.
I have 3000 CSV files stored on my hard drive, each containing thousands of rows and 10 columns. Rows correspond to dates, and the number of rows as well as the exact dates is different across spreadsheets. The columns for all the spreadsheets are the same in number (10) and label. For each date from the earliest date across all spreadsheets to the latest date across all spreadsheets, I need to (i) access the columns in each spreadsheet for which data for that date exists, (ii) run some calculations, and (iii) store the results (a set of 3 or 4 scalar values) for that date. To clarify, results should be a variable in my workspace that stores the results for each date for all CSVs.
Is there a way to load this data using Python that is both time and memory efficient? I tried creating a Pandas data frame for each CSV, but loading all the data into RAM takes almost ten minutes and almost completely fills up my RAM. Is it possible to check if the date exists in a given CSV, and if so, load the columns corresponding to that CSV into a single data frame? This way, I could load just the rows that I need from each CSV to do my calculations.
Simple Solution.
Go and Download DB Browser for SQlite.
Open it, and create New Database.
After That, go to File and Import Table from CSV. ( Do this for All of your CSV Tables ) Alternatively, you can use Python script and sqlite3 library to be fast and automated for creating table and inserting values from your CSV sheets.
When you are done with importing all the tables, play around with this function based on your details.
import sqlite3
import pandas as pd
data = pd.read_csv("my_CSV_file.csv") # Your CSV Data Path
def create_database(): # Create Database with table name
con = sqlite3.connect('database.db')
cur = con.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS my_CSV_data (id INTEGER PRIMARY KEY, name text, address text,mobile text , phone text,balance float,max_balance INTEGER)")
con.commit()
con.close()
def insert_into_company(): # Inserting data into column
con = sqlite3.connect(connection_str)
cur = con.cursor()
for i in data:
cur.execute("INSERT INTO my_CSV_data VALUES(Null,?,?,?,?,?,?)",(i[0],i[1],i[2],i[3],i[4],i[5]))
con.commit()
con.close()
def select_company(): # Viewing Data from Column
con = sqlite3.connect(connection_str)
cur = con.cursor()
cur.execute("SELECT * FROM my_CSV_data")
data = cur.fetchall()
con.close()
return data
create_database()
insert_into_company()
for j in select_company():
print(j)
Do this Once, you can you use it again and again. It will enable you to access data in less than 1 second. Ask me, if you need any other help. I'll be happy to guide you through.