perform update on basis of select query column using python - python

How to make the query of the update, after selecting in a way that if column
initially does not have value in a table then perform an update on that column else does not perform an update using python.
The update query is mentioned below.
sql_update = """Update table_name1 set column1 = %s, column2 = %s,column3=%s,column4=%s where column5 = %s"""
input = ('a', 'b', 'c', 'd' , 1)
cursor.execute(sql_update , input)
conn.commit()

With sql you can do an update only on columns that have empty values as:
update table_name set col_1 = 1, col_2 = 2 where col_3 is null

Related

Pandas or python, loop through SQL queries held in a column in a dataframe, append the results in a dataframe

I have a dataframe which consists of 1000's of different SQL queries I would like to loop through and have the results of each appended in another dataframe, below is an example of 2 rows.
index
SQLQuery
0
select comment from tablea where columna = 'x' and columnb = 'y'
1
select comment from tableb where columnc = 'w' and columnd = 'z'
I believe it is along the lines of the following but I can't get it to work-
select_query = 'What should I set in here?'
df_list = []
for x in select_query:
df = pd.read_sql(select_query, conn)
df_list.append(df)
final_df = pd.concat(df_list)

Identify column names from DataFrame in Python when Column names are unknown

I run a for loop for executing few SQL queries . I have the results captured in a DataFrame (again inside the loop) as below for two validations.
DATAFRAME for Test1:
index column1 column2
0 jack 100
1 bill 200
2 Tom 300
DATAFRAME Looks for Test2:
index column1
0 102345
1 102345
I have to write the results of Dataframe for each Test to another table in Oracle . In order to do this , I need to get the column names. I am unable to Identify how many column names are present at a given point in time in the loop as the Dataframe can have from 1-5 columns depending upon the SQL run . Is there a way to do this .
Code for reading from table and writing to DataFrame:
def get_src_query_metadata(cursor, sql_query):
cursor.execute(sql_query)
columns = [col[0] for col in cursor.description]
cursor.rowfactory = lambda *args: dict(zip(columns, args))
data = pd.DataFrame(cursor.fetchall())
return data
def get_target_query_metadata(cursor, sql_query):
cursor.execute(sql_query)
columns = [col[0] for col in cursor.description]
cursor.rowfactory = lambda *args: dict(zip(columns, args))
data = pd.DataFrame(cursor.fetchall())
return data
def main():
_JobDict_src = get_src_query_metadata(cursor, src_query[i])
_JobDict_tgt = get_target_query_metadata(cursor, target_query[i])
How do I get the column names and its values assign to separate variables .
You can find and count column names through this loop
coln=0
for col in df.columns:
coln+=1
print(col)
print(coln)
and find data types through the following
for col in df.dtypes:
print(col)

How to parse sql statement insert into to get values with pyspark

I have a sql dump with several insert into like the following one
query ="INSERT INTO `temptable` VALUES (1773,0,'morne',0),(6004,0,'ATT',0)"
I'm trying to get only the values in a dataframe
(1773,0,'morne',0)
(6004,0,'ATT',0)
I tried
spark._jsparkSession.sessionState().sqlParser().parsePlan(query)
and get
'InsertIntoTable 'UnresolvedRelation `temptable`, false, false
+- 'UnresolvedInlineTable [col1, col2, col3, col4], [List(1773, 0,
morne, 0), List(6004,0, 0, ATT, 0)]
But I don't know how to retrieve those lists of value
is there a way to get without hive?
If you are trying to get only list of values from multiple insert statements then you may try below
listOfInserts = [('''INSERT INTO temptable VALUES (1773,0,'morne',0),(6004,0,'ATT',0)''',),('''INSERT INTO temptable VALUES (1673,0,'morne',0),(5004,0,'ATT',0)''',)]
df = spark.createDataFrame(listOfInserts, ['VALUES'])
from pyspark.sql.functions import substring_index
df.select(substring_index(df.VALUES, 'VALUES', -1).alias('right')).show(truncate = False)

Python insert array into mysql table column avoiding record duplication

I have a simple array like this:
a=['a','b','c']
I need to insert all the "a" elements into Mysql table 'items' and column 'names' avoiding to insert the element if it's already present in 'names' column, avoiding iteration and multiple INSERT query.
Thanks
1) You can use MySQL specific INSERT ... ON DUPLICATE KEY UPDATE Syntax. (I assume there is PRIMARY KEY or UNIQUE KEY on column 'name')
(additionaly: a = list(set(a)) #to remove duplicates in a).
a = ['a', 'b', 'c']
c = conn.cursor()
c.executemany('INSERT INTO items (name) VALUES (%s) ON DUPLICATE KEY UPDATE name = name', a)
2) If there is no uniqueness constriaint on column 'name', you can check which names are already in database and remove them from your list to insert:
a = ['a', 'b', 'c']
c = conn.cursor()
c.execute('SELECT names from items')
existent_names = [name[0] for name in c]
a = list(set(a) - set(existent_names))
c.executemany('INSERT INTO items (name) VALUES (%s)', a)
Hi if you dont want to have duplicate 'names' into your table 'items' maybe that should be your primary key, and when ever you try to insert duplicate values, mysql simply wont allow it. However this code maybe can help you if you are using different primary key for your table:
import MySQLdb
conn = MySQLdb.connect(host= "localhost",
user="root",
passwd="yourpass",
db="yourdb")
x = conn.cursor()
a=['a','b','c']
for i in range (0,len(a))
try:
data = x.execute("""SELECT name FROM items WHERE name=%s """, a[i])
data = cursor.fetchone()
if data == None:
x.execute("""INSERT INTO items(name) VALUES(%s)""",a[i])
conn.commit()
except:
conn.rollback()
This is what I understand from your post. You want to add unique values from the list into the database. You can use python's set for it.
a = ['a', 'b', 'c']
set_a = list(set(a))
inset_into_db(set_a)
If holding all items.names won't be memory costly, you can run a query to select all items.names and keep these in a set.
cursor.execute('SELECT names from items')
names_set = set(name[0] for name in cursor)
Before you execute a query, filter out existing names in this set.
fresh_a = filter(lambda v: v not in names_set, a)
If you are concerned about duplicates names in a, you can apply cast it to a set.

Insert data into grouped DataFrame (pandas)

I have a pandas dataframe grouped by certain columns. Now I want to insert the mean of the numeric values of four adjacent columns into a new column. This is what I did:
df = pd.read_csv(filename)
# in this line I extract a unique ID from the filename
id = re.search('(\w\w\w)', filename).group(1)
Files look like this:
col1 | col2 | col3
-----------------------
str1a | str1b | float1
My idea was now the following:
# get the numeric values
df2 = pd.DataFrame(df.groupby(['col1', 'col2']).mean()['col3'].T
# insert the id into a new column
df2.insert(0, 'ID', id)
Now loop over all
for j in range(len(df2.values)):
for k in df['col1'].unique():
df2.insert(j+5, (k, 'mean'), df2.values[j])
df2.to_excel('text.xlsx')
But I get the following error, referring to the line with df.insert:
TypeError: not all arguments converted during string formatting
and
if not allow_duplicates and item in self.items:
# Should this be a different kind of error??
raise ValueError('cannot insert %s, already exists' % item)
I am not sure what string formatting refers to here, since I have only numerical values being passed around.
The final output should have all values from col3 in a single row (indexed by id) and every fifth column should be the inserted mean value of the four preceding values.
If I had to work with files like yours I code a function to convert to csv... something like that:
data = []
for lineInFile in file.read().splitlines():
lineInFile_splited = lineInFile.split('|')
if len(lineInFile_splited)>1: ## get only data and not '-------'
data.append(lineInFile_splited)
df = pandas.DataFrame(data, columns = ['A','B'])
Hope it helps!

Categories

Resources