Replacing BCP utility with microservice - python

My team want to implement a ASP.NET CORE Web API based micro service with a plan to replace bulk copy program utility. Currently we are using BCP utility to return 200,000 rows with 30 columns. The data is returned in csv format.
We created a restful endpoint and using ADO.NET we are connecting to SQL server to extract same volume of data. Here is the code:
using (SqlConnection myConnection = new SqlConnection(con))
{
string oString = "Select * from Employees where runid = 1";
SqlCommand oCmd = new SqlCommand(oString, myConnection);
myConnection.Open();
using (SqlDataReader oReader = oCmd.ExecuteReader())
{
while (oReader.Read())
{
//Read data here
}
}
}
With this code, I am getting memory exceptions..
What is the best way to fix this issue considering in future I will get request to return higher volume of data with more users making simultaneous request. I am open to implementing this solution using C#, Java, Python or NodeJs.

The following code streams the data directly from the database, so it should be quite memory efficient and performant. It is using the Sylvan Csv library functionality to create csv records directly from the SqlDataReader.
using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.Infrastructure;
using Microsoft.Data.SqlClient;
using Sylvan.Data.Csv;
...
[HttpGet(Name = "Get")]
public IActionResult Get()
{
return new FileCallbackResult("text/csv", async (outputStream, _) =>
{
using (var myConnection = new SqlConnection(_configuration["ConnectionStrings"]))
{
var cmdText = "Select * from Employees where runid = 1";
var command = new SqlCommand(cmdText, myConnection);
myConnection.Open();
using (SqlDataReader oReader = await command.ExecuteReaderAsync())
{
var streamWriter = new StreamWriter(outputStream);
var csvDataWriter = CsvDataWriter.Create(streamWriter);
await csvDataWriter.WriteAsync(oReader);
}
}
})
{
FileDownloadName = "employees.csv"
};
}
FileCallbackResult is from: https://github.com/StephenClearyExamples/AsyncDynamicZip/blob/net6-ziparchive/Example/src/WebApplication/FileCallbackResult.cs
You can read about it here: https://blog.stephencleary.com/2016/11/streaming-zip-on-aspnet-core.html

Related

.NET Azure Function App using UpsertItemAsync to upload to CosmosDB is dramatically slow, especially compared to Python's CosmosClient

I have a .NET Function App running on Azure, that is uploading data to CosmosDB like this:
foreach (var message in messages)
{
try
{
await notificationsContainer.UpserItemAsync(message, message.NTID);
}
catch (Exception exception)
{
//...
}
}
The UpsertItemAsync is a wrapper:
public async Task<T> UpsertItemAsync(T document, string partitionKey)
{
ItemResponse<T> response = await _container.UpsertItemAsync<T>(document, new PartitionKey(partitionKey));
return response.Resource;
}
I'm doing a test with 6500 messages. It took 16 minutes to upload 640(!) messages to the database. On the same time, using Python's CosmosClient, this call
container.create_item(message)
times 6500, takes 131 seconds to complete.
Moreover, the Function App is running on Azure and the CosmosClient is set with direct connectivity mode:
CosmosClient client = clientBuilder
.WithConnectionModeDirect()
.WithThrottlingRetryOptions(new TimeSpan(0, 0, 0, 0, config.MaxRetryWaitTimeInMilliSeconds), config.MaxRetryCount)
.WithBulkExecution(true)
.Build();
While the python script is running on an on-prem VM.
What could be the explanation of this dramatic difference in performance? Isn't the function app incredibly slow?
Your problem is that you are enabling Bulk Mode (.WithBulkExecution(true)) but doing await on each operation.
When using Bulk Mode (reference https://devblogs.microsoft.com/cosmosdb/introducing-bulk-support-in-the-net-sdk/) you need to create those operations but not await individually. Something like:
List<Task> operations = new List<Task>();
foreach (var message in messages)
{
operations.Add(notificationsContainer.UpserItemAsync(message, message.NTID));
}
try
{
await Task.WhenAll(operations);
}
catch(Exception ex)
{
//...
}
Either that or disable Bulk Mode if you are going to do individual operation execution.

How to process video that is being streamed to a Firebase signaling server in Python

I have managed to setup my webcam to point to a specific location on the Firebase Database and broadcast a video using WebRTC.
I do this as follows in Javascript (and display in my HTML):
<video id="yourVideo" autoplay muted playsinline></video>
...
var database = firebase.database().ref('node_on_firebase');
var yourVideo = document.getElementById("yourVideo");
var friendsVideo = document.getElementById("friendsVideo");
var yourId = Math.floor(Math.random()*1000000000);
var servers = {'iceServers': [{'urls': 'stun:stun.services.mozilla.com'}, {'urls': 'stun:stun.l.google.com:19302'}, {'urls': 'turn:numb.viagenie.ca','credential': 'webrtc','username': 'websitebeaver#mail.com'}]};
var pc = new RTCPeerConnection(servers);
pc.onicecandidate = (event => event.candidate?sendMessage(yourId, JSON.stringify({'ice': event.candidate})):console.log("Sent All Ice") );
pc.onaddstream = (event => friendsVideo.srcObject = event.stream);
function sendMessage(senderId, data) {
var msg = database.push({ sender: senderId, message: data });
msg.remove();
}
function readMessage(data) {
// works
var msg = JSON.parse(data.val().message);
var sender = data.val().sender;
if (sender != yourId) {
if (msg.ice != undefined)
pc.addIceCandidate(new RTCIceCandidate(msg.ice));
else if (msg.sdp.type == "offer")
pc.setRemoteDescription(new RTCSessionDescription(msg.sdp))
.then(() => pc.createAnswer())
.then(answer => pc.setLocalDescription(answer))
.then(() => sendMessage(yourId, JSON.stringify({'sdp': pc.localDescription})));
else if (msg.sdp.type == "answer")
pc.setRemoteDescription(new RTCSessionDescription(msg.sdp));
}
};
database.on('child_added', readMessage);
function closeMyFace() {
yourVideo.srcObject.getTracks().forEach(track => track.stop());
}
function showMyFace() {
navigator.mediaDevices.getUserMedia({audio:false, video:true}).
then(function(stream){
pc.addStream(stream)
yourVideo.srcObject = stream
})
.catch(function(error){
console.log(error)
})
}
function showFriendsFace() {
pc.createOffer()
.then(offer => pc.setLocalDescription(offer) )
.then(() => sendMessage(yourId, JSON.stringify({'sdp': pc.localDescription})) );
}
However, how do I download/stream this video and process the video in chunks, ideally in a Python script?
If you intend to download/process the video while it is streaming, then your (python) client will need to create its own RTCPeerConnection so that it can also receive the video stream. I believe that would not be trivial in python, though probably easier on other platforms. More info: WebRTC Python implementation
If your use case allows you to process the video after the recording is complete (or at least, your use case is okay with significant latency), then could have the javascript client upload the data as it received or later in batch (from friendsVideo stream in the example above), possibly in chunks, to a location where your custom (python) client could then download and process.
Although not related to RTCPeerConnection, you can search here on SO for other users that have used firebase for streaming video (with mixed results). Again though, that is somewhat different from what you are trying to do with RTCPeerConnection. Example: Firebase Storage Video Streaming

How do I correctly make consecutive calls to a child process in Node.js?

I have a Node.js application which is currently a web-based API. For one of my API functions, I make a call to a short Python script that I've written to achieve some extra functionality.
After reading up on communicating between Node and Python using the child_process module, I gave it a try and achieved my desired results. I call my Node function that takes in an email address, sends it to Python through std.in, my Python script performs the necessary external API call using the provided e-mail, and writes the output of the external API call to std.out and sends it back to my Node function.
Everything works properly until I fire off several requests consecutively. Despite Python correctly logging the changed e-mail address and also making the request to the external API with the updated e-mail address, after the first request I make to my API (returning the correct data), I keep receiving the same old data again and again.
My initial guess was that Python's input stream wasn't being flushed, but after testing the Python script I saw that I was correctly updating the e-mail address being received from Node and receiving the proper query results.
I think there's some underlying workings of the child_process module that I may not be understanding... since I'm fairly certain that the corresponding data is being correctly passed back and forth.
Below is the Node function:
exports.callPythonScript = (email)=>
{
let getPythonData = new Promise(function(success,fail){
const spawn = require('child_process').spawn;
const pythonProcess = spawn('python',['./util/emailage_query.py']);
pythonProcess.stdout.on('data', (data) =>{
let dataString = singleToDoubleQuote(data.toString());
let emailageResponse = JSON.parse(dataString);
success(emailageResponse);
})
pythonProcess.stdout.on('end', function(){
console.log("python script done");
})
pythonProcess.stderr.on('data', (data) => {
fail(data);
})
pythonProcess.stdin.write(email);
pythonProcess.stdin.end();
})
return getPythonData;
}
And here is the Python script:
import sys
from emailage.client import EmailageClient
def read_in():
lines = sys.stdin.readlines()
return lines[0]
def main():
client = EmailageClient('key','auth')
email = read_in()
json_response = client.query(email,user_email='authemail#mail.com')
print(json_response)
sys.stdout.flush()
if __name__ == '__main__':
main()
Again, upon making a single call to callPythonScript everything is returned perfectly. It is only upon making multiple calls that I'm stuck returning the same output over and over.
I'm hitting a wall here and any and all help would be appreciated. Thanks all!
I've used a Mutex lock for this kind of example. I can't seem to find the question the code comes from though, as I found it on SO when I had the same kind of issue:
class Lock {
constructor() {
this._locked = false;
this._waiting = [];
}
lock() {
const unlock = () => {
let nextResolve;
if (this._waiting.length > 0) {
nextResolve = this._waiting.pop(0);
nextResolve(unlock);
} else {
this._locked = false;
}
};
if (this._locked) {
return new Promise((resolve) => {
this._waiting.push(resolve);
});
} else {
this._locked = true;
return new Promise((resolve) => {
resolve(unlock);
});
}
}
}
module.exports = Lock;
Where I then call would implement it like this, with your code:
class Email {
constructor(Lock) {
this._lock = new Lock();
}
async callPythonScript(email) {
const unlock = await this._lock.lock();
let getPythonData = new Promise(function(success,fail){
const spawn = require('child_process').spawn;
const pythonProcess = spawn('python',['./util/emailage_query.py']);
pythonProcess.stdout.on('data', (data) =>{
let dataString = singleToDoubleQuote(data.toString());
let emailageResponse = JSON.parse(dataString);
success(emailageResponse);
})
pythonProcess.stdout.on('end', function(){
console.log("python script done");
})
pythonProcess.stderr.on('data', (data) => {
fail(data);
})
pythonProcess.stdin.write(email);
pythonProcess.stdin.end();
})
await unlock();
return getPythonData;
}
}
I haven't tested this code, and i've implemented where i'm dealing with arrays and each array value calling python... but this should at least give you a good start.

Possible to run Stored Procedure using documentDB SDK for Python?

I'm using Tornado and pydocumentDB to run a storage application on Azure. I also have a stored procedure:
function userIDSproc() {
var collection = getContext().getCollection();
// Query documents and take 1st item.
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
"SELECT * FROM root r WHERE r.id='user_ids_counter'",
function (err, feed, options) {
if (err) throw err;
// Check the feed and if empty, set the body to 'no docs found',
// else take 1st element from feed
if (!feed || !feed.length) getContext().getResponse().setBody('no docs found');
else tryUpdate(feed[0])
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
function tryUpdate(document){
document.counter += 1;
getContext().getResponse().setBody(document['counter']);
var isAccepted = collection.replaceDocument(document._self, document, function (err, document, options) {
if (err) throw err;
// If we have successfully updated the document - return it in the response body.
getContext().getResponse().setBody(document);
});
}
What I'm trying to do is increase the counter property of my user_ids document every time a new user is generated and their document is added to the collection. Is it possible to call that Sproc, update the counter, then query the document for the new counter, then use that new counter as the ID for the new user? The documentDB SDK on GitHub shows a couple methods like QueryStoredProcedures(self, collection_link, query, options=None):
and ReadStoredProcedures(self, collection_link, options=None): but nothing to actually execute.
For the DocumentDB Python SDK, you can call ExecuteStoredProcedure(self, sproc_link, params, options=None).
You can find a short example in the SDK's unit tests here:
https://github.com/Azure/azure-documentdb-python/blob/e605e7ca7b1ddd2454f1014f536a0fded9e6f234/test/crud_tests.py#L3408-L3409

How do I import csv files into a ms sql database?

I'm really out of my league here, but I hear that it can be done.
I have a PBX phone server with raw phone data in thousands of different CSV files that are captured every so often (between 8 minutes and 2 hours) that are stored in dated folders.
The only way I can connect to the server is through WinSCP, which just gives me the file structure (looks like Filezilla FTP).
So 2 things:
How would someone go about importing thousands of files into a SQL Server database (2008)?
How would someone go about setting up a timed event to import new CSV files as they are created?
I just need some direction. I don't even know where to start.
Thanks for your help.
If you have the permissions, you can do a BULK INSERT using sqlcmd with a batch file. In addition, you can create an SSIS package.
More information here and here.
1- There are many ways to load csv files into sql server i.e BCP ,BulkLoad, OpenRowSet( Bulk provider) ,SSIS ,Your Custom .Net application
Most popular is Bulk insert. See this blog for tutorial
http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/
2- Timed event ? again many options but most straight forward -> Setup Nighly SQL jobs
using System;
using System.Data;
using Microsoft.VisualBasic.FileIO;
namespace ReadDataFromCSVFile
{
static class Program
{
static void Main()
{
string csv_file_path=#"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
}
}

Categories

Resources