Connecting to Fabric SqlEndpoints using AAD/Entra Token in Notebooks

Fabric Notebooks give great connectivity options to the default Lakehouse for Spark and SparkSQL, but they are more limited when it comes to connecting securely and smoothly to a SqlEndpoint, be that LH or DW.

We demand two principles

  • programmatically retrieve connection string rather than having to manually specify or pass as parameter.
  • Authenticate to SqlEndPoint via AAD token instead of using a service principal or having to pass a username/password without MFA.

Sample code below and also on GitHub link below. Paste this into a notebook which is connected to a default lakehouse and has a data warehouse in same workspace.
https://github.com/ProdataSQL/Fabric/blob/main/10_SqlEndPoint/SQL%20Endpoint.py

dw_name="FabricDW"   # Change this to your DW Name

import sempy.fabric as fabric
import struct
import sqlalchemy
import pyodbc
import pandas as pd
from notebookutils import mssparkutils

#Function to Return sqlalchemt ODBC Engine, given a connection string and using Integrated AAD Auth to Fabric
def create_engine(connection_string : str):
    token = mssparkutils.credentials.getToken('https://analysis.windows.net/powerbi/api').encode("UTF-16-LE")
    token_struct = struct.pack(f'<I{len(token)}s', len(token), token)
    SQL_COPT_SS_ACCESS_TOKEN = 1256
    return sqlalchemy.create_engine("mssql+pyodbc://", creator=lambda: pyodbc.connect(connection_string, attrs_before={SQL_COPT_SS_ACCESS_TOKEN: token_struct}))


# Get ODBC Connection String for Default LH ijn this Notebook
tenant_id=spark.conf.get("trident.tenant.id")
workspace_id=spark.conf.get("trident.workspace.id")
lakehouse_id=spark.conf.get("trident.lakehouse.id")
lakehouse_name=spark.conf.get("trident.lakehouse.name")
sql_end_point= fabric.FabricRestClient().get(f"/v1/workspaces/{workspace_id}/lakehouses/{lakehouse_id}").json()['properties']['sqlEndpointProperties']['connectionString']
connection_string = f"Driver={{ODBC Driver 18 for SQL Server}};Server={sql_end_point}"
print (f"connection_string={connection_string}")

engine = create_engine(connection_string)
with engine.connect() as alchemy_connection:
    #Run TSQL Query on a LH End Point
    query = f"exec {lakehouse_name}.[sys].[sp_server_info] 2"
    df = pd.read_sql_query(query, alchemy_connection)
    print (df)

    #Run TSQL Query on a DW End Point
    query = f"exec {dw_name}.[sys].[sp_server_info] 2"
    df = pd.read_sql_query(query, alchemy_connection)
    print (df)

    #Execute a TSQL Stored Procedure or DDL/DML on a Fabric DW
    connection = engine.raw_connection()
    cursor = connection.cursor()
    sql= f"USE {dw_name};CREATE TABLE tmpTable (Column1 INT NULL);DROP TABLE tmpTable"
    cursor.execute(sql)
    connection.commit()

How could Microsoft Improve this ?

  1. While there is an official DW connector for Notebooks, it currently only supports Scala for like the one person on the moon using Scala. So if this could be improved to give PySpark and support like SparkSQL has that would be amazing

    https://learn.microsoft.com/en-us/fabric/data-engineering/spark-data-warehouse-connector
  2. Currently mssparkutils.credentials.getToken() doesnt return a token that is immediately usable by ODBC or other database libraries. Would be great if that could happen so we dont have to manually mess with the token structure and also if Fabric SqlEndPoint was an audience option.

    https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python#get-token
  3. Currently the AAD token is issued under the security context of the person who last saved the notebook. ideally we need managed identity so a notebook can run under a system and not user account. This is my biggest concern security wise.

    This is not currently on the published roadmap for notebooks ;-(
    https://learn.microsoft.com/en-gb/fabric/release-plan/data-engineering
Credits

https://stackoverflow.com/questions/57193301/pass-azure-ad-token-for-azure-sql-db-in-pre-connection-arguments-to-sqlalchemy-c

Thanks to Aidan@prodata.ie who did most of the coding and testing.

Also thanks to https://x.com/mim_djo for pushing me to publish this workaround 😉

One thought on “Connecting to Fabric SqlEndpoints using AAD/Entra Token in Notebooks

Leave a Reply