Integrating Sharepoint Online with OneLake in Fabric

Its common to have spreadsheets in SharePoint and want to ingest them into OneLake in Fabric. Typically this could be master data, mappings, budgets or any type of data that does not reside within a core enterprise system.

We’ve developed a Python class that can be used in Fabric Notebooks that wraps the Graph API to allow file download or upload from SharePoint into either a raw file on OneLake or into a Lakehouse table.

you can skip to get hold of the sample Fabric Notebook here to play with the good stuff, sample code.
https://github.com/ProdataSQL/Fabric/tree/main/07_SharePoint

Prerequisites

Before you can do any automation with SharePoint you will need a service principle identity created in AAD and the following details

Tennant ID
Client ID
Secret
SharePoint URL
Library Name
Folder Name
Filename wildcard

The creation of the service principle is a bit cumbersome as its needs AAD rights, but there is a good point and click guide here (Powershell option too!)
https://sposcripts.com/download-files-from-sharepoint-using-graph/

Sample Code for PySpark Notebook

ConnectionSettings = '{"library": "Unittest", "tenant_id":"xxxxxxxx-xxxx--xxxx-xxxxxxxxxxxx","app_client_id":"app-fabricdw-dev-clientid","app_client_secret":"app-fabricdw-dev-clientsecret","keyvault":"kv-fabric-dev","sharepoint_url":"prodata365.sharepoint.com","site":"Fabric"}'
SourceSettings = '{}'
SourceDirectory = 'tst'
TargetDirectory = 'unittest/AW/tst'
SourceObject = '*.xlsx'
TargetFileName = ''
TargetSettings = ''   

from builtin.sharepoint import Sharepoint,AuthToken
import pandas
import json
from os.path import join
from pathlib import Path

SourceSettings = SourceSettings or '{}'
ConnectionSettings = ConnectionSettings or '{}'
source_connection_options = json.loads(ConnectionSettings)
source_options = json.loads(SourceSettings)

auth_token = AuthToken(**source_connection_options)
sharepoint = Sharepoint(auth_token, folder=SourceDirectory, file=SourceObject, **source_options, **source_connection_options)

files = sharepoint.get_file_bytes()

for file_name, file_bytes in files.items():
    Path(join("/lakehouse/default/Files/",TargetDirectory)).mkdir(parents=True, exist_ok=True)

    with open(join("/lakehouse/default/Files/",TargetDirectory,file_name), "wb") as f:
        f.write(file_bytes.getbuffer())

Next Steps

Hopefully this helps you get sharepoint integrated into Fabric up and running. If you want to chat about Fabric or data warehousing drop us an email on fabric@prodata.ie or you can catch me on twitter
https://twitter.com/bob_duffy

Integrating Sharepoint Online with OneLake in Fabric

Prerequisites

Sample Code for PySpark Notebook

Next Steps

Like this:

Related

Published by Bob Duffy

Leave a ReplyCancel reply

Prerequisites

Sample Code for PySpark Notebook

Next Steps

Share this:

Like this:

Related

Published by Bob Duffy

Leave a ReplyCancel reply

Discover more from Prodata