Its common to have spreadsheets in SharePoint and want to ingest them into OneLake in Fabric. Typically this could be master data, mappings, budgets or any type of data that does not reside within a core enterprise system.
We’ve developed a Python class that can be used in Fabric Notebooks that wraps the Graph API to allow file download or upload from SharePoint into either a raw file on OneLake or into a Lakehouse table.
you can skip to get hold of the sample Fabric Notebook here to play with the good stuff, sample code.
https://github.com/ProdataSQL/Fabric/tree/main/07_SharePoint
Prerequisites
Before you can do any automation with SharePoint you will need a service principle identity created in AAD and the following details
- Tennant ID
- Client ID
- Secret
- SharePoint URL
- Library Name
- Folder Name
- Filename wildcard
The creation of the service principle is a bit cumbersome as its needs AAD rights, but there is a good point and click guide here (Powershell option too!)
https://sposcripts.com/download-files-from-sharepoint-using-graph/
Sample Code for PySpark Notebook
ConnectionSettings = '{"library": "Unittest", "tenant_id":"xxxxxxxx-xxxx--xxxx-xxxxxxxxxxxx","app_client_id":"app-fabricdw-dev-clientid","app_client_secret":"app-fabricdw-dev-clientsecret","keyvault":"kv-fabric-dev","sharepoint_url":"prodata365.sharepoint.com","site":"Fabric"}'
SourceSettings = '{}'
SourceDirectory = 'tst'
TargetDirectory = 'unittest/AW/tst'
SourceObject = '*.xlsx'
TargetFileName = ''
TargetSettings = ''
from builtin.sharepoint import Sharepoint,AuthToken
import pandas
import json
from os.path import join
from pathlib import Path
SourceSettings = SourceSettings or '{}'
ConnectionSettings = ConnectionSettings or '{}'
source_connection_options = json.loads(ConnectionSettings)
source_options = json.loads(SourceSettings)
auth_token = AuthToken(**source_connection_options)
sharepoint = Sharepoint(auth_token, folder=SourceDirectory, file=SourceObject, **source_options, **source_connection_options)
files = sharepoint.get_file_bytes()
for file_name, file_bytes in files.items():
Path(join("/lakehouse/default/Files/",TargetDirectory)).mkdir(parents=True, exist_ok=True)
with open(join("/lakehouse/default/Files/",TargetDirectory,file_name), "wb") as f:
f.write(file_bytes.getbuffer())
Next Steps
Hopefully this helps you get sharepoint integrated into Fabric up and running. If you want to chat about Fabric or data warehousing drop us an email on fabric@prodata.ie or you can catch me on twitter
https://twitter.com/bob_duffy