The world of source control for PowerBI is kinda in conflict
- On one side we need to use Source control and even Microsoft tools like Power BI deployment pipelines or your own CI/CD put PBIX files in GIT.
- On the orther side GIT has clear guidelines not to store large/binary files.
If you have put Power BI files into GIT in and Azure Repos, then over time with a lot of changes the repos will become very large and bloated as GIT keeps the history of all changes.
In this case we took a repos from 57.6GB to 3.42 GB.
Here are the 9 steps to compress the Azure repos if this does get out of control.
Step by Step Cleaning Azure Repos
1. Check Permissions on Azure Repos
Part of the process needs to re-write history and you need to be in a group in Azure repos which has been specifically granted thta permission (as below) in the repos permissions on portal.
2. Download BFG cleaner Tools
Your going to need the following tools which I downloaded to S:\apps\BFG on a dev server
- Java (JRE) Installed from https://www.java.com/en/download/
- Copy exe for git-sizer into s:\apps\bfg from https://github.com/github/git-sizer/releases
- download the Java BFG repo cleaner into S:\apps\bfg. In my case I renamed it to bfg.jar to make it simpler to call
3. Ensure you have latest pull of code
Clone the azure Repos and make sure you have latest copy
4. Check the Size of the Repos
Using the git-sizer exe we can check thje starting size. In my case it was an impressive 57.6 GB. with one file at 608 MB (an old PowerBI model not even used anymore)
5. Use BFG to re-write history on all files > 1 MB
The BFG repos cleaner is documented here . At the most basic level you can pick a size and delete history for all files greater than target size.
Note: the BFG will still keep your most “current” version. so work in progress and production code should not be affected. This is just history.
java -jar s:\apps\bfg\bfg.jar --strip-blobs-bigger-than 1M
6. Prune old files from repos
This may take a while. It was 30 mins on a 16 core Azure server (and about 3 mins on my desktop with an i9 and NVME SSD)
git reflog expire --expire=now --all && git gc --prune=now –-aggressive
7. Re-check Size of Azure Repos
Re-run the git-sizer.exe tool. In this case, size went down to 3.42 GB from 57.6 GB. Quite a saving!
8. Do a Forced push into azure repos
So far, the cleaning and pruning is just in the local repos, so we can now push that into the Azure repos with a command below
git push --force
9. Get all developers to do a pull
When developers next do a pull, it will take a good while (30-60 mins for me) as the local GIT repose does a pack and compress to shrink the local repos size.
With so much pruning and re-compressing of the local GIT repos, it may be faster to just delete the local repos and do a fresh clone which can just be a few minutes.
git pull –force git reflog expire --expire=now --all && git gc --prune=now –-aggressive
Optionally you can reset the local repos back to head in azure repos with below. In some cases we ran into merge issues and had to do this.
git reset --hard origin/master git pull -–force git reflog expire --expire=now --all && git gc --prune=now –-aggressive
What’s next and Lessons Learned ?
One thing to bear in mind is that as commits happen the bloated size will just come back. so this maintenance may now be a regular thing. I’m looking into see if we can configure say Large File support for Azure repos using an external blob store for Power BI Files. I’m not 100% sure if its compatible yet. Still googling and asking asround to see if anyone else has done it!
Another lesson learned is when working on an enterprise project of which Power BI is just one small part, we may configure a seperate repos just for Power BI. This will make visual studio projects much slimmer and CI/CD faster, while also allowing maintennce to focus on just the PowerBI repos.