Optimising Microsoft Graph PowerShell scripts
We all have probably been there and developed a PowerShell script that took some fair amount of time until the execution completed, weren’t we? Of course one could argue and say that as long a script ‘works’ it is good enough but depending on the use case and environment a PowerShell script that runs 30 to 60 minutes exceeds the patience of most (IT) people and can also lead to increased costs. But what makes those kinds of scripts that awfully slow and can’t we just tweak them to run faster?
The following examples and script will be related to PowerShell and the Microsoft Graph API but the mentioned approaches can be adapted for any kind of RESTful API and scripting language. For all interactions with the Graph API I use the Microsoft Graph PowerShell SDK which is available as a PowerShell module from the PowerShell gallery.
Reasons why your script is slow
I would say I have already read and (tried to) understand hundreds of PowerShell scripts out there and I often notice the following flaws which lead to poor performance:
Slow algorithms and wrong data structures
Slow algorithms are extricably linked to the understanding of data structures when it comes to PowerShell scripting. Here the top two cases I often observe:
Where-Object queries within a loop looking for a particular element in a collection can be very time consuming when we actually expect a single result or match.
This pattern can be avoided by creating a hash table and checking whether the hash table contains the key we’re looking for and then directly accessing the element.
The creation of the hash table can be easily delegated to the
Group-Object cmdlet. The specified property will then become the key to the hash table.
Voilà, you just improved the time complexity of that
Where-Object snippet from
O(n^2) (worst case with linear search for
n elements with
n possibilities) to
O(n) (still iterating over
n elements but the hash table lookup has
O(1) complexity) by using a hash table. Don’t get scary by that
O notation, it’s just an estimate for the runtime behaviour of algorithms and when the expression of
O gets smaller, the code get’s faster.
You like to write
$myArray += $someObject ? Of course you can do so but an array has a fixed size and when adding an element the whole array needs to be copied in each iteration and this is not optimal from a performance perspective.
PowerShell inherits all the different collection types from C# and we can use them to have more performant collections like lists. Lists and half dynamic data structures were designed to handle additions and removals of elements and perform better than frequent array copy operations.¹
Please also make sure that you do not use the deprecated
System.ArrayList type anymore. It has not only that annoying return type of boolean on each add but should also not be used anymore. The
System.Collections.Generic namespace provides us a
List implementation that we can and should use!²
More information: Everything you wanted to know about arrays — PowerShell | Microsoft Learn.
Bad architecture and script design
A bad design approach would be to first query all users of a tenant and then querying the manager for each user individually within a loop. Imaging an Azure Active Directory tenant with 1000 users, this approach would lead to
1 + 1000 API requests (assuming that we have a page size of 1000 and only need one request to fetch a list of all users).
Compared to this naïve approach we could simply call the list users action and expand all the users with their respective manager: https://graph.microsoft.com/beta/users?$expand=manager. This leads to the same result but requires only one request.
To prevent bad architecture and design of scripts it’s essential to have profound knowledge of the API and even more important that the API documentation provides the information about the different endpoints.
A common flaw of RESTful APIs is under- and/or over-fetching. Resulting in that we receive to much data (we might not need) or that we receive to less data and need to make multiple requests to actually retrieve the data we are interested in. Microsoft Graph API supports:
- Expanding entities to resolve linked entities automatically with the
expandparameter (As described within the above example).
- Filtering to prevent over-fetching by using the
- Selecting the required properties by using the
selectrequest parameter to only get required properties
When it comes to filter and select operations we want to have them as early as possible (this approach is sometimes referred to as filter left) within the process, ideally already when placing the API request.
So instead of:
You should prefer to do:
Of course this only makes sense if we do not require to have the full list of users available. Because if we require all users it is probably faster to fetch initially all users and then processing them but you still could optimise your query by using the
select parameter and only fetching the attributes you need. For certain operations Microsoft Graph also provides export operations to bigger amounts of data.³
Aaaaaand if you made it that far — here’s where I actually want to set the focus of this post! Because after we have elected the right algorithms and data structures for our script and have the right design in place we have still one of the most time consuming parts in place: Input and output, also referred to as I/O.
I/O can be output to the console or to a file:
So before you start printing out each item of a collection to the console consider whether you really need this information? Probably not. And in case you need it for debugging, use another PowerShell output stream such as the verbose or debug stream and you can then control the output with the
Debug parameters. Use PowerShell to Write Verbose Output — Scripting Blog (microsoft.com).
Each Microsoft Graph API request is transmitted as an HTTP request to the API. This involves multiple components such as your machine (PowerShell, operating system, hardware), network and Microsoft to process your request and send you a response. Sounds obvious but the more requests you place the slooooooooower your scripts get. I count this also towards I/O.
So once you nailed all of the previously described points we can try to optimise or tune Microsoft Graph requests within the next chapter.
Approaches to optimise Microsoft Graph requests
Assuming we have a script that requires a lot of requests to Microsoft Graph resources we can try to optimise the process to save some runtime.
Before I started to play around with optimisations I needed a baseline to measure the success of my adjustments. In my dev tenant I have 1280 fake-users that I fetched at the beginning of my testing. For subsequent tests I will simply place a request to each users details and measure only this part, as there is no possibility to improve the above stated initial request.
Let’s do batching
Microsoft Graph offers a feature called batching / batch requests. Multiple requests can be combined into a single request and the whole batch can be submitted to the APIs batch endpoint via POST. We then receive a response containing all the batch results. It is even possible to create batches with dependent actions but in my case I just used the simple example of individual requests for users.
A maximum of 20 requests can be placed into a single batch. The response of the batch needs then to be reassembled again to have a full list of responses.
Thanks to batching we can reduce the number of requests from 1280 individual requests to
1280 / 20 = 64 requests. Before even taking measurements this approach sounds very promising as we can drastically reduce the amount of requests compared to the amount of the original requests.
A little bit of a downside is, that with the added logic for the batching our code fragment becomes slightly bigger and is more difficult to understand. Especially the part where we create the chunks with the individual requests and merge the responses together.
More information: Combine multiple requests in one HTTP call using JSON batching — Microsoft Graph | Microsoft Learn
Let’s do concurrency
With PowerShell 7 we have new parameter for the
ForEach-Object cmdlet. We can supply the
-Parallel switch to leverage concurrency⁵. So each script-block will run on a separate thread and by default 5 threads will be spawned.
Writing concurrent code or scripts is actually easy but writing concurrent code or scripts that work is difficult! Concurrent operations run in no particular order (non-deterministic) and we are responsible to use types and collections that are thread safe. This means we cannot use a regular list and need to switch to thread safe collection⁴ like
System.Concurrent.ConcurrentBag that can handle multiple threads modifying the collection.
Compared to the regular approach PowerShell does not need to wait until a single request has completed and can now place them in parallel and store the results within our collection.
Nice is also that we only need small change by using the
Parallel parameter and using the thread-safe collection. The downside is that this requires PowerShell 7 but with the
Requires statement we can document this dependency in an understandable way.
Let’s do batching and concurrency
By combining the previous approaches we can leverage both batching and concurrency features to submit and reassemble our batch requests to the API. But the work can be shared amongst multiple threads and should be hopefully processed faster.
The snippet has now a serial part where the individual batches are created and a parallel/concurrent part where the batches are submitted and reassembled. Compared to the baseline the snippet has become quite large and the intention of the code is probably not that self explanatory anymore. So another side effect of optimisation: It can make your code more difficult to read.
Comparison and results
I measured all the different code snippets with the built-in
Measure-Command cmdlet to get the amount of elapsed time for each script-block. And I was actually very impressed by the results! As mentioned in the beginning I was doing my tests with a little bit over 1'200 individual requests.
I was already familiar with the batching approach but combining this with PowerShell 7 concurrency is a real game changer. Bringing down the runtime from more than 3 minutes to just around 5 seconds is very promising. Imaging this at a scale like for 10'000 this is very promising.
Interesting is also the performance gain compared between batching and concurrency although this could be further tweaked as I only used the default setup of 5 threads.
Maybe an additional factor that comes into play when making a lot of requests is throttling. And because I was using the Microsoft Graph PowerShell SDK which has built-in support for throttling I actually do not know how many of the requests have been throttled but maybe this is also a good thing because you will face that in production as well. But for the comparison it could be that some of the requests were throttled and others not leading to a false result (although I could not find evidence for that during multiple runs).
Of course the absolute results from my measurements will probably not translate to other machines or environments as these depends on a lot of factors such as: hardware, os, network, request types. But the calculated speedup should be reachable with similar setups.
I hope you learned one or another thing about PowerShell and Microsoft Graph and that you can now write faster PowerShell scripts although this does not necessarily mean that these scripts will be written faster because optimisation requires time and measurements 😉.
Resources and some footnotes
¹There are also other options such as pipeline assignments to avoid working with lists: Everything you wanted to know about arrays — PowerShell | Microsoft Learn
²Lists have also overhead but this is neglectable for bigger amounts of data. So do not compare an array copy for two elements with a list. :)
³There exist special export APIs for reports and bulk exports of Intune and Azure AD data: Use Graph APIs to export Intune Reports | Microsoft Learn, Working with the authentication methods usage report API — Microsoft Graph beta | Microsoft Learn
⁴Thread-Safe collections | Microsoft Learn
⁵PowerShell ForEach-Object Parallel Feature — PowerShell Team (microsoft.com)
Thanks also to everyone for the positive notes on my recent tweet.