Progress on MiniScript 2 has been rapid lately, and it's now in the "final polish" stage. Today that included adding a cool new feature, an optimization that many users will never consciously notice, but which will be quietly improving things in almost every MiniScript program. This feature is computed lists.
Computed What, Now?
Here's the issue we set out to solve: a for loop in MiniScript must iterate over a list, string, or map; and very often, that's a list that comes from the range function. Here's a typical example:
sum = 0
for i in range(1000000)
sum += i
end for
That range function, until today, would allocate a million-and-one-element list and store the numbers from 1 million down to zero in it, just so we can assign each number to i in turn. And then it would release that list, to be garbage collected at some point. This is a lot of extra work that we really didn't need to do. Instead of storing all those numbers, we could instead just store the information we need to compute any element on demand. This saves a ton of allocation and deallocation, and in many cases (like when you bail out of your loop early) it saves computation too.
Python faced the same problem, and chose (starting with Python 3.0) to make range return an iterable rather than a list. That's fine for cases like the above, but in cases where you actually needed a list, you have to cast it through the list() function to get one. This is a gotcha that everybody trips over initially, and just extra noise in the code even once you're used to it.
Our Solution: Computed List
For MiniScript 2, we have chosen a different solution: a list may, under the hood, be a computed list. Rather than storing individual values, a computed list stores just three: the base, the step, and the count. But unlike Python, this is completely transparent to your code; it is still a real list, and may be used exactly like any other list.
So now when you run the code above in MiniScript 2, it no longer allocates and precomputes over a million elements. Instead it allocates space for just 3 elements. But it acts like it has a million and one elements, in every way (unless you use the info intrinsic to peek at its computed flag).
You get such a computed list, currently, in two ways:
- The
rangefunction always returns a computed list. - List replication, i.e.
[x] * n, returns a computed list ifxis any immutable value andnis an integer.
What Stays Computed
There are lots of things you can do with a computed list that leave the list in computed form, and just calculate the answer (very quickly). If foo is such a list, then all of these work directly on the computed data:
foo.len-
foo[i]for any indexi -
foo[i:j]slicing fromiup toj -
foo.pop(yes, it mutates the computed list!) -
foo.sumand other non-mutating intrinsics -
freeze foo(making it into a frozen computed list)
That loop above could have been written as simply range(1000000).sum, and it very quickly computes the right answer, without ever allocating a million-element list.
When It Can't Stay Computed
However, unless you freeze the list, it's still a mutable object and you can change its contents. This generally requires "materializing" the list, i.e., at this point MiniScript goes ahead and allocates as much space as needed to hold all the values. (The exception to this rule is pop, which keeps it computed and merely reduces the count.) So, any of these operations will materialize the list:
foo.push xfoo.remove ifoo.insert i, xfoo[i] = xfoo.sort
At this point you lose the benefit of having a computed list, but that's OK. You're no worse off than we were yesterday, when we didn't have computed lists at all. And critically, you don't have to think about this. You ask for a list, you get a list, MiniScript keeps it computed when it can and materializes it when it must. Everything just works.
Performance
We built some benchmarks to measure code that relies heavily on big lists without mutating them, as well as code that does need to mutate them (and so can't take advantage of computed lists). This new feature brought about a 100X speedup in the first case, and had no measurable impact on the second.
Of course the improvement will depend greatly on the details of what the code is doing, but the key point is, as expected, this is an overall win. Not only do we avoid allocating and deallocating big hunks of RAM, but also a computed list (which is only 3 elements) fits comfortably into all levels of the memory cache hierarchy, unlike bigger lists. Cache misses are a major bottleneck in a lot of compute-heavy code, so this optimization wins both ways.
Life is Good
I'm proud of this solution because it gives us enormous benefits in cases that were needlessly slow before, but doesn't require the user to learn or remember anything new. I mean, I just told you all about it, but that was for sheer "Gee Whiz"-ism — nobody needs to understand what you now know!
And it does all this in a way that is relatively simple and didn't require a ton of new code under the hood, either.
This new feature will ship with MiniScript 2 Preview 6 (probably tomorrow). If you use command-line MiniScript, or would like to, keep your eye on the releases page, and give it a try!
Top comments (2)
The transparent part is the real win here:
range(1000000)stops allocating a million-and-one element list, but users still get an ordinary list instead of learning a Python-style iterable distinction. I also like the small implementation shape, just base, step, and count, with clear materialization only when operations likepush, indexed assignment, orsortforce it. From a product/engineering angle, this is the best kind of performance work: it changes the cost model without changing the programmer's mental model, so the improvement compounds quietly across existing code.Awesome! Very elegant solution. The best kind for performance problems.