It sounds like you understand the limitations correctly.
As far as I know, it'd be up to you to write your own code that computes the delta between old and new and only re-process the new entries.
The API would let you search through prior experimental results.
so you could load up the prior task, check the ids that showed up in output (maybe you save these as a separate artifact for faster load times), and only evaluate the new inputs. perhaps you copy over the old outputs to the new task for completeness.
that's how I'd approach it. use "data-creation" tasks and artifacts to roll your own logic for "caching" (skipping evaluation) within the task itself.
In the open source version, you don't get a whole lot (in my opinion) from using datasets over basic artifacts in tasks (scoped to just create a dataset). The real "power" in the datasets feature I believe come with some of the pro features.