
Reputation
Badges 1
42 × Eureka!Yes, sorry, the final cell has the flush
followed by the close
I used task.flush(wait_for_uploads=True)
in the final cell of the notebook
my colleague, @<1534706830800850944:profile|ZealousCoyote89> has been looking at this – I think he has used the relevant kwarg in the component decorator to specify the packages, and I think it worked but I’m not 100%. Connah?
Producing it now — thanks for your help, won’t be a few mins
Ahh okay.
I’m an absolute numpty.
I had enabled caching on the Pipeline Task that was grabbing a load of ClearML IDs and so it was trying to “get” datasets that had since been deleted.
Thanks for the nudge to minimal test – silly I didn’t do it before asking!
Appreciate your help.
But yeah, more generally having a different UI for different data types could be useful (e.g. categorical variables, integers, decimals, etc), just not a direct concern for me at this moment
Hi John, we are using a self-hosted server with:
WebApp 1.9.2-317
Server: 1.9.2-317
API: 2.23
edit: clearml==1.11.0
Ahh that’s great, thank you.
And then I could use storage manager or whatever to get the files. Perfect
Ah okay, that is a very easy solution to the problem. I wasn’t aware that you could build and run pipelines like that, and I especially wasn’t aware that you could return values from a pipeline and have them accessible to a script in the way that you have given.
Does this require you run the pipeline locally (I see you have set default execution queue) or do any other specific set-up?
I will give it a go tomorrow and report back – the only issue I foresee will be if doing this somehow inc...
Basically, for a bit more context, this is part of an effort to incorporate ClearML Pipelines in a CI/CD framework. Changes to the pipeline script create_pipeline_a.py
that are pushed to a GitHub master
branch would trigger the build and testing of the pipeline.
And I’d rather the testing/validation etc lived outside of the ClearML Pipeline itself, as stated earlier – and that’s what your pseudo code allows, so if it’s possible that would be great. 🙂
Thanks, I’ll check out those GitHub Actions examples but as you say, it’s the “template” step that is the key bit for this particular application.
the pipeline from tasks serializes itself to a configuration object that you can edit/create externally
I think if it has to come down to fiddling with lower-level objects, I’ll hold off for now and wait until something a bit more user-friendly comes along. Depends on how easy this is to work with.
This is something that we do need if we a...
The pseudo-code you wrote previously is what would be required, I believe
be able to get the pipeline’s Task ID back at the end
This is the missing piece. We can’t perform validation without this, afaik
The Dataset object itself is not being passed around. The point of showing you that was to say that the Dataset may change and therefore the number of objects (loaded from the Dataset, eg a number of pandas DataFrames that were CSV’s in the dataset) could change
I basically just mean having a date input like you would in excel where it brings up a calendar and a clock if it’s time – and defaults to “now”
(including caching, even if the number of elements in the list of vals changes)
There are no experiments in the project, let alone the pipeline; they’ve all been archived
And the app is presumably crashed because I can’t click the “Close” button – it’s (the whole page) totally unresponsive and I have to refresh the page, at which point the pipeline still exists (ie was not deleted).
I have left it on the deletion screen (screenshot) for 20-30 mins at one point and it didn’t do anything, so this seems to be a bug
I’m just the messenger here, didn’t set up the web app...
I get an error about incorrect Task ID’s – in the above pseudo code it would be the ID of the step
Task that was displayed in the error
e.g. pseudo for illustration only
` def get_list(dataset_id):
from clearml import Dataset
ds= Dataset.get(dataset_id=dataset_id)
ds_dir=ds.get_local_copy()
etc...
return list_of_objs # one for each file, for example
def pipeline(dataset_id):
list_of_obj = get_list(dataset_id)
list_of_results = []
for obj in list_of_obj:
list_of_results.append(step(obj))
combine(list_of_results) `One benefit is being able to make use of the Pipeline caching so if ne...
Yep, would be happy to run locally, but want to automate this so does running locally help with getting the pipeline run ID (programmatically)?
Yep, that’s it. Obviously would be nice to not have to go via the shell but that’s by the by (edit: I don’t know of a way to build or run a new version of a pipeline without going via the shell, so this isn’t a big deal).
Sorry, I think something’s got lost in translation here, but thanks for the explanation.
Hopefully this is clearer:
- Say we have a new ClearML pipeline as code on a new commit in our repo.
- We want to build and run this new pipeline and have it available on the ClearML Server.
- We want to run a suite of tests that validate/verify/etc the performance of this entire ClearML Pipeline, e.g. by having it run on a set of predefined inputs and checking the various artifacts that were creat...
So the DAG is getting confused on bringing the results of the Tasks together
I have already tested that the for loop does work, including caching, when spinning out multiple Tasks.
As I say, the issue is grouping the results of the tasks into a list and passing them into another step
Ahh. This is a shame. I really want to use ClearML to efficiently compute features but it’s proving a challenge!
Thanks
Thanks, yes I am familiar with all of the above.
We want to validate the entire pipeline . I am not talking about using a ClearML Pipeline as the validator (which is the case in your examples).
Here is some further detail that will hopefully make things more obvious:
- The pipeline is a series of steps which creates a feature store – in fact, you might even call it a feature pipeline!
- Each pipeline step takes responsibility for a different bit of feature engineering.
- We want to val...
from tempfile import mkdtemp new_folder = with_feature.get_mutable_local_copy(mkdtemp())
It’s this line that causes the issue