Reputation
Badges 1
662 × Eureka!I can also do this via Mongo directly, but I was hoping to skip the K8S interaction there.
Any follow up thoughts SuccessfulKoala55 or CostlyOstrich36 ?
Right, so where can one find documentation about it?
The repo just has the variables with not much explanations.
The deferred_init
input argument to Task.init
is bool
by default, so checking type(deferred_init) == int
makes no sense to begin with, and is altering the flow.
Last but not least - can I cancel the offline zip creation if I'm not interested in it 🤔
EDIT: I see not, guess one has to patch ZipFile
...
We have the following, works fine (we also use internal zip packaging for our models):
model = OutputModel(task=self.task, name=self.job_name, tags=kwargs.get('tags', self.task.get_tags()), framework=framework)
model.connect(task=self.task, name=self.job_name)
model.update_weights(weights_filename=cc_model.save())
Heh, good @<1523704157695905792:profile|VivaciousBadger56> 😁
I was just repeating what @<1523701070390366208:profile|CostlyOstrich36> suggested, credits to him
@<1523701070390366208:profile|CostlyOstrich36> I added None btw
FWIW running clearml
==1.9.1
with WebApp: 1.9.2-317 • Server: 1.9.2-317 • API: 2.23
Happens with the latest version indeed.
I can’t share our code, but the gist of it is:
pipe = PipelineController(name=..., project=..., version=...)
pipe.add_function_step(...) # Many calls
pipe.set_default_execution_queue(...)
pipe.start(queue=..., wait=True)
Nothing I can spot --
ClearML results page:
ClearML pipeline page:
Launching the next 2 steps
Launching step [...]
Launching step [...]
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
2023-02-21 13:53:48
ClearML Monitor: Could not detect iteration reporting, falling back to itera...
I wouldn't put past ClearML automation (a lot of stuff depend on certain suffixes), but I don't think that's the case here hmm
So the pipeline runs successfully, I can find all the different tasks, but I cannot see them in the Pipelines tab…
Thanks SuccessfulKoala55 and AgitatedDove14 ! We'll go through the hoops of setting up mongo on AWS then.
We're working to decouple the data from the helm chart, seems like a dangerous idea to store long term data on k8s in case of failure 😅
@<1523704157695905792:profile|VivaciousBadger56> It seems like whatever you pickled in the zip file relies on some additional files that are not pickled.
FWIW It’s also listed in other places @<1523704157695905792:profile|VivaciousBadger56> , e.g. None says:
In order to make sure we also automatically upload the model snapshot (instead of saving its local path), we need to pass a storage location for the model files to be uploaded to.
For example, upload all snapshots to an S3 bucket…
Yes, you're correct, I misread the exception.
Maybe it hasn't completed uploading? At least for Datasets one needs to explicitly wait IIRC
I can only say I’ve found ClearML to be very helpful, even given the documentation issue.
I think they’ve been working on upgrading it for a while, hopefully something new comes out soon.
Maybe @<1523701205467926528:profile|AgitatedDove14> has further info 🙂
Well you could start by setting the output_uri
to True
in Task.init
.
It should store it on the fileserver, perhaps you're missing a configuration option somewhere?
FWIW, we prefer to set it in the agent’s configuration file, then it’s all automatic
We're using self hosted account
Ah! Makes sense. Thanks!
Also I appreciate the time youre taking to answer AgitatedDove14 and CostlyOstrich36 , I know Fridays are not working days in Israel, so thank you 🙂
I am indeed