Reputation
Badges 1
25 × Eureka!AttractiveCockroach17 can I assume you are working with the hydra local launcher ?
Hi @<1687653458951278592:profile|StrangeStork48>
secrets manager per se,
Quick question, are you running the trains-server over http or https ?
Meanwhile you can just sleep for 24hours and put it all on the services queue. it should work π
Example here:
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
Okay, some progress, so what is the difference ?
Any chance the issue can be reproduced with a small toy code ?
Can you run the tqdm loop inside the code that exhibits the CR issue ? (maybe some initialization thing that is causing it to ignore the value?!)
Nothing except that Draft makes sense feels like the task is being prepped and Aborted feels like something went wrong
Yes guess that if we call execute remotely, without a queue, it makes sense for you to edit it...
Is that the case TrickySheep9 ?
If it is I think we should change it to draft when it is not queued. sounds good to you guys ?
Hi ClumsyElephant70
What's the clearml
you are using ?
(The first error is a by product of python process.Event created before a forkserver is created, some internal python issue. I thought it was solved, let me take a look at the code you attached)
SoreDragonfly16 notice that if in the web UI you aborting a task it will do exactly what you described, print a message and quit the process. Any chance someone did that?
Is
mark_completed
used to complete a task from a different process and
close
from the same process - is that the idea?
Yes
However, when I tried them out,
mark_completed
terminated the process that called
mark_completed
.
Yes if you are changing the state of the Task externally or internally the SDK will kill the process. If you are calling task.close()
from the process that created the Task it will gra...
Yes, that makes sense. Then you would need to use wither the AWS vault features, or the ClearML vault features ...
Hi ConvolutedSealion94
Yes πTask.set_random_seed(my_seed=123) # disable setting random number generators by passing None task = Task.init(...)
because fastaiβs tensorboard doesnβt work in multi gpu
keep me posted when this is solved, so we can also update the fastai2 interface,
It's the safest way to run multiple processes and make sure they are cleaned afterwards ...
Guys FYI:params = task.get_parameters_as_dict()
Usually in the /tmp folder under a temp filename (it is generated automatically when spinned)
In case of the services, this will be inside the docker itself
to avoid downgrade to clearml==1.9.1
I will make sure this is solved in clearml==1.9.3 & clearml-session==0.5.0 quickly
Hmm I assume it is not running from the code directory...
(I'm still amazed it worked the first time)
Are you actually using "." ?
Yes, actually ensuring pip is there cannot be skipped (I think in the past it cased to many issues, hence the version limit etc.)
Are you saying it takes a lot of time when running? How long is the actual process that the Task is running (just to normalize times here)
Not sure: They also have the feature store (data management), as mentioned, which is pretty MLOps-y
.
Right, sorry, I was thinking about "Nuclio", my bad.
How would you compare those to ClearML?
At least based on the documentation and git state I would say this is very early stages. In terms of features they "tick all the boxes", but I'll be a bit skeptic on the ability to scale and support these features.
Taking a look at the screenshots from the docs, it also seem...
But what I get withΒ
get_local_copy()
Β is the following path: ...
Get local path will return an immutable copy of the dataset, by definition this will not be the "source" storing the data.
(Also notice that the dataset itself is stored in zip files, and when you get the "local-copy" you get the extracted files)
Make sense ?
This is odd... can you post the entire trigger code ?
also what's the clearml version?
JitteryCoyote63 so now everything works as expected ?
Hi WackyRabbit7 ,
Yes we had the same experience with kaggle competitions. We ended up having a flag that skipped the task init :(
Introducing offline mode is on the to do list, but to be honest it is there for a while. The thing is, since the Task object actually interacts with the backend, creating an offline mode means simulation of the backend response. I'm open to hacking suggestions though :)
Does a pipeline step behave differently?
Are you disabling it in the pipeline step ?
(disabling it for the pipeline Task has no effect on the pipeline steps themselves)
Funny it's the extension "h5" , it is a different execution path inside keras...
Let me see what can be done π
Yea the "-e ." seems to fit this problem the best.
π
It seems like whatever I add to
docker_bash_setup_script
is having no effect.
If this is running with the k8s glue, there console out of the docker_bash_setup_script ` is currently Not logged into the Task (this bug will be solved in the next version), But the code is being executed. You can see the full logs with kubectl, or test with a simple export test
docker_bash_setup_script
` export MY...