So I think it makes more sense in this case to work with the former.
Totally !
This is a horrible setup, it means no authentication will pass, it will literally break every JWT authentication scheme
Hi ClumsyElephant70
What's the clearml
you are using ?
(The first error is a by product of python process.Event created before a forkserver is created, some internal python issue. I thought it was solved, let me take a look at the code you attached)
Are you inheriting from their docker file ?
Hi RobustRat47
My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the
I think you are correct see here:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/examples/pytorch/train_pytorch_mnist.py#L136
you have to convert the model to TorchScript for Triton to serve it
GrumpyPenguin23 could you help and point us to an overview/getting-started video?
While if I just download the right packages from the requirements.txt than I don't need to think about that
I see you point, the only question how come these packages are not automatically detected ?
Hi CurvedHedgehog15
I would like to optimize hparams saved in Configuration objects.
Yes, this is a tough one.
Basically the easiest way to optimize is with hyperparameter sections as they are basically key/value you can control from the outside (see the HPO process)
Configuration objects are, well, blobs of data, that "someone" can parse. There is no real restriction on them, since there are many standards to store them (yaml,json.init, dot notation etc.)
The quickest way is to add...
Hi ArrogantBlackbird16
but it returns a task handle even after the Task has been closed.
It should not ... That is a good point!
Let's fix that π
Ohh sorry you will also need to fix the
def _patched_task_function
The parameter order is important as the partial call relies on it.
My bad no need for that π
Hi VivaciousBadger56
Basically you can think of MLRun as "amazon lambda service without amazon". It is designed to run a "function" in scale on multiple nodes.
ClearML on the other hand is an MLOps platform. It does the experiment tracking, it orchestrates Task (think jobs), it does data management and lastly we recently released the serving. These are two different use cases.
Am I making sense here?
What I'm trying to do is to filter is between two datetimes...Is that possible?
could you expand ?
If you need to change the values:config_obj.set(...)
You might want to edit the object on a copy, not the original π
You might be able to write a script to override the links ... wdyt?
IdealPanda97 Hmm I see...
Well, unfortunately, Trains is all about free access to all π
That said, the Enterprise edition does add permissions and data management on top of Trains. You can get in touch through the https://allegro.ai/enterprise/#contact , I'm sure someone will get back to you soon.
I understand that it uses time in seconds when there is no report being logged..but, it has already logged three times..
Hmm could it be the reporting started 3 min after the Task started ?
ReassuredTiger98 no, but I might be missing something.
How do you mean project-specific?
Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.
Nice!
btw:
in the clearml.conf we put this:
for future reference, you are missing the sdk section:
sdk.http.timeout: 300
.
notation works as well as {}
withΒ
PipelineController
, is there any way to avoid creating a new development environment for each step of the pipeline?
You are in luck, we are expanding the PipelineController to support functions. basically allowing you to run the step on the node running the entire pipeline, but I'm not sure this covers all angles of the problem.
My main question here is, who/how the initial setup is created by cleaml-agent ?
I would like to be more efficient and re-use that ...
Thank you for saying ! π
Hi ProudMosquito87 trains-agent will automatically clone your code into the docker, no need to worry about it π make sure you configure the https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L16 or the trains-agent machine contains the git ssh keys in the home folder of the user executing the trains-agent
Can you test with the hydra example? if the example works, any chance you can send a toy to reproduce it ?
https://github.com/allegroai/clearml/tree/master/examples/frameworks/hydra
DistressedGoat23 check this example:
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.pyaSearchStrategy = RandomSearch
It will collect everything on the main Task
This is a curial point for using clearml HPO since comparing dozens of experiments in the UI and searching for the best is just not manageable.
You can of course do that (notice you can actually order them by scalars they report, and even do ...
,
remote_execute
kills the thread so the multirun stops at the first sub-task.
Hmm
task = Task.init(...)
# config some stuff
task.remote_execute(queue_name_here, exit_process=False)
# this means that the local execution will stop but when running on the remote agent it will be skipped
if Task.running_locally():
return
Hmm, this means the step should have included the git repo itself, which means the code should have been able to import the .py
Can you see the link to the git repository on the Pipeline step Task ?