JitteryCoyote63
I agree that its name is not search-engine friendly,
LOL π
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merch with AI π π
Anyhow, this one definitely backfired...
JitteryCoyote63
I agree that its name is not search-engine friendly,
LOL π
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merchandise with AI π π
Anyhow, this one definitely backfired...
UnevenDolphin73 sounds great, any chance you can open a git issue on clearml-agent repo for this feature request ?
Is it not possible to serve a model with preprocessing pipeline from scikit-learn using clearml-serving?
of course it is, did you first try the example , here: None
If you need to run your own LogisticRegression
call you can use this example:
None
Notice this is where the custom endpoint actually calls the prediction: [None](https...
NastyOtter17 can you provide some more info ?
Well that depends on how you think about the automation. If you are running your experiments manually (i.e. you specifically call/execute them), then at the beginning of each experiment (or function) call Task.init
and when you are done call Task.close
. This can be done in parallel if you are running them from separate processes.
If you want to automate the process, you can start using the trains-agent
which could help you spin those experiments on as many machines as you l...
okay but still I want to take only a row of each artifact
What do you mean?
How do I get from the node to the task object?
pipeline_task = Task.get_task(task_id=Task.current_task().parent)
SmallDeer34 in theory no reason it will not work with it.
If you are doing a single node (from Ray's perspective)
This should just work, the challenge might be multi-node ray+cleaml (as you will have to use clearml to set the environment and ray as messaging layer (think openmpi etc.)
What did you have in mind?
save off the "best" model instead of the last
Should be relatively easy to update on the main Task the model with the best performance, no?
I think it fails because it tries to install trains twice. Could you remove the trains package, and test? I'm also curious how do you have both installed?!
Thanks@doru! BTW if you are running a code from outside the trains repo, do you still get the double package?
The versions don't need to match, any combination will work.
Yes, I mean trains-agent. Actually I am using 0.15.2rc0. But, I am using local files, I mean I clone trains and trains-agent repos and install them. Their versions are 0.15.2rc0
I see, that's why we get the git ref, not package version.
SarcasticSparrow10 sure see "execute_remotely" it does exactly that:
https://allegro.ai/docs/task.html#trains.task.Task.execute_remotely
It will stop the current process (after syncing everything) and launch itself remotely (i.e. enqueue itself)
When the same code is running by the "trains-agent" the execute_remotely call becomes a no-operation and is basically skipped
TenseOstrich47 / PleasantGiraffe85
The next version (I think releasing today) will already contain scheduling, and the next one (probably RC right after) will include triggering. That said currently the UI wizard for both (i.e. creating the triggers), is only available in the community hosted service. That said I think that creating it from code (triggers/schedule) actually makes a lot of sense,
pipeline presented in a clear UI,
This is actually actively worked on, I think Anxious...
So I'm gusseting the cli will be in the folder of python:import sys from pathlib2 import Path (Path(sys.executable).parent / 'cli-util-here').as_posix()
Hi JitteryCoyote63
Just making sure, the package itself it installed as part of the "Installed packages", and it also installs a command line utility ?
Yes, found the issue :) I'll see to it there is a fix in the next RC. ETA early next week
I have an idea, can you try with:task = Task.init(..., reuse_last_task_id=False)
I have a suspicion it starts the Tasks in parallel, and the "reuse_last_task_id" causes them to "reuse the same task locally" which makes them overwrite the configuration of one another.
Okay let me check if I can test on this git version.
Can you see it on the console ?
JitteryCoyote63
somehow the previous iterations, not sure yet if itβs coming from my code, ignite or clearml
ClearML will automatically continue reporting from the previous iteration (i.e. if before continuing the Task the last iteration was 100, then the next report with iteration =0 will actually be 101)
task.set_initial_iteration(engine.state.iteration)
Basically it is called automatically by ClearML (obviously only when you continue an aborted Task)
But how do you specify the data hyperparameter input and output models to use when the agent runs the experiment
They are autodetected if you are using Argparse / Hydra / python-fire / etc.
The first time you are running the code (either locally or with an agent), it will add the hyper parameter section for you.
That said you can also provide it as part of the clearml-task
command with --args
(btw: clearml-task --help
will list all the options, https://clear.ml/docs/...
Thanks @<1569496075083976704:profile|SweetShells3> for bumping it!
Let me check where it stands, I think I remember a fix...
so you have a repo with poetry that some users update and some do not?
All working on the same branch ?
Hi RobustRat47
My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the
I think you are correct see here:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/examples/pytorch/train_pytorch_mnist.py#L136
you have to convert the model to TorchScript for Triton to serve it
Notice that we are using the same version:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/clearml_serving/engines/triton/Dockerfile#L2
The reason was that previous version did not support torchscript, (similar error you reported)
My question is, why don't you use the "allegroai/clearml-serving-triton:latest" container ?
RobustRat47 what's the Triton container you are using ?
BTW, the Triton error is:model_repository_manager.cc:1152] failed to load 'test_model_pytorch' version 1: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain.
https://github.com/triton-inference-server/server/issues/3877
it certainly does not use tensorboard python lib
Hmm, yes I assume this is why the automagic is not working π
Does it have a pythonic interface form the metrics ?