
Reputation
Badges 1
25 × Eureka!however, this will also turn off metricsΒ
For the sake of future readers, let me clarify on this one, turning it off auto_connect_frameworks={'pytorch': False}
only effects the auto logging of torch.save/load
(side note: the reason is pytorch does not have built in metric reporting, i.e. it is usually done manually and these days most probably with tensorboard, for example lightning / ignite will use tensorboard as default metric reporting),
So like a UI for creating pipelines doing different things on the different solutions ?
I guess. or pipelines that you can compose after running experiments to see that experiments are connected to each other
hmm what do you mean by "compose after running experiments" ? like a way to group them? what is the relation between one "item" to another ?
If this is a sequence of Tasks , are they executed by a controller ?
I see, something like:from mystandalone import my_func_that_also_calls_task_init def task_factory(): task = Task.create(project="my_project", name="my_experiment", script="main_script.py", add_task_init_call=False) return task
if the pipeline and the my_func_that_also_calls_task_init
are in the same repo, this should actually work.
You can quickly test this pipeline with
` pipe = Pipelinecontroller()
pipe.add_step(preprocess, ...)
pipe.add_step(base_task_facto...
it fails but with COMPLETED status
Which Task is marked "completed" the pipeline Task or the Step ?
WickedGoat98 are you running the agent with --gpus ?
Hi WickedGoat98
"Failed uploading to //:8081/files_server:"
Seems like the problem. what do you have defined as files_server in the trains.conf
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
I think I understand what's going on, in order for the pipeline logic to be "aware" of the pipeline component, it needs to be declared in the pipeline logic script file (or scope if you will).
Try to import from src.testagentcomponent import step_one
also in the global pipeline script (not just inside the function)
Runtime, every time the add_step needs to create a New Task to be enqueued
if fails duringΒ
add_step
Β stage for the very first step, becauseΒ
task_overrides
Β contains invalid keys
I see, yes I guess it it makes sense to mark the pipeline as Failed π
Could you add a GitHub issue on this behavior, so we do not miss it ?
@<1569496075083976704:profile|SweetShells3> remove these from your pbtext:
name: "conformer_encoder"
platform: "onnxruntime_onnx"
default_model_filename: "model.bin"
Second, what do you have in your preprocess_encoder.py
?
And where are you getting the Error? (is it from the triton container? or from the Rest request?
Hi CooperativeFox72
Sure πtask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
I see... In the triton pod, when you run it, it should print the combined pbtxt. Can you print both before/after ones? so that we could compare ?
I can raise this as an issue on the repo if that is useful?
I think this is a good idea, at least increased visibility π
Please do π
could you send the entire log here?
i.e. from the "docker-compose" command line and onward
Yes you have to spin the server in order to generate the access/secret key...
Hi ZippyAlligator65
You mean like env vars?
I think you are correct and the first time you spin the server it is not possible (I mean you need it up to get the access/secerey and only then you can insert them into the helm values) ... π
Bummer... that seems like a bit of an oversight tbh.
There is never a solution for those, unless the helm chart "knows" something about the server before spinning it the first time, which basically means a predefined access-key, I do not think we want that π
So essentially, the server helm chart creates randomly generated secret pair and deploys it as a shared k8 secret that pods can access.
This is the tricky part, for the helm chart to be able to create it, it means it can login to the server it means there is a secret embedded in the helm chart that lets you access the default server. you see my point ?
Here you go:
` @PipelineDecorator.pipeline(name='training', project='kgraph', version='1.2')
def pipeline(...):
return
if name == 'main':
Task.force_requirements_env_freeze(requirements_file="./requirements.txt")
pipeline(...) If you need anything for the pipeline component you can do:
@PipelineDecorator.component(packages="./requirements.txt")
def step(data):
some stuff `
PompousBeetle71 cool, next RC will have the argparse exclusion feature :)
Hi TrickyFox41
is there a way to cache the docker containers used by the agents
You mean for the apt get install part? or the venv?
(the apt packages themselves are cached on the host machine)
for the venv I would recommend turning on cache here:
https://github.com/allegroai/clearml-agent/blob/76c533a2e8e8e3403bfd25c94ba8000ae98857c1/docs/clearml.conf#L131
I am running clearml-agent in docker mode btw.
Try -e PYTHONOPTIMIZE=1
in the docker args section, should do the same π
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONOPTIMIZE
Understood, then I would use Task.remote_execution()
Basically :
task = Task.init(...)
# config some stuff
task.remote_execute(quque_name_here)
# this line will be executed on the remote machine only
This will both automatically log your code / repo with Task.init, and the call to Task.remote_execute will stop the local process (on your machine that runs the hydra sweep) and continue on the remote machine.
This will both allow you to use Hydra sweet & schedule / run on remote ...
Hi DrabCockroach54
... and no logs for python script.
what do you mean by "no logs" , is it clearml logs? or k8s pod logs ?
nice @<1724960458047229952:profile|EnergeticKoala33> !
The issue was that the agent was trying to start the docker but had no credentials to do that, your solution is exactly what was needed to be done