Reputation
Badges 1
25 × Eureka!HelplessCrocodile8 I just tried it, everything seems to work (ubuntu 20.04) π
What's the OS your are using? Python version? Is it conda ?
GrotesqueDog77 this should just work, decorate the functions with @PipelineDecorator.component
and call the functions one after the otherpaths = step_one() step_two(paths)
ClearML will make sure it serializes the strings and pass them to step two (of course step two should actually run on a machine with access to the same folder, but this is another issue π )
Thanks GiganticTurtle0 !
I will try to reproduce with the example you provided. regardless I already took a look at the code, and I'm pretty sure I know what the issue is. We will be pushing a few fixes after the weekend, I'm hoping this one will be included as well π
Yes, though the main caveat is the data is not really immutable π
Ok, but whenΒ
nvcc
Β is not available, the agent uses the output fromΒ
nvidia-smi
Β right? On one of my machine,Β
nvcc
Β is not installed and in the experiment logs of the agent runnin there,Β
agent.cuda =
Β is the version shown withΒ
nvidia-smi
Already added to the next agent's version π
JitteryCoyote63 the agent.cuda_version
(or CUDA_VERSION env) tell the agent which pytorch wheel to download. CUDNN library can be included inside any wheel and it will work as long as the cuda / cudart exist on the system, for example pytorch wheels include the cudnn they use . agent.cudnn_version
should actually be deprecated, and is not actually used.
For future reference, dependency order:
Nvidia Drivers CUDA library and CUDA-runtime libraries (libcuda.so / libcudart.so) CUDN...
agent.cuda_driver_version = ...
agent.cuda_runtime_version = ...
Interesting idea! (I assume for reporting only, not configuration)
... The agent mentionned used output from nvcc (2) ...
The dependencies I shared are not how the agent works, but how Nvidia CUDA works π
regrading the cuda check with nvcc
, I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvid...
There is some overhead, but it should be negligible.
FranticCormorant35 DeterminedCrab71 please continue the discussion in this thread
It just seems frozen at the place where it should be spinning up the tasks within the pipeline
And is there an agent for those ? usually there is one agent for running logic tasks (like pipelines) running with --services-mode
which means multiple Tasks can be executed by the same agent. And other agents for compute Tasks that are a signle Task per agent (but you can run multiple agents on the same machine)
Hi SteadySeagull18
However, it seems to be entirely hanging here in the "Running" state.
Did you set a an agent to listen to the "services" queue ?
Someone needs to run the pipeline logic itself, it is sometimes part of the clearml-server deployment but not a mist
sets up the venv correctly, prints
Starting Task Execution:
then does nothing
Can you provide a log?
Do you see the code/git reference in the Pipeline Task details - Execution Tab ?
oh dear ...
ScrawnyLion96 let me check with front-end guys π
Is there a way to filter a experiments in a hyperparameter sweep based on a given range of a parameter/metric in the UI
Are you referring to the HPO example? or the Task comparison ?
(once you verify PR the fix, I'll make sure it is merged)
Correct π
You can spin it in two modes, either venv or docker (notice that even in docker mode, it will still clone the code into the docker and install the packages inside the docker, but it also inherits from the docker preinstalled system packages, so that the installation process is a lot faster, but you have the ability to change packages without having to build an entire new docker image)
I just assumed it should only be triggered by dataset related things but after a lot of experimenting i realized its also triggered by tasks...
VexedCat68 I think you are correct, and it should only be triggered by "Dataset" Tasks, that said maybe there is a bug , in which case if there are no additional filters it will get triggered on Any change in the project. This will explain how adding the tags filter solved the issue.
wdyt?
to setup ClearML agent in kubernetes with the SSH keys?
You can add env variable:CLEARML_AGENT__AGENT__FORCE_GIT_SSH_PROTOCOL="true"
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#dynamic-environment-variables
Hi ColossalAnt7 , I think we run into it on a few dockers, I believe the bug was fixed in the latest trains-agent
RC. Could you verify please ?
Hi ScaryKoala63
Which versions are you using (clearml / lightning) ?
BTW: what's the use case? Why do you need to open two Tasks in the same code/script ?
JitteryCoyote63
I agree that its name is not search-engine friendly,
LOL π
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merch with AI π π
Anyhow, this one definitely backfired...
PompousParrot44
you can always manually store/load models, example: https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/examples/reporting/model_config.py#L35 Sure, you can patch any frame work with something similar to what we do in xgboost, any such PR will be greatly appreciated! https://github.com/allegroai/trains/blob/master/trains/binding/frameworks/xgboost_bind.py
Hi CooperativeFox72 trains 0.16 is out, did it solve this issue? (btw: you can upgrade trains to 0.16 without upgrading the trains-server)
ModelCheckpoint('best_model', save_best_only=True)
That worked for me now, what's the diff
Hi WorriedParrot51
Let me shed some light on this complicated mechanism, because this is not very straight forward.
Basically the agent signals the trains package it should ignore the code calls, and use a specific Task in the backend (i.e. if in manual mode, the trains package logs the data into the trains-server, in agent mode (remote mode), it does the opposite and takes the data from the trains-server "into" the code)
Specifically, just like in manual mode, calling argparse.parse is be...
Hi WorriedParrot51
Assuming you run the code "manually" once (i.e. without the agent). Then when you call Task.init it will register the argparser.
When running with the agent, the first time you will call parse, it will automatically override the argparse defaults with the values stored in the Task.
Make sesne?
am getting None for Task.current_task() at the beginning of my script.
Task.init() is doing the magic , only after this call you will have current_task (either running manua...
could you remove it and test ?