Reputation
Badges 1
25 × Eureka!Oh found it:temp.linux-aarch64-cpython-39
this is Arm?!
Hi SquareFish25
Sure, here are a few:
HPO
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Pipeline
https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py
Automation:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
the task is being Aborted rather than be in Draft. Am I missing something?
Yes, the reason is for not missing anything that you might have reported on it.
And usually execute_remotely will get the execution queue as a paramter (i.e. immdiatly launching the Task)
You can now (starting v1.0) enqueue an aborted Task so it should not make a difference, you can also reset the Task and edit it in the UI
Oh you achieve exactly the same with plotly and te restapi/python interface.
Basically pull data from tasks , create visualization and log it on one if the Task or on a new one
Or can it also be right after
Task.init()
?
That would work as well 🙂
And other question is clearml-serving ready for serious use?
Define serious use? KFserving support is in the pipeline, if that helps.
Notice that clearml-serving is basically a control plane for the serving engine, not to neglect the importance of it, the heavy lifting is done by Triton 🙂 (or any other backend we will integrate with, maybe Seldon)
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
So sorry I missed this thread 🙏
Basically your issue is the load balancer that prevents the post command, you can change that, just add to any clearml.conf the following line:
api.http.default_method: "put"
Ohh, sure then editing git config will solve it.
btw: why would you need to do that, the agent knows how to do this conversion on the fly
Hi @<1569496075083976704:profile|SweetShells3>
Try to do:
import torch.distributed as dist
if dist.get_rank()==0:
task = Task.init(...)
This will make sure only the "master" process is logged
or
if int(os.environ.get('RANK'))==0:
task = Task.init(...)
IntriguedRat44 If the monitoring only shows a single GPU (the selected one) it means it reads the correct CUDA_VISIBLE_DEVICES (this is how it knows that you are only using a selected GPU not all of them).
There is nothing else in the code that will change the OS environment.
Could you print os.environ['CUDA_VISIBLE_DEVICES'] while running the code to verify ?
Ohh I see now the force SSH did not replace the user in the SSH link (only if the original was http), right ?
Suppose that a new model version 2 is trained, but it does not fulfill our target metrics, is it possible to just save the model to model repo and not serve it, if a model version 1 is already being served?
Sure, just do not "publish" the model, it will be stored in the model repository, fully accessible but the clearml-serving will not serve it 🙂
I think non-master processes trying to log something, but have no Logger instance because have no Task instance.
Hmm is your code calling Logger.current_logger()
directly ?
Logs in master process include all training history or I need to concatenate logs from different nodes somehow?
So the main problem is that you need to pass the TASK ID that the master node creates to the second node, so it can report to the same Task.
I know that the enterprise version of ClearML support...
I still see things being installed when the experiment starts. Why does that happen?
This only means no new venv is created, it basically means install in "default" python env (usually whatever is preset inside the docker)
Make sense ?
Why would you skip the entire python env setup ? Did you turn on venvs cache ? (basically caching the entire venv, even if running inside a container)
EnviousPanda91 please feel free to PR if it works 🙂
https://github.com/allegroai/clearml/blob/86586fbf35d6bdfbf96b6ee3e0068eac3e6c0979/clearml/binding/frameworks/catboost_bind.py#L114
EnviousStarfish54 we just fixed an issue that relates to "installed packages" on windows.
RC is due to be release in the upcoming days, I'll keep you posted
I like this approach more but it still requires resolved environment variables inside the clearml.conf
Yes 😞 maybe this is a feature request ?
Thanks ContemplativePuppy11 !
How would you pass data/args between one step of the pipeline to another ?
Or are you saying the pipeline class itself stores all the components ?
the SDK is unable to see each of the nodes?
Exactly ! I mean I love the idea of "nested" component, but implementation wise this is not trivial, it will also hurt the ability of caching individual component. The workaround is to have all the "business logic" in the pipeline function itself, routing data between components is basically "free". The data does not actually go through the pipeline logic, it only passes reference (unless the pipeline logic actually tries to access the data o...
Ohh that's why you don't have it 🙂
NaughtyFish36
No module named 'leap.learn.data_tools.merge_data.merge_data'
This seems to be the error but I cannot see leap
in the installed packages , Notice that if the Task has "Installed Packages" section then the agent will use that Not the "requirements.txt" , Only if this section is Empty it will revert to the "requirements.txt" in the repo.
How did you create the Task in the first place?
I see that you added "leap" into the initial bashscript, actually you should add i...
I'm really for adding an interface, but I was not able to locate a simple integration option with basically anything, Wdyt ?
NaughtyFish36
what's the error you are getting?
Also did you try setting: force_git_ssh_protocol: true
?
https://github.com/allegroai/clearml-agent/blob/76c533a2e8e8e3403bfd25c94ba8000ae98857c1/docs/clearml.conf#L39
FYI matplotlib imshow will create a debug image, and on complex plots the plot might get converted to image. (But shown under the plots section). All in all you might not be aware of it, but you are uploading image to your files server
but I cannot compare between them
I think we noticed it, and this will be fixed in the next server update (again, some plotly.js issue there)
Containers are not running
? but you are running the docker-compose, how come no containers are running ?
Hi SkinnyPanda43
I realized that the params are not being saved anymore
Could you test with clearml==1.0.4 ?