Hi DilapidatedDucks58 , what is your server version?
Hi @<1526371965655322624:profile|NuttyCamel41> , can you add the full log?
@<1533257278776414208:profile|SuperiorCockroach75> , excuse my ignorance, but doesn't it depend on the output model i.e. the training run that created it?
What do you mean? How are you running the pipeline - locally or remotely?
Hi @<1523701457835003904:profile|AbruptHedgehog21> , I'm not sure I understand - How do you use set_base_docker and what do you expect to happen?
Hi @<1813745484821434368:profile|SuccessfulPigeon84> , these are Enterprise only features as far as I'm aware. I would suggest contacting ClearML's sales 🙂
Setting the upload destination correctly and doing the same steps again
Hi @<1535069219354316800:profile|PerplexedRaccoon19> , not sure I understand what you mean, can you please elaborate on what you mean by doing the evaluations within ClearML?
What versions of clearml-agent & clearml are you using? Is it a self hosted server?
How about this by the way?
https://clear.ml/docs/latest/docs/references/sdk/model_outputmodel#outputmodelset_default_upload_uri
Hi @<1806135344731525120:profile|GrumpyDog7> , it shows the reason in the log:
Python executable with version '3.9' requested by the Task, not found in path, using '/usr/bin/python3' (v3.12.3) instead
You either need a container with the relevant python version available or have it installed using the bash script section.
Makes sense?
TenseOstrich47 , you could create a monitor task that reads model performance from your database and reports them as some scalar. According to that scalar you can create triggers 🙂
What do you think?
external trigger
What do you mean? Do you have a reference?
Any specific reason not to use the autoscaler? I would imagine it would be even more cost effective
Hi @<1856144871656525824:profile|SparklingFly7> , can you describe the issue you're experiencing? I saw there is a new response in github - None
btw what os are you on?
try with pip install -U clearml==1.7.2rc1
Hi @<1632913939241111552:profile|HighRaccoon77> , the most 'basic' solution would be adding a piece of code at the end of your script to shut down the machine but obviously it would be unpleasant to run locally without Task.execute_remotely() - None
Are you specifically using Sagemaker? Do you have any api interface you could work with to manipulate shutdown of machines?
Hi @<1523701066867150848:profile|JitteryCoyote63> , you mean a global "env" variable that can be passed along the pipeline?
Can you check the machine status? Is the storage running low?
Yes. Run all the pipelines examples and see how the parameters are added via code to the controller.
For example:
None
I think something might be block ports on your local machine. Did you change ports mapping for the ClearML dockers?
` Status: Downloaded newer image for nvidia/cuda:10.2-runtime-ubuntu18.04
1657737108941 dynamic_aws:cpu_services:n1-standard-1:4834718519308496943 DEBUG docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
time="2022-07-13T18:31:45Z" level=error msg="error waiting for container: context canceled" `As can be seen here 🙂
Check the pre_execute_callback and post_execute_callback arguments of the component.
Do you mean see the datasets in the UI?
Hi @<1695969549783928832:profile|ObedientTurkey46> , is this happening when running on top of the agent or locally?
GrievingTurkey78 , I'm not sure. Let me check.
Do you have cpu/gpu tracking through both pytorch lightning AND ClearML reported in your task?
Hi @<1533619725983027200:profile|BattyHedgehong22> , does the package appear in the installed packages section of the experiment?
I'm sorry. I think I wrote something wrong. I'll elaborate:
The SDK detects all the packages that are used during the run - The Agent will install a venv with those packages.
I think there is also an option to specify a requirements file directly in the agent.
Is there a reason you want to install packages from a requirements file instead of just using the automatic detection + agent?