Reputation
Badges 1
25 × Eureka!GrittyHawk31 by default any user can login (i.e. no need for password), if you want user/password access:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config/#web-login-authentication
Notice no need to have anything else in the apiserver.conf , just the user/pass section, everything else will just be the default values.
well I do not think you set your pytorch lightining to use cuda:
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/code/.venv/lib/python3.9/site-packages/lightning/pytorch/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
MelancholyBeetle72 I think we collect them in Issue 81 on GitHub, feel free to add it if it is missing 🙂
https://github.com/allegroai/clearml/issues/81
ssh: Could not resolve hostname
: Name or service not known
@<1542316991337992192:profile|AverageMoth57> so is this the main issue? this seems unrelated to the Gerrit thing, just missing configuration of the .ssh on the agent machine, is that correct?
is everything on the same network?
SmarmySeaurchin8
updated_tags = task.tags
updated_tags.remove(tag)
task.tags = updated_tags
Ooops 😞
task.get_tags()
task.set_tags()
Hi MelancholyBeetle72
You mean the venv creation takes the bulk of the time, or it something else ?
The latest TAO doesn't use python for fine tuning, rather it uses the CLI entirely
It's a good question, but I think the CLI actually just runs a python code (the CLI is their interface). Generally speaking I'm pretty sure it will not be complicated to convert the TLT integration to support TAO (Nvidia helps with that, and I think we had a similar proces with Nvidia Clara/MONAI)
BTW: how are you using Nvidia TAO ?
It should actually work the same, if you find out it fails to properly register let me know (and then I guess a github issue is the next step)
Does what you suggested here >
Yes, it is basically the same underlying mechanism, only instead of 1-to-1 it's 1-to-many
Hi @<1541954607595393024:profile|BattyCrocodile47>
Can you trigger a pre-existing Pipeline via the ClearML REST API?
Yes
'd want to have a Lambda function trigger the Pipeline for a batch without needing to have all the Pipeline code in the lambda function.
Easiest is to use clearml SDK, which basically is clone / enqueue (notice that pipeline is also a kind of a Task). See here: [None](https://github.com/allegroai/clearml/blob/3ca6900c583af7bec18792a4a92592b94ae80cac/example...
Hi PompousParrot44
What do you have in the Execution/"script path" ?
Why would you need to manually change the current run? you just provided the values with either default/command-line ?
what am I missing here?
ResponsiveHedgehong88 I'm not sure I state dit, but the argparser arguments and values are collected automatically from your current run and put on the Task, there is no need to manually set them if you have the argparser running on your machine. Basically it collects the current (i.e. the process running on your machine) settings, and "copies" them ...
I managed to set up my (Windows) laptop as a worker and reproduce the issue.
Any insight on how we can reproduce the issue?
Thanks CharmingShrimp37 !
Could you PR the fix ?
It will be just in time for the 0.16 release 🙂
What's the host you have in the clearml.conf ?
is it something like " http://localhost:8008 " ?
Hi SubstantialElk6
Yes this is the queue the glue will pull jobs from and push into the k8s. You can create a new queue from the UI (go to the workers&queues page and to the Queue Tab and press on "create new" Ignore it 🙂 this is if you are using config maps and need TCP routing to your pods As you noted this is basically all the arguments you need to pass for (2). Ignore them for the time being This is the k8s overrides to use if launching the k8s job with kubectl (basically --override...
WittyOwl57 that is odd there is a specific catch for SystemExit
https://github.com/allegroai/clearml/blob/51d70efbffa87aa41b46c2024918bf4c584f29cf/clearml/backend_interface/task/repo/scriptinfo.py#L773
How do I reproduce this issue/warning ?
Also: "Repository and package analysis timed out (300.0 sec), giving up" seriously ove 5 minutes ?! how large is the git repo?
You can change it the CWD folder, if you put . in working dir it will be the root git repo, but you can do any subfolder, obviously you need to change the script path to match the folder, e.g. ./folder/script.py etc.
If there a way to do this without manually editing installed packages?
Running your code once with Task.init should automatically detect all the directly imported packages, then when trains-agent executes the Task, it will install them to a clean venv and put back all the packages inside the venv.
In order for all the used packages (e.g. bigquery) to appear in the "Installed packages" your cide needs to be executed once manually (i.e. not with trains-agent) then the ` tra...
I was using clearml == 0.17.5 and I also had this issue
I think it was introduced when we moved to subprocess reporting, with 0.17.5
You can disable it with the following in clearml.conf:sdk.development.report_use_subprocess = false
If you spin two agent on the same GPU, they are not ware of one another ... So this is expected behavior ...
Make sense ?
ReassuredTiger98
(for some reason it kind of jumps over PyTorch, but then installs torchvision?!)
Could you run with the latest with --debug
We just added but you will have to install from git:pip3 install git+Then run with --debug:clearml-agent --debug daemon ...
Hi DilapidatedDucks58 ,
Just making sure all 8 works have different worker ids? (you can see 8 in the workers page in the UI)
Also, are they running this docker or venv mode?
but now serving is not able to locate the model file itself,
from your screen shot the file seems to be in local folder somewhere "file://" it should be in the file server or in object storage, how did it get there? how is the file server configured
LOL I keep typing clara without noticing (maybe it's the nvidia thing I keep thinking about)
Carla makes much more sense 😄
Hi @<1523702932069945344:profile|CheerfulGorilla72>
Please tell me what RAM metric is tracked by ClearML?
Free RAM is the entire machine free RAM
Yeah htop shows odd numbers as it doesn't "count" allocated buffers
specifically you can see the code here:
None