Reputation
Badges 1
25 × Eureka!could be nice to have a direct "task comparison" link in the UI somewhere,
you mean like a "cart" for comparison ? or just to "save the state" so you can move between projects ?
When looking at the worker details, it says "No queues currently assigned to this worker"
Yes, I think we should have better information there, the "AWS service" is not directly pulling jobs from any specific queue, hence nothing there. It is "listening" to queues and launching machines, those machines will be listening to the queue. I wonder if it is just easier to also make sure it is listed as "assigned" to those queues . wdyt?
Does it mean I can use clearml-serving helm chart alone
Unrelated, the clearml-serving can be deployed on k8s or with docker-compose regardless of where/how clearml-server is deployed
Yeah I think this kind of makes sense to me, any chance you can open a GH issue on this feature request?
Hi UnsightlyLion90
from my understanding agent do the job of SLURM,
That is kind of correct (they overlap in some ways π )
Any guide of how to integrate both of them?
The easiest way is to just add the "Task.init()" call to your code, and use SLURM to schedule the job. this will make sure all jobs are fully logged (this can also includes automatically uploading the models, and artifact support etc)
Full SLURM support (i.e. similar to the k8s glue support), is currently ou...
BTW: if you could implement _AzureBlobServiceStorageDriver
with the new Azure package, it will be great:
Basically update this class:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L1620
I think that just backing up /opt/clearml and moving it should be just fine π€
Ohh so even easier:print(client.workers.get_all())
Hi MagnificentSeaurchin79
This means the tensorflow was not directly imported in the repository (which is odd, it might point to the auto package analysis failing to find a the package, if this is the case please let me know)
Regardless, if you need to make sure a package is listed in the requirements either import it or use.Task.add_requirements('tensorflow')
or Task.add_requirements('tensorflow', '2.3.1')
first try the current setup usingΒ
pip
, and if it fails, useΒ
poetry
Β ifΒ
poetry.lock
Β exists
I guess the order here is not clear to me (the agent does the opposite), why would you start with pip if you are using poetry ?
EnviousStarfish54 a fix is already available in the latest RC
Could you verify it solves your issue as well?pip install trains==0.16.2rc0
Hmm DepressedChimpanzee34 my bad it seems the loading is done via YAML loader, but the dumping is straight forward str casting...
https://github.com/allegroai/clearml/blob/6e6271fb91f2aeb2aa7a13c6d07d4e635baaa670/clearml/backend_interface/task/task.py#L934
What would you expect to get (BTW "value\blah"
is Not a valid string assignment in python as there is no \b escape character, it should be "value\blah" which translates into the text "value\blah")
Hi EnviousStarfish54
docker on windows , with nvidia runtime support is only with WSL (I think)
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-wip
https://medium.com/@dalgibbard/docker-with-gpu-support-in-wsl2-ebbc94251cf5
I still wonder how no one noticed ... (maybe 100 unique title/series report is relatively high threshold)
Seems like settings on the clearml-server disappeared (specifically default queue tag?!)
Hi AmiableFish73
Hi all - is there an easier way track the set of datasets used by a particular task?
I think the easiest is to give the Dataset an alias, it will automatically appear in the Configuration section:Dataset.get(..., alias="train dataset")
wdyt?
Hi @<1610083503607648256:profile|DiminutiveToad80>
I think we will need more context for the log...
but I think there is something wrong with the GCP resource configuration of your autoscaler
Can you send the full autoscaler log and the configuration ?
It is way too much to pass on env variable π
CrookedWalrus33
Force SSH git authentication, it will auto mount the .ssh from the host to the docker
https://github.com/allegroai/clearml-agent/blob/6c5087e425bcc9911c78751e2a6ae3e1c0640180/docs/clearml.conf#L25
@<1524922424720625664:profile|TartLeopard58> @<1545216070686609408:profile|EnthusiasticCow4>
Notice that when you are spinning multiple agents on the same GPU, the Tasks should request the "correct" fractional GPU container, i.e. if they pick a "regular" no mem limit.
So something like
CLEARML_WORKER_NAME=host-gpu0a clearml-agent daemon --gpus 0 clearml/fractional-gpu:u22-cu12.3-2gb
CLEARML_WORKER_NAME=host-gpu0b clearml-agent daemon --gpus 0 clearml/fractional-gpu:u22-cu12.3-2gb
```...
why doesn't this happen on my other experiments?
same 100+ reports ?
(My new theory is that calling Task.reload() will fix it, and it might be called internally for the other experiments, like when reporting models/artifacts)
Could that be the case ?
So are you saying why do we need to install a specific pip version ?
You can "disable it" by selecting a very high versionpip_version: "<40"
https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L67
Hi @<1600299043865497600:profile|MagnificentSeaurchin90>
Any chance you can provide more info on the error?
if I want to compare two experiments the scalar plots do not load ( loading forever ).
I'm assuming the issue is the Plots tab? or is it the Scalars? what do you have in the Plots? can you send an image of the single experiment ?
It uses only one CPU core, could I use multiprocessing somehow?
Hi EcstaticMouse10
Hmm, yes it should be multi core:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/clearml/datasets/dataset.py#L1175
wdyt?