Reputation
Badges 1
25 × Eureka!it seems like each task is setup to run on a single pod/node based on the attributes like
gpu memory
,
os
,
num of cores,
worker
BoredHedgehog47 of course you can scale on multiple node.
The way to do that is to create a k8s Yaml with replicas, each pod is actually running the exact same code with the exact same setup, notice that inside the code itself the DL frameworks need to be able to communicate with one another and b...
Hi @<1610808279263350784:profile|FriendlyShrimp96>
Is there a way to get a list of variants given a metric, or even just a full list of metrics and variants for a given task id?
Try this
None
from clearml.backend_api.session.client import APIClient
c = APIClient()
metrics = c.events.get_task_metrics(tasks=["TASK_ID_HERE"], event_type="training_debug_image")
print(metrics)
I think API ...
Oh no, I just saw the message @<1541954607595393024:profile|BattyCrocodile47> is this stills relevant?
so i end up having to clone the other ones manually in my code
Hi ConvolutedChicken69
Yes the problem is that there is no standard for multi repo environments
The best solution I can come up with is using git-submodules or packaging the auxiliary repo as wheels. wdyt?
oh, if this is the case, why not use the "main" server?
It's the same but done from outside, you want the same and "offline" as well right?
Hmm, this means the step should have included the git repo itself, which means the code should have been able to import the .py
Can you see the link to the git repository on the Pipeline step Task ?
Hi PanickyMoth78
So do not tell anyone, but the next version will have reports built in clearml, as well as the ability to embed graphs in 3rd party (think Notion GitHub, markdown etc.)
Until then (ETA mid Dec), the easiest is to download an image or just use the url (it encodes the full view, so when someone clicks on it they the exact view you are seeing)
So good news (1) Dashboard is being worked on as we speak. (2) we released clearml-serving doing exactly that, the next release of clearml-serving will include integration with kfserving (under the hood) essentially managing the serving endpoints on top of the k8s cluster , wdyt?
I get gaps in the graphs.
For example, the first time I run, I create a task and run a loop:
Hi SourOx12
Is this related to this one?
https://github.com/allegroai/clearml/issues/496
JitteryCoyote63 hacky but sure π
` from trains.config import config_obj
print(config_obj) `
clearml doesnβt do any βmagicβ in regard to this for tensorflow, pytorch etc right?
No π and if you have an idea on how, that will be great.
Basically the problem is that there is no "standard" way to know which layer is in/out
Hmm, it seems as if the task.set_initial_iteration(0) is ignored...
What's the clearml version you are using ?
Is it the same one you have on the local machine ?
Hi PerplexedWalrus3
you should get something like the following on the console :ClearML Task: created new task id=1ca59ef1f86d44bd81cb517d529d9e5a 2021-07-25 13:59:09 ClearML results page:
2021-07-25 13:59:16
Hi DisgustedDove53
When you say "deployment" there are a lot of way to interpret that π what exactly are you looking for ?
why doesn't this happen on my other experiments?
same 100+ reports ?
(My new theory is that calling Task.reload() will fix it, and it might be called internally for the other experiments, like when reporting models/artifacts)
Could that be the case ?
Hi CooperativeFly2
is it possible to create multiple train-agent per gpu
Yes you can, that said memory cannot be actually shared between GPU processes (GPU time is obviously shared) so you have to be careful with the Tasks actually being executed in parallel.
For instance:TRAINS_WORKER_NAME=host_a trains-agent daemon --gpus 0 --queue default TRAINS_WORKER_NAME=host_b trains-agent daemon --gpus 0 --queue default
I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) o
Hmm is this part of the gRPC interface of Triton? if it is, we should be able to add that quite easily,
SubstantialElk6
Notice if you are using a manual setup the default is "secure: false" you have to change it to "secure: true":
https://github.com/allegroai/clearml-agent/blob/176b4a4cdec9c4303a946a82e22a579ae22c3355/docs/clearml.conf#L251
Hi @<1564422644407734272:profile|DistressedCoyote60>
I have an ML project that is not on git. It is separated into several files:
Yes you are correct , if your code is composed of multiple files, you have to have a git repo for the agent to be able to run it.
π
That said, it is free on an any of these services github/bitbucket/gitlab π
So you could change it down the road if infra/hosting changes.
Internally this is doable and Enterprise edition supports it, at the end this is stored in DBs π
Also in this case, I'm uploading the data to the public file server URL, but my k8 pod can't reach that for security reasons.
Yes, this is solvable as well (again sorry for pointing it, but only in the enterprise version), where you can specify per client or globally:
` path_substitution = [
# Replace regis...
TrickyRaccoon92 actually Click is on the to do list as well ...
It's more or less here:
https://github.com/allegroai/clearml-session/blob/0dc094c03dabc64b28dcc672b24644ec4151b64b/clearml_session/interactive_session_task.py#L431
I think that just replacing the package would be enough (I mean you could choose hub/lab, which makes sense to me)
All in all, seems like it will be fairly easy to add JupyterHub to clearml-session, and that would solve your issue, no?
(and it seems from implementation perspective, this will not be a lot of work)
wdyt?
You could change infrastructure or hosting, and now your data is associated with the wrong URL
Yeah that makes sense, so have it on a specific dns name? (this is usually the case with k8s deployments)
So this is an additional config file with enterprise?
Extension to the "clearml.conf" capabilities
Is this new config file deployable via helm charts?
Yes, you can also set it company/user wide using the clearml Vault feature (again enterprise, sorry π )
Basically what I want is aΒ
clearml-session
Β but with a docker container running JupyterHub instead of JupyterLab.
I missed that π
The idea of clearml-session
is to launch a container with jupyterlab (or vscode) on a remote machine, and connect the users machines (i.e. the machine executed the clearml-session
CLI) directly into the container.
Pleacing the jupyterlab with JupyterHub will be meaningless here, becuase the idea it spins an instance (contai...
When a remote task runs
Dataset.get()
it is not using the correct URL
BoredHedgehog47 it will get the link the data was Registered with, when creating the Dataset.
This has Nothing to do with the local configuration, it can point to any arbitrary file location on the internet.
It was created there, because at the time of the dataset creation someone (manually or via the config) set a specific host as the file location, and to that host the files were uploaded (again ...