Reputation
Badges 1
25 × Eureka!but DS in order for models to be uploaded,
you still have to set:
output_uri=True
in the
No, if you set the default_output_uri, there is no need to pass output_uri=True in the Task.init() 🙂
It is basically setting it for you, make sense ?
Hi @<1697056701116583936:profile|JealousArcticwolf24>
Awesome deployment 🤩
Yes if you need another scalable model serving you can just run another instance of the clearml-serving-inference
https://github.com/allegroai/clearml-serving/blob/7ba356efc97a6ae2159283d198d981b3c1ab85e6/docker/docker-compose.yml#L77
So you end up with two of them, one per models environ...
Our datasets are more than 1TB in size and will grow in size (probably 4TB and up), this means we also need 4TB local storage
Yes, because somewhere you will have to store your unzipped files.
Or you point to the S3 bucket, and fetch the data when you need to access it (ore prefetch it) with the S3 links the Dataset stores, i.e. only when accessed
Hi @<1523708901155934208:profile|SubstantialBaldeagle49>
If you report on the same iteration with the same title/series you are essentially overwriting the data (as expected)
Regrading the plotly report size.
Two options:
- round down numbers (by default it will store all the digits, and usually after the forth it's quite useless, and it will drastically decrease the plot size)
- Use logger.report_scatter2d , it is more efficient and has a mechanism to subsample extremely large graphs.
The image is
allegroai/clearml:1.0.2-108
Yep, that makes sense, seems like a backwards compatibility issue
AttractiveCockroach17 can I assume you are working with the hydra local launcher ?
JitteryCoyote63 This seems like exactly what you are saying, elastic license issue...
Let me know if I can be of help 🙂
I would like to start off by saying that I absolutely love clearml.
@<1547028031053238272:profile|MassiveGoldfish6> thank you for saying that! 😍
Is is possible to download individual files from a dataset without downloading the entire dataset? If so, how do you do that?
Well by default files are packaged into multiple zip files, you can control the size of the zip file for finer granularity, but at the end when you download, you are downloading the entire packaged ...
so I didn't have much time to upgrade all the packs because I have some issues with that but it is on my todo list
No worries 🙂
Quick question, if you run https://github.com/allegroai/trains/blob/master/examples/frameworks/keras/legacy/keras_tensorboard.py
Do you see models in the artifacts tab?
WickedGoat98 nice!!
Can you also pass the login screen (i.e. can you access the api server)
Let me know if you managed to get it working, then we can see if we can detect it automatically.
Hi @<1523711619815706624:profile|StrangePelican34>
if I am trying to deploy 100 models on a GPU that can handle 5 concurrently,
Main limitation is Triton's ability to dynamically load / unload models. We know Nvidia is adding this capability, but I think this is still not out, once they support it, it should be transparent
i had a misconception that the conf comes from the machine triggering the pipeline
Sorry, this one :)
Hi SkinnyPanda43
Are you trying to access the same Task or an external one ?
Hi @<1529633468214939648:profile|CostlyElephant1>
Is it possible to get user ID of the current user
On the Task.data object itself there should be a filed named " user " that's the user ID of the owner (creator) of the Task.
You can filter based on this id with
Tasks.get_tasks(..., task_filter={'user': ["user-id-here"]})
wdyt?
BTW: the agent will resolve pytorch based on the install CUDA version.
Hi @<1573119962950668288:profile|ObliviousSealion5>
Hello, I don't really like the idea of providing my own github credentials to the ClearML agent. We have a local ClearML deployment.
if you own the agent, that should not be an issue,, no?
forward my SSH credentials using
ssh -A
and then starting the clearml agent?
When you are running the agent and you force git clonening with SSH, it will autmatically map the .ssh into the container for the git to use
Ba...
Yes... I think that this might be a bit much automagic even for clearml 😄
that does happen when you create a normal local task, that's why i was confused
The parts that are not passed in both cases are the configurations from the conf file. Only the environment is passed (e.g. git python packages etc) , . For example if you have storage credentials in your conf file , they are not passed to a remote agent, instead the credentials from the remote agent are used when it runs the task.
make sense?
So could it be that pip install --no-deps . is the missing issue ?
what happens if you add to the installed packages "/opt/keras-hannd" ?
btw: both should work fine
owning the agent helps, but still it's much better if the credentials don't show up in logs,
They are not, they are always filtered out,
- how does
force_git_ssh_protocolhelp please? it doesn't solve the issue of the agent simply not having accessIt automatically maps the host .ssh into the container, so that git can use SSH to clone.
What exactly is not working?
and how are you configuring it?
Hi @<1547028031053238272:profile|MassiveGoldfish6>
Is there a way for ClearML to simply save the model once training is done and to ignore the model checkpoints?
Yes, you can simple disable the auto logging of the model and manually save the checkpoint:
task = Task.init(..., auto_connect_frameworks={'pytorch': False}
...
task.update_output_model("/my/model.pt", ...)
Or for example, just "white-label" the final model
task = Task.init(..., auto_connect_frameworks={'pyt...
Can you clone the git with the .ssh credentials on the host machine ?
If so, can you do the same manually inside a docker (i.e. spin a docker with mount -v /home/hostuser/.ssh:/root/.ssh) ?
JitteryCoyote63
So there will be no concurrent cached files access in the cache dir?
No concurrent creation of the same entry 🙂 It is optimized...