Hi SmallDeer34
Did you call Task.init ?
Was wondering how it can handle 10s, 100s of models.
Yes, it supports dynamically loading/unloading models based on requests
(load balancing multiple nodes is disconnected from it, but assuming they are under diff endpoints, the load balancer can be configured to route accordingly)
GiganticTurtle0 your timing is great, the plan is to wrap-up efforts and release early next week (I'm assuming GitHub fixes will be pushed tomorrow I'll post here once they are there)
Hi MammothGoat53
Do you mean working with RestAPI directly?
https://clear.ml/docs/latest/docs/references/api/events
you can also increase the limit here:
https://github.com/allegroai/clearml/blob/2e95881c76119964944eaa0289549617e8afeee9/docs/clearml.conf#L32
So in theory you can clone yourself 2 extra times and push into an execution queue, but the issue might be actually making sure the resources are available. what did you have in mind?
It only happens in the clearml environment, works fine local.
Hi BoredHedgehog47
what do you mean by "in the clearml environment" ?
based on this one:
https://stackoverflow.com/questions/31436407/git-ls-remote-returns-fatal-no-remote-configured-to-list-refs-from
I think this is a specific issue of the local git repo configuration, can you verify
(btw: I tested with git 2.17.1 git ls-remote --get-url
will return the remote url, without an error)
when I run it on my laptop...
Then yes, you need to set the default_output_uri
on Your laptop's clearml.conf (just like you set it on the k8s glue)
Make sense ?
Hmm reading this: None
How are you checking the health of the serving pod ?
So I have a task that just loads a model, but I don't see it as an artifact in the UI
You should see it under Artifacts, Input model if you are calling Keras load function (or similar)
BTW: any specific reason for going the RestAPI way and not using the python SDK ?
Hi @<1571308003204796416:profile|HollowPeacock58>
could you share the full log ?
WackyRabbit7
Long story short, yes, only by name (hashing might be too slow on large files)
The easiest solution, if the hash is incorrect, delete the local copy it returns, and ask again, it will download it.
I'm not sure if the hashing is exposed, but if it is not, we can add it.
What do you think?
Any idea why the Pipeline Controller is Running despite the task passing?
What do you mean by "the task passing"
If you have a requirements file then you can specify it:Task.force_requirements_env_freeze(requirements_file='requirements.txt')
If you just want pip freeze
output to be shown in your "Installed Packages" section then use:Task.force_requirements_env_freeze()
Notice that in both cases you should call the function Before you call Task.init()
btw, what do you mean by "Packages will be installed from projects requirements file" ?
Hi @<1687643893996195840:profile|RoundCat60> , I just saw the message,
Just by chance I set the SSH deploy keys to write access and now we're able to clone the repo. Why would the SSH key need write access to the repo to be able to clone?
Let me explain, the default use case for the agent is to use user/pass (as configured in the clearml.conf file(
It will change any ssh links to https links and will add the credentials to clone the repository.
You can also provide SSH keys (basicall...
Ohh if this is the case, you might also consider using offline mode, so there is no need for backend
https://clear.ml/docs/latest/docs/guides/set_offline#setting-task-to-offline-mode
But adding a simple
force_download
flag to the
get_local_copy
That's sounds like a good idea
trains-agent runs a container from that image, then clones ...
That is correct
I'd like the base_docker_image to not only be defined at runtime
I see, may I ask why not just build it once, push it into artifactory and then have trains-agent
use it? (it will be much faster)
Hi CooperativeFox72 ,
From the backend guys, long story short, upgrade your machine => more cpu cores , more processes , it is that easy 🙂
Thanks, yes you are correct the color is derived from the series name, so I guess the issue is the name+Id is not kept in full screen
How can I ensure that additional tasks aren’t created for a notebook unless I really want to?
TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)
Yes I think the writer.add_figure
somehow crops the image
I'm going to follow your suggestion and just put the extra effort into distributing a pre-built image.
That sounds good 🙂
If you feel the need is important, I do have a hack in mind, it will be doable once we have support for entrypoint "-c python_code_here". But since this is still not available I believe you are right and build an image would be the easiest.
A note on the docker image, remember that when running inside the docker we inherit the system packages installed on the d...
Hi FierceFly22
Hi, does anyone know where trains stores tensorboard data
Tesnorboard data is stored wherever you point your file-writer to 🙂
What trains is doing is while tensorboard writes it's own data to disk, it takes the data (in-flight) and sends it to the trains-server. The trains-server puts everything in the DB, so later everything is viewable & searchable.
Basically you don't need to store your TB files after your experiment is done, you have all the data in the trains-s...
SmarmySeaurchin8 checks the logs, maybe you can find something there