Reputation
Badges 1
25 × Eureka!When are those keys used?
They are the default keys for internal access, basically just make up something, otherwise someoune could access the server with the default keys
Interesting... TrickyRaccoon92 could it be the validation phase was creating a new Tensorboard file ?
This will set more time before the timeout right?
Correct.
task.freeze_monitor()
download()
task.defrost_monitor()
Currently there isn't, but that's a good ides.
What would be the argument of using it vs increasing the timeout ?
btw: setting the resource timeout to 99999 will basically mean that it will wait until the first reported iteration, Not that it will just sleep for 99999sec 🙂
Hi PungentLouse55
Hope you are not tired of me
Lol 🙂 No worries
I am using trains 0.16.1
Are you referring to the trains-server version or the python package ? (they are not the same and can be of totally different versions)
ValueError('Task object can only be updated if created or in_progress')
It seems the task is not "running" hence the error, could that be
Hi @<1523701066867150848:profile|JitteryCoyote63>
Hi, how does
agent.enable_git_ask_pass
works
basically it pushes the pass through stdin to git when it asks (it is a git feature)
JitteryCoyote63 I think I found the bug in clearml-task it adds it at the end instead of before everything else
With offline mode,
Later if you need you can actually import the execution (including artifacts etc.) you just need the zip file it creates when you are done.
task=Task.current_task()
Will get me the task object. (right?)
PanickyMoth78 yes, always, from anywhere, this is a singleton object 🙂
Hi WickedGoat98
Regardless on the ingress configuration (which seems like you have the hang of), the API instance itself needs to be configured with persistent volume (the web / file server do not need direct access to the API server).
Can you get the API to run properly ?
Regrading the trains-agent once you have the API/Web/File server configured, you can configure it like the trains-agent-services is configured inside the docker-compose (e.g. set the environment variable with the c...
You are doing great 🙂 don't worry about it
PanickyMoth78 quick update the fix is already being tested, I'm hoping an RC tomorrow 🙂
Hi DilapidatedDucks58
how to force-reinstall package from github in Installed Packages
You mean make sure that the agent installs it from github?
The "Installed packages" section is equivalent to "requirements.txt" anything you can put in requirements.txt, you can put there.
For example adding to "Installed Packages"git+Will make sure you install the latest clearml from GitHub.
Notice that you cannot have two packages with the same name (just like with regular requirements.txt)...
Hi DeterminedToad86
I just verified on a clean sagemaker instance everything should just work, see here: https://demoapp.demo.clear.ml/projects/0e919ea1cc5c499b99e1ab85004b6e97/experiments/887edef09d4549e88b829a34c87d4d5b/output/execution Yes if you have more than one file (either notebook or python script) than you must have a git repo, in order to run the task using the Agent.
@<1710827340621156352:profile|HungryFrog27> the venv-build folder is supposed to be deleted after each task is done. How did you end up with leftovers? Could it be windows was failing to delete it for some reason? That actually connects with you initial issue no?
Hi RotundHedgehog76
we have issues with
clearml-agent
when using standalone mode. ...
What is the use case for standalone mode? is this venv or docker mode?
Hi UpsetTurkey67
repository discovery stores github repo in the form:
...
while for others
git@github.com:...
Yes that depends on how they locally cloned the repo (via SSH or user/pass/token)
Interestingly in the former case the ssh config is ignored and cloning repository breaks on the worker
If you have passed git user/pass to the agent it should use them not SSH, how did you configure the agent ?
I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) o
Hmm is this part of the gRPC interface of Triton? if it is, we should be able to add that quite easily,
I think the part that is missing for me is the context, in other words how would one configure the execution_plan and why would they configure it in a specific way?
My intuition, without fully understanding it, is that for some reason the internal DAG/decision is exposed to the user, and it feels like too much information. Basically I have a hunch that the users should not need to have such deep understanding to control the flow, and they should end up with an abstraction on top of it. ...
are you referring to the same line? 47 in cache.py?
just to check. Does the k8s glue install torch by default?
SubstantialElk6 what do you mean the glue installs torch ?
The glue will take a Task from the queue create a k8s job (basically use the same docker and inside the docker run get the agent to execute the requested Task). Where would the "torch" come into play?
Yes, that makes sense. But did you see the callback being executed ? it seems it was supposed to, then the next call would have been 2:30 hours later, am I missing something ?
I just tested the master with https://github.com/jkhenning/ignite/blob/fix_trains_checkpoint_n_saved/examples/contrib/mnist/mnist_with_trains_logger.py on the latest ignite master and Trains, it passed, but so did the previous commit...
Hi @<1618056041293942784:profile|GaudySnake67>Task.create is designed to create an External task not from the current running process.Task.init is for creating a Task from your current code, and this is why you have all the auto_connect parameters. Does that make sense ?
CooperativeFox72
Could you try to run the docker and then inside the docker try to do:su root whoami