Reputation
Badges 1
25 × Eureka!I don't see any requests
This points to configuration, specifically maybe it is directed to a different server?!
Hi IrritableJellyfish76
If you are running a code that uses clearml from kubeflow, you have out of the box integration between the two, what am I missing?
and about a month later for some reason the initial iteration seems to have changed to 0
Hmm, I see your point. Just so I fully understand, your are not saying Old experiments were changed, but new experiments (running the same code-ish) have a totally different max iterations value. Is this correct ?
Could it be it was never allocated to begin with ?
Ohh that cannot be pickled... how would you suggest to store it into a file?
JitteryCoyote63 this is standard ssh authorized server removal
https://superuser.com/a/30089
specifically you can try:ssh-keygen -R 10.105.1.77
Hi CheerfulGorilla72
the "installed packages" section is used as "requirements.txt for the agent.
Are you saying the autodetection fails to detect all packages? You can specify in "manual execution" (i.e not when the agent is running the code), to just take the requirements.txt locally:` Task.force_requirements_env_freeze(requirements_file="./requirements.txt")
notice the above call should be executed Before Task.init
task = Task.init(...) `3. If you clear all the "installed packages" se...
GiganticTurtle0 is it just --stop that throws this error ?
btw: if you add --queue default
to the command line I assume it will work, the thing is , without --queue it will look for any queue with the "default" tag on it, since there are none, we get the error.
regardless that should not happen with --stop
I will make sure we fix it
Just so we do not forget, can you please open an issue on clearml-agent github ?
But first I want to make sure the verify argument is actually used, hence False
Thanks @<1569496075083976704:profile|SweetShells3> for bumping it!
Let me check where it stands, I think I remember a fix...
without the ClearML Server in-between.
You mean the upload/download is slow? What is the reasoning behind removing the ClearML server ?
ClearML Agent per step
You can use the ClearML agent to build a socker per Task, so all you need is just to run the docker. will that help ?
Hi JuicyFox94
I think you are correct, this bug will explain the entire thing.
Basically what happens is that remote_execute stops the local run before the configuration is set on the Task. Then running remotely the code pull the configuration, sees that it is empty and does nothing.
Let me see if I can reproduce it...
training loop is within line 469, I think.
I think the model state is just post training loop (not inside the loop), no?
FiercePenguin76
So running the Task.init from the jupyter-lab works, but running the Task.init from the VSCode notebook does not work?
Correct:extra_docker_shell_script: ["apt-get install -y awscli", "aws codeartifact login --tool pip --repository my-repo --domain my-domain --domain-owner 111122223333"]
BTW MagnificentSeaurchin79 just making sure here:
but I don't see the loss plot in scalars
This is only with Detect API ?
BTW: how are you using them? should we have a direct interface to those ?
ShallowGoldfish8 this call does that:
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/advanced/execute_remotely_example.py#L127
Yes, no reason to attach the second one (imho)
Well, PipelineDecorator actually allows you to do the same thing, with the same ability that is clone / modify / enqueue.
(I mean, Pipeline with tasks is also great, I just want to clarify that they have the same capabilities in this respect).
Hmm how do you launch the autoscaler, code?
If you create an initial code base maybe we can merge it?
NastySeahorse61 it might that the frequency it tests the metric storage is only once a day (or maybe half a day), let me see if I can ask around
(just making sure you can still login to the platform?)
Why is it using an OutputModel and an InputModel?
So calling OutputModel will create the new Model entity and upload the data, InputModel will store it as required input Model.
Basically on the Task you have input & output section, when you clone the Task you are copying the input section into the newly created Task, and the assumption is that when you execute it, your code will create the output section.
Here when you clone the Task you will be clone the reference to the InputModel (i...
Hmm two questions: 1. How come it did not detect the packages when you were running the original task manually? 2. Could it be the poetry manager option is not working correctly?! Can you verify the venv is created with all packages? If so can you post the full log?
I should manually copy it to the remote services agents?
The code itself needs to run somewhere, currently this has to be your machine, either you manually run the AWS autoscaler or an agents runs it for you. Make sense ?