Reputation
Badges 1
25 × Eureka!Then you have to pass the .ssh into the remote server, probably the easiest is to have it in the "extra bash script"
I would like to use ClearML together with Hydra multirun sweeps, but Iβm having some difficulties with the configuration of tasks.
Hi SoreHorse95
In theory that should work out of the box, why do you need to manually create a Task (as opposed to just have Task.init call inside the code) ?
now, I need to pass a variable to the Preprocess class
you mean for the construction ?
GiganticTurtle0
That definitely makes sense. Where can I specify callbacks in theΒ
PipelineDecorator
Β API?
Hmm there isn't one actually... (the interface I was thinking about was PipelineConroller ...)
Would it make sense to throw an exception in the pipeline execution code?
BTW: I just verified, if the pipeline step fails an exception is raised (ValueError)
Import Error sounds so out of place it should not be a problem :)
What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas
So you want the DS to manually tell the baseclasss what to store ?
then the base class will store it for them, for example with joblib , is this the...
Hi @<1569858449813016576:profile|JumpyRaven4>
- The gunicorn logs do not show anything including any error or trace of the 502 only siege reports the 502 as well as the ALB.Is this an ALB or an ELB ?
What's the timeout its configured?
Do you have GPU instances as well? what's theclearml-serving-inferencedocker version ?
JitteryCoyote63 look for the latest RC it should have the fix (output_uri=False) 1.7.3rc1
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
I see now, give me a minute I'll check
Hmm this is odd, when you press on the parent dataset in the UI, and go to full-details, then the INFO tab. Can you copy here everything ?
I can then programmatically choose which file to import with importlib. Is there a way to tell clearml programmatically to analyze the files, so it can built up the requirements correctly?
Sadly no π
It analyzes the running code, then if it decides it is not a self contained script it will analyze the entire repo ...
I just saw thatΒ
Task.create
Β takes
Task.create is Not Task.init. It is meant to allow you to create new Tasks (think Jobs) from ...
Hi AbruptWorm50
the second "epoch loss" is the scalar for the "validation" process (see "validation: epoch loss" series is actually the TF file/folder prefix automatically added)
Make sense ?
ConvolutedChicken69
basically the cleamrl-data needs to store an immutable copy of the delta changes per version, if the files are already uploaded, there is a good chance they could be modified...
So in order to make sure we you have a clean immutable copy, it will always upload the data (notice it also packages everything into a single zip file, so it is easy to manage).
ContemplativeGoat37
http://1.it seems the DNS resolving to the server fails? (Temporary failure in name resolution) Is this running on an agent, or manually ? "clearml.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###" Is this you manually aborting the Task or is it aborting itslef due to the connectivity ?
4. what's the clearml/clearml-agent versions ?
oh sorry my bad, then you probably need to define all OS environment variable for python temp folder for the agent (the Task process itself is a child process so it will inherit it)
TMPDIR/new/tmp TMP=/new/tmp TEMP=/new/tmp clearml-agent daemon ...
Any idea where that could come from? Could we turn off the local logging as well - in these kinds of runs we donβt need it?
It is supposed to create it automatically... I tested with other examples (clearml version 1.7.3rc1) everything seems to work
What am I missing? how do we recreate the issue ? can you verify it is still not working with the latest RC?
I like the idea of using the timeit interface, and I think we could actually hack it to do most of the heavy lifting for us π
(I'll make sure it is added to the docstring because apparently it was not there
Hmm I cannot think of something that will provide something a per user basis.
Wouldn't a global set of credentials that the agent is using be enough ?
(on the local machine, user can keep using the "definitions.py")
WackyRabbit7 just making sure I understand:MedianPredictionCollector.process_results Is called after the pipeline is completed.
Then inside the function, Task.current_task() returns None.
Is this correct?
I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)
CourageousKoala93 when you call Task.close() it will mark the task as completed, there is no need to do that manually. The idea with mark_completed is that you can forcefully change the state if needed, or externally stop the task and mark it completed. Make sense?
Hi UnevenOstrich23
if --docker is enable that will means every new experiments will be executed into dedicated agent worker containers?
Correct
I think the missing part is how to specify the docker for the experiment?
If this is the case, in the web UI, clone your experiment (which will create a draft copy, that you can edit), then in the Execution tab, scroll down to the "base docker image" and specify the docker image to use.
Notice that you can also add flags after the docker im...
No, clearml uses boto, this is internal boto error, which points bucket size limit, see the error itself
Okay fixed, you will be able to override it with output_uri=False (which is ignored on remote execution if you have a project default or Task output uri set in the UI).
Make sense ?
EcstaticGoat95 I can see the experiment but I cannot access the notebook (I get Binder inaccessible)
Is this the exact script as here? https://clearml.slack.com/archives/CTK20V944/p1636536308385700?thread_ts=1634910855.059900&cid=CTK20V944
PompousBeetle71 , These are cuda versions, I'm looking for the nvidia driver version for example 440.xx or 418.xx .
The reason is, we set an OS environment for the driver, and I remember that old drivers did not support it . Basically they do not support NVIDIA_VISIBLE_DEVICES=all , so I'm trying to see if that's the case, then we could add fix .
