so i end up having to clone the other ones manually in my code
Hi ConvolutedChicken69
Yes the problem is that there is no standard for multi repo environments
The best solution I can come up with is using git-submodules or packaging the auxiliary repo as wheels. wdyt?
OddAlligator72 FYI, in you current code you can always doif use_trains: from trains import Task Task.init()Might be easier 😉
Hi DeliciousBluewhale87
When you say "workflow orchestration", do you mean like a pipeline automation ?
Hi DrabCockroach54
Do we know if gpu_0_mem_usage and gpu_0_mem_used_gb, both shows current GPU usage?
the first is percentage used (memory % used at any specific moment) and the second is memory used GiB , both for the video memory
How to know from this how much GPU is reserved for the task if this task is in progress?
What do you mean by how much is reserved ? Are you running with an agent?
Then you have to pass the .ssh into the remote server, probably the easiest is to have it in the "extra bash script"
I would like to use ClearML together with Hydra multirun sweeps, but I’m having some difficulties with the configuration of tasks.
Hi SoreHorse95
In theory that should work out of the box, why do you need to manually create a Task (as opposed to just have Task.init call inside the code) ?
Hi FunnyTurkey96
Any chance you can try to run with the latest form GitHub (i just tested your code and it seemed to work on my machine).pip install git+
now, I need to pass a variable to the Preprocess class
you mean for the construction ?
GiganticTurtle0
That definitely makes sense. Where can I specify callbacks in theÂ
PipelineDecorator
 API?
Hmm there isn't one actually... (the interface I was thinking about was PipelineConroller ...)
Would it make sense to throw an exception in the pipeline execution code?
BTW: I just verified, if the pipeline step fails an exception is raised (ValueError)
Import Error sounds so out of place it should not be a problem :)
What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas
So you want the DS to manually tell the baseclasss what to store ?
then the base class will store it for them, for example with joblib , is this the...
Hi @<1569858449813016576:profile|JumpyRaven4>
- The gunicorn logs do not show anything including any error or trace of the 502 only siege reports the 502 as well as the ALB.Is this an ALB or an ELB ?
What's the timeout its configured?
Do you have GPU instances as well? what's theclearml-serving-inferencedocker version ?
JitteryCoyote63 look for the latest RC it should have the fix (output_uri=False) 1.7.3rc1
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
I see now, give me a minute I'll check
Hmm this is odd, when you press on the parent dataset in the UI, and go to full-details, then the INFO tab. Can you copy here everything ?
I can then programmatically choose which file to import with importlib. Is there a way to tell clearml programmatically to analyze the files, so it can built up the requirements correctly?
Sadly no 😞
It analyzes the running code, then if it decides it is not a self contained script it will analyze the entire repo ...
I just saw thatÂ
Task.create
 takes
Task.create is Not Task.init. It is meant to allow you to create new Tasks (think Jobs) from ...
Hmm I just tested on the community version and it seems to work there, Let me check with frontend guys. Can you verify it works for you on https://app.community.clear.ml/ ?
Hi AbruptWorm50
the second "epoch loss" is the scalar for the "validation" process (see "validation: epoch loss" series is actually the TF file/folder prefix automatically added)
Make sense ?
I can read them programmatically using tensorboard and the log the using clearml logger,
StaleButterfly40 this will be a great script to put somewhere (I'm sure you are not the only one with this problem). Maybe put it as a GitHub issue ? wdyt ?
ConvolutedChicken69
basically the cleamrl-data needs to store an immutable copy of the delta changes per version, if the files are already uploaded, there is a good chance they could be modified...
So in order to make sure we you have a clean immutable copy, it will always upload the data (notice it also packages everything into a single zip file, so it is easy to manage).
ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)
hows does this work with HPO?
The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice...
ContemplativeGoat37
http://1.it seems the DNS resolving to the server fails? (Temporary failure in name resolution) Is this running on an agent, or manually ? "clearml.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###" Is this you manually aborting the Task or is it aborting itslef due to the connectivity ?
4. what's the clearml/clearml-agent versions ?
oh sorry my bad, then you probably need to define all OS environment variable for python temp folder for the agent (the Task process itself is a child process so it will inherit it)
TMPDIR/new/tmp TMP=/new/tmp TEMP=/new/tmp clearml-agent daemon ...
They inherit from one another, so it does make sense. Also the add_tags is on the "main" Task and not the backend parent
Any idea where that could come from? Could we turn off the local logging as well - in these kinds of runs we don’t need it?
It is supposed to create it automatically... I tested with other examples (clearml version 1.7.3rc1) everything seems to work
What am I missing? how do we recreate the issue ? can you verify it is still not working with the latest RC?
Click on the "k8s_schedule" queue, then on the right hand side, you should see your Task, click on it, it will open the Task page. There click on the "Info" Tab, there look for "STATUS MESSAGE" and "STATUS REASON". What do you have there?
I like the idea of using the timeit interface, and I think we could actually hack it to do most of the heavy lifting for us 🙂
