. Can I get gpu usage over time frame via API also?
task.get_reported_scalars
But this will get you All the scalars, I think the next version of the server supports asking a specific one as well.
How are you implementing the alert monitoring?
Is is a stateless process starting every X min, or is it a state-full process running and monitoring ?
PanickyMoth78 ScantMoth28
With several models saved by the training process (whose code is not task-aware)
You can actually specify which models to be saved:task = Task.init(..., auto_connect_frameworks{'pytorch': ['*.pt']})
https://clear.ml/docs/latest/docs/references/sdk/task#taskinit
This way you can upload only the model you need.
Thanks GentleSwallow91
That's a good tip, where in the docs would you add it?
what does it mean to run the steps locally?
start_locally : means the pipeline code itself (the logic that runs / controls the DAG) runs on the local machine (i.e. no agent), but this control logic creates/clones Tasks and enqueues them, for those Tasks you need an agent to execute them
run_pipeline_steps_locally=True: means the Tasks the pipeline creates, instead of enqueuing them and having an agent runs them, they will be launched on the same local machine (think debugging, other...
I think we should open a GitHub Issue and get some more feedback, maybe we should just add support in the backend side ?
why are all defined components shown in the UI Results/Plots/PipelineDetails/ExecutionDetails section? Shouldn't it make more sense to show only the ones that are used in that pipeline?
They are listed there (because of the decorator, you basically "say" these are steps so they are listed), the actual resolving (i.e. which steps are actually being called) is done in "real-time"
Make sense ?
I specifically set is as empty withย
export_data['script']['requirements'] = {}
ย in order not to reduce overhead during launch. I have everything installed inside the container
Do you have everything inside the container Inside a venv ?
Hi SmallDeer34
Can you try with the latest RC , I think we fixed something with the jupyter/colab/vscode support!pip install clearml==1.0.3rc1
Yes I think the writer.add_figure
somehow crops the image
Hi EnthusiasticCoyote38
But one one process finished it changed task status to complete. May be you know some save way to deal with such situation? Or maybe the best way to check task status before upload object?
Well, you can actually forcefully set the state of the Task to running, then add artifacts, then close it?
would that work?
` my_other_task.reload()
my_other_task.mark_started(force=True)
my_other_task.upload_artifact(...)
my_other_task.flush(wait_for_uploads=True)
my_othe...
We were able to find a stable, free, open source, multiplatform way to do this
You mean to move the data from the gdrive to object storage ? or to just mount the gdrive ?
If it cannot find the Task ID I'm guessing it is trying to connect to the demo server and not your server (i.e. configuration is missing)
Ohh, hmm, that is odd, there should not be a limit there. Let me check ....
CLEARML_AGENT_GIT_USER
Is your git user (on whatever git host/server you are using, GitHub/GitLab/BitBucket etc.)
The 'on-premise' server fails to connect to the ClearML server because of the VPN I think
I think you are correct.
You can quickly test it, try ti run curl
http://local-server:8008 see if that works
I'm looking into the savefig issue, meanwhile you can disable the popup by adding at the top of your code the following:import matplotlib matplotlib.rcParams['backend'] = 'agg' import matplotlib.pyplot matplotlib.pyplot.switch_backend('agg')
command line to the arg parser should be passed via the "Args" section in the Configuration tab.
What is the working directory on the experiment ?
ElegantCoyote26 could you upgrade the docker-compose ?
FranticCormorant35 DeterminedCrab71 please continue the discussion in this thread
Also, for a single parameter you can use:cloned_task.set_parameter(name="Args/artifact_name", value="test-artifact", description="my help text that will appear in the UI next to the value")
This way, you are not overwriting the other parameters, you are adding to them.
(Similar to update_parameters
, only for a single parameter)
BeefyCow3 On the plot itself click on the json download button
Trains is fully open-source, that said properly publishing and maintaining the web client is still on our to do list (I mean there is totally readable JavaScript code packaged in the trains-server and the dockers). It is constantly pushed because there is generally less contributions on the front-end with these kind of projects. That said of you guys are willing to help, it will greatly help in pushing it forward... LivelyLion31 what do you think, would you guys like to help with the fronte...
And same behavior if I make the dependance explicty via the retunr of the first one
Wait, are you saying that in the code above, when you abort "step_a" , then "step_b" is executed ?
Okay, I'll make sure we change the default image to the runtime flavor of nvidia/cuda
. I'm thinking it's generically a kernel gateway issue, but I'm not sure if other platforms are using that yet
The odd thing is that you can access the notebook, but it returns zero kernels ..
Yes the one you create manually is not really of the same "type" as the one you create online, this is why you do not see it there ๐
VexedCat68 yes ๐ you can also pass the parent folder and it will zip the entire subfolders into a single artifact