FYI matplotlib imshow will create a debug image, and on complex plots the plot might get converted to image. (But shown under the plots section). All in all you might not be aware of it, but you are uploading image to your files server
Now that we have the free tier (a.k.a community server) we might change the default behavior.
The idea is always to allow an easy way to on-board and test the system.
ReassuredTiger98
BTW: what's the scenario where your machine reverted to the default configuration (i.e. no configuration file) ?
Hi ShallowCat10
What's the TB your are using?
Is this example working correctly for you?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/tensorboard_pr_curve.py
ModelCheckpoint('best_model', save_best_only=True)
That worked for me now, what's the diff
Also, I did also accidentally create multiple services via
clearml-serving create --name <> --project <>
, and cannot get rid of them.
find them in the UI (you can go to All Projects, then in the search bar put their UIs) and archive / delete them
So the part that is confusing to me is: None
who / how is this Task running? did you also setup a "service" queue (as part of the clear...
Done HandsomeCrow5 +1 added 🙂
btw: if you feel you can share how your reports looks like (screen shot is great), this will greatly help in supporting this feature , thanks
Only those components that are imported in the script where the pipeline is defined would be included in the DAG plot, is that right?
Actually the way it works currently (and we might change it if there is a better way), every time you call PipelineDecorator.component
a new component is stored on the Pipeline Task, which is later translated into DaG graph and Table (next version will have a very nice UI to display / edit them).
The idea is first to have a representation of the p...
Sorry @<1798525199860109312:profile|IntriguedGoldfish14> just noticed your reply
Yes two inference container, running simultaneously on the cluster. As you said, each one with its own environment (assuming here that the requirements of the models collide)
Make sense
so the docker didnt use the dns of the host?
I'm assuming it is not configured on your DNS, otherwise it would have been resolved...
the separate experiments are not starting back at iteration 0
What do you mean by that?
BattyLion34 if everything is installed and used to work, what's the difference from the previous run that worked ?
(You can compare in th UI the working vs non-working, and check the installed packages, it would highlight the diff, maybe the answer is there)
but the requirement was already satisfied.
I'm assuming it is satisfied on the host python environment, do notice that the agent is creating a new clean venv for each experiment. If you are not running in docker-mode, then you ca...
If I edit directly the OmegaConf in the UI than the port changes correctly
This will only work if you change the Hydra/allow_omegaconf_edit to True in the UI. Did you?
Hi SucculentBeetle7
The parameters passed to add_step
need to contain the section name (maybe we should warn if it is not there, I'll see if we can add it).
So maybe something like:{'Args/param1', 1}
Or{'General/param1', 1}
Can you verify it solves the issue?
${PWD} works!
This will be resolved every call to Task.init (so I would recommend against it), how about "$HOME/" ?
Change to add_missing_installed_packages=False,
here, and see if you end up with git diff
https://github.com/allegroai/clearml/blob/1f82b0c4010799be6157f5c845c7f6ac48e71c0c/clearml/backend_interface/task/populate.py#L158
what do you mean? the same env for all components ? if they are using/importing exactly the same packages, and using the same container, then yes it could
DistressedGoat23
We are running a hyperparameter tuning (using some cv) which might take a long time and might be even aborted unexpectedly due to machine resources.
We therefore want to see the progress
On the HPO Task itself (not the individual experiments the one controlling it all) there is the global progress of the optimization metric, is this what you are looking for ? Am I missing something?
Hi @<1556450111259676672:profile|PlainSeaurchin97>
You mean instead of the parallel coordinates ?
None
DefeatedOstrich93 what do you mean by "I am wondering why do I need to create files before applying diff ?"git diff
will not list files unless their are added (they are marked as "untracked") think temp files logs etc. until you add a file to git it will basically ignore that file. Make sense ?
Hi GrotesqueMonkey62 any chance you can be a bit more specific? Maybe a screen grab?
Here is how it works, if you look at an individual experiment scalars are grouped by title (i.e. multiple series on the same graph if they have the same title)
When comparing experiments, any unique combination of title/series will get its own graph, then the different series on the graph are the experiments themselves.
Where do you think the problem lays ?
when you clone the Task, it might be before it is done syncying git / packages.
Also, since you are using 0.16 you have to have a section name (Args or General etc.)
How will task b use the parameters ? (argparser / connect dict?)
Hi ConvolutedSealion94
You can archive / delete the SERVING-CONTROL-PLANE
Task from the DevOps project in the UI.
Do notice you will need to make sure the clearml-serving is updated with a new sesison ID or remove it (i.e. take down the pods / docker-compose)
Make sense ?
Were you able to interact with the service that was spinned? (how was it spinned?)
The imports inside the functions are because the function itself becomes a stand-alone job running on a remote machine, not the entire pipeline code. This also automatically picks packages to be installed on the remote machine. Make sense?
Hi LivelyLion31 I missed your S3 question, apologies. What did you guys end up doing?
BTW you could always upload the entire TB log folder as artifact, it's simple task.upload_artifact('tensorboard', './tblogsfolder')
Agreed, MotionlessCoral18 could you open a feature request on the clearml-agent repo please? (I really do not want this feature to get lost, and I'm with you on the importance, lets' make sure we have it configured from the outside)
BroadMole98 as one can expect long answer as well 🙂
I have a workflow with 19000 job nodes in it.
wow, 19k job nodes? as in a single pipeline 19k steps?
The main idea of the trains-agent is to allow multi-node workloads, and creating pipelines on top of a scheduler without worrying about docker packaging (done automatically for you), and to have a proper scheduler with priority (that is missing from k8s)
If the first step is just "logging" all the steps, you can easily add "Task...