JitteryCoyote63 what am I missing?
What are the errors you are getting (with / without the envs)
Hi @<1578555761724755968:profile|GrievingKoala83>
mount s3 as a cache folder
I'm not sure that would be fast enough for cache ...
How to override
/root/.cache/pip
path?
in your clearml.conf fille:
None
then set it to your PV
the latter is an ec2 instance
and the agent fails to install on the ec2 machine ?
Hi ColossalAnt7 , I think we run into it on a few dockers, I believe the bug was fixed in the latest trains-agent
RC. Could you verify please ?
No worries, and I will make sure we output a warning if section names are not used π
So the naming is a by product of the many TB created (one per experiment), if you add different naming ot the TB files, then this is what you'll be seeing in the UI. Make sense ?
VexedCat68 both are valid. In case the step was cached (i.e. already executed) the node.job will be None, so it is probably safer to get the Task based on the "executed" field which stores the Task ID used.
Hmm StrangePelican34
Can you verify you call Task.init before TB is created ? (basically at the start of everything)
where is it persisted? if I have multiple sessions I want to persist, is that possible?
On the file server, yeah it should be support that, you can specify the --continue-session to continue a previously used one.
Notice it does delete older "snapshots" (i.e. previous workspace) when you are continuing a session (use --disable-session-cleanup
to disable it)
I am trying to use the
configuration vault
option but it doesn't seem to apply the variables I am using.
Hi EmbarrassedSpider34 I think this is an enterprise feature...
Manged to make the credentials attached to the configuration when the task is spinned,
I'm assuming env variables ?
BTW: I tested the code you previously attached, and it showed the plot in the "Plots" section
(Tested with latest trains from GitHub)
Your code should have worked, i.e. you should see the 'model.h5' in the artifacts tab. What do you have there?
It should look something like this one:
https://demoapp.trains.allegro.ai/projects/531785e122644ca5b85b2e19b0321def/experiments/e185cf31b2634e95abc7f9fbdef60e0f/artifacts/output-model
BTW:
To manually register any model:
from trains import Task, OutputModel task = Task.init('examples', 'my model') OutputModel().update_weights('my_best_model.h5')
however, this will also turn off metricsΒ
For the sake of future readers, let me clarify on this one, turning it off auto_connect_frameworks={'pytorch': False}
only effects the auto logging of torch.save/load
(side note: the reason is pytorch does not have built in metric reporting, i.e. it is usually done manually and these days most probably with tensorboard, for example lightning / ignite will use tensorboard as default metric reporting),
btw: what's the OS and python version?
AbruptWorm50 can you send full image (X axis is missing from the graph)
Hi @<1547028116780617728:profile|TimelyRabbit96>
It should process the new request A (this is a multi threading / async implementation)
Is this consistent with what you are seeing ?
So I might be a bit out of sync, but I think there should be Triton serving and OpenVino serving built into it (or at least in progress).
Hi CleanPigeon16
can I make the steps in the pipeline use the latest commit in the branch?
Yes:
manually clone the stesp's Task (in the UI), and in the UI edit the Execution section and change to "last sommit on branch" and specify the branch name programmatically (as the above, clone+edit)
ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found
Seems like the "run_experiment" step is not defined. Could that be ...
Hi SuperiorDucks36
you have such a great and clear GUI
π
I personally would love to do it with a CLI
Actually a lot of stuff are harder to get from UI (like current state of your local repository etc.) But I think your point stands π We will start with CLI, because it is faster to deploy/iterate, then when you guys say this is a winner we will have a wizard in the UI.
What do you think?
Fixed in pip install clearml==1.8.1rc0
π
HealthyStarfish45 you mean like replace the debug image viewer with custom widget ?
For the images themselves, you can get heir urls, then embed that in your static html.
You could also have your html talk directly with the server REST API.
What did you have in mind?
ResponsiveCamel97
BTW: any reason not to allow this flexibility ?
ElegantKangaroo44 I tried to reproduce the "services mode" issue with no success. If it happens again let me know maybe will better understand how it happened (i.e. the "master" trains-agent gets stuck for some reason)
I'll try to find the link...
If I have access to the logs, python env and git commits, is there an API to log those to the experiments too?
Sure:task.update_task
see here:
https://clear.ml/docs/latest/docs/references/sdk/task#update_task
example:task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}})
The easiest way to get all the different sections (they should be relatively self explanatory) is calling task.export_task() which returns a dict with all the fields yo...
single task in the DAG is an entire ClearML
pipeline
.
just making sure detials are not lost, "entire ClearML pipeline ." : the pipeline logic is process A running on machine AA.
Every step of that pipeline can be (1) subprocess, but that means the exact same environement is used for everything, (2) The DEFAULT behavior, each step B is running on a different machine BB.
The non-ClearML steps would orchestrate putting messages into a queue, doing retry logic, and tr...
Hi SourSwallow36
What do you man by Log each experiment separately ? How would you differentiate between them?
You can install it, and after the wizard is done uninstall it, if you want to keep using the trains from the git clone.