Reputation
Badges 1
25 × Eureka!And is there an easy way to get all the metrics associated with a project?
Metrics are per Task, but you can get the min/max/last of all the tasks in a project. Is that it?
time-based, dataset creation, model publish (tag),
Anything you think is missing ?
Good news a dedicated class for exactly that will be out in a few days π
Basically task scheduler and task trigger scheduler, running as a service cloning/launching tasks either based on time (cron alike) or based on a trigger).
wdyt?
Hi FlutteringWorm14
Is there some way to limit that?
What do you mean by that? are you referring to the Free tier ?
Hi PompousBeetle71
I remember it was an issue, but it was solved a while ago. Which Trains version are you using?
UnevenDolphin73 if the repo does not include a poetry file it will revert to pip
Hi GreasyPenguin14
However the cleanup service is also running in a docker container. How is it possible that the cleanup service has access and can remove these model checkpoints?
The easiest solution is to launch the cleanup script with a mount point from the storage directory, to inside the container ( -v <host_folder>:<container_folder> )
The other option, which clearml version 1.0 and above supports, is using the Task.delete, that now supports deleting the artifacts and mod...
Let me check... I think you might need to docker exec
Anyhow, I would start by upgrading the server itself.
Sounds good?
Hi ElegantCoyote26
what's the clearml version you are using?
Here you go π
(using trains_agent for easier all data access)from trains_agent import APIClient client = APIClient() log_events = client.events.get_scalar_metric_data(task='11223344aabbcc', metric='valid_average_dice_epoch') print(log_events)
ClumsyElephant70
Could it be virtualenv package is not installed on the host machine ?
(From the log it seems you are running in venv mode, is that correct?)
Hi GrittyCormorant73
At the end everything goes through session.send, you can add a print there?
btw: why would you print all the requests? what are we debugging here?
Is this information stored anywhere or do I need to explicitly log this data somehow?
On the creating Task along side all the other reports.
Basically each model stores its creating Task (Task ID), using the Task ID you can query all the metrics reported by the task
Hi @<1528908687685455872:profile|MassiveBat21>
However
no useful
template
is created for down stream executions - the source code template is all messed up,
Interesting, could you provide the code that is "created", or even better some way to reproduce it ? It sounds like sort of a bug? or maybe a feature support that is missing.
My question is - what is a best practice in this case to be able to run exported scripts (python code not made availa...
I'm guessing this is done through code-server?
correct
I'm currently rolling a JupyterHub instance (multiuser, with codeserver inside) on the same machine as clearml-server. Thatβs where tasks are executed etc. so, all browser dev env.
Yeah, the idea with clearml-session each user can self serve themselves the container that works best for them. With a jupyterhub they start to step on each other's toes very quickly ...
I think you are correct, this is odd, let me check ...
is it planned to add a multicursor in the future?
CheerfulGorilla72 can you expand? what do you mean by multicursor ?
based on this one:
https://stackoverflow.com/questions/31436407/git-ls-remote-returns-fatal-no-remote-configured-to-list-refs-from
I think this is a specific issue of the local git repo configuration, can you verify
(btw: I tested with git 2.17.1 git ls-remote --get-url will return the remote url, without an error)
Yeah.. that should have worked ...
What's the exact error you are getting ?
Long story short, not any longer (in previous versions of k8s it was possible, but after the runtime container change it is not supported)
Yes that makes total sense to me. How about a GitHub issue on the clearml-docs ?
Another point I see is, that in the workers & queses view the GPU usage is not been reported
It should be reported, if it is not, maybe you are running the trains-agent in cpu mode ? (try adding --gpus)
Yes! I checked it should work (it checks if you have load(...) function on the preprocess class and if you do it will use it:
None
def load(local_file)
self._model = joblib.load(local_file_name)
self._preprocess_model = joblib.load(Model(hard_coded_model_id).get_weights())
Hi GiganticTurtle0
ClearML will only list the directly imported packaged (not their requirements), meaning in your case it will only list "tf_funcs" (which you imported).
But I do not think there is a package named "tf_funcs" right ?
controller_object.start_locally()
. Only the pipelinecontroller should be running locally, right?
Correct, do notice that if you are using Pipeline decorator and calling run_locally() the actual the pipeline steps are also executed locally.
which of the two are you using (Tasks as steps, or functions as steps with decorator)?
Full markdown edit on the project so you can create your own reports and share them (you can also put links to the experiments themselves inside the markdown). Notice this is not per experiment reporting (we kind of assumed maintaining a per experiment report is not realistic)
Ohh then YES!
the Task will be closed by the process, and since the process is inside the Jupyter and the notebook kernel is running, it is still running