![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/AgitatedDove14.png)
Reputation
Badges 1
25 × Eureka!from your jupyterlab can you do:!curl
Ohh so even easier:print(client.workers.get_all())
It's just the print (_ repr _) not showing the datafor w in client.workers.get_all(): print(w.data)
Hmm yes we should probably provide metrics:client.workers.get_stats(..., items=[dict(key='cpu_usage'), dict(key='gpu_usage')])
SmarmySeaurchin8
updated_tags = task.tags
updated_tags.remove(tag)
task.tags = updated_tags
When you say status, what do you mean? Is it active? Running a task?
Let me check, it was supposed to be automatically aborted
Hi @<1610083503607648256:profile|DiminutiveToad80>
do you have a full log? can you share the code you are trying to run?
Thanks @<1569496075083976704:profile|SweetShells3> ! let me see if I can reproduce the issue
But essentially Prefect also has agents to run jobs on machines where the processes run (which seems to be exactly the same model as in ClearML),
Yes ait is conceptually very similar
this data is highly regulated data, ...
The main difference that with ClearML the agents are running on Your machines (either local or on Your cloud account) the clearml-server does not actually have access to the data streaming through it.
Does that make sense ?
Hi DepressedChimpanzee34
This is not a query call, this is a reporting call. see docs below
https://clear.ml/docs/latest/docs/references/api/workers#post-workersstatus_report
It is used by the worker to report its own status.
I think this is what you are looking for:
https://clear.ml/docs/latest/docs/references/api/workers#post-workersget_stats
I see... In the triton pod, when you run it, it should print the combined pbtxt. Can you print both before/after ones? so that we could compare ?
Thanks @<1569496075083976704:profile|SweetShells3> for bumping it!
Let me check where it stands, I think I remember a fix...
Yes you can drag it in the UI :) it's a new feature in v1
VictoriousPenguin97 I'm not sure there is an easy solution, basically you have to edit both MongoDB (artifacts) and Elastic (think debug samples) 😞
You are correct, the agent will clone the git and install the requirements, as written in the task installed packages section. Regrading the git branch, notice it will pull the specific commit id as stated in the execution section, and it will apply any uncommitted changes. You can edit the execution section and change the commit to the latest in a specific version (you should probably also clear the uncommitted changes of you do that)
PlainSquid19 No worries 🙂
btw: If you could see if the mangling of workings / script path happens with the latest trains, that will be appreciated, because if you were running the script in the first place from "stages/" then the trains should have caught it ...
Click on the "k8s_schedule" queue, then on the right hand side, you should see your Task, click on it, it will open the Task page. There click on the "Info" Tab, there look for "STATUS MESSAGE" and "STATUS REASON". What do you have there?
So it sounds as if for some reason calling Task.init inide a notebook on your jupyterhub is not detecting the notebook.
Is there anything special about the jupyterhub deployment ? how is it deployed ? is it password protected ? is this reproducible ?
RipeGoose2 yes that will work 🙂
That said, we should probably fix the S3 credentials popup 😉
True, this is exactly the reason. That said, you can always manually add it. You can see the default values : https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf
is no agent listening to the "k8s_scheduler"
There should not be one, this is purely "virtual" , so users understand the k8s cluster is spinning their pod (sometimes it takes time, imagine EKS etc. , just visibility)
unfortunately I can't get info from the cluster
You should be able the pod in the cluster no?!
What's the Task Info panel say, can you share a screen shot ?
RipeGoose2 yes, the UI cannot embed the html yet, but if you go click on the link itself it will open the html in a new tab.
Could you verify it works ?
Hi JitteryCoyote63
The easiest is to inherit the ResourceMonitor class and change the default logging rate (you could also disable some of the metrics).
https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/task.py#L565
Then pass the new class to Task.init as auto_resource_monitoring
So basically the APIClient is a pythonic interface to the RestAPI, so you can do the following
See if this one works# stats from he last 60 seconds for worker in workers: print(client.workers.get_stats(worker_ids=[worker.id], from_date=int(time()-60),to_date=int(time()), interval=60, ))
That sounds like an issue with "working dir" , check the "Execution" "Working Directory" field.
'.' means the root of the git repository
'subfolder' means run the script from the subfolder etc. also make sure that the script path is adjusted accordingly.
btw: Trains should have filled in all the correct paths... If you have time get the latest trains (0.14.3) and run again see if the problem consts, we should probably fix that bug 🙂
BTW trains agent will not delete the venv until the next run, so you can check exactly what's missing there