
Reputation
Badges 1
25 × Eureka!Hi SteadySeagull18
However, it seems to be entirely hanging here in the "Running" state.
Did you set a an agent to listen to the "services" queue ?
Someone needs to run the pipeline logic itself, it is sometimes part of the clearml-server deployment but not a mist
PompousParrot44 , so you mean like a base conda env?
Configuring trains-agent to use conda is done here:
https://github.com/allegroai/trains-agent/blob/699d13bbb34649c7e5337b4187cda59b7fa6fd33/docs/trains.conf#L44
Then for every experiment trains-agent will create a new conda environment based on the requirements of that experiment.
You can tell it to inherit the base conda env (or the one it is running from, I think) by settingsystem_site_packages: true
https://github.com/allegroai/tr...
Hi @<1673501397007470592:profile|RelievedDuck3>
how can I configure my alerts to be notified when the distribution of my metrics (variables) changes on my heatmaps?
This can be done inside grafana, here is a simple example:
None
Specifically you need to create a new metric that is the distance of current distribution (i.e. heatmap) from the previous window), then on the distance metric, ...
Hi UnevenDolphin73
Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?
You can add a link as an artifact, that is probably the easiest:tasl.upload_artifact(name="just link", artifact_object="
")
EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with
.datasets
. This creates several annoyances that I...
Hi @<1603198134261911552:profile|ColossalReindeer77>
Hello! does anyone know how to do
HPO
when your parameters are in a
Hydra
Basically hydra parameters are overridden with "Hydra/param"
(this is equivalent to the "override" option of hydra in CLI)
OmegaConf
is the configuration, the overrides are in the Hyperparameters "Hydra" section
None
Hi @<1541954607595393024:profile|BattyCrocodile47>
It seems to me that instead of implementing webhooks to react to things like adding a tag to a model
Did you look at this example ?
None
Can we straightforwardly stream ALL ClearML events to another system?
what would you consider an event?
The "basic" object type is Task, a state in task is changed via an api call, would that be an e...
Hi SpotlessFish46 ,
Is the artifact already in S3 ?
Is the S3 configured as the default files_server in the trains.conf
?
You can always use the StorageManager upload to wherever and register the url on the artifacts.
You can also programmatically change the artifact destination server to S3, then upload the artifact as usual.
What would be the best natch for you?
I guess only if autoscaling is used (one worker one machine)?
yes, basically depending on how you set autoscaling / k8s integration 🙂
Yes, I find myself trying to select "points" on the overview tab. And I find myself wanting to see more interesting info in the tooltip.
Yep that's a very good point.
The Overview panel would be extremely well suited for the task of selecting a number of projects for comparing them.
So what you are saying, this could be a way to multi select experiments for detailed comparison (i.e. selecting the "dots" on the overview graph), is this what you had in mind?
Hey LethalDolphin75 , when it works, could you PR it?
You will be able to set it.
You will just not see the output in the console log , but everything is running and being executed
WickedGoat98
The webUI will look like the demo server 🙂https://demoapp.trains.allegro.ai/
2. curl http://server-ip:8008 should return something like:{"meta":{"id":"78a9dc77081348e2930d1f429fd7e092","trx":"78a9dc77081348e2930d1f429fd7e092","endpoint":{"name":"","requested_version":1.0,"actual_version":null},"result_code":400,"result_subcode":0,"result_msg":"Invalid request path /","error_stack":null},"data":{}}%
3. curl http://server-ip:8080 should return something like:
` <!d...
Hi @<1614069770586427392:profile|FlutteringFrog26>
So since you have the Task id. you do:
task = Task.get_task("task id here")
Then to get the models
models = task.models["output]
the models is a list And a dict, if you want the lats one you do last_model = models[-1]
if you know the best model name you do model = models["best model"]
(notice the model name is the exact one you see in the UI. Once you have the model object you can get a copy with `model.get_lo...
Glad to hear!
(yeah @<1603198134261911552:profile|ColossalReindeer77> I'm with you the override is not intuitive, I'll pass the info to the technical writers, hopefully they can find a way to make it easier to understand)
Gitlab has support for S3 based cache btw.
This might still be considered "slow" compared to local-dist/cluster mount
Would adding support for some sort of post task script help? Is something already there?
Interesting, can you expand on the use case? (currently there is only pre-task script, for setup)
Sorry @<1689446563463565312:profile|SmallTurkey79> just notice your reply
Hmm so I know the enterprise version has a built-in support for slurm, which would remove the need to deploy agents on the slurm cluster.
What you can do is on the SLURM login server (i.e. a machine that can run sbatch), write a simple script that pulls the Task ID from the queue and calls sbatch with clearml-agent execute --id <task_id_here>
, would this be agood solution
Is it possible to make a connection to a S3 bucket via this authentication method with the open source version on EKS?
Hi BoredBluewhale23
In your setup, are we talking about agents running inside the Kubernetes cluster, or clients connecting from their own machine ?
Hi SkinnyPanda43
In your local machine do not pass output_uri at all, so nothing will be uploaded.
On the agent's configuration file configure, default_output_uri
to the S3 bucket
(Notice you can always override them in the UI, see the bottom of the execution Tab)
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L312
The difference is that running the agent in daemon mode, means the "daemon" itself is a job in SLURM.
What I was saying is pulling jobs from the clearml queue and then pushing them as individual SLURM jobs, does that make sense ?
Sure @<1523720500038078464:profile|MotionlessSeagull22> DM me 🙂
You can get a mutable copy of the entire dataset (original version), with get_mutable_copy()
Then change the files on the returned directory, then create a new Dataset with the parent dataset as the original verison, then sync the folder.
You can also just update the specific file (without needing to download the entire original version)
Hi @<1661542579272945664:profile|SaltySpider22>
question 1: are parallel writes to a dataset with the same version possible?
When you are saying parallel what do you mean? from multiple machines ?
Whats the recommended way to append the dataset in a future version?
Once a dataset was finalized the only way to add files is to add another version that inherits from the previous one (i.e. the finalized version becomes the parent of the new version)
If you are worried about multip...
Hi @<1587615463670550528:profile|DepravedDolphin12>
Is there anyway to get the id of the pipeline using pipeline name?
In the UI top right "details" panel should have the Pipeline ID
Is this what you are looking for ?
Hi DeliciousBluewhale87
Yes that should have worked, can you verify the task status ?
Print(Task.get_task(...).get_status())
Hi MysteriousBee56 ,
what do you mean by:
Can we upload our project repository to trains server?