The "Optimizer task" will continue to run as long as there are sub-Tasks it launched.
Is anything else running/pending ?
I think there was an issue with the entire .ml domain name (at least for some dns providers)
sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the assÂ
No worries I totally feel you.
As a quick hack in the actual code of the Task itself, is it reasonable to have:task = Task.init(....) task.set_initial_iteration(0)
StorageManager
Oh it has no remove 😞StorageHelper.delete is the only way
Hi JitteryCoyote63
So the main issue is backing up the elastic & mongo DB while they are running, once they are backed/restored, the server will spin as is. (Let me check regrading the reddis, it might be that since it is used for caching there is no need to actually backup the content only the configuration)
However, when we try to access the webapi from remote through the VPN we fail. The VPN logs don't show any blockage. Any ideas?
Maybe the VPN firewall blocks http connections ? or it might be BrightRabbit75 case, that sounds quite logical to never show anywhere
ReassuredTiger98 you mean when calling clearml-init ? or default value?
And if this is the case, that would explain the empty elastic as well
That is a bit odd, But SSH keys have to have a specific chmod flags for them to work (security issues)
What was the error ?
Funny this was mentioned just today 🙂
https://clearml.slack.com/archives/CTK20V944/p1620664770492400
Basically if you want to always ignore the "installed packages" in your clearml.conf, put:agent.package_manager.force_repo_requirements_txt = true
This would be my only improvement, otherwise awesome!!!output_model.update_weights(weights_filename=os.path.join(training_data_path, 'runs', 'train', 'yolov5s6_results', 'weights', 'best.onnx'))
FrustratingWalrus87 If you need active one, I think there is currently no alternative to TB tSNE 🙂 it is truly great 🙂
That said you can use plotly for the graph:
https://plotly.com/python/t-sne-and-umap-projections/#project-data-into-3d-with-tsne-and-pxscatter3d
and report it to ClearML with Logger report_plotly :
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/examples/reporting/plotly_reporting.py#L20
ohh, the copy paste thing when you generate credentials ?
do you know how can i save all the logs and all the metric images?
These are stored into clearml-server, no? what am I missing ?
Is it being used to ssh to the instance?
It is used for the SSH client so it "knows" the SSH server (does that make sense) ?
Hi SubstantialElk6ClearML-Data doesn't actually "load" the data, it brings it locally and returns a folder with all your data files, from that point onward, it's up to your code to load it to the framework. Make sense ?
Hm GiganticTurtle0 let me check quickly it
EFS get downloaded to the k8 pod local volume?
EFS is an Amazon service that mounts a persistent FS into ec2 instances, I believe they have support for k8s as a service as well, which would make it kind of like a PV only as a service.
Does that make sense ?
But in credentials creation it still shows 8008. Are there any other places in docker-compose.yml where port from 8008 to 8011 should be replaced?
I think there is a way to "tell" it what to out there, not sure:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#configuration-files
PungentLouse55 from the screenshot I assume the experiment template you are trying to optimize is not the one from the trains/examples 🙂
In that case, and based on the screenshots, the prefix is "Args/" as this is the section name.
Regrading objective metric, again based on your screenshots:objective_metric_title="Accuracy" objective_metric_series="Validation"Make sense ?
If you want to rename it (any pipeline), click on the "Full details" in the "Run Info" (right hand side panel), then in the full detail of the Pipeline Task you will be able to rename the pipeline execution
(Is renaming useful? should we add a right click to rename ?)
I have an idea, can you try with:task = Task.init(..., reuse_last_task_id=False)I have a suspicion it starts the Tasks in parallel, and the "reuse_last_task_id" causes them to "reuse the same task locally" which makes them overwrite the configuration of one another.
The quickest workaround would be, In your final code just do something like:my_params_for_hpo = {'key': omegaconf.key} task.connect(my_params_for_hpo, name='hpo_params') call_training_with_value(my_params_for_hpo['key'])This will initialize the my_params_for_hpo with the values from OmegaConf, and allow you to override them in the hyperparameyter section (task.connect is two, in manual it stores the data on the Task, in agent mode, it takes the values from the Task and puts them ba...
If you take a look here, the returned objects are automatically serialized and stored on the files server or object storage, and also deserialized when passed to the next step.
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
You can of course do the same manually
not sure what is the "right way" 🙂
But I do pkill -f "trains-agent --gpus 0" This will kill a process that started "trains-agent --gpus 0" Notice it matches the cmd pattern so it has to match the way you executed the agent. You can check it with ps -Af | grep trains-agent
Can my request be made as new feature so that we can tag same type of graphs under one main tag
Sure, open a Git Issue :)
Interesting, do you think you could PR a "fixed" version ?
https://github.com/allegroai/clearml-web/blob/2b6aa6043c3f36e3349c6fe7235b77a3fddd[…]app/webapp-common/shared/single-graph/single-graph.component.ts
RobustGoldfish9
I think you need to set the trains-agent docker to be aware of the host, so it knows how to mount data/cache/configurations into the sibling docker
It should look something like:TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains"So if running a docker:docker run -e TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains" ...
IrateBee40
Check the first steps here:
https://clear.ml/docs/latest/docs/getting_started/ds/ds_first_steps
(Basically you have to generate credentials / configure you machine so it knows where the server is and how to access it)
Make sense ?
Hi PanickyMoth78 , an RC is out with a fix.
pip install clearml==1.6.3rc0
Thank you for noticing the graph issue.
Btw do notice that since data is being changed inside the controller loop the parents are still kind of odd, because it is not clear to the logic the source of the data so it assumes it depends on the current state (i.e. all the leaves)