Reputation
Badges 1
282 × Eureka!thanks. That seems to work. I got a question, does it save the best model or the model in the last epoch?
Hi HelpfulDeer76 , I'm facing similar issues. Would you mind describing in detail how you deploy clearml-agent? Is it running as a pod on k8s?
[root@2c7498711bef elasticsearch]# curl
`
{
"index" : "events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2021-05-22T11:33:38.932Z",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisi...
[root@2c7498711bef elasticsearch]# curl -XGET
`
yellow open events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b 4hAFNtGkRr-CHNGnUYfbTA 1 1 4724 271 660.9kb 660.9kb
yellow open events-log-d1bd92a3b039400cbafc60a7a5b1e52b M3qgFy1HRU2PibDOr1YOdw 1 1 1221 20 1013.6kb 1013.6kb
red open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-05 EQK8mnlhRxCrrKK3clcUFA 1 1
red open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_...
Thanks that did solve the problem, the tasks are running again.
and yes, there are stuff in there. In fact its been running for a few weeks with no issue. This appears to have happened after i added new workers, though i can't be sure this is the cause. Is there a limit to the number of workers that i can add for community edition?
yes, previously run experiments. I will just kill clearml-elastic container if that may solve the problem.
I'm not familiar with elastic. What role does elastic play in ClearML?
docker exec clearml-elastic curl
zsh: no matches found:
It's a local deployment. I was only presented with username without a need to enter passwords. When I'm in, I don't see an option in my profile to set a password as well. Neither is there integration with ldap for example.
Hi CostlyOstrich36 , What you described is task. I was referring to the pipeline controller.
Hi. If we disable the API service, how will it affect the system? How do we disable?
The first is probably done using pipeline controllers, the second using Datasets or HyperDatasets. Its not very clear how the last one is achieved, especially on the searchable data catalogs.
Create immutable and differentiable versions on-prem or in the cloud with our data agnostic solution.
Share data across R&D teams with searchable data catalogs available on any environment.
Hi SuccessfulKoala55 , would they need the fileserver to route to minio then? E.g.
This will ensure that any actions by clearml-data and models are saved into the S3 object store.
api {
files_server: s3://ecs.ai:80/clearml-data/default
}
aws {
s3 {
credentials {
host: http://ecs.ai:80
## Insert the iam credentials provided by your SAs here.
}
}
}
But if user forgot to do above, they will be saved on ClearML server. If I switch off f...
Ok. Problem was resolved with latest version of clearml-agent and clearml.
Hi, it's a preference from my developers. They preferred that the they install the python libraries into the images, load them up into the registry. In other words, they prefer to have libraries installed at image time.
Any comments on using the global python libraries without the need to 'pip install' anything?
This is strange then, is it possible for clearml logs to register successfully saving into a S3 storage when actually it isn't? For example, i've seen in past experiences with certain S3 client that saved onto a local folder called 's3:/' instead of putting it on S3 storage itself.
Hi TimelyPenguin76 ,
If you notice in the last screenshot, it state the bucket name to be http://ecs.ai . It then it tries to open http://s3.amazonaws.com/ecs.ai/clearml-models/artifact/uploading_file?X-Amz-Algorithm= ....
I didn't track the version on this change in behaviour. But last I tried it was able to download the content after I provide the credentials.
I also see this on my logs, noting that the config is read in but its still printing the supposedly hidden keys on the logs and UI.agent.hide_docker_command_env_vars.enabled = true agent.hide_docker_command_env_vars.extra_keys.0='TRAINS_AGENT_GIT_USER' ..... docker_cmd=harbor.ai/public/detectron2:v3 --env TRAINS_AGENT_GIT_USER=gituser
Ah ok. So it will be fixed on the ClearML server web UI as well? (See screenshots).
ok, i'll wait till i get my hands on vault then. thanks.
Hi, by agent logs i suppose you meant the logs from the ClearML server console panel?
Hi just wondering if I did something wrong here. Would k8s-glue be the reason is not working? I'm purchasing the enterprise version and if vault has the same problem it'll be a big issue.