Reputation
Badges 1
123 × Eureka!WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
But be careful, upgrading is extremely dangerous
Is it possible to split the large elasticsearch indexes? I know elasticsearch has something called rollover, but im not sure that clearml supports this
What you want is to have a service script that cleans up archived tasks, here is what we used: None
i also think that if my package manager is set to uv, then it should only use uv and ignore pip at all
ok, I found it.
Are S3 links supported?
I also see that elastisearch and mongo has some data
hi, thanks for reaching out. Getting desperate here.
Yes, its self hosted
No, only currently running experiments are deleted (task itself is gone, but debug images and models are present in fileserver folder)
What I do see is some random elastisearch errors popping up from time to time
[2024-01-05 09:16:47,707] [9] [WARNING] [elasticsearch] POST
None ` [status:N/A requ...
I was on 1.7 version and now im on latest 1.11
Cant get screenshow yet (copying data), will add later.
What worries me is that config and agent folders are empty. I can reconfigure all agents, no problems.
But where is info about projects stored?
will it be appended in clearml?
"s3" is part of domain to the host
I need the zipping, chunking to manage millions of files
elastisearch also takes like 15GB of ram
I get the same when I copy /opt/clearml/data folder into /mnt/data/clearml/data
well, I connected to mongodb manually and it is empty, loaded with just examples
I also dont have side panel for some reason
Getting errors in elastisearch when deleting tasks, get retunred "cant delete experiment"
Here are my clearml versions and elastisearch taking up 50GB
But it seems like the data is gone, not sure how to get them back
@<1523701070390366208:profile|CostlyOstrich36> Updated webserver and the problem still persists
This is the new stack:
WebApp: 1.15.1-478 • Server: 1.14.1-451 • API: 2.28
notice, we didnt update API (we had running experiments)
we use Ceph Storage Cluster, interface to it is the same as S3
I dont get what I have misconfigured.
The only thing I have not added is "region" field in clearml.conf because we literally dont have, its a self hosted cluster.
You can try and replicate this s3 config I have posted earlier.
It looks like im moving forward
Setting url in clearml.conf without "s3" as suggested works (But I dont add port ther, not sure if it breaks something, we dont have a port)
host: " our-host.com "
Then in test_task.py
task: clearml.Task = clearml.Task.init(
project_name="project",
task_name="task",
output_uri=" None ",
)
I think connection is created
What im getting now is bucket error, i suppose I have to specify it so...
Can I do it while i have multiple ongoing training?