
Reputation
Badges 1
29 × Eureka!Yes, It was working.
When this error occurred every experiment projects in Clearml UI are disappeared.
Then I did docker container ls
ever containers work fine, except the mongo which pushed to “restarting” status.
The error attached in this post is logs from mongo container.
We also tried to change the engine from “WiredTiger” to “mmpav1",
Actually run out of space was a issue, but we already move the clearml-storage to the bigger one.
After I diddocker-compose -f docker-compose.yml down docker-compose -f docker-compose.yml up -d
Then elasticsearch container got this error
` ElasticsearchException[failed to bind service]; nested: IOException[failed to test writes in data directory [/usr/share/elasticsearch/data/nodes/0/indices/mQ-x_DoZQ-iZ7OfIWGZ72g/_state] write permission is required]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes/0/indices/mQ-x_DoZQ-iZ7OfIWGZ72g/_state/.es_temp_file];
clearml-el...
Restarting the MongoDB doesn’t help as well.
mongo: 3.6.5, clearml-server: lastest
Just for update. This issue fixed by
Start mongo container in bash mode. [sudo docker-compose -f docker-compose.yml run mongo bash] Run mongo db repair. [mongod --dbpath /data/db --repair] The DB is repaired successfully and restored.
I already did
chmod 777 on /opt/clearml/data
or there’s other folders I need to grant the permission
Did I migrate the data correctly using the steps I took?
Since I have mounted NAS, I want my server to get directly access to NAS.
Hello, after did the steps you mentioned https://clearml.slack.com/archives/CTK20V944/p1659702067809619?thread_ts=1659694970.919069&cid=CTK20V944
The server is now can start properly but Clearml UI doesn’t show any experiments that I cloned from serverA. Any suggestion? thank you!
` {"type": "server", "timestamp": "2021-12-22T10:56:48,500Z", "level": "DEBUG", "component": "o.e.a.s.TransportSearchAction", "cluster.name": "clearml", "node.name": "clearml", "message": "All shards failed for phase: [query]", "cluster.uuid": "0aOZYv7bQD--rcBaPQvSJQ", "node.id": "6hqtnbZLSyCot85eDVoHAw" }
{"type": "server", "timestamp": "2021-12-22T10:56:48,500Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "clearml", "node.name": "clearml", "message": "path: /events-log-d...
I’ve follow the installation steps that mentioned in this page
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac/
Then I replaced /opt/clearml/data
of ServerB by ServerA /opt/clearml/data
.
I’ll try that approach
Oh, I just realized that the mondo version between ServerA and ServerB is mismatch.
The problem was resolved by updating the mongo image to 4.0.23 as serverA.
I mean migrating the data from serverA to serverB.
I just replace serverB with ServerA’s /opt/clearml/data
.
My code are attached above. The version of Clearml is 1.0.5.
Example of filename: patch_ver3_dense_image_model_2 - srr_b4v2-large - step2 - patch_ver3_dense_image_model_2 - srr_b4v2-large - step1 - patch_ver3_dense_image_model_2_weights.03-1.31_weights.15-0.26_weights.10-0.18.h5
` import tensorflow as tf
from clearml import Model
from clearml.model import InputModel
def get_model_id(model_name,tags = None):
print("Model name: ", model_name)
print("Model tags: ", tags)
response = Model.query_models(model_name=model_name,
tags=tags)
if not response:
raise ValueError('your model name and tags result in empty query')
model_data= None
for model_obj in response:
if model_obj.name == model_name:
...
I execute add_file and then, upload.
Is it ok if the path of ServerA and ServerB is difference.
For example, ServerA stores file at /opt/clearml but ServeB stores at /some_path/clearml
Sorry, how to create a new version of dataset.
There’s another log from elastic contatiner.
The logs is too long, I have to store them in txt file.
I think these are all errors in container.
I also attached all logs in file.
Yes, I backed up only Mongo path.
OS: Ubuntu
It works fine, but after sometimes MongoDB stops working. We find out that MongoDB is restarting from docker-compose ps.
No, the run out of space occurred 3 day ago but MongoDB issue just occurred today.