now it is empty and I don't know where to find credentianl to connect one more docker client
Can you otherwise interact with the WebApp normally?
Can you try to restart the agents and see if you get the same error?
Jake, the only way that I know to run the agents is to run docker
docker ps | grep clearml 2635cec202d9 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up 3 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver f8d307913fe0 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up About a minute 0.0.0.0:8008->8008/tcp, :::8008->8008/tcp, 8080-8081/tcp clearml-apiserver afe2f21c44ce redis:5.0 "docker-entrypoint.s…" 3 days ago Up 3 days 6379/tcp clearml-redis a90c06e8de95 mongo:4.4.9 "docker-entrypoint.s…" 3 days ago Up 3 days 27017/tcp clearml-mongo 6df40da956ae docker.elastic.co/elasticsearch/elasticsearch:7.16.2 "/bin/tini -- /usr/l…" 3 days ago Restarting (1) 20 seconds ago clearml-elastic f9afae832275 allegroai/clearml:latest "/opt/clearml/wrappe…" 3 days ago Up 3 days 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp clearml-fileserver
so I run /opt/clearml/ docker-compose -f docker-compose.yml up
just to make sure and this are errors that I am seeing
clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.app_sequence] ################ API Server initializing ##################### clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Initializing database connections clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb host mongo clearml-apiserver | [2022-06-09 13:27:33,737] [9] [INFO] [clearml.database] Using override mongodb port 27017 clearml-apiserver | [2022-06-09 13:27:33,738] [9] [INFO] [clearml.database] Registering connection to auth-db (
) clearml-apiserver | [2022-06-09 13:27:33,739] [9] [INFO] [clearml.database] Registering connection to backend-db (
) clearml-apiserver | [2022-06-09 13:27:33,743] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 1 of 4. Waiting for 30sec
something red going on with apiserver
Hi Igor, profile
is an old path, /settings/profile
replaced it.
how did you navigate to it?
To be more precise - how to clean up and install 🙂
and more logs 🙂 nice warning about dev server in production
clearml-apiserver | /usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py:208: ElasticsearchWarning: Legacy index templates are deprecated in favor of composable templates. clearml-apiserver | warnings.warn(message, category=ElasticsearchWarning) clearml-apiserver | [2022-06-09 13:28:03,875] [9] [INFO] [clearml.initialize] [{'mapping': 'events_plot', 'result': {'acknowledged': True}}, {'mapping': 'events_training_debug_image', 'result': {'acknowledged': True}}, {'mapping': 'events', 'result': {'acknowledged': True}}, {'mapping': 'events_log', 'result': {'acknowledged': True}}] clearml-apiserver | [2022-06-09 13:28:03,876] [9] [INFO] [clearml.initialize] Applying mappings to ES host: [ConfigTree([('host', 'elasticsearch'), ('port', '9200')])] clearml-apiserver | [2022-06-09 13:28:03,890] [9] [INFO] [clearml.initialize] [{'mapping': 'worker_stats', 'result': {'acknowledged': True}}, {'mapping': 'queue_metrics', 'result': {'acknowledged': True}}] clearml-apiserver | [2022-06-09 13:28:03,890] [9] [INFO] [clearml.apiserver.mongo.initialize.migration] Started mongodb migrations clearml-apiserver | [2022-06-09 13:28:03,897] [9] [INFO] [clearml.apiserver.mongo.initialize.migration] Finished mongodb migrations clearml-apiserver | [2022-06-09 13:28:03,919] [9] [INFO] [clearml.service_repo] Loading services from /opt/clearml/apiserver/services clearml-apiserver | [2022-06-09 13:28:04,068] [9] [INFO] [clearml.app_sequence] Exposed Services: auth.create_credentials auth.create_user auth.edit_credentials auth.edit_user auth.fixed_users_mode auth.get_credentials auth.get_token_for_user auth.login auth.logout auth.revoke_credentials auth.validate_token debug.ping events.add events.add_batch events.clear_scroll events.clear_task_log events.debug_images events.delete_for_task events.download_task_log events.get_debug_image_sample events.get_multi_task_plots events.get_scalar_metric_data events.get_scalar_metrics_and_variants events.get_task_events events.get_task_latest_scalar_values events.get_task_log events.get_task_metrics events.get_task_plots events.get_vector_metrics_and_variants events.multi_task_scalar_metrics_iter_histogram events.next_debug_image_sample events.scalar_metrics_iter_histogram events.scalar_metrics_iter_raw events.vector_metrics_iter_histogram login.logout login.supported_modes models.add_or_update_metadata models.archive_many models.create models.delete models.delete_many models.delete_metadata models.edit models.get_all models.get_all_ex models.get_by_id models.get_by_id_ex models.get_by_task_id models.get_frameworks models.make_private models.make_public models.move models.publish_many models.set_ready models.unarchive_many models.update models.update_for_task organization.get_tags organization.get_user_companies pipelines.start_pipeline projects.create projects.delete projects.get_all projects.get_all_ex projects.get_by_id projects.get_hyper_parameters projects.get_hyperparam_values projects.get_model_metadata_keys projects.get_model_metadata_values projects.get_model_tags projects.get_project_tags projects.get_task_parents projects.get_task_tags projects.get_unique_metric_variants projects.make_private projects.make_public projects.merge projects.move projects.update projects.validate_delete queues.add_or_update_metadata queues.add_task queues.create queues.delete queues.delete_metadata queues.get_all queues.get_all_ex queues.get_by_id queues.get_default queues.get_next_task queues.get_queue_metrics queues.move_task_backward queues.move_task_forward queues.move_task_to_back queues.move_task_to_front queues.remove_task queues.update server.config server.endpoints server.get_stats server.info server.report_stats_option tasks.add_or_update_artifacts tasks.add_or_update_model tasks.archive tasks.archive_many tasks.clone tasks.close tasks.completed tasks.create tasks.delete tasks.delete_artifacts tasks.delete_configuration tasks.delete_hyper_params tasks.delete_many tasks.delete_models tasks.dequeue tasks.dequeue_many tasks.edit tasks.edit_configuration tasks.edit_hyper_params tasks.enqueue tasks.enqueue_many tasks.failed tasks.get_all tasks.get_all_ex tasks.get_by_id tasks.get_by_id_ex tasks.get_configuration_names tasks.get_configurations tasks.get_hyper_params tasks.get_types tasks.make_private tasks.make_public tasks.move tasks.ping tasks.publish tasks.publish_many tasks.reset tasks.reset_many tasks.set_requirements tasks.started tasks.stop tasks.stop_many tasks.stopped tasks.unarchive_many tasks.update tasks.update_batch tasks.validate users.create users.delete users.get_all users.get_all_ex users.get_by_id users.get_current_user users.get_preferences users.set_preferences users.update workers.get_activity_report workers.get_all workers.get_metric_keys workers.get_stats workers.register workers.status_report workers.unregister clearml-apiserver | * Serving Flask app 'server' (lazy loading) clearml-apiserver | * Environment: production clearml-apiserver | WARNING: This is a development server. Do not use it in a production deployment. clearml-apiserver | Use a production WSGI server instead. clearml-apiserver | * Debug mode: off clearml-apiserver | [2022-06-09 13:28:04,071] [9] [WARNING] [werkzeug] * Running on all addresses. clearml-apiserver | WARNING: This is a development server. Do not use it in a production deployment. clearml-apiserver | [2022-06-09 13:28:10,330] [9] [INFO] [clearml.service_repo] Returned 200 for debug.ping in 0ms 100 294 100 294 0 0 58800 0 --:--:-- --:--:-- --:--:-- 58800
Hi MortifiedDove27 , try going into localhost:8080/login
all dockers seems to be up but running experiment fails to communicate with API server
Retrying (Retry(total=239, connect=240, read=239, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login Retrying (Retry(total=238, connect=240, read=238, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login Retrying (Retry(total=236, connect=240, read=236, redirect=240, status=240)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /auth.login
is this ClearML connection errors? 🙂
How can I run experiments if ClearML don't work, this is the service, why it should prevent experiment from running?
MortifiedDove27 , in the docker ps command you added everything seems to be running fine
docker ps | grep clearml bd8fb61e8684 allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver f79c54472a6f allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 0.0.0.0:8008->8008/tcp, :::8008->8008/tcp, 8080-8081/tcp clearml-apiserver 47c3340dd78d docker.elastic.co/elasticsearch/elasticsearch:7.16.2 "/bin/tini -- /usr/l…" 8 days ago Up 8 days 9200/tcp, 9300/tcp clearml-elastic 1af43ee6ec38 mongo:4.4.9 "docker-entrypoint.s…" 8 days ago Up 8 days 27017/tcp clearml-mongo d1523e810309 redis:5.0 "docker-entrypoint.s…" 8 days ago Up 8 days 6379/tcp clearml-redis 486e237b27d4 allegroai/clearml:latest "/opt/clearml/wrappe…" 8 days ago Up 8 days 8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp clearml-fileserver
yes, I was using only experiments tab to compare scalars and see validation and train images and I can see that information
Did you update the server while the agents were running?
Hi Shay, thanks for reply
I just went by old path remembered in browser. Last week we updated client and server, they are both running on our physical server
No, I was always shutting down server. But if you can give me step by step how to clean install I will be happy to do it
I was just wondering maybe the agents somehow got confused when the server was changed
credentials moved to workspace tab, but I get the feeling your API server is down
Shay, you are correct, one of the docker is down. But don't they supposed to run as part of docker /opt/clearml/ docker-compose -f docker-compose.yml up
?
I have restarted docker with docker down several times and nothing changes
This looks strange. Can you try re-running the dockers?