Reputation
Badges 1
186 × Eureka!btw, there are "[2020-09-02 15:15:40,331] [9] [WARNING] [urllib3.connectionpool] Connection pool is full, discarding connection: elasticsearch" in the apiserver logs again
we’re using latest ClearML server and client version (1.2.0)
Requirement already satisfied (use --upgrade to upgrade): celsusutils==0.0.1
I'm not sure it's related to the domain switch since we upgraded to the newest ClearML server version at the same time
okay, so if there’s no workaround atm, should I create a Github issue?
if you click on the experiment name here, you get 404 because link looks like this:
https://DOMAIN/projects/PROJECT_ID/EXPERIMENT_ID
when it should look like this:
https://DOMAIN/projects/PROJECT_ID/experiments/EXPERIMENT_ID
m5.xlarge EC2 instance (4 vCPUs, 16 GB RAM), 100GB disk
not sure what you mean. I used to do task.set_initial_iteration(task.get_last_iteration()) in the task resuming script, but in the training code I explicitly pass global_step=epoch to the TensorBoard writer
do you have any idea why cleanup task keeps failing then (it used to work before the update)
sorry, my bad, after some manipulations I made it work. I have to manually change HTTP to HTTPS in config file for Web and Files (not API) server after initialization, but besides that it works
nice, thanks! I'll check if it solves the issue first thing tomorrow in the morning
nope, old clenup task fails with trains_agent: ERROR: Could not find task id=e7725856e9a04271aab846d77d6f7d66 (for host: )Exception: 'Tasks' object has no attribute 'id
weirdly enough, curl http://apiserver:8008 from inside the container works
copy-pasting entire training command into command line 😃
I updated S3 credentials, I'll check if they work later
it doesn't explain inability to delete logged images and texts though
more like collapse/expand, I guess. or pipelines that you can compose after running experiments to see that experiments are connected to each other
parents and children. maybe tags, maybe separate tab or section, idk. I wonder if anyone else is interested in this functionality, for us this is a very common case
weird
this is what I got in installed packages without adding the direct link:
torch==1.6.0.dev20200430+cu101
torchvision==0.7.0.dev20200430+cu101
I decided to restart the containers one more time, this is what I got.
I had to restart Docker service to remove the containers
is it in documentation somewhere?
dnk if it's relevant, but I also added a new user to apiserver.conf today
Error
Failed to get Scalar Charts