Reputation
Badges 1
186 × Eureka!thanks! we copy S3 URLs quite often. I know that itβs better to avoid double spaces in task names, but shit happens π
do you have any idea why cleanup task keeps failing then (it used to work before the update)
we do log a lot of the different metrics, maybe this can be part of the problem
maybe db somehow got corrupted ot smth like this? I'm clueless
just DMed you a screenshot where you can see a part of the token
Requirement already satisfied (use --upgrade to upgrade): celsusutils==0.0.1
LOL
wow π
I was trying to find how to create a queue using CLI π
not quite. for example, Iβm not sure which info is stored in Elastic and which is in MongoDB
I don't think so because max value of each metric is calculated independently of other metrics
standalone-mode gives me "Could not freeze installed packages"
it will probably screw up my resource monitoring plots, but well, who cares π
overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine values I would like to continue from the latest iteration
but for the metrics, I explicitly pass the number of epoch that my training is currently on. it'ls kind of weird that it adds offset to the values that are explicitly reported, no?
sounds like an overkill for this problem, but I donβt see any other pretty solution π
I added the link just in case anyway π
also, is there any way to install a repo that we clone as a package. we often use absolute imports and do "pip install -e ." to utilize it
sorry there are so many questions, we just really want to migrate to trains-agent)
I'll get back to you with the logs when the problem occurs again
yeah, that sounds right! thanks, will try
python3 slack_alerts.py --channel trains-alerts --slack_api "OUR_KEY" --include_completed_experiments --include_manual_experiments
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server
http://apiserver:8008 ?
http://OUR_IP:8081 http://OUR_IP:8080
http://apiserver:8008
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
`...
Error
Failed to get Scalar Charts
nope, old clenup task fails with trains_agent: ERROR: Could not find task id=e7725856e9a04271aab846d77d6f7d66 (for host: )
Exception: 'Tasks' object has no attribute 'id
weirdly enough, curl
http://apiserver:8008 from inside the container works
what if cleanup service is launched using ClearML-Agent Services container (part of the ClearML server)? adding clearml.conf to the home directory doesn't help