
Reputation
Badges 1
186 × Eureka!it prints an empty dict
I’m doing Task.init() in the script, maybe it somehow resets connected parameters… but it used to work before, weird
I don’t connect anything explicitly, I’m using argparse, it used to work before the update
nope, same problem even after creating a new experiment from scratch
ValueError: Task has no hyperparams section defined
same here, changing arguments in the Args section of Hyperparameters doesn’t work, training script starts with the default values.
trains 0.16.0
trains-agent 0.16.0
trains-server 0.16.0
I updated the version in the Installed packages section before starting the experiment
copy-pasting entire training command into command line 😃
sounds like an overkill for this problem, but I don’t see any other pretty solution 😃
on the side note, is there any way to automatically give more meaningful names to the running docker containers?
nope, the only changes to config that we made are adding web-auth and non-responsive tasks watchdog
just in case, this warning disappeared after I https://stackoverflow.com/questions/49638699/docker-compose-restart-connection-pool-full
I change the arguments in Web UI, but it looks like they are not parsed by trains
thanks! we copy S3 URLs quite often. I know that it’s better to avoid double spaces in task names, but shit happens 😄
yeah, backups take much longer, and we had to increase our EC2 instance volume size twice because of these indices
got it, thanks, will try to delete older ones
yeah, I am aware of trains-agent, we are planning to start using it soon, but still, copying original training command would be useful
after the very first click, there is a popup with credentials request. nothing happens after that
I guess, this could overcomplicate ui, I don't see a good solution yet.
as a quick hack, we can just use separate name (eg "best_val_roc_auc") for all metric values for the current best checkpoint. then we can just add columns with the last value of this metric
I added the link just in case anyway 😃
also, is there any way to install a repo that we clone as a package. we often use absolute imports and do "pip install -e ." to utilize it
sorry there are so many questions, we just really want to migrate to trains-agent)
fantastic, everything is working perfectly
thanks guys
we're using the latest version of clearml, clearml agent and clearml server, but we've been using trains/clearml for 2.5 years, so there are some old tasks left, I guess 😃
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
trains_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the TRAINS API server
http://apiserver:8008 ?
http://OUR_IP:8081 http://OUR_IP:8080
http://apiserver:8008
WARNING: You are using pip version 20.1.1; however, version 20.3.3 is available.
`...
nope, old clenup task fails with trains_agent: ERROR: Could not find task id=e7725856e9a04271aab846d77d6f7d66 (for host: )
Exception: 'Tasks' object has no attribute 'id
weirdly enough, curl
http://apiserver:8008 from inside the container works
problem is solved. I had to replace /opt/trains/data/fileserver to /opt/clearml/data/fileserver in Agent configuration, and replace trains to clearml in Requirements
do you have any idea why cleanup task keeps failing then (it used to work before the update)