Badges 1186 × Eureka!
same here, changing arguments in the Args section of Hyperparameters doesn’t work, training script starts with the default values.
I updated the version in the Installed packages section before starting the experiment
I don’t connect anything explicitly, I’m using argparse, it used to work before the update
thanks! this bug and cloning problem seem to be fixed
copy-pasting entire training command into command line 😃
I change the arguments in Web UI, but it looks like they are not parsed by trains
I get "The connection has timed out" when I'm trying to reach 8081 port
yeah, backups take much longer, and we had to increase our EC2 instance volume size twice because of these indices
got it, thanks, will try to delete older ones
yeah, we did. let me check if explicitly setting credentials helps
name: "John Doe"
well okay, it's probably not that weird considering that worker just runs the container
I guess I could manually explore different containers and their content 😃 as far as I remember, I had to update Elastic records when we moved to the new cloud provider in order to update model URLs
not quite. for example, I’m not sure which info is stored in Elastic and which is in MongoDB
if you click on the experiment name here, you get 404 because link looks like this:
when it should look like this:
btw, are there any examples of exporting metrics using Python client? I could only find last_metrics attribute of the task
we do log a lot of the different metrics, maybe this can be part of the problem
it will probably screw up my resource monitoring plots, but well, who cares 😃
I use Docker for training, which means that log_dir contents are removed for the continued experiment btw
thank you, I'll let you know if setting it to zero worked
this is what I got in installed packages without adding the direct link:
docker mode. they do share the same folder with the training data mounted as a volume, but only for reading the data.
awesome news 👍
everything is working as expected
nope, that's the point, quite often we run experiments separately, but they are related to each other. currently there's no way to see that one experiment is using checkpoint from the previous experiment since we need to manually insert S3 link as a hyperparameter. it would be useful to see these connections. maybe instead of grouping we could see which experiments are using artifacts of this experiment
nope, same problem even after creating a new experiment from scratch
new version worked
AnxiousSeal95 yeah, got it! thanks!