
Reputation
Badges 1
533 × Eureka!can you tell me which API call exactly are you using for spinning up? I would like to debug and try to use boto3
myself in order to spin up an instance, so I can understand where the problem is coming from
Anyway I checked the base task, and this is what it has in installed packages (seems like it doesn't list all the real packages in the environment)
I assume trains passes it as is, so I think the quoting I mentioned might work
:face_palm: 🤔 :man-tipping-hand:
Version 1.1.1
Snippet of which part exactly?
Yeah, logs saying "file not found", here is an example
This is a part of a bigger process which times quite some time and resources, I hope I can try this soon if this will help get to the bottom of this
I'm asking that because the DSes we have are working on multiple projects, and they have only one trains.conf
file, I wouldn't want them to edit it each time they switch project
so basically - if she has new commits locally that werent pushed it won't work
But if she did not commit her latest changes, and now she enqueues - it will work?
Saving part from task A:
pipeline = trials.trials[index]['result']['pipeline'] output_prefix = 'best_iter_' if i == 0 else 'iter_' task.upload_artifact(name=output_prefix + str(index), artifact_object=pipeline)
Manual model registration?
actually i was thinking about model that werent trained uaing clearml, like pretrained models etc
I assume that at some points in the execution, the client (where the task is running) is sending JSONs to the mongo service, and that is what we see in the web UI.
Since we are talking about a case where there is no internet available, maybe these could be dumped into files/stdout and let the user manually insert them.
The manual insertion UX could be something like a CLI copy-paste or and endpoint for files - but since your UX is so good ( 🙂 ) I'm sure you'll figure this part out better
I just tried setting the conf in the section Martin said, it works perfectly
I don't fully get it - it says it has to be enqueued
later today or tomorrow, I'll update
If you want we can do live zoom or something so you can see what happens
Oh... from the docs I understood that I don't have to run the script, that I can either configure it in the UI, or with the sscript (wizard) so I ignored it up until now
I'm using iteration = 0 at the moment, and I "choose" the max and it shows as a column... But the column is not the scalar name (because it cuts it and puts the >
sign to signal max).
For the sake of comparing and sorting, it makes sense to log a scalar with a given name without the iteration dimension
Yep, the trains server is basically a docker-compose based service.
All you have to do is change the ports in the docker-compose.yml
file.
If you followed the instructions in the docs you should find that file in /opt/trains/docker-compose.yml
and then you will see that there are multiple services ( apiserver
, elasticsearch
, redis
etc.) and in each there might be a section called ports
which then states the mapping of the ports.
The number on the left, is ...
what should I paste here to diagnose it?
I suspect that it has something to do with remote execution / local execution of pipelines, because we play with this , so sometimes the pipeline task itself executes on the client, and sometimes on the host (where the agent is also)
AgitatedDove14 all I did was to cerate this metric as "last" and then turned on the "max" and "min" and then turned them off
I can't reproduce it now but:
I restarted the services and it didn't help I deleted the columns, and created them again after a while and it helped
Hahahah thanks for the help SuccessfulKoala55 & CostlyOstrich36
I really do feel it would be a nice to have the ability to easily configure the Cleanup Service to cleanup only specific projects / tasks as its a common use case to have a project dedicated for debugging and alike