
Reputation
Badges 1
533 × Eureka!Makes sense
So I assume, trains assumes I have nvidia-docker installed on the agent machine?
Moreover, since I'm going to use Task.execute_remotely
(and not through the UI) is there any code way to specify the docker image to be used?
let me try to docker-compose down --rmi all
Yep, if communication is both ways, there is no way (that I can think of) it can be solved for offline mode.
But if the calls that are made from the server to the client can be redundant in a specific setup (some functionality will not work, but enough valuable functionality remains) then it is possible in the manual way
I set it to true, I have more packages installed now, but it still fails.. here is the log TimelyPenguin76
` Successfully installed clearml-1.0.5 cloudpickle-1.6.0 cycler-0.10.0 hyperopt-0.2.5 kiwisolver-1.3.2 matplotlib-3.4.3 networkx-2.6.2 pandas-1.3.2 patsy-0.5.1 plotly-5.3.0 python-dateutil-2.8.2 statsmodels-0.12.2 tenacity-8.0.1 tqdm-4.62.2
Adding venv into cache: /home/elior/.clearml/venvs-builds/3.8
Running task id [24a54a473c234b00a126ec805d74046f]:
[.]$ /home/elior/.clearml/venvs...
And once this is done, what is the file server IP good for? will it redirect to the bucket?
Thia is just keeping getting better and better.... 🤩
Martin: In your trains.conf, change the valuefiles_server: '
s3://ip :port/bucket'
Isn't this a client configuration ( trains-init
)? Shouldn't be any change to the server configuration ( /opt/trains/config...
)?
Continuing on this discussion... What is the relationship between configuring files_server
and all the rest we just talked about and the the default_output_uri
?
To be clearer - how to I refrain from using the built in file-server altogether - and use MINIO for any storage need?
I know I can configure the file server on trains-init
- but that only touches the client side, what about the container on the trains server?
Loading part from task B:
` def get_models_from_task(task: clearml.Task, model_artifact_substring: str = 'iter_') -> dict:
"""
Extract all models saved as artifacts with the specified substring
:param task: Task to fetch from
:param model_artifact_substring: Substring for recognizing models among artifacts
:return: Mapping between iter number and model instance
"""
# Extract models from task (models are named iter-XXX where XXX is the iteration number)
model_...
2021-10-11 10:07:19 ClearML results page:
`
2021-10-11 10:07:20
Traceback (most recent call last):
File "tasks/hpo_n_best_evaluation.py", line 256, in <module>
main(args, task)
File "tasks/hpo_n_best_evaluation.py", line 164, in main
trained_models = get_models_from_task(task=hpo_task)
File "tasks/hpo_n_best_evaluation.py", line 72, in get_models_from_task
with open(pickle_path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/elior/.clearml/c...
moreover, in each pipeline I have 10 different settings of task A -> Task b (and then task C), each run 1-2 fails randomly
Thx DangerousDragonfly8 💪
I assume we are talking about the IP I would find here right?
https://www.whatismyip.com/
I am noticing that the files are saved locally, is there any chance that the files are over-written during the run or get deleted at some point and then replaced?
Yes they are local - I don't think there is a possibility they are getting overwritten... But that depends on how clearml names them. I showed you the code that saves the artifacts, but this code runs multiple times from a given template with different values - essentially it creates like 10 times the same task with different param...