
Reputation
Badges 1
533 × Eureka!And once this is done, what is the file server IP good for? will it redirect to the bucket?
Thia is just keeping getting better and better.... 🤩
Martin: In your trains.conf, change the valuefiles_server: '
s3://ip :port/bucket'
Isn't this a client configuration ( trains-init
)? Shouldn't be any change to the server configuration ( /opt/trains/config...
)?
Continuing on this discussion... What is the relationship between configuring files_server
and all the rest we just talked about and the the default_output_uri
?
To be clearer - how to I refrain from using the built in file-server altogether - and use MINIO for any storage need?
I know I can configure the file server on trains-init
- but that only touches the client side, what about the container on the trains server?
Loading part from task B:
` def get_models_from_task(task: clearml.Task, model_artifact_substring: str = 'iter_') -> dict:
"""
Extract all models saved as artifacts with the specified substring
:param task: Task to fetch from
:param model_artifact_substring: Substring for recognizing models among artifacts
:return: Mapping between iter number and model instance
"""
# Extract models from task (models are named iter-XXX where XXX is the iteration number)
model_...
2021-10-11 10:07:19 ClearML results page:
`
2021-10-11 10:07:20
Traceback (most recent call last):
File "tasks/hpo_n_best_evaluation.py", line 256, in <module>
main(args, task)
File "tasks/hpo_n_best_evaluation.py", line 164, in main
trained_models = get_models_from_task(task=hpo_task)
File "tasks/hpo_n_best_evaluation.py", line 72, in get_models_from_task
with open(pickle_path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/elior/.clearml/c...
moreover, in each pipeline I have 10 different settings of task A -> Task b (and then task C), each run 1-2 fails randomly
Thx DangerousDragonfly8 💪
I assume we are talking about the IP I would find here right?
https://www.whatismyip.com/
I am noticing that the files are saved locally, is there any chance that the files are over-written during the run or get deleted at some point and then replaced?
Yes they are local - I don't think there is a possibility they are getting overwritten... But that depends on how clearml names them. I showed you the code that saves the artifacts, but this code runs multiple times from a given template with different values - essentially it creates like 10 times the same task with different param...
The scenario I'm going for is never to run on the dev machine, so all I'll need to do once the server + agents are up is to add task.execute_remotely...
after the Task.init
line and after the execution of the script is called on the dev machine, it won't actually run but rather enqueue itself for the agent to run it?
(it works now, with 20 GB)
So dynamic or static are basically the same thing, just in dynamic, I can edit the artifact while running the expriment?
Second, why would it be overwritten if I run a different run of the same experiment? As I saw, each object is stored under a directory with the task ID which is unique per run, so I assume I won't be overriding artifacts which are saved under the same name in different runs (regardless of static or dynamic)