
Reputation
Badges 1
25 × Eureka!BTW: what happens if you pass the same s3://bucket to Task.init output_uri
? I assume you are getting the same access issue ?
Hi ShallowArcticwolf27
from the command line to a remote machine while loading a localΒ
.env
Β file as a configuration object?
Where would the ".env" go to ? Are we trying to pass it to the remote machine somehow ?
check if the fileserver docker is running with docker ps
While I'll look into it, you can do:from clearml import OutputModel output_model = OutputModel() output_model.update_weights("best_model.onnx")
@<1523701099620470784:profile|ElegantCoyote26> what's the target upload? also how come you are uploading a local file and auto deleting it, and then uploading the same one as artifact ?
I am running clearml-agent in docker mode btw.
Try -e PYTHONOPTIMIZE=1
in the docker args section, should do the same π
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONOPTIMIZE
π CooperativeFox72 please see if you can send a code snippet to reproduce the issue. I'd be happy to solve the it ...
Hi BlandPuppy7 , is this Trains related, are you trying to integrate it, and need help?
BTW: is this on the community server or self-hosted (aka docker-compose)?
Is there any references (vlog/blog) on deploying real-time model and do the continuous training pipeline in clear-ml?
Something along the lines of this one ?
https://clear.ml/blog/creating-a-fully-automatic-retraining-loop-using-clearml-data/
Or this one?
https://www.youtube.com/watch?v=uNB6FKIi8Wg
Could you verify you have 8 subfolders named 'venv.X' in the cache folder ~/. trains ?
DeliciousBluewhale87 You can havwe multiple queues for the k8s queuea in priory order:python k8s_glue_example.py --queue glue_q_high glue_q_low
Then if someone is doing 100 experiments (say HPO), then they push into the "glie_q_low" which means it will first pop Tasks from the high priority queue and if it is empty it will pop from the low priority queue.
Does that make sense ?
I just tested the master with https://github.com/jkhenning/ignite/blob/fix_trains_checkpoint_n_saved/examples/contrib/mnist/mnist_with_trains_logger.py on the latest ignite master and Trains, it passed, but so did the previous commit...
Hi MinuteWalrus85
This is great question, and super important when training models. This is why we designed a whole system to manage datasets (including storage querying, balancing data, and caching). Unfortunately this is only available in the paid tier of Allegro... You are welcome to https://allegro.ai/enterprise/ the sales guys.
π
If you mean like Canary ? then yes, but only on KFserving baclend (coming soon), since the engines themselves do not support it (this is basically a "routing" feature)
I see, so basically fix old links that are now not accessible? If this is the case you might need to manually change the document on the mongodb running in the backend
VexedCat68 the remote checkpoints (i.e. Models) represent the local storage, so if you internally overwrite the files, this is exactly what will happen in the backend. so the following should work (and store the last 5 checkpoints):epochs += 1 torch.save("model_{}.pt",format(epochs % 5))
Regrading deleting / getting models:Model.remove(task.models['output'][-1])
Okay, what you can do is the following:
assuming you want to launch task id aabb12
The actual slurm command will be:trains-agent execute --full-monitoring --id aabb12
You can test it on your local machine as well.
Make sure the trains.conf is available in the slurm job
(use trains-agent --config-file
to point to a globally shared one)
What do you think?
So what will you query ?
Is it only for modified changes and not untracked files?
basically everything that "git diff" will output.
Then the agent will re-apply it on a remote machine
Metadata might be expensive, it's a RestAPI call, and we have found users putting hundreds of artifacts, with preview entries ...
This is strange... Could you send the browser console log, maybe there is an exception there
Hmm that sounds like the agent needs to access a vault with credentials per user, unfortunately this is not covered in the open-source π I "think" this is supported in the enterprise version as part of the permission management
ShakyOstrich31
I am reusing an old task ...
Which means that the old Task stores the requirements on the Task itself (see "Installed Packages" section), Notice it also stores the exact git commit to use.
When you are cloning the Task (i.e. in the pipeline), you should probably:
set the commit / branch to the latest in the branch clear the "installed packages" section, which would cause the agent to use the "requirements.txt" stored in the git repo itself.As far as I understand this s...