ElegantKangaroo44 it seems to work here?!
https://demoapp.trains.allegro.ai/projects/0e152d03acf94ae4bb1f3787e293a9f5/experiments/48907bb6e870479f8b230e6b564cd52e/output/metrics/plots
Hi @<1610083503607648256:profile|DiminutiveToad80>
This depends on how you configure the agents in your clearm.conf
You can do https if user/pass are configured, and you can force SSH and it will auto-mount your host SSH folder into the container and use it.
None
[None](https://github.com/allegroai/clearml-agent/blob/0254279ed5987fbc69cebae245efaea33aec1ff2/docs/cl...
It seems to fail when trying to download the modellocal_download = StorageManager.get_local_copy(uri, extract_archive=False) File "/opt/venv/lib/python3.7/site-packages/clearml/storage/manager.py", line 47, in get_local_copy cached_file = cache.get_local_copy(remote_url=remote_url, force_download=force_download) File "/opt/venv/lib/python3.7/site-packages/clearml/storage/cache.py", line 55, in get_local_copy if helper.base_url == "file://":
And based on the error I suspect the...
I would suggest deleting them immediately when they're no longer needed,
This is the idea for the next RC, it will delete them after it is done using π
Are Kwargs supported in functions decorated as a pipeline component?
They are, but I think the main issue is the casting, without prior knowledge, everything will be a tring
Hi GrotesqueOctopus42
Dispite having reuse_last_task_id=True on Task.init, it always creates a new task id. Anyone ever had this issue?
So the way "reuse_last_task_id=True" works is that if there are no artifacts on the Task it will reuse it, but when running inside jupyter it always has artifacts (the notebook itself), so it starts a new Task.
You can however pass a specific Task ID and it will reuse it "reuse_last_task_id=aabb11", would that help?
MelancholyElk85
How do I add files without uploading them anywhere?
The files themselves need to be packaged into a zip file (so we have an immutable copy of the dataset). This means you cannot "register" existing files (in your example, files on your S3 bucket?!). The idea is to make sure your dataset is protected against changes on the one hand, but on the other to allow you to change it, and only store the changeset.
Does that make sense ?
But this will require some code changes...
BulkyTiger31 could it be there is some issue with the elastic container ?
Can you see any experiment's metrics ?
HighOtter69 , let me check something
Hi LethalDolphin75
I think you are right there isn't one (although I remember a discussion about it...)
Anyhow it will be very easy to implement, just inherit from:
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/parameters.py#L111
And return the power of the parent value here:
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/parameters.py#L146
And
https://github.com/allegroai/...
SubstantialElk6
The ~<package name with first name dropped> == a.b.c
is a known conda/pip temporary install issue. (Some left over from previous package install)
The easiest way is to find the site-packages folder and delete the package, or create a new virtual environment
BTW:
pip freeze will also list these broken packages
ProudMosquito87 I think this is what you are looking for: https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L101
Hi @<1523702000586330112:profile|FierceHamster54>
I think I'm missing a few details on what is logged, and ref to the git repo?
Hi PunyGoose16 ,
I think the website is probably the easiest π
https://clear.ml/contact-us/
I think they get back to quite quickly
Yes it fully supported, and should work.
Could you share the full execution log ?
Hi ReassuredTiger98
I think DefiantCrab67 solved it π
https://clearml.slack.com/archives/CTK20V944/p1617746462341100?thread_ts=1617703517.320700&cid=CTK20V944
on the host machine or inside the containers that are spinning on the host machine ?
HiΒ SmoggyGoat53
There is a storage limit on the file server (basically 2GB per file limit), thisΒ is the cause of the error.
You can upload the 10GB to any S3 alike solution (or a shared folder). Just set the "output_uri" on the Task (either at Task.init or with Task.output_uri = " s3://bucket ")
main clearml repo?
Yep that sounds right π thank you!
DeliciousBluewhale87
Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there..
Hmm that means it is working...
Do you see there a *.conf files? What do they contain? (it point to the correct clearml-server config)
MortifiedCrow63 , hmmm can you test with manual upload and verify ?
(also what's the clearml version you are using)
Okay that means it is running in virtual environment mode.
On the original Task (the one you enqueued) what were the installed packages (specifically the torch/torchvision) ?
Hi TrickyRaccoon92
... would any running experiment keep a cache of to-be-sent-data, fail the experiment, or continue the run, skipping the recordings until the server is back up?
Basically they will keep trying to send data to server until it is up again (you should not loose any of the logs)
Are there any clever functionality for dumping experiment data to external storage to avoid filling up the server?
You mean artifacts or the database ?
You can try just pulling the "metric" section of the Task, but I cannot imaging the network bandwidth is the issue?
Could it be load on the clearml-server (i.e. it needs to handle lots of requests ?)
Hi MotionlessSeagull22
Hmm I'm not this is possible in the UI.
You can compare multiple experiments and view the images in form of thumbnails one next to the other, But full view will be a single image...
You can however right click on the image and get a direct link, then open a new tab ... :(
Hi UnsightlySeagull42
Just making sure, the two scripts are on your git repo ?
Hi @<1523702000586330112:profile|FierceHamster54>
Nope π nothing to worry about.
That said do notice the open-source file-server is not secure, this does not mean it will spill data on the server, but it does mean that you should probably put it behind a VPN or use S3/GCP/Azure if this is open to the public internet
It appears that "they sell that" as Triton Management Service, part of
. It is possible to do through their API, but would need to be explicit.
We support that, but this is Not dynamically loaded, this is just removing and adding models, this does not unload them from the GRAM.
That's the main issue. when we unload the model, it is unloaded, to do dynamic, they need to be able to save it in RAM and unload it from GRAM, that's the feature that is missing on all Triton deployme...