Yeah the ultimate goal I'm trying to achieve is to flexibly running tasks for example before running, could have a claim saying how many resources I can and the agent will run as soon as it find there are enough resources
Checkout Task.execute_remotely()
you can push it anywhere in your code, when execution get to it, If you are running without an agent it will stop the process and re-enqueue it to be executed remotely, on the remote machine the call itself becomes a noop,
I...
You should have a download button when you hover over the table, I guess that would be the easiest.
If needed I can send an SDK code but unfortunately there is no single call for that
So two folders with artifacts per experiment. I was wondering if there was a more efficient solution and if it could be combined.
Not sure I follow, is two subfolders for two different things are not they it is supposed to be ?
If this is the case, then you have to set a shared PV for the pods, this way they can actually have a persistent cache, which would also be shared.
BTW: a single function call might not be a perfect match for a pipeline component , the overhead of starting a node might not be negligible as it needs to install required python packages bring the code etc.
Does it mean I can use clearml-serving helm chart alone
Unrelated, the clearml-serving can be deployed on k8s or with docker-compose regardless of where/how clearml-server is deployed
ColossalDeer61 btw, it turns out the docker-compose services docker was ill configured on the GitHub 😞 I suggest you get the latest copy of it:curl
-o docker-compose.yml
Thanks! I think I was able to locate the issue, but I wanted to verify 🙂
PompousHawk82 unfortunately this is kind of binary, either you have full tracking of load/save operations or you do not.
This warning message will disappear in the next version as we will be able to log multiple models under the same Task :)
Found it, definitely a bug in the callback, it has not effect on the HPO process itself
they are just neighboring modules to the function I am importing.
So I think that is you specify the repo,, on the remote machine you will end with the code of the component sitting at the root folder of the repo, from there I assume you can import the rest, the root git path should be part of your PYTHONPATH automatically.
wdyt?
To auto upload the model you have to tell clearml to upload it somewhere, usually by passing output_uri to Task.init or setting the default_output_uri in the clearml.conf
Hi @<1657918706052763648:profile|SillyRobin38>
I'm curious to know if it's possible to prevent uploading a duplicate endpoint.
...and we attempt to upload it again without any changes to the command content,
Basically you overwrite it, and yes, possible 🙂
any other aspect, could the system prevent the duplicate upload?
so basically check the hash and say, no need to upload?
Hi @<1545216070686609408:profile|EnthusiasticCow4>
is there a way to get the date from the InputModel?
You should be able to with model._get_model_data()
But I think we should have it all exposed, wdyt?
Okay how do I reproduce it ?
PompousParrot44
you can always manually store/load models, example: https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/examples/reporting/model_config.py#L35 Sure, you can patch any frame work with something similar to what we do in xgboost, any such PR will be greatly appreciated! https://github.com/allegroai/trains/blob/master/trains/binding/frameworks/xgboost_bind.py
So why is it trying to upload to "//:8081/files_server:" ?
What do you have in the trains.conf on the machine running the experiment ?
CrookedWalrus33 can you send the entire log? (you can DM it to me)
Hi ExasperatedCrocodile76
It seems like it is using conda package manager, were you using conda when you run the code manually ?ERROR: This cross-compiler package contains no program /home/ivan/miniconda3/envs/clearML/bin/x86_64-conda_cos6-linux-gnu-gfortran
Why is it trying to install from source code?
BTW: can you test with the latest agent RC? ( pip install clearml-agent==1.4.0rc4
)
With offline mode,
Later if you need you can actually import the execution (including artifacts etc.) you just need the zip file it creates when you are done.
post_optional_packages: ["google-cloud-storage", ]
Will install it last (i.e. after all the other packages) but only if you have it in the "Installed packages" list
ElegantKangaroo44 I think TrainsCheckpoint
would probably be the easiest solution. I mean it will not be a must, but another option to deepen the integration, and allow us more flexibility.
Hi @<1581454875005292544:profile|SuccessfulOtter28>
Why would you archive an experiment?
Because you do not want to see it any longer (i.e. not very important) but you do not want to loose the ability to later do some forensics and look into it (meaning you do not want to completely delete it)
does that make sense ?
Here you go 🙂
(using trains_agent for easier all data access)from trains_agent import APIClient client = APIClient() log_events = client.events.get_scalar_metric_data(task='11223344aabbcc', metric='valid_average_dice_epoch') print(log_events)
BTW: from the instance name it seems like it is a VM with preinstalled pytorch, why don't you add system site packages, so the venv will inherit all the preinstalled packages, it might also save some space 🙂
DeterminedToad86 see here:
https://github.com/allegroai/clearml-agent/blob/0462af6a3d3ef6f2bc54fd08f0eb88f53a70724c/docs/clearml.conf#L55
Change it on the agent's conf file to:system_site_packages: true