so it would be better just to use the original code files and the same conda env. if possibleโฆ
Hmm you can actually run your code in "agent mode" assuming you have everything else setup.
This basically means you set a few environment variables prior to launching the code:
Basically:export CLEARML_TASK_ID=<The_task_id_to_run> export CLEARML_LOG_TASK_TO_BACKEND=1 export CLEARML_SIMULATE_REMOTE_TASK=1 python my_script_here.py
, the easiest way possible would be if could just some how run task and let the lsf manage the environment
You mean let the LSF set the conda/venv ? or do you also mean to get the code-base, changes etc ?
Need - in my CI, the url used is https but I need the ssh url to be used. I see that we can pass repo to Task.create but not Task.init
Are you cloning an existing Task, or creating a new one ?
BTW: is this on the community server or self-hosted (aka docker-compose)?
Is this a bug, or an issue with clearml not working correctly with hydra?
It might be a bug?! Hydra is fully supported, i.e. logging the state and allowing you to change the Arguments from the UI.
Is this example working as expected ?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
If you're referring to the run executed by the agent, it ends after this message because my script does not get the right args and so does not know what to...
self.task.upload_artifact('trend_step', self.trend_step + 1)
Out of curiosity why would every request generate an artifact ? Wouldn't it be better to report as part of the statistics ?
What would be the size / type of the matrix X
(i.e. np.size / np.dtype) ?
GrotesqueDog77 when you say "the second issue" , do you mean the fact that both step 1 and step 2 should have access to the same filesystem?
Hi @<1535069219354316800:profile|PerplexedRaccoon19>
On debugging, it looks like indices are corrupt.
ishhhhh, any chance you have a backup?
Question - why is this the expected behavior?
It is ๐ I mean the original python version is stored, but pip does not support replacing python version. It is doable with conda, but than you have to use conda for everything...
Hmm there was this one:
https://github.com/allegroai/clearml/commit/f3d42d0a531db13b1bacbf0977de6480fedce7f6
Basically always caching steps (hence the skip), you can install from the main branch to verify this is the issue. an RC is due in a few days (it was already supposed to be out but got a bit delayed)
AstonishingSeaturtle47 , makes sense?
is it possible to change an existing model's URL?
Edit the DBs ... That's basically the only way ๐
OutrageousSheep60 so this should work, no?ds.upload(output_url='gs://<BUCKET>/', compression=0, chunk_size=100000000000)
Notice the chunk size is the maximum size (in bytes) per chunk, so it should basically very large
AstonishingWorm64 can you share the full log (In the UI under Results/Console there is a download button)?
Yes this is definitely the issue, the agent assume the docker user is "root".
Let me check something
OmegaConf
is the configuration, the overrides are in the Hyperparameters "Hydra" section
None
Simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.
Ohhh I missed that. What is the speed you get for uploading the artifacts to the server? (you can test it with simple toy artifact upload code) ?
So the TB issue was reported images were not logged.
We are now talking about the caching, which is actually a UI thing which clearml-server version are you using ?
And where are the images stored (the default files server or is it S3/GS etc.) ?
Basically try with the latest RC ๐
pip install trains 0.15.2rc0
SubstantialElk6 when you say "Triton does not support deployment strategies" what exactly do you mean?
BTW: updated documentation already up here:
https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving
Oh no, I just saw the message @<1541954607595393024:profile|BattyCrocodile47> is this stills relevant?
Hi EnthusiasticCoyote38
But one one process finished it changed task status to complete. May be you know some save way to deal with such situation? Or maybe the best way to check task status before upload object?
Well, you can actually forcefully set the state of the Task to running, then add artifacts, then close it?
would that work?
` my_other_task.reload()
my_other_task.mark_started(force=True)
my_other_task.upload_artifact(...)
my_other_task.flush(wait_for_uploads=True)
my_othe...
I see... We could definitely add an argument to control it. I'll update here once there is an RC
why doesn't this happen on my other experiments?
same 100+ reports ?
(My new theory is that calling Task.reload() will fix it, and it might be called internally for the other experiments, like when reporting models/artifacts)
Could that be the case ?
K8s + clearml-agent integration.
Hmm is this an on-prem k8s cluster?
GrittyKangaroo27 any chance you can open a GitHub issue so this is not forgotten ?
(btw: we I think 1.1.6 is going to be released later today, then we will have a few RC with improvements on the pipeline, I will make sure we add that as well)
Hi EmbarrassedSpider34clearml-init
will try to create ~/clearml.conf
I'm assuming that when you execute under root it is resolved to /root/clearml.conf
That said you might be able to override it with:CLEARML_CONFIG_FILE=$HOME/clearml.con sudo clearml-init