
Reputation
Badges 1
25 × Eureka!iβm just curious about how does trains server on different nodes communicate about the task queue
We start manual, we tell the agent just execute the task (notice we never enqueued it), if all goes well we will get to multi-node part π
MysteriousBee56 what do you mean by "local repository"?
Like no git server, or local commit before pushing it ?
Thanks ScantChimpanzee51 !
Let me see what I can find, should be easy enough to fix now π
BTW
Grafana Visualizing endpoint request latency as well as prediction result value distributions
I want to store only my raw data in my blob storage, and I want to create a Hyperdataset with all the artificats, metrics, frames,
Yes that's exactly how it works.
None
This line adds a reference to raw file (local/remote)
[https://github.com/allegroai/clearml/blob/1b474dc0b057b69c76bc2daa9eb8be927cb25efa[β¦]es/hyperdatasets/data-registration/register_dataset_wit...
. Is there any known issue with amazon sagemaker and ClearML
On the contrary it actually works better on Sagemaker...
Here is what I did on sage maker, created:
created a new sagemaker instance opened jupyter notebook Started a new notebook conda_python3 / conda_py3_pytorchIn then I just did "!pip install clearml" and Task.init
Is there any difference ?
Hi @<1533257411639382016:profile|RobustRat47>
sorry for the delay,
Hi when we try and sign up a user with github.
wait, where are you getting this link?
So I think there are two bugs here?
--args overrides="key=value" does not work request: add --hydra to override hydra arguments (and if this is added the first one is not needed)Is that correct?
This is something that we do need if we are going to keep using ClearML Pipelines, and we need it to be reliable and maintainable, so I donβt know whether it would be wise to cobble together a lower-level solution that has to be updated each time ClearML changes its serialisation code
Sorry if I was not clear, I do not mean for you ti do unstable low-level access, I meant that pipelines are Designed to be editable externally, they always deserialize themselves.
The only part that is mi...
VictoriousPenguin97 I'm assuming the exact same server version ?
You might be able to also find out exactly what needs to be pickled using theΒ
f_code
Β of the function (but that's limited to C implementation of python).
Nice!
So βwaitβ is a better metaphore for me
So I would do something like (I might have a few typos but that's the gist):
def post_execute_callback_example(a_pipeline, a_node):
# type (PipelineController, PipelineController.Node) -> None
print('Completed Task id={}'.format(a_node.executed))
# wait until model is tagged, then pass it as argument
while True:
found = Moodel.query_models(...) # model filter here, inlucing tag and project
if found:
...
If you do not have a lot of workers, that I would guess console outputs
Hi SmilingFrog76
Great question, sadly multi-node is never simple π
Let's start with the basic, let's assume one worker is available and the other is not, what would you want to happen? (p.s. I'm not aware of flexible multi-node training frameworks, i.e. a framework that can detect another node is available and connect with it mid training, that said, it might exist π )
Should I useΒ
update_weights_package
Yes
BTW, config.pbtxt you should pass when "registering" the endpoint with the CLI
Yes, that sounds like a good start, DilapidatedDucks58 can you open a github issue with the feature request ?
I want to make sure we do not forget
Hi ColossalAnt7
Following on SuccessfulKoala55 answer
I saw that there is a config file where you can specify specific users and passwords, but it currently requires
- mount the configuration file (the one holding the user/pass) into the pod from a persistent volume .
I think the k8s way to do this would be to use mounted config maps and secrets.
You can use ConfigMaps to make sure the routing is always correct, then add a load-balancer (a.k.a a fixed IP) for the users a...
JitteryCoyote63 could you send the log maybe ?
Then this is by default the free space on the home folder (`~/.clearml') that is missing free space
s like the
would be a really good starting place.
This is actually JS (typescript) ... not python, not sure on how to continue from there π
GiganticTurtle0 this is exactly what I did, and ended up with two pipelines, comparing them produced what I expected (different arguments as passed by the script).
What are you getting ?
SillyPuppy19 are you aborting the experiment or are you trying to protect crash? Is it like a callback functionality you are looking for?
TightElk12 I think this message belongs to a diff thread ;)
I believe AnxiousSeal95 is.
ElatedFish50 any specific reason for the question?
What sort of data would be stored in the
venvs-build
folder?
ClumsyElephant70 temporary (lifetime of the task execution) virtual environment, including the code etc. It is deleted and recreated for every new task launched (or restored from cache, if venvs_cache is enabled)
@<1523701083040387072:profile|UnevenDolphin73> it's looking for any of the files:
None
quick update 1.0.2 will be ready in an hour, apologies π
I'm still unclear on why cloning the repo in use happens automatically for the pipeline task and not for component tasks.
I think in the pipeline it was the original default, but it turns out for a lot of users this was not their defualt use case ...
Anyhow you can also pass repo="."
which will load + detect the repo in the execution environemtn and automatically fill it in