Hi @<1547028031053238272:profile|MassiveGoldfish6>
What is the use case? the gist is you want each component to be running on a different machine. and you want to have clearml do the routing of data and logic between.
How would that work in your use case?
SarcasticSquirrel56
if I configure manually the pods for the different nodes, how do I make clearml server aware that those agents exist?
Basically the agent register themselves on your cleaml-server, and they register on which Queue(s) they listen to. In other words the interface to choose the different types of machines/gpus is by enqueue the Task to different queues.
For example: Queue(1): "CUDA11_GPUx1" , Queue(2): "CUDA10_GPUx1"
Make sense ?
EDIT:
I guess to achieve what I w...
So it seems to get the "hint" from the type:
This will worktf.summary.image('toy255', (ex * 255).astype(np.uint8), step=step, max_outputs=10)
wdyt, should it actually check min/max and manually cast it ?
So good news (1) Dashboard is being worked on as we speak. (2) we released clearml-serving doing exactly that, the next release of clearml-serving will include integration with kfserving (under the hood) essentially managing the serving endpoints on top of the k8s cluster , wdyt?
Hi @<1523701079223570432:profile|ReassuredOwl55>
I want to kick off the pipeline and then check completion
outside
of the pipeline task. (edited)
Basically the pipeline is a Task (of a certain type).
You do the "standard" thing, you clone the pipeline Task, you enqueue it, and you wait for it's status
task = Task.clone(source_task="<pipeline ID here>")
Task.enqueue(task, queue_name=services)
task.wait_for_status(...)
wdyt?
Could you maybe send a screenshot? This is very strange? Also what's the trains version?
- I'm happy tp hear you found a work around
- Seems like there is something wrong with the way the pbtxt is being merged, but I need some more information
{'detail': "Error processing request: object of type 'NoneType' has no len()"}
Where are you seeing this error?
What are you seeing in the docker-compose log.
this topic is about the issue with reporting a configuration with a string inside a tuple that has backslash
So the encoding itself is done YAML style, and based on your example \b Has to be encoded to \b because this is string encoding, like \n will become "new line"
Make sense ?
I think this is the discussion you are after:
https://clearml.slack.com/archives/C01H5VAUZ8R/p1612452197004900?thread_ts=1612273112.002400&cid=C01H5VAUZ8R
Hi DisgustedDove53
Now for the clearml-session tasks, a port-forward should be done each time if I need to access the Jupyter notebook UI for example.
So basically this is why the k8s glue has --ports-mode.
Essentially you setup a k8s service (doing the ingest TCP ports) then the template.yaml that is used by the k8s glue should specify said service. Then the clearml-session knows how to access the actual pod, by a the parameters the k8s glue sets on the Task.
Make sense ?
ERROR: Could not install packages due to an EnvironmentError:
[Errno 28] No space left on device
BTW: @<1523703080200179712:profile|NastySeahorse61> this sounds like docker out of space on the Main disk '/var/` where it stores all the images and temp file systems
This will cause you code to fail as any runtime change to the container file system will raise this out of disk space error
ZanyPig66 is this reproducible? This sounds like a bug, whats the TB version and OS you rae using?
Is this example working for you (i.e. you see debug images)
https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
Can you post here the docker-compose.yml you are spinning? Maybe it is the wring one?
Step 4 here:
https://github.com/thepycoder/asteroid_example#deployment-phase
TrickyRaccoon92
I guess elegant is the challenge 🙂
What exactly is the use case ?
corporate firewall... let's start with http 🙂
Hi SubstantialElk6
noted that clearml-serving does not support Spacy models out of the box and
So this is a good point.
To add any pissing package to the preprocessing docker you can just add them in the following environment variable here: https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/docker/docker-compose.yml#L83EXTRA_PYTHON_PACKAGES="spacy>1"
Regrading a custom engine, basically this is supported with --engine custom
you c...
Multi-threaded multi-processes multi-nodes 🙂
Hi @<1523701295830011904:profile|CluelessFlamingo93>
What do you mean? what's the difference between ClearML server and self hosted? both are self hosted no?
Hi @<1535069219354316800:profile|PerplexedRaccoon19>
What do you mean by simulate?
You can manually setup and run a Task if you need,
'clearml-agent execute --id task_id' add --docker for docker mode.
This will setup the env and run the task
Yes the clearml-server AMI - we want to be able to back it up and encrypt it on our account
I think the easiest and safest way for you is to actually have full control over the AMI, and recreate once from scratch.
Basically any ubuntu/centos + docker and docker-compose should do the trick, wdyt ?
Hi @<1558986821491232768:profile|FunnyAlligator17>
What do you mean by?
We are able to
set_initial_iteration
to 0 but not
get_last_iteration
.
Are you saying that if your code looks like:
Task.set_initial_iteration(0)
task = Task.init(...)
and you abort and re-enqueue, you still have a gap in the scalars ?
Hi @<1555362936292118528:profile|AdventurousElephant3>
I think your issue is that Task supports two types of code,
- single script/jupyter notebook
- git repo + git diffIn your example (If I understand correctly) you have a notebook calling another notebook, which means the first notebook will be stored on the Task, but the second notebook (not being part of a repository) will not be stored on the task, and this is why when the agent is running the code it fails to find the second notebook....
Then the type hints are not removed from helper and the code immediately crashes when being run
Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved
Hi VexedKangaroo32 , funny enough this is one of the fixes we will be releasing soon. There is a release scheduled for later this week, right after that I'll put here a link to an RC containing a fix to this exact issue.
send the agent's logs to log management and monitoring service,
These are stored into ELK, it was built to store large amounts of logs, I cannot see any reason why one would want to remove it?
Maybe if there would be a way to change their format, it could also help filtering them from my side.
You mean in the UI?
Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633