OHH nice, I thought that it just some kind of job queue on up and running machines
It's much more than that, it's a way of life 🙂
But seriously now, it allows you to use any machine as part of your cluster, and send jobs for execution from the web UI (any machine, even just a standalong GPU machine under your desk, or any cloud GPU instance any mixing the two together:)
Maybe I need to change something here:
apiserver.conf
Not sure, I'm still waiting on answer... It...
(It would be nice to have all the Pypi releases tagged in github btw)
I wanted to say, we listen ... and point to the tag , but for some reason it was not pushed LOL.
'config.pbtxt' could not be inferred. please provide specific config.pbtxt definition.
This basically means there is no configuration on how to serve the mode, i.e. size/type of lower (input) layer and output layer.
You can wither store the configuration on the creating Task, like is done here:
https://github.com/allegroai/clearml-serving/blob/b5f5d72046f878bd09505606ca1147d93a5df069/examples/keras/keras_mnist.py#L51
Or you can provide it as standalone file when registering the mo...
One option is definitely having a base image that has the things needed. Anything else? Thanks!
This is a bit complicated, to get the cache to kick in you have to mount an NFS file into the pod as the cache (to create a persistent cache)
Basically, spin NFS pod to store the cache, change the glue job template yaml to mount it into the pod (see default cache folders:
/root/.cache/pip and /root/.clearml/pip-download-cache)
Make sense ?
Hi ItchyJellyfish73
You can always archive a Task/Model even when published
In the UI you can right-click and choose archive.
From code you need to add a system tag "archived"from clearml import Task t = Task.get_task(task_id='aabb') t.set_system_tags(t.get_system_tags() + ['archived'])
And similarly for Model(model_id='aabb')
So, what I am referring to is the ability of a system to allow some rigor and robustness of tracking of experiments, and also enforcing some thoughts on how things might be deployed, early on in the development process, whilst not being overly prescriptive and cumbersome
I'm cannot agree more!!
VivaciousPenguin66 We are working on trying to better understand how to solve this very delicate act of balance and offer some sort of "JIRA" for ML.
If this is okay with you, once product pe...
Hi, I would like to understand how I can set the pip cache location for my agent,
ClumsyElephant70 by default the pip cache (and all other cache folders) are mounted back into the host itself ~/.clearml/
I'm assuming the idea is shared cache, if this is the case, do:docker_pip_cache = ~/my_shared_nfs/pip-cache
https://github.com/allegroai/clearml-agent/blob/e3e6a1dda81bee2dd20a64d09746568e415f1823/docs/clearml.conf#L139
In theory, one could go over previously executed tasks, and create a copy of a specific scalar metric.
ShallowCat10 does that make sense in your scenario ?
but never executes/enqueues them (they are all in
Draft
mode).
All pipeline steps are not enqueued ?
Is the pipeline controller itself running?
Is this like a local minio?
What do you have under the sdk/aws/s3 section
?
From the docs I think what's going on is that the https://opennmt.net/OpenNMT-tf/package/opennmt.Runner.html#opennmt.Runner.train is spinning a new subprocess, and the training itself happens on the subprocess.
If this is the case this will explain the lack of automagic, as the subprocess is lacking the "Task.init" call
wdyt, could that be the case ?
👍
Okay But we should definitely output an error on that
Thanks TrickyRaccoon92
I think it's about time we remove the survey link anyhow 🙂
I'll make sure it happens ..,
Exactly, just pointing to the fact that, that machine is yours ;)
Hi @<1694157594333024256:profile|DisturbedParrot38>
The dataset ID is also the task ID :)
Hi SoreHorse95
I am exploring hiding our clearml server behind
Do you mean add additional reverse proxy to authenticate clearml-server from outside ?
UnevenDolphin73fatal: could not read Username for '
': terminal prompts disabled .. fatal: clone of '
' into submodule path '/root/.clearml/vcs-cache/xxx.60db3666b11ac2df511a851e269817ef/xxx/xxx' failed
It seems it tries to clone a submodule and fails due to to missing keys for the submodule.
https://stackoverflow.com/questions/7714326/git-submodule-url-not-including-username
wdyt?
Hi SarcasticSparrow10 , so yes it does, this is more efficient when using pytorch loaders, and in some other situations.
To disable it add to your clearml.conf:sdk.development.report_use_subprocess = false
2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)
this?ids = [t.id for t in top_task]
I am using importlib and this is probably why everythings weird.
Yes that will explain a lot 🙂
No worries, glad to hear it worked out
FlutteringWorm14 an RC is out (1.7.3dc1) with the ability to configure from clearml.conf
you can now setsdk.development.worker.report_event_flush_threshold
from clearml.conf
RoughTiger69
Apparently,
, doesn’t populate that dict with
any keys that don’t already exist in it
.
Are you saying new entries are not added to the Dict even if they are on the Task (i.e. only entries that already exist on the dict are populated ?
But you already have all the entries defined here:
https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L22
Since all this is ha...
I just cloned it from the examples that are available in the SaaS console upon account creation
Ohhh! that would explain it. Maybe it is broken there?! let me check a second
Hi RoundMosquito25
however they are not visible either in:
But can you see them in the UI?
Hi SharpHedgehog60
Task type is another way to declare the type of processing the Task performs.
Later you can filter based on the Task type (like you would with a Tag).
For example Datasets are always of a Type "data processing"
I guess it won’t due to the nature of services?
Correct, k8s glue works differently, that said I would actually use the helm to spin a pod woth the agent in services mode and venv mode.
I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)
Yes in the UI, clone or reset the Task, then youcan edit the installed packages section under the Execution tab
I see. If you are creating the task externally (i.e. from the controller), you should probably call. task.close() it will return when everything is in order (including artifacts uploaded, and other async stuff).
Will that work?