Reputation
Badges 1
662 × Eureka!FWIW, we prefer to set it in the agent’s configuration file, then it’s all automatic
I dunno :man-shrugging: but Task.init is clearly incompatible with pytest and friends
This is with:Task.set_offline_mode(True) task = Task.init(..., auto_connect_streams=False)
Maybe @<1523701827080556544:profile|JuicyFox94> can answer some questions then…
For example, what’s the difference between agentk8sglue.nodeSelector
and agentk8sglue.basePodTemplate.nodeSelector
?
Am I correct in understanding that the former decides the node type that runs the “scaler” (listening to the given agentk8sglue.queue
), and the latter for any new booted instance/pod, that will actually run the agent and the task?
Read: The former can be kept lightweight, as it does no...
I can also do this via Mongo directly, but I was hoping to skip the K8S interaction there.
UPDATE: Apparently the quotation type matters for furl
? I switched the '
to \"
and it seems to work now
We’re using karpenter
(more magic keywords for me), so my understanding is that that will manage the scaling part.
Yes, I’ve found that too (as mentioned, I’m familiar with the repository). My issue is still that there is documentation as to what this actually offers.
Is this simply a helm chart to run an agent on a single pod? Does it scale in any way? Basically - is it a simple agent (similiar to on-premise agents, running in the background, but here on K8s), or is it a more advanced one that offers scaling features? What is it intended for, and how does it work?
The official documentation are very spa...
Perfect, thanks for the answers Valeriano. These small stuff are missing from the documentation, but I now feel much more confident in setting this up.
Much much appreciated 🙏
- in the second scenario, I might have not changed the results of the step, but my refactoring changed the speed considerably and this is something I measure.
- in the third scenario, I might have not changed the results of the step and my refactoring just cleaned the code, but besides that, nothing substantially was changed. Thus I do not want a rerun.Well, I would say then that in the second scenario it’s just rerunning the pipeline, and in the third it’s not running it at all 😄
(I ...
I see that the GUI AutoScaler is only in the paid version, wonder why the GCP driver is not open source?
Follow-up question/feature request (out of interest) - could the WebUI show the matching commit message?
IIRC, get_local_copy()
downloads a local copy and returns the path to the downloaded file. So you might be interested in e.g.local_csv = pd.read_csv(a_task.artifacts['train_data'].get_local_copy())
With the models, you're looking for get_weights()
. It acts the same as get_local_copy()
, so it returns a path.
EDIT: I think also get_local_copy()
for a model should work 👍
@<1523701070390366208:profile|CostlyOstrich36> I added None btw
AgitatedDove14
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
Ah, then no, sounds temporary. If they're only relevant when starting the process though, I would suggest deleting them immediately when they're no longer needed, and not wait for the end of the task (if possible, of course)
It's a small snippet that ensures identically named projects are still unique'd with a running number.
From the traceback ( backend_interface/task/task.py, line 178, in __init__
), notice it's not Task.init
It's not exactly "debugging", but rather a description of the generated model/framework (generated with pygraphviz).
I'm working on the config object references 😉
So basically what I'm looking for and what I have now is something like the following:
(Local) I have a well-defined aws_autoscaler.yaml
that is used to run the AWS autoscaler. That same autoscaler is also run with CLEARML_CONFIG_FILE=....
(Remotely) The autoscaler launches, listens to the predefined queue, and is able to launch instances as needed. I would run a remote execution task object that's appended to the autoscaler queue. The autoscaler picks it up, launches a new instanc...
Ah, you meant “free python code” in that sense. Sure, I see that. The repo arguments also exist for functions though.
Sorry for hijacking your thread @<1523704157695905792:profile|VivaciousBadger56>
Yeah I managed to work around those former two, mostly by using Task.create
instead of Task.init
. It's actually the whole bunch of daemons running in the background that takes a long time, not the zipping.
Regarding the second - I'm not doing anything per se. I'm running in offline mode and I'm trying to create a dataset, and this is the error I get...
There is a data object it, but there is no script object attached to it (presumably again because of pytest?)
If I set the following:"extra_clearml_conf": "sdk.aws.s3.credentials = [\n{\nhost: 'ip:9000'\nkey: 'xxx'\nsecret: 'xxx'\nmultipart: false\nsecure: false\n},\n{\nhost: 'ip2:9000'\nkey: 'xxx'\nsecret: 'xxx'\nmultipart: false\nsecure: false\n}\n]"
I run into a weird furl
error:ValueError: Invalid port '9000''.
Yes exactly that AgitatedDove14
Testing our logic maps correctly, etc for everything related to ClearML
Of course Im using report_table
in the above; it seems the support for Pandas DataFrame does not include support for MultiIndex
other than by concatenating the indices together
That's fine (as in, it works), but it looks a bit weird and defies the purpose of a MultiIndex
🤔 Was wondering if there are plans to add better support for it
I would expect the service to actually implicitly inject it to new instances prior to applying the user's extra configuration 🤔
Heh, my bad, the term "user" is very much ingrained in our internal way of working. You can think of it as basically any technically-inclined person in your team or company.
Indeed the options in the WebUI are too limited for our use case, so we're developed "apps" that take a yaml configuration file and build a matching pipeline.
With that, our users do not need to code directly, and we can offer much more fine control over the pipeline.
As for the imports, what I meant is that I encounter...
-ish, still debugging some weird stuff. Sometimes ClearML picks ip
and sometimes ip2
, and I can't tell why 🤔