Reputation
Badges 1
662 × Eureka!Any leads TimelyPenguin76 ? I've also tried setting up a minio s3 bucket, but I'm not sure if the remote agent has copied the credentials and host 🤔
Not really - it will just show the string. A preview would be more like a low-res version of the uploaded image or similar.
Alternatively, it would be good to specify both some requirements and auto-detect 🤔
A follow up question (instead of opening a new thread), is there a way I could signal some files/directories to be copied to the execute_remotely
task?
Ah, you meant “free python code” in that sense. Sure, I see that. The repo arguments also exist for functions though.
Sorry for hijacking your thread @<1523704157695905792:profile|VivaciousBadger56>
Does it make sense to you to run several such glue instances, to manage multiple resource requirements?
Latest (1.5.1 I believe?), full log incoming, but it's like I've posted elsewhere already 🤔
It just sets up the environment and immediately crashes when trying to run the code.
The setup itself is done correctly.
Perfect now 👌 (also nice cleanup of default_new_data_root
duplicate code :D)
No that does not seem to work, I get
task.execute_remotely(queue_name="default")
2024-01-24 11:28:23,894 - clearml - WARNING - Calling task.execute_remotely is only supported on main Task (created with Task.init)
Defaulting to self.enqueue(queue_name=default)
Any follow-up thoughts, @<1523701070390366208:profile|CostlyOstrich36> , or maybe @<1523701087100473344:profile|SuccessfulKoala55> ? 🤔
Perfect, thanks for the answers Valeriano. These small stuff are missing from the documentation, but I now feel much more confident in setting this up.
Sorry AgitatedDove14 , forgot to get back to this.
I've been trying to convince my team to drop poetry 😄
Added the following line under volumes
for apiserver
, fileserver
, agent-services
:- /data/clearml:/data/clearml
Consider e.g:
# steps.py
class DataFetchingStep:
def __init__(self, source, query, locations, timestamps):
# ...
def run(self, queue=None, **kwargs):
# ...
class DataTransformationStep:
def __init__(self, inputs, transformations):
# inputs can include instances of DataFetchingStep, or local files, for example
# ...
def run(self, queue=None, **kwargs):
# ...
And then the following SDK usage in a notebook:
from steps imp...
I guess in theory I could write a run_step.py
, similarly to how the pipeline in ClearML works… 🤔 And then use Task.create()
etc?
Hey @<1523701070390366208:profile|CostlyOstrich36> , thanks for the reply!
I’m familiar with the above repo, we have the ClearML Server and such deployed on K8s.
What’s lacking is documentation regarding the clearml-agent helm chart. What exactly does it offer, etc.
We’re interested in e.g. using karpenter to scale our deployments per demand, effectively replacing the AWS autoscaler.
Setting the endpoint will not be the only thing missing though, so unfortunately that's insufficient 😞
Any thoughts @<1523701070390366208:profile|CostlyOstrich36> ?
I wouldn’t want to run the entire notebook, just a specific part of it.
Hey @<1537605940121964544:profile|EnthusiasticShrimp49> ! You’re mostly correct. The Step
classes will be predefined (of course developers are encouraged to add/modify as needed), but as in the DataTransformationStep
, there may be user-defined functions specified. That’s not a problem though, I can provide these functions with the helper_functions
argument.
- The
.add_function_step
is indeed a failing point. I can’t really create a task from the notebook because calling `Ta...
Yes, I’ve found that too (as mentioned, I’m familiar with the repository). My issue is still that there is documentation as to what this actually offers.
Is this simply a helm chart to run an agent on a single pod? Does it scale in any way? Basically - is it a simple agent (similiar to on-premise agents, running in the background, but here on K8s), or is it a more advanced one that offers scaling features? What is it intended for, and how does it work?
The official documentation are very spa...
We load the endpoint (and S3 credentials) from a .env
file, so they're not immediately available at the time of from clearml import Task
.
It's a convenience thing, rather than exporting many environment variables that are tied together.
I can elaborate in more detail if you have the time, but generally the code is just defined in some source files.
I’ve been trying to play around with pipelines for this purpose, but as suspected, it fails finding the definition for the pickled object…
The odd thing is that it was already defined, and then when I clicked an S3 link, it asked me to fill it in again, adding a duplicate credentials row
I am indeed