Reputation
Badges 1
662 × Eureka!Unfortunately not, each task defines and constructs its own dataset. I want cloned task to save that link 🤔
I created a new task with the project name internal tests
, and no task name (so it's derived by ClearML).
The task was a simple print out.
The project does not appear in the project space and does not turn up on searches (the task does)
Basically when there are occasionally extreme values (i.e. most values fall in [0, 50] range, and one value suddenly falls in 50e+12 range), the plotting library (matplotlib or ClearML, unsure) hangs for a really long time
I'll have a look, at least it seems to only use from clearml import Task
, so unless mlflow changed their SDK, it might still work!
Sure! It looks like this
My suspicion is that this relates to https://clearml.slack.com/archives/CTK20V944/p1643277475287779 , where the config file is loaded prematurely (upon import
), so our dotenv.load_dotenv()
call has not yet registered.
Ah. Apparently getting a task ID while it’s running can cause this behaviour 🤔
We have a more complicated case but I'll work around it 😄
Follow up though - can configuration objects refer to one-another internally in ClearML?
BTW AgitatedDove14 following this discussion I ended up doing the regex way myself to sync these, so our code has something like the following. We abuse the object description here to store the desired file path.
` config_path = task.connect_configuration(configuration=config_path, name=config_fname)
included_files = find_included_files_in_source(config_path)
while included_files:
file_to_include = included_files.pop()
sub_config = task.connect_configuration(
configurat...
And last but not least, for dictionary for example, it would be really cool if one could do:my_config = task.connect_configuration(my_config, name=name) my_other_config = task.connect_configuration(my_other_config, name=other_name) my_other_config['bar'] = my_config # Creates the link automatically between the dictionaries
And task = Task.init(project_name=conf.get("project_name"), ...)
is basically a no-op in remote execution so it does not matter if conf
is empty, right?
After the task was initialized? 🤔
I'm not sure why internally ClearML tries to initialize a task when get_task
is called...
It does not 🙂
We started discussing it here - https://clearml.slack.com/archives/CTK20V944/p1640955599257500?thread_ts=1640867211.238900&cid=CTK20V944
You suggested this solution - https://clearml.slack.com/archives/CTK20V944/p1640973263261400?thread_ts=1640867211.238900&cid=CTK20V944
And I eventually found this solution to work - https://clearml.slack.com/archives/CTK20V944/p1641034236266500?thread_ts=1640867211.238900&cid=CTK20V944
Because setting env vars and ensuring they exist on the remote machine during execution etc is more complicated 😁
There are always ways around, I was just wondering what is the expected flow 🙂
JitteryCoyote63 please do not get used to it :D there's an open ticket/feature request to either revert this or let the user/server choose the most comfortable way
For now this is okay - no data lost, really - but I'd like to make sure we're not missing any steps in the next upgrade
It's a small snippet that ensures identically named projects are still unique'd with a running number.
Added the following line under volumes
for apiserver
, fileserver
, agent-services
:- /data/clearml:/data/clearml
Does it make sense to you to run several such glue instances, to manage multiple resource requirements?
Perfect, thanks for the answers Valeriano. These small stuff are missing from the documentation, but I now feel much more confident in setting this up.
Yes, I’ve found that too (as mentioned, I’m familiar with the repository). My issue is still that there is documentation as to what this actually offers.
Is this simply a helm chart to run an agent on a single pod? Does it scale in any way? Basically - is it a simple agent (similiar to on-premise agents, running in the background, but here on K8s), or is it a more advanced one that offers scaling features? What is it intended for, and how does it work?
The official documentation are very spa...
Maybe @<1523701827080556544:profile|JuicyFox94> can answer some questions then…
For example, what’s the difference between agentk8sglue.nodeSelector
and agentk8sglue.basePodTemplate.nodeSelector
?
Am I correct in understanding that the former decides the node type that runs the “scaler” (listening to the given agentk8sglue.queue
), and the latter for any new booted instance/pod, that will actually run the agent and the task?
Read: The former can be kept lightweight, as it does no...
We’re using karpenter
(more magic keywords for me), so my understanding is that that will manage the scaling part.
But... Which queue does it listen to, and which type of instances will it use etc