Reputation
Badges 1
662 × Eureka!Okay this was a deep dive into clearml-agent code 😁
Took a long time to figure out that there was a specific Python version with a specific virtualenv that was old (Python 3.6.9 and Python 3.8 had latest virtualenv, but Python 3.7.5 had an old virtualenv).
Then the task requested to use Python 3.7, and that old virtualenv version was broken.
As a result -> Could the agent maybe also output the virtualenv
version used with setting up the environment for the first time?
I think I may have brought this up multiple times in different ways :D
When dealing with long and complicated configurations (whether config objects, yaml, or otherwise), it's often useful to break them down into relevant chunks (think hydra, maybe).
In our case, we have a custom YAML instruction !include
, i.e.
` # foo.yaml
bar: baz
bar.yaml
obj: !include foo.yaml
maybe_another_obj: !include foo.yaml `
Say I upload each of these yamls as a configuration object (as with the above). Once I try to load bar.yaml remotely it will crash, since foo.yaml is missing (and is instead a clearml configuration object).
Does that make sense?
Yes, exactly! I've added instructions for the users on creating their account and running clearml-init
, and then they run the snippet that updates the api and sdk sections.
Or did you mean I can couple a short "mini config" with the package and redirect clearml to use this local one (instead of the one at ~/clearml.conf)?
I'll have yet another look at both the latest agent RC and at the docker-compose, thanks!
There was no "default" services agent btw, just the queue, I had to launch an agent myself (not sure if it's relevant)
Hey @<1523701070390366208:profile|CostlyOstrich36> , thanks for the reply!
I’m familiar with the above repo, we have the ClearML Server and such deployed on K8s.
What’s lacking is documentation regarding the clearml-agent helm chart. What exactly does it offer, etc.
We’re interested in e.g. using karpenter to scale our deployments per demand, effectively replacing the AWS autoscaler.
Right and then for text (file path) use some regex or similar for extraction, and for dictionary simply parse the values?
We're using the example autoscaler, nothing modified
Also, creating from functions allows dynamic pipeline creation without requiring the tasks to pre-exist in ClearML, which is IMO the strongest point to make about it
I'm not sure about the intended use of connect_configuration
now.
I was under the assumption that in connect_configuration(configuration, name=None, description=None)
, the configuration
is only used in local execution.
But when I run config = task.connect_configuration({}, name='General')
(in remote execution), the configuration is set to the empty dictionary
There used to be a good example but it's now missing. I'm not sure what does Use only for automation (externally), otherwise use Task.connect_configuration
mean when e.g. looking at Task.set_configuration_object
, etc.
Could you clarify a bit, CostlyOstrich36 or AgitatedDove14 ?
First bullet point - yes, exactly
Second bullet point - all of it, really. The SDK documentation and the examples.
For example, the Task
object is heavily overloaded and its documentation would benefit from being separated into logical units of work. It would also make it easier for the ClearML team to spot any formatting issues.
Any linked example to github is welcome, but some visualization/inline code with explanation is also very much welcome.
Basically when running remotely, the first argument to any configuration (whether object or string, or whatever) is ignored, right?
That's exactly what I meant AgitatedDove14 🙂 It's just that to access that comparison page, you have to make a comparison first. It would be handy to have a link (in the side bar?) to an empty comparison
There's not much (or anything) in the log to provide...
` (.venv) 15:42 [0:user@server$~] CLEARML_CONFIG_FILE=~/agent_clearml.conf clearml-agent daemon --queue default on_prem --detached --order-fairness
Environment variables set from configuration: ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_DEFAULT_REGION']
...
Ah, you meant “free python code” in that sense. Sure, I see that. The repo arguments also exist for functions though.
Sorry for hijacking your thread @<1523704157695905792:profile|VivaciousBadger56>
Generally, really. I've struggled recently (and in the past), because the documentation seems:
Very complete wrt available SDK (though the formatting is sometimes off) Very lacking wrt to how things interact with one anotherA lot of what I need I actually find from pluging into the source code.
I think ClearML would benefit itself a lot if it adopted a documentation structure similar to numpy ecosystem (numpy, pandas, scipy, scikit-image, scikit-bio, scikit-learn, etc)
I see, okay that already clarifies some stuff, I'll dig a bit more into this then! Thanks!
Sorry, found it on my end!
@<1523701205467926528:profile|AgitatedDove14> this
My suspicion is that this relates to https://clearml.slack.com/archives/CTK20V944/p1643277475287779 , where the config file is loaded prematurely (upon import
), so our dotenv.load_dotenv()
call has not yet registered.
Honestly, this is all related to issue #340. The only reason we have this to begin with is because we need one separate "initializer" task that downloads the remote cache and prepares the agent environment for execution (downloading the configuration files, etc).
Otherwise it fits perfectly with pipelines, but we're not there yet.
In the local execution we don't have this initializer task, so we use Task.init()
before starting to work on a model, and task.close()
when we're done....
So basically what I'm looking for and what I have now is something like the following:
(Local) I have a well-defined aws_autoscaler.yaml
that is used to run the AWS autoscaler. That same autoscaler is also run with CLEARML_CONFIG_FILE=....
(Remotely) The autoscaler launches, listens to the predefined queue, and is able to launch instances as needed. I would run a remote execution task object that's appended to the autoscaler queue. The autoscaler picks it up, launches a new instanc...
I think you're looking for the execute_remotely
function?
Hey SuccessfulKoala55 ! Is the configuration file needed for Task.running_locally()
? This is tightly related with issue #395, where we need additional files for remote execution but have no way to attach them to the task other then using the StorageManager
as a temporary cache.
We're wondering how many on-premise machines we'd like to deprecate. For that, we want to see how often our "on premise" queue is used (how often a task is submitted and run), for how long, how many resources it consumes (on average), etc.
I don't think there's a PR issue for that yet, at least I haven't created one.
I could have a look at this and maybe make a PR.
Not sure what would the recommended flow be like though 🤔