Reputation
Badges 1
25 × Eureka!EFS get downloaded to the k8 pod local volume?
EFS is an Amazon service that mounts a persistent FS into ec2 instances, I believe they have support for k8s as a service as well, which would make it kind of like a PV only as a service.
Does that make sense ?
GiganticTurtle0
I think that what you are looking for is:param_dict = {'key': 1234} task.connect(param_dict, name='general')
Notice that when this code runs manually (i.e. not by the agent), the dict is stored on "general" parameter section of the Task.
But when the code is executed by the Agent, the opposite happens and the parameters from the "general" section of the Task or put back into the param_dict
, here the casting is done based on the type of the original values.
Generall...
Could it be you have two entries of "console_cr_flush_period" ?
Hi @<1692345677285167104:profile|ThoughtfulKitten41>
Is it possible to trigger a pipeline run via API?
Yes! a pipeline is at the end a Task, you can take the pipeline ID and clone and enqueue it
pipeline_task = Task.clone("pipeline_id_here")
Task.enqueue(pipeline_task, queue_name="services")
You can also monitor the pipeline with the same Task inyerface.
wdyt?
Is there any way to make that increment from last run?
pipeline_task = Task.clone("pipeline_id_here", name="new execution run here")
Task.enqueue(pipeline_task, queue_name="services")
wdyt?
Since you are running in venv mode, adding the OS environment before the clearml-agent, will basically make sure it will propagate to the process itself.
ReassuredTiger98 make sense ?
Hmm so is the problem having the gituser inside the code? or the k8s_glue print ?
Funny this was mentioned just today 🙂
https://clearml.slack.com/archives/CTK20V944/p1620664770492400
Basically if you want to always ignore the "installed packages" in your clearml.conf, put:agent.package_manager.force_repo_requirements_txt = true
Hmm I think everything is generated inside the c++ library code, and python is just an external interface. That means there is no was to collect the metrics as they are created (i.e. inside the c++ code), which means the only was to collect them is to actively analyze/read the tfrecord created by catboost 😞
Is there a python code that does that (reads the tfrecords it creates) ?
In that case when you create the Tasks for the step,do not specify any packages/requirements, then the agent will just use the "requirements.txt" from the repository.
If you need you can also specify them when you create the Task itself see https://github.com/allegroai/clearml/blob/912f6f5ba2328b26de042de03f02de5802df360f/clearml/task.py#L608
https://github.com/allegroai/clearml/blob/912f6f5ba2328b26de042de03f02de5802df360f/clearml/task.py#L609
So the "packages" are the packages you need in the steps themselves ?
Are you trying to upload an artifact post execution ?
I am thinking about just installing this manually on the worker ...
If you install them system wide (i.e. with sudo) and add agent.package_manager.system_site_packages
then they will always be available for you 🙂
And then also use
priority_optional_packages: ["carla"]
This actually means that it will always try to install the package clara
first, but if it fails, it will no raise an error.
BTW: this would be a good use case for dockers, just saying :w...
The idea of queues is not to let the users have too much freedom on the one hand and on the other allow for maximum flexibility & control.
The granularity offered by K8s (and as you specified) is sometimes way too detailed for a user, for example I know I want 4 GPUs but 100GB disk-space, no idea, just give me 3 levels to choose from (if any, actually I would prefer a default that is large enough, since this is by definition for temp cache only), and the same argument for number of CPUs..
Ch...
I am trying to see if the user can submit a list of resource requirements (e.g 4GPUs, 12 cores, 100GB diskspace) for the task when queuing the task and the agents pick these tasks if they have the requested resources. With this, the user need not think about which queue to send the task to. The users just state what they need and the agents do the scheduling for them.
Can I assume we are talking Kubernetes under the hood for the resource allocation ?
I am trying to see if the user can submit a list of resource requirements (e.g 4GPUs, 12 cores, 100GB diskspace)
This will be quite easy to implement using the cleamrl k8s glue, just use user-properties and change the template based on it. I can point to where you need to modify the code
I can definitely see your point from the "DevOps" perspective, but from the user perspective it put the "liability" on me to "optimize" the resource, which to me sounds a bit much to put on my tiny shoulders, I just have a general knowledge on what I need. For example lots of CPUs (because I know my process scales well with more cpus), or large memory (because I have an entire dataset in memory). Personally (and really only my personal perspective), I'd rather have the option to select from a...
Hi PunyGoose16 ,
I think the website is probably the easiest 🙂
https://clear.ml/contact-us/
I think they get back to quite quickly
Let me rerun the code and check
Okay there is some odd stuff going on in the backend, I'll check with backend guys tomorrow and update 🙂
JitteryCoyote63 I think that with 0.17.2 we stopped mounting the venv build to the host machine. Which means it is all stored inside the docker.
Now I suspect what happened is it stayed on another node, and your k8s never took care of that
Could you test with the latest "cleaml"pip install git+
Task.add_requirement(".") should be supported now 🙂
Hi @<1655744373268156416:profile|StickyShrimp60>
My hydra OmegaConf configuration object is not always being picked up, and I am unable to consistently reproduce it.
... I am using clearml v1.14.4,
Hmm how can we reproduce it? what are you seeing what it does "miss" the hydra, i.e. are you seeing any Hydra section? how are you running the code (manually , agent ?)
Let me check, which helm chart are you referring to ?
It can also work by running on multiple known nodes.
Horovod sits on top of openmpi that needs ssh to open multiple nodes, I'm not sure how one would connect it without passing the SSH keys from one node to the other, and making sure they can directly communicate. (Not saying it is not possible, but just a few things to configure before it works, the enterprise edition remove the need for the direct SSH connection between the nodes)
How would i add a glue for multinode?
Basic...
@<1523701868901961728:profile|ReassuredTiger98> what do you have in the clearml.conf under "conda_channels" ?
Is this it ?
None