Hi ElegantCoyote26
sometimes the agents load an earlier version of one of my libraries.
I'm assuming some internal package that is installed from a wheel file not a direct git repo+commit link ?
great π
two things:
I'm not sure argparse supports dict as a type (I mean it will take anything but I'm not sure it will parse your arguments as dict) I know there was an issue with argparsing, but I think it was solvedbtw: Basically the way clearml-agent works, it does not actually pass the arguments in commandline but directly to the argparser at runtime
What happens if you clone the Task (the one with Args showing and without the explicit task.connect(_args)
and send it to the age...
PreciousParrot26 I think this is really a matter of the CI process having very limited resources. just to be clear, you are correct and the steps them selves are Not executed inside the CI environment, but it seems that even running the pipeline logic is somehow "too much" for the limited resources... Make sense ?
OSError: [Errno 28] No space left on device
Hi PreciousParrot26
I think this says it all π there is no more storage left to run all those subprocesses
btw:
I am curious about why a
ThreadPool
of
16
threads is gathered,
This is the maximum simultaneous jobs it will try to launch (it will launch more after the launching is doe, notice not the actual execution) but this is just a way to limit it.
controller_object.start_locally()
. Only the pipelinecontroller should be running locally, right?
Correct, do notice that if you are using Pipeline decorator and calling run_locally()
the actual the pipeline steps are also executed locally.
which of the two are you using (Tasks as steps, or functions as steps with decorator)?
Hi TrickyFox41
Hey since Hydra does not work with
clearml-task
I should shouldn't it? what does not work ?
Hi DrabCockroach54
I think the Kubernetes integration (k8s glue) is not part of the open-source features, and is only available as enterprise feature π
Xeon E3-1240: 4 - 5 hours!wow... yes definitely worth upgrading π
This is odd, how are you spinning clearml-serving ?
You can also do it synchronously :
predict_a = self.send_request(endpoint="/test_model_sklearn_a/", version=None, data=data)
predict_b = self.send_request(endpoint="/test_model_sklearn_b/", version=None, data=data)
Hi LovelyHamster1
Could you think of a toy code that reproduces this issue ?
It uses only one CPU core, could I use multiprocessing somehow?
Hi EcstaticMouse10
Hmm, yes it should be multi core:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/clearml/datasets/dataset.py#L1175
wdyt?
Hi @<1566596960691949568:profile|UpsetWalrus59>
All correct with the exception of " ...or 1GB Metric" this is a limit, since metrics (and meta data) is always stored on the clearml-server, so it is metered. There is also an API limit, basically anti abuse, which of course resets every month, but if you are running tens of experiments at the same time you will hit this limit. Make sense ?
Switching to process Pool might be a bit of an overkill here (I think)
wdyt?
The api server by default spins multiple processes (they all might be busy a tye time with a huge flood of requests, but this is still multi process). Let me check if there is an easy way to set more processes
if I run my own ClearML self-hosted server?
Then you have everything on your end, it will not communicate with the saas offering. meaning no limits what so ever.
(That said some of the cloud auto-scaling and compute features are not part of the open source)
Registering some metadata as a model doesnβt feel correct to me.
Yes I'm with you π
BTW what kind of meta-data would need versions during the life time of a Task ?
SmallBluewhale13 in your code what are you getting when you print the version:from clearml import __version__ print(__version__)
Have to get glue setup, which I couldnβt understand fully, so thatβs a different topic
I suggest using the apply template setup (basically you provide a Job/Service template, and it uses that to setup k8s jobs based on the Tasks coming in from the specific queue)
Hi NonsensicalSeaanemone47
I'm assuming you mean k8s as compute cluster?
If so, then yes clearml adds priority scheduling on top of your existing kl8s cluster. It also allows you to reuse images as the k8s spins the base container image and then inside the container image the agent sets the environment of the experiment (clones code, apply diff, install missing python packages etc.)
It also gives visibility into the executed pods.
Make sense ?
from what I gather there is a lightly documented concept
Yes ... π the reason for it is that actually one could do:
` @PipelineDecorator.pipeline(...)
def pipeline(i):
....
if name == 'main':
pipeline(0)
pipeline(1)
pipeline(2) `Basically rerunning the pipeline 3 times
This support was added as some users found a use case for it, but I think this would be a rare one
Hi RoughTiger69
Interesting question, maybe something like:
` @PipelineDecorator.component(...)
def process_sub_list(things_to_do=[0,1,2]):
r = []
for i in things_to_do:
print("doing", i)
r.append("done{}".format(i))
return r
@PipelineDecorator.pipeline(...)
def pipeline():
create some stuff to do:
results = []
for step in range(10):
r = process_sub_list(list(range(step*10, (step+1)*10)))
results.append(r)
push into one list with all result, this will ac...
1
One reason I don't like using the configuration section is that it makes debugging much much harder.
debugging ? please explain how it relates to the configuration, and presentation (i.e. preview)
2.
Yes in theory, but in your case it will not change things, unless these "configurations" are copied on any Task (which is just storage, otherwise no real harm)
3.
I was thinking "zip" file that the Task creates and uploads, and a new configuration type, say "external/zip" , and in the c...
I was using clearml == 0.17.5 and I also had this issue
I think it was introduced when we moved to subprocess reporting, with 0.17.5
You can disable it with the following in clearml.conf:sdk.development.report_use_subprocess = false
could one also limit the number of CPU cores available?
If you are running in docker mode you can add:--cpus=<value>
see ref here: https://docs.docker.com/config/containers/resource_constraints/
Just add it to extra_docker_arguments
:
https://github.com/allegroai/clearml-agent/blob/2cb452b1c21191f17635bcb6222fa8bfd82afe29/docs/clearml.conf#L142
An easier fix for now will probably be some kind of warning to the user that a task is created but not connected
That is a good point, maybe if you do not have a "main" Task, then we print the warning (with some flag to disable the warning) ?
hmm this might help:
https://pip.pypa.io/en/stable/topics/configuration/#environment-variables
basically you might be able to define:PIP_NO_USE_PEP517=1
Great!
I'll make sure the agent outputs the proper error π