Okay so the way it works is that it moves all the logging to background process, But if you have a Lot of data, actually pushing the data between python processes is Not very efficient. This line basically tells it to just use background thread (instead of background process), for sending the data to the server.
The idea behind using background process in the first place is to better support pytorch workers that spin a lot of subprocesses, and we do not want to add a thread per process and in...
Actually that is less interesting, as it is quite straight forward
a task of queue B if the next task is of type A it will have to wait,
It seems you imply there are two types of Tasks and they need to be executed one after the other ?
ERROR: torch-1.12.0+cu102-cp38-cp38-linux_x86_64.whl is not a supported wheel on this platform
TartBear70 could it be you are running on a new Mac M1/2 ?
Also quick question, any chance you can test with the latest RC?pip3 install clearml-agent==1.3.1rc6
VirtuousFish83 is the exit(1) called from the main process or a subprocess? Are you running it with an agent?
Task.connect is "automagic" i.e. to server when in Manual mode, from server in agent mode,
set_parameter is one way only and should be used to set an external Task's parameters.
BTW: if you could implement _AzureBlobServiceStorageDriver
with the new Azure package, it will be great:
Basically update this class:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L1620
That is awesome!
If you feel like writing a bit about the use-case and how you solved it, I think AnxiousSeal95 will be more than happy to publish something like that 🙂
Is task.parent something that could help?
Exactly 🙂 something like:# my step is running here the_pipeline_task = Task.get_task(task_id=task.parent)
This should have worked with the latest clearml RC.
And you verified it is not working?
I think there is a bug on the UI that causes series with "." to only use the first part of the series name for the color selection. This means "epsilon 0" and "epsilon 0.1" will always get the same color, and this will explain why it works on other graphs
BroadMole98 Awesome, can't wait for your findings 🙂
I guess I got confused since the color choices in
One of the most beloved features we added 🙂
Hi @<1533257411639382016:profile|RobustRat47>
sorry for the delay,
Hi when we try and sign up a user with github.
wait, where are you getting this link?
That makes sense to me, what do you think about the following:
` from clearml import PipelineDecorator
class AbstractPipeline(object):
def init():
pass
@PipelineDecorator.pipeline(...)
def run(self, run_arg):
data = self.step1(run_arg)
final_model = self.step2(data)
self.upload_model(final_model)
@PipelineDecorator.component(...)
def step1(self, arg_a):
# do something
return value
@PipelineDecorator.component(...)
def step2(self, arg_b):
# do ...
Basically the links to the file server are saved in both mongo and elastic, so as long as these are host:ip based, at least in theory it should work
Hi PanickyMoth78
it was uploading fine for most of the day but now it is not uploading metrics and at the end
Where are you uploading metrics to (i.e. where is the clearml-server) ?
Are you seeing any retry logging on your console ?packages/clearml/backend_interface/metrics/reporter.py", line 124, in wait_for_events
This seems to be consistent with waiting for metrics to be flushed to the backend, but usually you will see retry messages on your console when that happens
We might need to change the default base docker image, but I remember it was there... Let me check again
I want pipeline / task dispatch to be reported and monitored outside of clearml. For example, I might want to log the dispatch event in some non-clearml system and then monitor the health of the pipeline and alert if if it is pending for too long.Hmm interesting, so like a callback?!
I'm thinking a callback is being executed after the Pipelines is sent, but once the callback is done, the pipeline process leaves?
Does that make sense ?
I might want to dispatch other jobs from within the same p...
I remember there were some issues with it ...
I hope not 😞 Anyhow the only thing that does matter is the auto_connect arguments (meaning if you want to disable some, you should pass them when calling Task.init)
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/.clearml/cache/storage_manager/datasets/.lock.000.ds_38e9acc8d56441999e806815abddee82.clearml'
Let me check this issue, it seems like the locking mechanism should have figured that there is no lock...
BTW: the new pipeline decorator interface example is here:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
With k8s glue going, want to finally look at clearml-session and how people are using it.
If used with k8s glue, you will have to run the glue with --ports-mode, then the clearml session will know how to connect to container itself, since at runtime the container will register the gateway + port for the learml-session client to connect to
hmm that is odd.
Can you send the full log ?
Not yet 😞
It should not be complex to implement,
The actual aws auto scaler class is implementing just two functions:
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L104
def spin_down_worker(self, instance_id):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L...
think this is because of the version of xgboost that serving installs. How can I control these?
That might be
I absolutely need to pin the packages (incl main DS packages) I use.
you can basically change CLEARML_EXTRA_PYTHON_PACKAGES
https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/docker-compose-triton-gpu.yml#L100
for example:export CLEARML_EXTRA_PYTHON_PACKAGES="xgboost==1.2.3 numpy==1.2.3"
so for example if there was an idle GPU and Q3 take it and then there is a task comes to Q2 which we specified 3GPU but now the Q3 is taken some of these GPU what will happen
This is a standard "race" the first one to come will "grab" the GPU and the other will wait for it.
I'm pretty sure enterprise edition has preemption support, but this is not currently part of the open source version (btw: also the dynamic GPU allocation, I think, is part of the enterprise tier, in the opensource ...