
Reputation
Badges 1
166 × Eureka!There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)
here is the log from the failing component:File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/utilities/locks/portalocker.py", line 140, in lock fcntl.flock(file_.fileno(), flags) BlockingIOError: [Errno 11] Resource temporarily unavailable
I have tried this several times now. Sometimes one runs an the other fails and sometimes both fail with this same error
Unfortunately, waiting a while did not make this go away 🙂
I get the same error with those added lines
Yes.
Some mechanism that would allow for followup code execution. Ideally in a way that would not be susceptible to the same things that may cause a task to fail.
I tried the first option and it worked 🙂 🙏
Oh sure, use
they will be visible on the Dataset page on the version in question
That sounds simple enough.
Though I imagine I'd need to explicitly report every figure. Correct?
I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.
Thanks TimelyPenguin76 .
From your reply I understand that I have control over what the destination is but that all files generated in a task get transferred regardless of the return_values
decorator argument. Is that correct? Can I disable auto-save of artifacts?
Ideally, I'd like to have better control over what gets auto-saved. E.g. I'm happy for tensorboard events to be captured and shown in clearml and for matplotlib figures to be uploaded (perhaps to gcs) but I'd like to avoid ...
I found that instead of returning some_returned_url
(which triggers zipping and saving of the filed under that url), I can wrap it in a dict: {"the url": some_returned_url}
which then lets me pass back the url to the pipeline and only that dict gets uploaded (e.g. {'run_datasets_path': Path('/data/my_datasets_path/run_id_1')}
) I can divert all files that I do want uploaded and tracked by clearml to gs://
by adding at start of task-fuction: ` Logger.current_logger().se...
essentially, several running processes were performing:model_evals_dataset = Dataset.get( dataset_project=dataset_project, dataset_name=f"model_evals", ) model_evals_dataset.add_files(run_eval_path) model_evals_dataset.upload()
the same occures when I run a single training component instead of two
Note that if I change the component to return a regular meaningless string - "mock_path"
, the pipeline completes rather quickly and the dataset is not uploaded.
I imagine that these phantom dependencies will prevent parallelization. Is there a workaround?
Sure. It is a minor change from the code in the clearml examples for pipelines.
I just repeat the last two pipeline steps from that code in a loop (x3)
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
switching back to version 1.6.2. cleared this issue (but re-introduced others for which I have been using the release candidate)
multi_instance_support=True
lets me run the pipeline again 👍
The second run prints out the same (non) "random" numbers as the first run
I'm connecting to the hosted clear.ml
packages in use are:# Python 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] clearml == 1.6.2 fastai == 2.7.5
in case it matters, I'm running this code in a jupyter notebook within a docker container (to keep things vell isolated). The /data
path is volume mapped to my local filesystem (and, in fact, already contains the dataset files, so the fastai call to untar_data should see the data there and return immediately)
That same make_data fu...
I'll try a more carefully checked run a bit later but I know it's getting a bit late in your time zone
I suppose one way to perform this is with a https://clear.ml/docs/latest/docs/references/sdk/scheduler that kicks off a health check task (check exit state of executed tasks). It seems more efficient to support a triggered response to task fail.
Ooh nice.
I wasn't aware task.models["output"]
also acts like a dict.
I can get the one I care about in my code with something like task.models["output"]["best_model"]
however can you see the inconsistency between the key and the name there:
For componenttask=Task.current_task()
Will get me the task object. (right?)
This does not work for pipeline. Is pipeline a task?
Edit: The same works for pipeline
I think this should be a valid use of pipelines. for example - at some step I choose to sweep across several values of some parameter and the rest of the steps are duplicated for each value of that parameter.
The additional edges in the graph suggest that these steps somehow contain dependencies that I do not wish them to have.
Something else that I feel is missing from the docs regarding pipelines, as someone who has given kubeflow pipelines a try (in the http://vertex.ai pipelines environment), is some explanation of how functions become pipelines and components.
More specifically, I've learned to watch out for kubeflow pipeline code which is run at definition time (at compilation time, to be more accurate) instead of at pipeline execution time.
This whole experiment with random numbers started as my attempt ...
Restarting the autoscaler, instances and a running single pipeline - I still get the same error.clearml.utilities.locks.exceptions.LockException: [Errno 11] Resource temporarily unavailable