Reputation
Badges 1
166 × Eureka!no retry mesages
CLEARML_FILES_HOST is gs
CLEARML_API_HOST is a self hosted clearml server (in google compute engine).
Note that earlier in the process the code uploads a dataset just fine
thanks KindChimpanzee37 . Where is that minimal example to be found?
It seems to be doing ok on the app side:
I didn't realise Datasets had tasks associated with them but there is one and it seems to be doing ok.
I've attached it's log file which only mentions skipping one file (a warning)
BTW:
If I try to find the right model in the
task.models["output"]
(this time there is just one but in my code there may be several) it appears with the
(see other attached screenshot).
What would make sense here ? (I have to be honest I'm not sure).
If the model was saved with a file name (is that the trigger for auto-upload?), I think it makes sense for the model name to match the file name (not the task name), especially when there may be ...
My local environment has clearml version 1.6.3rc0
and agents in aws were started with the AWS Autoscaler which has no explicit place for google credentials.
I see a place for Additional ClearML Configuration
in the AWS autoscaler UI which I suspect may help but I don't see how I can pass a secrets file along with my agent.
I'll try and reproduce this in simpler code
I now get this error:2022-07-18 21:51:29,168 - clearml.storage - ERROR - Failed creating storage object
Reason: [Errno 2] No such file or directory: '~/gs.cred'
to be clear, I replaced <this is your GCP storage credentials file>
with the contents of that file, escaping every "
with a \"
and removing newlines.
on the same topic. What if (I were able to iterate and) I wanted the pipelines calls to be blocking so that the next pipeline executes only after the previous one completes?
Yes.
Some mechanism that would allow for followup code execution. Ideally in a way that would not be susceptible to the same things that may cause a task to fail.
sys.path.insert(0, "/src/clearml_evaluation/")
is actually left-over code from when I was making things run locally (perhaps prior to connecting to github repo) but I think that adding a non-existent path to the system path would be benign
actually, re-running pipeline_from_decorator.py
a second time (and a third time) from the command line seem to have executed without the that ValueError so maybe that issue was some fluke.
Nevertheless, those runs exit prior to lineprint('process completed')
and I would definitely prefer the command executing_pipeline
to not kill the process that called it.
For example, maybe, having started the pipeline I'd like my code to also report having started the pipeline to som...
For componenttask=Task.current_task()
Will get me the task object. (right?)
This does not work for pipeline. Is pipeline a task?
Edit: The same works for pipeline
I had several pipeline components getting it and uploading files to is concurrently.
Can Datsets handle that?
perhaps anecdotal but just calling random.seed()
will set the seed using the system time for you
https://docs.python.org/3/library/random.html#random.seed
Hi John. sort of. It seems that archiving pipelines does not also archive the tasks that they contain so /projects/lavi-testing/.pipelines/fastai_image_classification_pipeline
is a very long list..
there may have been some interaction between the training task and a preceding dataset creation task :shrug:
To be specific there is "model name" which is not unique , and there is model-key which is unique to the Task
not sure why the two fields don't simply match. I guess that there may be situations where file name (without the full path) may be used several times.
erm,
this parallelization has led to the pipeline task issuing a bunch of:model_path/run_2022_07_20T22_11_15.209_0.zip , err: [Errno 28] No space left on device
and quitting on me.
my train_image_classifier_component
is programmed to save model files to a local path which is returned (and, thanks to clearml, the path's contents are zipped uploded to the files service).
I take it that these files are also brought into pipeline tasks's local disk?
Why is that? If that is indeed what...
Hi TimelyPenguin76
Thanks for working on this. The clearml gcp autoscaler is a major feature for us to have. I can't really evaluate clearml without some means of instantiating multiple agents on GCP machines and I'd really prefer not to have to set up a k8 cluster with agents and manage scaling it myself.
I tried the settings above with two resources, one for default queue and one for the services queue (making sure I use that image you suggested above for both).
The autoscaler started up...
I'll try a more carefully checked run a bit later but I know it's getting a bit late in your time zone
first, thanks for having these discussions. I appreciate this kind of support is an effort 🙏
Yes. i perfectly understand that once a pipeline job (or a task) is sent off in this manner, it executes separately (and, most likely in a different machine) from the process that instantiated it.
I still feel strongly that such a command should not be thought of as a fire and exit operation. I can think of several scenarios where continued execution of the instantiating process is desired:
I ...
I imagine that one workaround is to
Disable automatic model uploads Perform manual model upload (with the correct name).Can you point me to how to do these?
Thanks AgitatedDove14 for all the guidance.
also - are there plans for the pipeline view to show artefacts (as in - links to things returned from components)
What I think would be preferable is that the pipeline be deployed and that the python process that deployed it were allowed to continue on to whatever I had planned for it to do next (i.e. not exit)