PompousBeetle71 could you check that the "output:destination" is the same for both experiments ?
PleasantGiraffe85
it took the repo from the cache. When I delete the cache, it can't get the repo any longer.
what error are you getting ? (are we talking about the internal repo)
Hmm, this means the step should have included the git repo itself, which means the code should have been able to import the .py
Can you see the link to the git repository on the Pipeline step Task ?
Maybe WackyRabbit7 is a better approach as you will get a new object (instead of the runtime copy that is being used)
Ohh now I get it...
Wait a couple of hours, 0.16 is out today with trains-agent --stop flag ๐
JuicyFox94
NICE!!! this is exactly what I had in mind.
BTW: you do not need to put the default values there, basically it reads the defaults from the package itself trains-agent/trains and uses the conf file as overrides, so this section can only contain the parts that are important (like cache location credentials etc)
Yes, but I'm not sure that they need to have separate task
Hmm okay I need to check if this can be easily done
(BTW, the downside of that, you can only cache a component, not a sub-component)
RattySeagull0 I think you are correct, python 3.6 is the installed inside the docker. Is it important to have 3.7 ? You might need another docker (or change the installation script and install python 3.7 inside)
is some explanation of how functions become pipelines and components.
That is a good point! I'll make sure we mention it in the pipeline section of the docs
This whole experiment with random numbers started as my attempt at verifying that code in clearml.pipeline
Correct ๐ BTW: this is why we added PipelineDecorator.run_locally()
so it is easier to debug, you can also use PipelineDecorator.debug_pipeline()
to run the entire pipeline in a single process as python ...
It seems that I solved the problem by moving all of the local codeย (local repos) imports to after the Task.init
PunyPigeon71 I'm confused, how did that solve the issue on the remote machine?
BTW: What's the TF / Keras version?
So this is verry odd, it looks like a pip bug:
The agent is trying to install torch==2.1.0.*
because by default it ignores the 4th+ parts (they are unstable and torch have tendency to remove them) . and for some reason pip will not match 2.1.0.*
with for example "2.1.0.dev20230306+cu118"
but based on the docs it should work:
see here: None
As a workaround you can always edit and change to the final url for example: so ...
Any chance you actually run the second script with Popen (i.e. calling the python as a subprocess) ?
If you mean like Canary ? then yes, but only on KFserving baclend (coming soon), since the engines themselves do not support it (this is basically a "routing" feature)
there is probably some way to make an S3 path open up in the browser by default
You should have a pop-up asking for credentials ...
Could you check that if you add the credentials in the profile page it works ?
can i run a random task from a queue? like thisย
clearml-agent execute --id <TASK_ID>
ย or
ChubbyLouse32 This will just work out of the box ๐
No need to enqueue the Task, just reset it (in the UI)
Hm, one of the issues I have with this change is that now every dataset hat doesnโt have a semantic version cannot be loaded anymore
Okay we definitely need to solve that.
Any chance I can ask to open a github issue (just so we do not forget).
I will pass it quickly along so that we can maybe offer a fix in the next RC
yes that makes send, I think what happened is one of the processes completed the Task (i.e. closed it) before the others did, and so they threw exception.
I switched to have all tasks in a separate process
I think that's probably the best (performance wise as well), nice!
Okay that makes sense, if this is the case I'm assuming you have set the files server to point to your S3 bucket is that correct ?
could it be you are missing the credentials for that (it is trying to upload the preprocessing code there, so the clearml-serving container would be able to pull it later)
Hmm I think the approach in general would be to create two pipeline tasks, then launch them from a third pipeline or trigger externally? If on the other hand it makes sense to see both pipelines on the same execution graph, then the nested components makes a lot of sense. Wdyt?
Notice you have in the Path:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/sfi
But you should have:/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py/
Hi DepressedChimpanzee34
This is not a query call, this is a reporting call. see docs below
https://clear.ml/docs/latest/docs/references/api/workers#post-workersstatus_report
It is used by the worker to report its own status.
I think this is what you are looking for:
https://clear.ml/docs/latest/docs/references/api/workers#post-workersget_stats
(BTW: you can disable the auto-logging feature of joblib)Task.init(..., auto_connect_frameworks={'scikit': False})
so for example if there was an idle GPU and Q3 take it and then there is a task comes to Q2 which we specified 3GPU but now the Q3 is taken some of these GPU what will happen
This is a standard "race" the first one to come will "grab" the GPU and the other will wait for it.
I'm pretty sure enterprise edition has preemption support, but this is not currently part of the open source version (btw: also the dynamic GPU allocation, I think, is part of the enterprise tier, in the opensource ...
is there something else in the conf that i should change ?
I'm assuming the google credentials?
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
SmallBluewhale13 the final path is automatically generated, you only need to specify the bucket itself. By default it will be your "files_server"
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/docs/clearml.conf#L10
You can either change the configuration (which will make sure All uploaded artificats will always be there, including debug images etc.)
You can specify where you want the artifacts and debug images to be uploaded by setting:
https://allegro....
Xeon E3-1240: 4 - 5 hours!wow... yes definitely worth upgrading ๐