Reputation
Badges 1
25 × Eureka!So the way it will work, is you will also need to have a Task.init in main process (the one using the launch function) and the same Task.init in the main_func. What it does is it signals the sub processes to use the main process task. This way they all report to the same task. Obviously to test it you will need to wait for the RC (after the weekend :)
Let me try to build a minimal reproducible version
Thank you!
Then what happens is thatΒ
Task.current_task()
Β returnsΒ
None
Β for the pipeline's task...
Hmm that sounds like the pipeline Task was closed?! could that be? where (in the code) is the call to Task.current_task ?
Eg, i'm creating a task usingΒ
clearml.Task.create
Β , often it doesn't properly get the git diff correctly,
ShakyJellyfish91 Task.create does not store any "git diff" automatically, is there a reason not to use Task.init ?
Hmm can you try:--args overrides="['log.clearml=True','train.epochs=200','clearml.save=True']"
FranticCormorant35 As far as I understand what you have going is a multi-node setup, that you manage yourself. Something like Horovod Torch distributed or any MPI setup. Since Trains support all of the above standard multi-node. The easiest way is to do the following:
On the master Node set OS environment:OMPI_COMM_WORLD_NODE_RANK=0Then on any client node:OMPI_COMM_WORLD_NODE_RANK=unique_client_node_numberIn all processes you can Call Task.init - with all the automagic kicking in....
I'm assuming TF was not part of the original requirements, and was automatically pulled by one of the packages, hence the latest version ....
I basically moved the Task.init() call below the imports
Okay that is odd, can you copy pate the before/after of the import, so we can fix that?!
To summarize: The scheduler should assign tasks the the agent first, which gives a queue the highest priority.
The issue here you assume both are idle and you need global priority based on resource preference. I understand your scenario now, but it will only hold if enqueuing order is "optimal". For example, if machine Y is running a Task B that is about to be completed (e.g. in a minute) then still machine X will pick the new Task B, and again we end up in the scenario where Task A i...
So the thing is clearml automatically detects the last iteration of the previous run, my assumption you also add it hence the double shift.
SourOx12 could that be it?
But this config should almost never need to change!
Exactly the idea π
notice the password (initially random) is also fixed on your local machine, for the exact same reason
quick update 1.0.2 will be ready in an hour, apologies π
SmarmySeaurchin8 what do you think?
https://github.com/allegroai/trains/issues/265#issuecomment-748543102
task.connect_configuration
Hi, I would like to understand how I can set the pip cache location for my agent,
ClumsyElephant70 by default the pip cache (and all other cache folders) are mounted back into the host itself ~/.clearml/
I'm assuming the idea is shared cache, if this is the case, do:docker_pip_cache = ~/my_shared_nfs/pip-cachehttps://github.com/allegroai/clearml-agent/blob/e3e6a1dda81bee2dd20a64d09746568e415f1823/docs/clearml.conf#L139
it seems it's following the path of the script i'm using to task.create, eg:
The folder it should run it is the script path you are passing (i.e. "script=ep_fn," )
Wrong path would imply that is it not finding the correct repository, is that the case ?
Jupyter Notebook is fully supported.
Could you try and restart the notebook kernel?
Thank you!
one thing i noticed is that it's not able to find the branch name on >=1.0.6x , while on 1.0.5 it can
That might be it! let me check the code again...
TenseOstrich47 / PleasantGiraffe85
The next version (I think releasing today) will already contain scheduling, and the next one (probably RC right after) will include triggering. That said currently the UI wizard for both (i.e. creating the triggers), is only available in the community hosted service. That said I think that creating it from code (triggers/schedule) actually makes a lot of sense,
pipeline presented in a clear UI,
This is actually actively worked on, I think Anxious...
Exactly!
Regarding adding feature store, probably not in the near future, a scalable feature store is quite the project, probably more realistic to somehow have a recipe to deploy with Feast
a task of queue B if the next task is of type A it will have to wait,
It seems you imply there are two types of Tasks and they need to be executed one after the other ?
Hi @<1541954607595393024:profile|BattyCrocodile47> and @<1523701225533476864:profile|ObedientDolphin41>
"we're already on AWS, why not use SageMaker?"
TBH, I've never gone through the ML workflow with SageMaker.
LOL I'm assuming this is why you are asking π
- First, you can use SageMaker and still log everything to ClearML (2 lines integration). At least you will have visibility to everything that is running/failing π
- SageMaker job is a container, which means for ...
Hi BurlyRaccoon64
What do you mean by "custom_build_script" ? not sure I found it in "clearml,conf"
https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf
Hi FrothyShark37
is the task scheduler only acessible through the SDK?
yes, in the open source version this is strictly code based. I know the enterprise tier has a UI for it, but in terms of features I believe this is equivalent
. Curious what advantage it would be to use the StorageManager
Basically if you set the clearml cache folder to the EFS, users can always do:from clearml import StorageManager local_file = StorageManager.get_local_copy(" ")where local_file is stored on persistent cache (EFS) and the cache is automatically cleaned based on last accessed file
Hi GrotesqueOctopus42 ,
BTW: is it better to post the long error message on a reply to avoid polluting the channel?
Yes, that is appreciated π
Basically logs in the thread of the initial message.
To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)
Yes if you do not specify --cpu-only it will default to trying to access gpus
Nice!
Is there a way to detect the repository when initialising a task?
SuperficialGrasshopper36 This should have happened automatically when you call Task.init()
Hi DepressedChimpanzee34
Why do you need to have the configuration added manually ? isn't the cleaml.conf easier ? If not I think OS environments are easier no? I run run above code, everything worked with no exception/warning... What is the try/except solves exactly ?
Did you set an agent on a machine? (See clearml agent in docs for details)
Could you test with the same file? Maybe timeout has something to do with the file size ?