Reputation
Badges 1
25 × Eureka!FranticCormorant35 As far as I understand what you have going is a multi-node setup, that you manage yourself. Something like Horovod Torch distributed or any MPI setup. Since Trains support all of the above standard multi-node. The easiest way is to do the following:
On the master Node set OS environment:OMPI_COMM_WORLD_NODE_RANK=0Then on any client node:OMPI_COMM_WORLD_NODE_RANK=unique_client_node_numberIn all processes you can Call Task.init - with all the automagic kicking in....
so if i plot image with matplot lib..it would not upload? i need use the logger.
Correct, if you have no "main" task , no automagic π
so how can i make it run with the "auto magic"
Automagic logs a single instance... unless those are subprocesses, in which case, the main task takes care of "copying" itself to the subprocess.
Again what is the use case for multiple machines?
Correct, and that also means the code the runs is not auto-magically logged.
Should work out of the box, as long as the task was started. You can forcefully start the task with:task.mark_started()
os.environ['TRAINS_PROC_MASTER_ID'] = '1:da0606f2e6fb40f692f5c885f807902a' os.environ['OMPI_COMM_WORLD_NODE_RANK'] = '1' task = Task.init(project_name="examples", task_name="Manual reporting") print(type(task))Should be: <class 'trains.task.Task'>
os.environ['TRAINS_PROC_MASTER_ID'] = args.trains_idit should be '1:'+args.trains_id
os.environ['TRAINS_PROC_MASTER_ID'] = '1:{}'.format(args.trains_id)Also str(randint(1, sys.maxsize))
Hi @<1523702969063706624:profile|PoisedShark13>
However, INSTALLED PACKAGES of my task is misses many of installed packages (any idea why?)
It automatically detects the directly imported packages, literally analyzing your code base and looking for imports
The derivative packages (i.e. the one that any of the "main" packages need, will be listed after the first time the agent installs everything)
If something specific is missing, you can manually add it with:
Task.add_requiremen...
You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79
Hi @<1571308003204796416:profile|HollowPeacock58>
I'm assuming this is the arm support (i,e, you are running on new mac) fix we released in one one of the last clearml-agent versions. could you update to the latest clearml-agent?
pip3 install clearml-agent==1.6.0rc2
I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)
FlutteringWorm14 any insight on the Task the it fails to delete ? or to reproduce ?
SlipperyDove40 following on the missing section name, this seems like backwards compatibility issue. Try calling with backwards_compatibility=Falsemy_params = Task.get_parameters(backwards_compatibility=False)This should always add the section name prefix.
So the original looks good, could it be you tried to clone a Task that was executed with an agent with pip, and then pushed into an agent running conda?
Yes in the UI, clone or reset the Task, then youcan edit the installed packages section under the Execution tab
Hi JitteryCoyote63
Could it be a python mismatch ? can you send the full log?
BTW: when I dopip3.8 install pytorch3d==I get the following versions:pytorch3d== (from versions: 0.0.1, 0.1.1, 0.2.0, 0.2.5, 0.3.0)
Hi UnevenDolphin73
You mean this part?
https://github.com/allegroai/clearml-agent/blob/5afb604e3d53d3f09dd6de81fe0a494dacb2e94d/docs/clearml.conf#L212
(In other words, theΒ
the Task's Environment section
Β is a bit unclear)
Yes we should expand, but generally you are correct it should work as you described π
So you want to launch the second step before the task that is uploading artifact is completed, but after the artifacts are uploaded?
Oh yes, you probably have sorting or filter applies there :)
Thank you EnviousStarfish54 !
This is very helpful!
I'm looking at Kedro and the project you shared, and a few thoughts came to mind:
I very much like the idea of using functions as "nodes" (and to extend, using notebook cells with tags as nodes). This got me thinking, I'm pretty sure we could have a similar imlmentation with ClearML. My thinking is using inspect or dill to convert the functions/cells into plain text code, automatically analyze the runtime requirements, and creat...
BTW: UnevenDolphin73 you should never actually do "task = clearml.Task.get_task(clearml.config.get_remote_task_id())"
You should just do " Task.init() " it will automatically take the "get_remote_task_id" and do all sorts of internal setups, you will end up with the same object but in an ordered fashion
Yes even without any arguments give to Task.init() , it has everything from the server
FrothyShark37 any chance you can share snippet to reproduce?
however can you see the inconsistency between the key and the name there:
Yes that was my point on "uniqueness" ... π
the model-key must be unique, and it is based on the filename itself (the context is known, it is inside the Task) but the Model Name is an entity, so it should have the Task Name as part of the entity name, does that make sense ?
JitteryCoyote63 This is odd you have both python3.9 and python3.8 on the container, and since it says (probably) ob the task the agent should run python3.9 it's trying to use it for creating the enthronement (it does not matter that agent is using python3.8).1673431344706 agent-1 DEBUG /usr/bin/python3.9 /usr/bin/python3.9: No module named pipThis is the main issue, pip is missing for python3.9 and this is why it reverted to python 3.8 when it was setting the environment.
It should prob...
packages are updated, and I don't know which python version I get, + changing the python version of the OS is not really recommended
Wait I'm confused, this is inside a container, no?
and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)
Generally speaking you are correct, but some packages will not have the same version for all python versions
Specifically in this case I think...
I guess itβs on me to check whether this slowdown is negligible or not
Usually performance is negligible, especially with GPU
But if you really want the best:
Add --security-opt seccomp=unconfined to the extra_docker_arguments
See detials:
https://betterprogramming.pub/faster-python-in-docker-d1a71a9b9917
Please do, just so it wont be forgotten (it won't but for the sake of transparency )
Will this still be considered asΒ
global site-packages
This is a pip settings, I "think" it inherits from the local user's installation, but I would actually install with "sudo pip" that will definitely be "inherited"