Reputation
Badges 1
25 × Eureka!I just cloned it from the examples that are available in the SaaS console upon account creation
Ohhh! that would explain it. Maybe it is broken there?! let me check a second
Hi RoundMosquito25
however they are not visible either in:
But can you see them in the UI?
Hi SharpHedgehog60
Task type is another way to declare the type of processing the Task performs.
Later you can filter based on the Task type (like you would with a Tag).
For example Datasets are always of a Type "data processing"
I guess it wonβt due to the nature of services?
Correct, k8s glue works differently, that said I would actually use the helm to spin a pod woth the agent in services mode and venv mode.
I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)
Yes in the UI, clone or reset the Task, then youcan edit the installed packages section under the Execution tab
I see. If you are creating the task externally (i.e. from the controller), you should probably call. task.close() it will return when everything is in order (including artifacts uploaded, and other async stuff).
Will that work?
Hi ConvolutedSealion94
You can archive / delete the SERVING-CONTROL-PLANE
Task from the DevOps project in the UI.
Do notice you will need to make sure the clearml-serving is updated with a new sesison ID or remove it (i.e. take down the pods / docker-compose)
Make sense ?
Were you able to interact with the service that was spinned? (how was it spinned?)
Hmm if this is case, you can add some prints in here:
None
the service/action will tell you what you are sending
wdyt?
What will I do to fix my problem?
What is the problem? we just proved the upload speed is just fine?
UpsetBlackbird87pipeline.start()
Will launch the pipeline itself On a remote machine (a machine running the services agent).
This is why your pipeline is "stuck" it is not actually running.
When you call start_lcoally() the pipeline logic itself is runnign on your machine and the nodes are running on the workers.
Makes sense ?
Hi ExcitedCat13
Sure, download the plugin from the git repo (Install instructions in the repo).
Regarding remote debugging, are referring to ssh ?
The plugin itself is designed to make sure that when you work on a remote machine with pycharm clearml will log the local git repo and changes (as the .git folder is not synced to the remote machine)
So for this...
Sorry, what is exactly "this" ?
This seems more complicated that I thought... I think you are correct, and it fails to load the entire module, let me check what I can do
The main reason to add the timeout is because the warning was annoying to users π
The secondary was that clearml will start reporting based on seconds from start, then when iterations start it will revert back to iterations. But if the iterations are "epochs" the numbers are lower so you end up with a graph that does not match the expected "iterations" x-axis. Make sense ?
Hi AgitatedTurtle16 could you verify you can access the API server with curl?
Thanks, new doc site is scheduled for next week, it will also be on github, so pr-ing fixes will be a breeze :)
It actually started executing your code, but it did not capture it correctly:
/root/.clearml/venvs-builds/3.10/bin/python -u /root/.clearml/venvs-builds/3.10/code/colab_kernel_launcher.py
Which I assume means the actual Task had bad code.
What do you have under the Task execution tab in the UI (the one you were launching, i.e. enqueueing )
Wait, why aren't you just calling Popen? (or os.system), I'm not sure how it relates to the torch multiprocess example. What am I missing ?
Correct the serving Task ID is the clearml serving session. It is the instance that holds all the information of this specific setup and models
FranticCormorant35 As far as I understand what you have going is a multi-node setup, that you manage yourself. Something like Horovod Torch distributed or any MPI setup. Since Trains support all of the above standard multi-node. The easiest way is to do the following:
On the master Node set OS environment:OMPI_COMM_WORLD_NODE_RANK=0
Then on any client node:OMPI_COMM_WORLD_NODE_RANK=unique_client_node_number
In all processes you can Call Task.init - with all the automagic kicking in....
Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually callingΒ
Task.init
Β on those scripts.
Correct, and allow users to more easily create Tasks from code.
Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload ...
I would like to bypass this behavior because my code has a need for a specific version of PyTorch.
DilapidatedCow43 you will get exactly the pytorch version you need, but complied to the CUDA version that is installed (pytorch people actually maintain multiple versions based on different cuda versions)
The 'on-premise' server fails to connect to the ClearML server because of the VPN I think
I think you are correct.
You can quickly test it, try ti run curl
http://local-server:8008 see if that works
Hi @<1523702652678967296:profile|DeliciousKoala34>
What's the clearml-server version you are working with?
Can you check with the latest RC?
pip3 install clearml==1.9.2rc2
Sure thing :)
BTW could you maybe PR this argument (marked out) so that we know for next time?
Hi @<1523702786867335168:profile|AdventurousButterfly15>
Make sure you pass output_uri=true in Task.init
It will automatically upload your model to the file server. You can also configure it in the clearml.conf, look for defualt_output_uri