Reputation
Badges 1
25 × Eureka!LazyLeopard18 well done on locating the issue.
Yes Docker on Windows is a bit flacky...
Hi PompousParrot44
Could you send the "Installed Packages" list?
I think there is a bug in the current trains-agent (there is already a fix but the RC is still not out),
where "packeg @ git+http" packages ignore the git+http link.
You can solve it manually by just editing the "Installed packages" (when Task is in draft mode, the section becomes editable), and remove the "package @" part, and leave the "git+http" link.
(without having to execute it first on Machine C)
Someone some where has to create the definition of the environment...
The easiest to go about it is to execute it one.
You can add to your code the following linetask.execute_remotely(queue_name='default')This will cause you code to stop running and enqueue itself on a specific queue.
Quite useful if you want to make sure everything works, (like run a single step) then continue on another machine.
Notice that switching between cpu...
Hi EnthusiasticCoyote38
Does clearml-agent hasΒ option
Fully supported π
Should work out of the box, it will always clone with --recursive and will bring all submodules
Hi @<1541954607595393024:profile|BattyCrocodile47>
Do you mean to start a remote session instead of the cli directly from the vscode ui and connect to it? If so, that would be awesome!! We have a remote session from the web were it spins you remote session and launches vscode inside the container so you work on it in your browser. But a VSCode plugin is a great idea, do you have a ref code to similar plugins?
Hi ReassuredTiger98
I think DefiantCrab67 solved it π
https://clearml.slack.com/archives/CTK20V944/p1617746462341100?thread_ts=1617703517.320700&cid=CTK20V944
Hi HandsomeCrow5 .
Remember the debug images are events with links to the actual images, so you first have to get the events and then you can download the images with https://allegro.ai/docs/examples/examples_storagehelper/#storagemanager (which by definition has the credentials, because it was able to upload them π
To get the events:from trains.backend_api.session.client import APIClient client = APIClient() client.events.debug_images(task='aabbcc')
I could merge some steps, but as I may want to cache them in the future, I prefer to keep them separate
Makes total sense, my only question (and sorry if I'm dwelling too much in it) is how would you pass the data between step 2 to step 3, if this is a different process on the same machine ?
@<1523704157695905792:profile|VivaciousBadger56> regrading: None
Is this a discussion or PR ?
(general ranting is saved for our slack channel π )
Sure thing, hopefully I'll remember to ping tomorrow once GitHub is synced, I'd appreciate it if you could verify the fix works π
Hi ConvolutedSealion94
You can archive / delete the SERVING-CONTROL-PLANE Task from the DevOps project in the UI.
Do notice you will need to make sure the clearml-serving is updated with a new sesison ID or remove it (i.e. take down the pods / docker-compose)
Make sense ?
Were you able to interact with the service that was spinned? (how was it spinned?)
I can raise this as an issue on the repo if that is useful?
I think this is a good idea, at least increased visibility π
Please do π
and i found our lab seems only have shared user file because i installed trains on one node, but it doesnβt appear on the others
Do you mean there is no shared filesystem among the different machines ?
Hi MagnificentSeaurchin79
This means the tensorflow was not directly imported in the repository (which is odd, it might point to the auto package analysis failing to find a the package, if this is the case please let me know)
Regardless, if you need to make sure a package is listed in the requirements either import it or use.Task.add_requirements('tensorflow') or Task.add_requirements('tensorflow', '2.3.1')
(Do notice that even though you can spin two agents on the same GPU, the nvidia drivers cannot share allocated GPU memory, so if one Task consumes too much memory the other will not have enough free GPU memory to run)
Basically the same restriction as manually launching two processes using the same GPU
Sure thing! this feature is all you guys, ask and shall receive π
DepressedChimpanzee34 <character> will almost always be converted into \ because otherwise it will not support \t or \n etc.
What I'm looking here is some logic that will allow us not to break backwards compatibility on the one hand, but still will allow you to have something like "first\second" entry.
WDYT? any ideas? (I really want to make sure we fix it as soon as possible)
Hi @<1523701260895653888:profile|QuaintJellyfish58>
Based on the docs
None
I think this should have worked, are you running the actual task_scheduler on yout machine? on the services queue ? what's the console output you see there ?
Why would you need to manually change the current run? you just provided the values with either default/command-line ?
what am I missing here?
ResponsiveHedgehong88 I'm not sure I state dit, but the argparser arguments and values are collected automatically from your current run and put on the Task, there is no need to manually set them if you have the argparser running on your machine. Basically it collects the current (i.e. the process running on your machine) settings, and "copies" them ...
Hi ResponsiveHedgehong88
With clearml-task the assumption is that you are using argparse. Does that make sense? You can also manually access it with task.get_parameters
https://clear.ml/docs/latest/docs/references/sdk/task#get_parameters
Hi TrickySheep9
Could you post the pipeline code here?
Also which clearml version are you using ?
Hi RotundHedgehog76
I think it should work out of the box, I mean at the end both spin jupyter notebooks, which is what clearml interacts with. Are you getting any errors?
When I do the port forward on my own usingΒ
ssh -L
Β also seems to fail for jupyterlab and vscode, too, which i find odd
The only external port exposed is the SSH one 10022, then the client forwards it locally (so you, the user, can always have the same connect i.e. "ssh root@localhost -p 8022")
If you need to expose an additional port , when the clearml-session is running, open another terminal and do:sh root@localhost -p 8022 -L 10123:localhost:6666This should po...