Reputation
Badges 1
25 × Eureka!Hi HappyDove3task.set_script
is a great way to add the info (assuming the .git is missing)
Are you running it using PyCharm? (If so use the clearml pycharm plugin, it basically passes the info from your local git to the remote machine via OS environment)
Hi @<1571308003204796416:profile|HollowPeacock58>
parameters = task.connect(config, name='config_params')
It seems that your DotDict does not support the python copy
operator?
i.e.
from copy import copy
copy(DotDict())
fails ?
For the on-prem you can check the k8s helm charts it case spin agents for you (static agents).
For the GKE the best solution is the k8s glue:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
For example, could you test if this one works:
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
Hi SubstantialElk6 I'll start at the end, you can run your code directly on the remote GPU machine 🙂
See clearml-task
documentation, on how to create a task from existing code and launch it
https://github.com/allegroai/clearml/blob/master/docs/clearml-task.md
That said, the idea is that you add the Task.init
call when you are writing/coding the code itself, then later when you want to run it remotely you already have everything defined in the UI.
Make sense ?
LovelyHamster1 verified, this is a UI bug with old limitation enforced.
I will make sure they know about it, it should be fixed for the upcoming release 🙂
SmallBluewhale13 the final path is automatically generated, you only need to specify the bucket itself. By default it will be your "files_server"
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/docs/clearml.conf#L10
You can either change the configuration (which will make sure All uploaded artificats will always be there, including debug images etc.)
You can specify where you want the artifacts and debug images to be uploaded by setting:
https://allegro....
Hi TrickySheep9
Long story short, clearml-session fully supports k8s (using k8s glue)
The --remote-gateway along side ports mode will basically allow you to setup a k8s service so that every session will register with a specific port so k8s does ingest foe you and route the SSH connection to the pod itslef, everything else is tunneled over the original SSH connection.
Make sense ?
It should have worked....
Can you run the examples from the repo and see if they work?
Hmm are you running from inside the Kaggle jupyter thing ?
I think that what happened was you are running it on the host machine (not inside the docker)
I probably missed a "
somewhere
Thanks GrievingTurkey78 !
It seems that under the hood they user argparser
See here:
https://github.com/google/python-fire/blob/c507c093fa6622ab5efee21709ffbf25974e4cf7/fire/parser.py
Which means it might just work?!
What do you think?
I could take a look and figure that out.
This will greatly accelerate integration 😉
PompousParrot44
It should still create a new venv, but inherit the packages from the system-wide (or specific venv) installed packages. Meaning it will not reinstalled packages you already installed, but it will ive you the option of just replacing a specific package (or install a new one) without reinstalling the entire venv
Hi PompousParrot44
Let's stick with a single question per thread, it will make my life a lot easier 🙂
What do you mean by "and not in the terminal directly when executed manually through script"?
trains-agent (usually) executed as a daemon pulling jobs and executing them.
The other options is to use it to manually execute a single task.
What am I missing?
Hi CleanPigeon16
I was wondering how (or if) you handle interruptions.
Good question, basically (and I might be missing a few details but I think that's the general gist).
A new instance will be spinned (spot/regular based on your "compute budget") as long as there is a job in the "monitored" queue. that mean that if a worker was kicked by amazon (i.e. is spot) another one will be spinned instead as long as there is a job in the queue. That means that what is probably missing in you...
Hi DisgustedDove53
When you say "deployment" there are a lot of way to interpret that 🙂 what exactly are you looking for ?
Sorry found the code on the Task, duh 🙂
` # get_ipython().magic('pip install clearml')
import clearml
from clearml import Task
task = Task.init(project_name='examples', task_name='test param', reuse_last_task_id=False)
param = {
'tuple_double_quotes_r': (r"value\blah", 1),
'tuple_double_quotes': ("value\blah", 1),
'tuple_single_quotes': ('value\blah', 1),
"double_quotes_r": r"value\blah",
'double_quotes': "value\blah",
'single_quotes': 'value\blah'
...
VirtuousFish83 is the exit(1) called from the main process or a subprocess? Are you running it with an agent?
Hmm DepressedChimpanzee34 my bad it seems the loading is done via YAML loader, but the dumping is straight forward str casting...
https://github.com/allegroai/clearml/blob/6e6271fb91f2aeb2aa7a13c6d07d4e635baaa670/clearml/backend_interface/task/task.py#L934
What would you expect to get (BTW "value\blah"
is Not a valid string assignment in python as there is no \b escape character, it should be "value\blah" which translates into the text "value\blah")
maybe worth updating the main Readme.md in the github.. if someone try to follow the instructions there it breaks
Hmm I thought we already did, Yes you are absolutely correct, I'll make sure we do
Thanks SubstantialElk6 !
I believe an initial a fix was pushed 😉 A full one (merging Task --env with k8s template) will be added soon
Would be very cool if you could include this use case!
I totally think we should, any chance you can open an Issue, so this feature is not lost?
Hmm that sounds like the agent needs to access a vault with credentials per user, unfortunately this is not covered in the open-source 😞 I "think" this is supported in the enterprise version as part of the permission management
SpotlessFish46
yes you can access the entire code in the incomitted changes, you can test it with:task = Task.get_task(task_id='aabb') task_dict = task.export_task()
2. correct, but then if you need the entire code base you need to clone the arepo and apply the uncommitted changes. Basically trains-agent does that when execute with buildtrains-agent build --id aabb --target ~/my_task_env
3. See (2)