
Reputation
Badges 1
25 × Eureka!PompousBeetle71 Could you check with 0.14.3 that just released?
MysteriousBee56 I see...
So yes, you can with the APIClient you have full RESTful access to the backend.
I think there was a similar discussion https://allegroai-trains.slack.com/archives/CTK20V944/p1593524144116300
HandsomeCrow5 how did you end up solving it? I think you had a similar use case?!
Kind of as it tries to do "apt-get install"...
what did you have in mind ?
Thanks @<1523704157695905792:profile|VivaciousBadger56> ! great work on the docstring, I also really like the extended example. Let me make sure someone merges it
, I generate some more graphs with a file calledΒ
graphs.py
Β and want to attach/upload to this training task
Make total sense to use Task.get_task, I just want to make sure that you are aware of all the options, so you pick the correct one for you :)
I will probably just use everywhere an absolute path to be robust against different machine user accounts: /home/user/trains.conf
That sounds like good practice
Other than the wrong, trains.conf, I can't think of anything else... Well maybe if you have AWS environment variables with credentials ? they will override the conf file
Thank you! π
Finally managed; you keep saying "all projects" but you meant the "All Experiments" project instead. That's a good startΒ
Β Thanks!
Yes, my apologies you are correct: "all experiments"
I did not start with python -m, as a module. I'll try that
I do not think this is the issue.
It sounds like anything you do on your specific setup will end with the same error, which might point to a problem with the git/folder ?
Oh no, you are absolutely correct, it is broken (I mean I have no idea why it lists Hydra, or how it got there). I will let the guys know and fix it.
Bottom line, after you clone it, please edit the installed packages and remove the "Hydra" line and replace with just "hydra-core" (no need for version).
The format is the same as "requirements.txt" and will effect the venv created by the agent
Hi CloudySwallow27
how can I just "define" it on my local PC, but not actually run it.
You can use the clearml-task
CLI
https://clear.ml/docs/latest/docs/apps/clearml_task#how-does-clearml-task-work
Or you can add the following line in your code, that will cause the execution to stop, and to continue on a remote machine (basically creating the Task and pushing it into an execution queue, or just aborting it)task = Task.init(...) task.execute_remotely()
https://clear.ml/do...
Hi @<1630377234361487360:profile|RoughSeaturtle43>
code from gitlab repo with ssl cert.
what do you mean by ssl secret? is it SSH or app-token ?
Is it possible to substitute these steps using containers instead.
I'm not sure I follow, could you expand ?
So the way it will work, is you will also need to have a Task.init in main process (the one using the launch function) and the same Task.init in the main_func. What it does is it signals the sub processes to use the main process task. This way they all report to the same task. Obviously to test it you will need to wait for the RC (after the weekend :)
JitteryCoyote63
IAM role to the web app could access
you mean the web client key/secret to access S3 data ?
Any idea why the Pipeline Controller is Running despite the task passing?
What do you mean by "the task passing"
Hi VexedElephant56
Yes it is:
Define CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
(if running in doecker mode add -e CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1 as container args)
https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_env_var
I could improve the cost-efficiency of my provisionned GCP A100 instances
But their pricing is linear, if you do not need a100 get a cheaper instance ?! no?
TartSeal39 please let me know if it works, conda is a strange beast and we do our best to tame it.
Specifically when you execute manually on a conda env we collect (separately) the conda packages & the python packages (so later we can replicate on both conda & pip, or at least do our best)
Are you running both development env and agent with conda ?
I guess itβs on me to check whether this slowdown is negligible or not
Usually performance is negligible, especially with GPU
But if you really want the best:
Add --security-opt seccomp=unconfined
to the extra_docker_arguments
See detials:
https://betterprogramming.pub/faster-python-in-docker-d1a71a9b9917
And it is not working ? what's the Working Dir you have under the Execution Tab ?
Is
mark_completed
used to complete a task from a different process and
close
from the same process - is that the idea?
Yes
However, when I tried them out,
mark_completed
terminated the process that called
mark_completed
.
Yes if you are changing the state of the Task externally or internally the SDK will kill the process. If you are calling task.close()
from the process that created the Task it will gra...
my experiment logic
you mean the actual code doing the training ?
so that it gets lazily executed and not at task definition time
Task definition time -> when creating the Pipeline Task? remember the base_task_factory a the end creates a Task object (it does not run the code itslef).
BTW: if you have simple training logic you can use pipeline decorators , it might be a better fit?
https://clear.ml/docs/latest/docs/fundamentals/pipelines#pipeline-from-function-decorator
MoodyCentipede68 seems you did not pass any configuration (os env or conf file) so it does nor know how to find the server and authenticate. Make sense?
@<1542316991337992192:profile|AverageMoth57> it sounds like you should use SSH authentication for the agent, just setforce_git_ssh_protocol: true
None
And make sure you have the SSH kets on the agent's machine
So I checked the code, and the Pipeline constructor internally calls Task.init, that means that after you constructs the pipeline object, Task.current_task() should return a valid object....
let me know what you find out
Hi SarcasticSquirrel56
But if I then clone the task, and execute it by sending it to a queue, the experiment succeeds,
I'm assuming that on the remote machine the "files_server" is not configured the same way as the local execution. for example it points to an S3 bucket the credentials for the bucket are missing.
(in your specific example I'm assuming that the plot is non-interactive which means this is actually a PNG stored somewhere, usually the file-server configuration). Does tha...