
Reputation
Badges 1
25 × Eureka!OmegaConf
is the configuration, the overrides are in the Hyperparameters "Hydra" section
None
No worries, and I hope you manage to get that backup.
GentleSwallow91 notice this part:
Hi Martin. Sorry - missed your reply.
Yeap I am aware that docker_internal_mounts is inside agent section.
'-v', '/tmp/ssh-XXXXXXnfYTo5/agent.8946:/tmp/ssh-XXXXXXnfYTo5/agent.8946', '-e', 'SSH_AUTH_SOCK=/tmp/ssh-XXXXXXnfYTo5/agent.8946',
It is creating a copy of the ssh folder and setting the SSH_AUTH_SOCK env to it. You can just map the entire ssh folder automatically by un-setting SSH_AUTH_SOCK before running the agent.SSH_AUTH_SOCK= clearml-agent ...
Can you share the modified help/yaml ?
Did you run any specific migration script after the upgrade ?
How many apiserver instances do you have ?
How did you configure the elastic container? is it booting?
And other question is clearml-serving ready for serious use?
Define serious use? KFserving support is in the pipeline, if that helps.
Notice that clearml-serving is basically a control plane for the serving engine, not to neglect the importance of it, the heavy lifting is done by Triton π (or any other backend we will integrate with, maybe Seldon)
But pytorch has no specific backend, it uses TB.
No?! Can you point me to an example? What I mostly find is how to calc metrics not standard way to then store them...
task.set_script(working_dir=dir, entry_point="my_script.py")
Why do you have this part? isn't it the same code, the script entry point is auto detected ?
... or when I run my_script.py locally (in order to create and enqueue the task)?
the latter, When the script is running locally
So something like
os.path.join(os.path.dirname(file), "requirements.txt")
is the right way?
Sure this will work π
Then as you suggested, I would just use sys.path it is probably the easiest and actually very safe (because the subfolders are Always next to the "main" source code)
FYI: if you need to query stuff you can always look directly in the RestAPI:
https://github.com/allegroai/clearml/blob/master/clearml/backend_api/services/v2_9/projects.py
https://allegro.ai/clearml/docs/rst/references/clearml_api_ref/index.html
Hmm yes, that is a good point, maybe we should allow to specify a parameter on the model configuration to help with the actual type ...
JitteryCoyote63 Great to hear π
BTW:
Would it be possible to extendΒ
Task.init
Β with aΒ
force_reuse
Β that would enforce reusing these tasks
You can pass continue_last_task=True
I think it should be equivalent to what you suggest
Hi FiercePenguin76
Hereβs my workaround - ignore the fail messages, and manually create an SSH connection to the server with Jupyter port forwarded.
You are correct, clearml-session assumes it can SSH into the remote agent machine, from that point it automatically tunnels all other connections on top of the original SSH (well with some fancy SSH keep-alive proxy).
I'm assuming that from home you cannot connect to the SSH machine at the office, which makes sense, but out of curiosity...
Hi LovelyHamster1
You mean when as a section name or a variable?
Could you change this example to include a variable that breaks the support ?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/.clearml/cache/storage_manager/datasets/.lock.000.ds_38e9acc8d56441999e806815abddee82.clearml'
Let me check this issue, it seems like the locking mechanism should have figured that there is no lock...
Hi RobustHippopotamus53
The way "latest from branch" works:
On the Task you specify the branch name (e.g. "master", no need to add the origin/ prefix) The agent then pulls the latest commit from that branch and updates back the Task to the current commit ID (the latest on the branch at the time of execution) This process ensures reproduciblity and traceability as we can always be certain the exact commit that was executed.Could it be the you "forced-push" a commit/squash, hence the "origina...
so you have a repo with poetry that some users update and some do not?
All working on the same branch ?
These paths are
pathlib.Path
. Would that be a problem?
No need to worry, it should work (i'm assuming "/src/clearml_evaluation/" actually exists on the remote machine, otherwise useless π
ElegantCoyote26 what you are after is:docker run -v ~/clearml.conf:/root/clearml.conf -p 9501:8085
Notice the internal port (i.e. inside the docker is 8080, but the external one is changed to 9501)
Task.create
will create a new Task (and return an object) but it does not do any auto-magic (like logging the console, tensorboard etc.)
But from the log it seems that:
you are not running as root in the docker? Python3.8 is installed (and not python 3.6 as before)
CleanWhale17 nice ... π
So the answer is Trains supports the Pipeline / Automation of it, but lacks that dataset integration (that is basically up to you to manage, with either artifacts or any other method)
The Allegro Enterprise allows you to rerun the code, on a new version of the dataset from the UI (or automation) without changing a single line of code π
Hi SubstantialElk6ClearML-Data
doesn't actually "load" the data, it brings it locally and returns a folder with all your data files, from that point onward, it's up to your code to load it to the framework. Make sense ?
do you have your Task.init
call inside the "train.py" script ? (and if you do, what are you getting in the Execution tab of the task) ?
SarcasticSparrow10 sure see "execute_remotely" it does exactly that:
https://allegro.ai/docs/task.html#trains.task.Task.execute_remotely
It will stop the current process (after syncing everything) and launch itself remotely (i.e. enqueue itself)
When the same code is running by the "trains-agent" the execute_remotely call becomes a no-operation and is basically skipped
Hi @<1695969549783928832:profile|ObedientTurkey46>
Use --services-mode in the agent , it will run many Tasks on the same machine, this is usually associated with the services queue, but can be run on any queue. This way you could have the same machine easily running those multiple "control" tasks.
wdyt?
you can also specify additional packages on the decorator@PipelineDecorator.component(..., packages=["tqdm>=2.1", "scikit-learn"]) def step_one(...): # code here