The experiment finished completely this time again
With the RC version or the latest ?
Try adding this environment variable:export TRAINS_CUDA_VERSION=0
You can check the keras example, run it twice, on the second time it will continue from the previous checkpoint and you will have input and output model.
https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py
Notice that you have to Have the task already started by the Master process
Then the dynamic gpu allocation is exactly what you need, I suggest talking to the sales ppl, I'm sure they can help. https://clear.ml/contact-us/
I see, by default it will look for requirements.txt in the root of the repo (the actual repo).
That said in code you can specify the requirements .txt:Task.force_requirements_env_freeze(requirements_file='repo/project-a/requirements.txt') task = Task.init(...)Notice, you need to call it prior to the Task.init call
OhTask.get_project_object().default_output_destination = None
This has no effect on the backend, meaning this does not actually change the value.from clearml.backend_api.session.client import APIClient c = APIClient() c.projects.update(project="<project_id_here>", default_output_destination="s3://")btw: how/what it is used for in your workflow ?
There are also "completed, aborted, queued" .
Archived is actually a tag (system tag, not user tag). There is a "state machines" of moving from one state to the other. The special case is "published" that we probably should have called "locked". The idea is that if a Task/Model is published, you cannot reset it (and even deleting requires force flag).
I would use additional user tags (or even system-tags) to mark "deployed" state, wdyt?
Hi @<1556450111259676672:profile|PlainSeaurchin97>
You mean instead of the parallel coordinates ?
None
Added -v /home/uname/.ssh:/root/.ssh and it resolved the issue. I assume this is some sort of a bug then?
That is supposed to be automatically mounted the SSH_AUTH_SOCK defined means that you have to add the mount to the SSH_AUTH_SOCK socket so that the container can access it.
Try to run when you undefine SSH_AUTH_SOCK and keep the force_git_ssh_protocol (no need to manually add the .ssh mount it will do that for you)
Hi PompousBeetle71 , what exactly is the scenario / problem we are trying to solve ?
oh dear ...
ScrawnyLion96 let me check with front-end guys 😞
SmarmySeaurchin8
updated_tags = task.tags
updated_tags.remove(tag)
task.tags = updated_tags
SoggyBeetle95 maybe it makes sense to configure the agent with an access-all credentials? Wdyt
My bad, I worded my question wrong I see,
LOL no worries 🙂
Any chance you have some "debug" leftover in the Pipeline code:
https://github.com/allegroai/clearml/blob/7016138c849a4f8d0b4d296b319e0b23a1b7bd9e/examples/pipeline/pipeline_from_decorator.py#L113
Maybe we should show a warning when we it is being called, or ignore it when running via an agent ...
DrabCockroach54 that is quite cool!
Basically here is what I would do
Query Tasks that are both Running and Do not have system tag "development" (that means running on agents) + filter only tasks that start say 10 min ago Go over the list and see if (1) they have GPU scalar reported (meaning GPU is accessible) (2) min/max/val of GPU utilization is under 5%wdyt?
Oh, I was assuming you are passing the entire DB backups to the cloud.
Are you saying you just want the file server on the cloud ? if this is the case, I would just use S3
So this can be translated to
CLEARML__SDK__AZURE__STORAGE__CONTAINERS__0__ACOUNT_NAME=abcd
os.environ['TRAINS_PROC_MASTER_ID'] = '1:da0606f2e6fb40f692f5c885f807902a' os.environ['OMPI_COMM_WORLD_NODE_RANK'] = '1' task = Task.init(project_name="examples", task_name="Manual reporting") print(type(task))Should be: <class 'trains.task.Task'>
What sort of data would be stored in the
venvs-build
folder?
ClumsyElephant70 temporary (lifetime of the task execution) virtual environment, including the code etc. It is deleted and recreated for every new task launched (or restored from cache, if venvs_cache is enabled)
generally speaking the agent will convert the repo url to the auth scheme it is configured with, ssh->hhtp if using user/pass, and http->ssh if using ssh
JitteryCoyote63 I remember something with "!" in the name or maybe "/" in the name that might cause this behavior. May I suggest checking with clearml-server 1.3 ?
Are you running the agent in docker mode? or venv mode ?
Can you manually ssh on port 10022 to the remote agent's machine ?ssh -p 10022 root@agent_ip_here