Reputation
Badges 1
25 × Eureka!Hi JitteryCoyote63 , is there a callback for that?
FierceHamster54 what you are saying that Inside the container it took 20 min to run? or that spinning the GCP instance until it registered as an Agent took 20min ?
Most of the time is took by building wheels for
nympy
and
pandas
...
BTW: This happens if there is a version mismatch and pip decides it needs to build the numpy from source, Can you send the full logs of that? Maybe we can somehow avoid that?
Hi FiercePenguin76
So currently the idea is you have full control over per user credentials (i.e. stored locally). Agents (depending on how deployed) can have shared credentials (with AWS the easiest is to push to the OS env)
Since my deps are listed in the dependencies of my setup.py, I don't want clearml to list the dependencies of the current environment
Make sense 🙂
Okay let me check regrading the "." in the venv cache.
JitteryCoyote63 I remember something with "!" in the name or maybe "/" in the name that might cause this behavior. May I suggest checking with clearml-server 1.3 ?
Won't it be too harsh to have system wide restriction like that ?
do you have a video showing the use case for clearml-session
I totally think we should, I'll pass it along 🙂
what is the difference between vscode via clearml-session and vscode via remote ssh extension ?
Nice! remote vscode is usually thought of as SSH, basically you have your vscode running on your machine, and using SSH vscode automatically connects to the remote machine.
Clearml-Session also ads a new capability VSCode inside your browser, where the VSCode itself as well...
Hi SpotlessWorm70
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program.
This seems like OpenMP issue
I would assume something is off with the local environment (not really connected to clearml but to one of the frameworks, for example TF, Keras, etc.)
NVIDIA_VISIBLE_DEVICES=0,1
Basically it is uses "as is" and Nvidia drivers do the rest
Same goes for all
or 0-3
etc.
okay let's PR this fix ?
PompousParrot44
Check out the task.execute_remotely()
You can call it right after the task init, and it will enqueue your running Task, and leave the process (if you want).
https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/trains/task.py#L1437
Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the end...
Hi RoughTiger69
but still get the semantics of knowing when an (external) file changed?
How would you know it changed?
This implies you have a way to verify hash, which means you download the data , no?
I think CostlyOstrich36 managed to reproduce?!
MortifiedCrow63 , hmmm can you test with manual upload and verify ?
(also what's the clearml version you are using)
Hi DangerousDragonfly8
You mean you want to trigger something when users archive a Task ?
MagnificentSeaurchin79
"requirements.txt" is ignored if the Task has an "installed packges" section (i.e. not completely empty) Task.add_requirements('pandas') needs to be called before Task.init() (I'll make sure there is a warning if called after)
Hi UnevenDolphin73
This differentiable storage - does it only work on file additions/removal, or also on intra-file changes?
This is on a file level, meaning you change a single byte in the file, the entire file will be packaged in the new version.
Make sense ?
same: Not Found (#404)
May I suggest to DM it to me (so it is not public)
In that case I suggest you turn on the venv cache, it will accelerate the conda environment building because it will cache the entire conda env.
t seems there is some async behavior going on. After ending a run, this prompt just hangs for a long time:
2021-04-18 22:55:06,467 - clearml.Task - INFO - Waiting to finish uploads
And there's no sign of updates on the dashboard
Hmm that could point to an issue uploading the last images (which are larger than regular scalars) could you try flushing and waiting ?
i.e.task.flush() sleep(45)
Nice SubstantialElk6 !
BTW: you can configure your cleaml client to store the changes from the latest Pushed commit (and not the default which is latest local commit)
see store_code_diff_from_remote:
in clearml.conf:
https://github.com/allegroai/clearml/blob/9b962bae4b1ccc448e1807e1688fe193454c1da1/docs/clearml.conf#L150
Hi FloppyDeer99
What is the meaning of no real scheduling
I think the meaning is that from the moment a k8s job is created, the k8s is in charge of actually spinning the container. Since k8s has no real priority/order the scheduling order is not guaranteed form this point.
The idea of the cleaml-k8s -glue is that the glue will launch a job on the k8s cluster only if it is sure there are enough resources to actually spin the job now (as opposed to, sometime in the future), this mea...
current task fetches the good Task
Assuming you fork the process than the gloabl instance" is passed to the subprocess. Assuming the sub-process was spawned (e.g. POpen) then an environement variable with the Task's unique ID is passed. then when you call the "Task.current_task" it "knows" the Task was already created and it will fetch the state from the clearml-server and create a new Task object for you to work with.
BTW: please use the latest RC (we fixed an issue with exactly this...
ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)
hows does this work with HPO?
The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice...