Reputation
Badges 1
25 × Eureka!You mean does one solution is better than combining maintaining and automating 3+ solutions (dvc/lakefs + mlflow + cubeflow/airflow)
Yes I'd say it is. BTW if you have airflow running for other automations you can very easily combine the automation with clearml and have a single airflow automation for everything, but the main difference now airflow only launches logic, never actual compute/data (which are launched and scaled via clearml
Does that make sense?
No sure I follow, you mean to launch it on the kubernretes cluster from the ClearML UI?
(like the clearml-k8s-glue ?)
and of course if your docker has packages preinstalled they are automatically used (not reinstalled)
how can you be snyk and lower than 0.96
Yep Snyk auto "patching" is great 🙂
as I mentioned wait for the GH sync tomorrow, a few more things are missing there
In the meantime you can just do ">= 0.109.1"
We are working hard on release 1.7 once that is out we will push an RC for review (I hope) 🙂
GiganticTurtle0 I'm not sure I follow, what do you mean by indexing the arguments? Can you post a short usage example ?
Hi @<1556450111259676672:profile|PlainSeaurchin97>
While testing the migration, we found that all of our models had their
MODEL URL
set to the IP of the old server.
Yes all the artifacts/models/debug-samples are stored "as is" , this means that if you configured your original setup with IP, it is kind of stuck there, this is why it is always preferred to use host-name ...
you apparently also need to rename
all
model URLs
Yes 😞
Back to the feature request, if this is taken care of (both adding a missed package, and the S3 upload), do you still believe there is a room for this kind of feature?
Then the dynamic gpu allocation is exactly what you need, I suggest talking to the sales ppl, I'm sure they can help. https://clear.ml/contact-us/
Test it on your local setup (I would hate to push a broken fix)
Is that possible?
Our datasets are more than 1TB in size and will grow in size (probably 4TB and up), this means we also need 4TB local storage
Yes, because somewhere you will have to store your unzipped files.
Or you point to the S3 bucket, and fetch the data when you need to access it (ore prefetch it) with the S3 links the Dataset stores, i.e. only when accessed
Hi CrookedWalrus33
docker_setup_bash_script= ["export PATH=""/workspace/miniconda/bin:$PATH"])
Oh I think you are correct, this should do the trick:docker_setup_bash_script= ["export PATH=/workspace/miniconda/bin:$PATH", "export LOCAL_PYTHON=/workspace/miniconda/bin/python3"]This will make sure both agent and script execute on the same python
but to run a script inside a docker which already has the environment built in.
If this is already activated, the latest agent w...
SoggyBeetle95 the question is, where does clearml stores these arguments, and the answer is on the Task object (from there the agent will take them and apply to the docker execution). Now since all users see all the tasks, they also see these arguments. Wdyt?
BTW, how can I run 'execute_orchestrator' concurrently?
It is launching simultaneously, (i.e. if you are not processing the output of the pipeline step function, the execution will not wait for its completion, notice that the call itself might take a few seconds, as it create a task and enqueues/sub-process it, but is it Not waiting for it)
Hi @<1784754456546512896:profile|ConfusedSealion46>
clear ml server took so much memory usage, especially for elastic search
Yeah that depends on how many metrics/logs you have there, but you really have to have at least 8GB RAM
delete old experiments ?
Hmm EmbarrassedPeacock82
Let's try with--input-size -1 60 1 --aux-config input.format=FORMAT_NCHWBTW: this seems like a triton LSTM configuration issue, we might want to move the discussion to the Triton server issue, wdyt?
Sure thing, feel free to ping 🙂
Think multiple hyper-paremter sections that we need to reference
(under the Tasks Configuration Tab, the Hyper parameters can have multiple sections)
No after, do you see the poetry lock removed in the uncommitted changes?
NastySeahorse61 I would try to open in incognito mode (i.e. no cookies etc.), did you also change the address of the server?
So basically development on a "shared" GPU?
but I cannot compare between them
I think we noticed it, and this will be fixed in the next server update (again, some plotly.js issue there)
Hmm I see, add this for example
extra_docker_shell_script: ["rm ~/.bashrc", "echo removed bashrc"]
CrookedWalrus33 any chance you can think of a sample code to reproduce?
Here this new entry in the log is 2 min after env completed =>
1702378941039 box132 DEBUG 2023-12-12 11:02:16,112 - clearml.model - INFO - Selected model id: 9be79667ca644d7dbdf26732345f5415
This seems to be something in your code, just add print("starting") in your entry python file, Before any imports (because they might actually do something)
Because form the agent's perspective after printing Starting Task Execution: it literally calls the python script, nothing else...
In order to clone the Task it needs to complete sync, which implies closing. I guess the use case for execute remotely while still running was not considered. How / why is this your workflow? Specifically how does Jupyter get into the picture?
For future readers, see discussion here:
https://clearml.slack.com/archives/CTK20V944/p1629840257158900?thread_ts=1629091260.446400&cid=CTK20V944