Reputation
Badges 1
25 × Eureka!Now Iβm just wondering if I could remove the PIP install at the very beginning, so it starts straightaway
AbruptCow41 CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 does exactly that π BTW, I would just set the venv cache and this means it will just be able to restore the entire thing (even if you have changed the requirements
https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L115
The issue itself is the name of the function (bottom line it has to be unique for every call). So the only very ugly hack is to copy paste the function X times?! π
(I'll see if we can push the fix to GitHub sooner)
OH I see. I think you should use the environment variable to override it:
None
so add to the docker args something like
-e CLEARML_AGENT__AGENT__PACKAGE_MANAGER__POETRY_INSTALL_EXTRA_ARGS=
- Suppose that the serving project A is serving some model version 1 and a new model is trained and it starts serving model version 2, but on runtime due to some reason reason we need to revert to model version 1, what would be the best way to achieve the above?
If you archive the model, then the cleaml-session will pick the "latest" non-archived model, essentially reverting to the previous version. Also notice that it supports multiple versions on a single endpoint (again also a feat...
In order for the sample to work you have to run the template experiment once. Then the HP optimizer will find the best HP for it.
One additional thing to notice, docker will Not actually limit the "vioew of the memory" it will just kill the container if you pass the memory limit, this is a limitation of docker runtime
@<1523701868901961728:profile|ReassuredTiger98> if you use the latest RC! i sent and run with --debug in the log you will see the full /tmp/conda_envaz1ne897.yml content
Here it is copied from your log, do you want to see if this one works:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- de...
BroadMole98
I'm still exploring what trains is for.
I guess you can think of Trains as Experiment manager + MLOps tied together.
The idea is to give a quick and easy way to move from coding/running on one machine to scaling it to multiple remote machines, with everything that comes with it.
In some ways it is like snakemake, it setups your environment and execute the code. Snakemake also allows you to setup data, which in Trains is done via code (StorageManager), pipelines are also...
ElegantCoyote26 what you are after is:docker run -v ~/clearml.conf:/root/clearml.conf -p 9501:8085
Notice the internal port (i.e. inside the docker is 8080, but the external one is changed to 9501)
Is there still an issue? Could it be the browser cannot access the file server directly?
TightElk12 I think this message belongs to a diff thread ;)
give me a minute to test
Oh that is odd. Is this reproducible? @<1533620191232004096:profile|NuttyLobster9> what was the flow that required another task.init?
It is currently only enabled when using ports mode, it should be enabled by default , i.e a new feature :)
This workflow however is the only way I have found to easily fix my previous βModule not foundβ errors
Hmm okay make sense,
Did you try to set these ?
or even hack the sys.path with something likeimport sys, os sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)+"/../")
you can also specify additional packages on the decorator@PipelineDecorator.component(..., packages=["tqdm>=2.1", "scikit-learn"]) def step_one(...): # code here
Do you have a specific numpy version you are installing ? why is it trying to install the wheel from code?
in Your Additional ClearML Configuration (which is basically clearml.conf configuration)
Add the following:environment { GOOGLE_APPLICATION_CREDENTIALS="~/gs.cred" } files { gsc { contents: "<this is your GCP storage credentials file>" path: "~/gs.cred" } }Reference:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L421
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a...
The agents are docker containers, how do I modify the startup script so it creates a queue?
Hmm actually not sure about that, might not be part of the helm chart.
So maybe the easiest is:from clearml.backend_api.session.client import APIClient c = APIClient() c.queues.create(name="new_queue")
Can't say I have noticed that, is this a delay on the send ? Which for some reason is correlated with the epochs ? What was the case with 0.17.5?
Hi @<1645597514990096384:profile|GrievingFish90>
You mean the agent itself inside a docker then the agent spins sibling dockers for the Tasks ?
when you are running the n+1 epoch you get the 2*n+1 reported
RipeGoose2 like twice the gap, i.e internally it adds the an offset of the last iteration... is this easily reproducible ?
You mean to add these two to the model when deploying?
β βββ model_NVIDIA_GeForce_RTX_3080.plan
β βββ model_Tesla_T4.plan
Notice the preprocess.py is Not running on the GPU instance, it is running on a CPU instance (technically not the same machine)
Okay I'll dig into it π
The cloning is done in another task, which has the argv parameters I want the cloned task to inherit from
JitteryCoyote63 What do you mean by that?
Hmmm, make sure the task doing the cloning is using 0.16.1 and above , because with .16 we added sections and the compatibility is between the version. Meaning if you have tasks generated with trains .16 you need trains .16 to clone them from code (so you could properly control the arguments)
JitteryCoyote63
Picks a new experiment on top of the long one running
This is very very strange. Is the long running experiment being logged (i.e. do you still see console output in the UI)?
Thanks VivaciousPenguin66 !
BTW: if you are running the local code with conda, you can set the agent to use conda as well (notice that if you are running locally with pip, the agent's conda env will use pip to install the packages to avoid version mismatch)