Reputation
Badges 1
25 × Eureka!Do you want to open an issue in pip?
Funny enough this works in:
pip3 install "torch >=2.1.0.*, <2.1.1.*" --extra-index-url
correct. notice you need two gents one for the pipeline (logic) and one for the pipeline components.
that said you can run two agents on the same machine π
I think that what you need is the triggers, check this one:
https://clear.ml/docs/latest/docs/references/sdk/trigger
and sometimes there are hanging containers or containers that consume too much RAM.
Hmmm yes, but can't you see it in CLearML dashboard ?
unless I explicitly add container name in container arguments, it will have a random name,
it would be great if we could set default container name for each experiment (e.g., experiment id)
Sounds like a great feature! with little implementation work π Can you add a GitHub issue on clearml-agent ?
Hi JitteryCoyote63
Could it be a python mismatch ? can you send the full log?
BTW: when I dopip3.8 install pytorch3d==
I get the following versions:pytorch3d== (from versions: 0.0.1, 0.1.1, 0.2.0, 0.2.5, 0.3.0)
BTW, we figure out thatΒ Β
'
Β is belong the echo
yep, when seeing the full command it is apparent
It's just another flag when running the trains-agent
You can have multiple service-mode instances, there is no actual limit π
That makes total sense.
So right now you can probably use clearml-session to spin a session in any container, add the jupyterhub to the requirements like so:clearml-session --packages jupyterhub
Then ssh + run jupyerhub + tunnel port?ssh roo@IP -p 10022 -L 6666:localhost:6666 $ jupyterhub
Would that work?
Maybe it is better to add an option to use jupyterhub instead of jupyterlab ?
wdyt?
Is itΒ
CLEARML_CONFIG_FILE
? (I had to dig this from the GH codeΒ
Β )
Yes it is !
https://clear.ml/docs/latest/docs/faq#clearml-configuration
(I will make sure we add it to https://clear.ml/docs/latest/docs/configs/env_vars#server-connection as well π )
so what should the value of "upload_uri" to set to,Β
fileserver_url
Β e.g.Β
Β ?
yes, that would work.
Yes! I checked it should work (it checks if you have load(...) function on the preprocess class and if you do it will use it:
None
def load(local_file)
self._model = joblib.load(local_file_name)
self._preprocess_model = joblib.load(Model(hard_coded_model_id).get_weights())
It's a good abstraction for monitoring the state of the platform and call backs, if this is what you are after.
If you just need "simple" cron, then you can always just loop/sleep π
Suppose that a new model version 2 is trained, but it does not fulfill our target metrics, is it possible to just save the model to model repo and not serve it, if a model version 1 is already being served?
Sure, just do not "publish" the model, it will be stored in the model repository, fully accessible but the clearml-serving will not serve it π
PompousParrot44 obviously you can just archive a task and run the cleanup service, it will actually delete archived tasks older than X days.
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
One thought is to initialise a new clearML task in each fold to capture the iteration-level metrics, and then create another task/experiment at the end to capture the aggregated metrics across folds.
That is probably the easiest, and the most scalable.
BTW: with the mew reporting feature, you can integrate the comparison of the CV directly into your final report π
PanickyMoth78
LockException: [Errno 11] Resource temporarily unavailable
I'm not sure I understand how you got to this error (obviously creating datasets and getting them back works), what is unique in the setup/flow itself ?
I still see things being installed when the experiment starts. Why does that happen?
This only means no new venv is created, it basically means install in "default" python env (usually whatever is preset inside the docker)
Make sense ?
Why would you skip the entire python env setup ? Did you turn on venvs cache ? (basically caching the entire venv, even if running inside a container)
okay thanks! let's pick it up on github π
EnviousStarfish54 we just fixed an issue that relates to "installed packages" on windows.
RC is due to be release in the upcoming days, I'll keep you posted
Yep that will fi it, nice one!!
BTW I think we should addtge ability to continue aborted datasets, wdyt?
GloriousPenguin2 could you open a GitHub issue on it? Just making sure this will actually get fixed π
Hi EnviousStarfish54
You mean the console output ? if that's the case, the Task.init call will monkey patch the sys.stdout/sys.stderr to report to clearml
as well as the console
PompousParrot44 Enterprise licensing pricing usually custom tailored to the size of the company and based on usage. If you are interested feel free to leave details in the "contact us" form on the website, and someone from sales will contact you shortly after.
Yes, because when a container is executed, the agent creates a new venv and inherits from the system wide installed packages, but it cannot inherit or "understand" there is an existing venv, and where it is.
others from the local environment and this causes a conflict when importing the attr module
Inside the docker ? " local environment" ?
This is all under "root" no?
JitteryCoyote63
are the calls from the agents made asynchronously/in a non blocking separate thread?
You mean like request processing on the apiserver are multi-threaded / multi-processed ?
and then?
The thing is programmatically this is not easy to do as API, because at the end the "function" (i.e. LCI) never leaves, it connects to the SSH and stays
But you can query the Task it creates, the project is known, the user is known and it is of special type/tag
preempting lower priority tasks to allow a higher priority task to come in
Well this is usually outside of the scope of "single researcher" / "tiny team"...
This typically a large scale problem
That said, it will be fairly easy to write a service that aborts Tasks, "tags them to be "continued", then later (at night?!) push them back into a queue... wdyt?
BTW: is this on the community server or self-hosted (aka docker-compose)?