Reputation
Badges 1
25 × Eureka!and since the update the docs seem to be a bit off but I think I got it
Working on a whole new site 😉
yes.
Obviously when you import the offline session, you will need to set it to point to your server with the correct credentials
With pleasure, I'll make sure we officially release RC1 soon :)
It all depends how we store the meta-data on the performance. You could actually retrieve it from the say val metric and deduce the epoch based on that
/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py
Yep I see it now, could you simulate locally (i.e have the other folders in the path as well)?
could it be you also have a file somewhere that is called sfi or imagery or models or chip_classifier that it accidently tries to import first from ?
but maybe hyperparam aborts in those cases?
from the hyperparam perspective it will be trying to optimize the global minimum, basically "ignoring" the last value reported. Does that make sense ?
GrievingTurkey78
Both are now supported, they basically act the same way 🙂
and log overrides + the final omegaconf
the time taken to upload halved. It is puzzling because as you say it's not that much to upload.
Maybe it was the load on the server? meaning dealing with multiple requests at the same time delayed the requests?!
For now I've whittled down the number of entries to a more select but useful few and that has solved the issue. If it crops up again I will try
connect_configuration
properly.
Thanks for your help!
My pleasure 🙂
The docker crashes and I want to be abel to debug it exactly as it is run by the agent
On your machine (any machine)
pip install clearml-agent
clearml-agent build --id <taskID> --docker "local_mydocker_name"
docker run -it local_mydocker_name bash
My question is what should be the path to the requirements.txt file?
Is it relative to the repo base?
This is actually in runtime (i.e. when running the code), so relative to the working directory. Make sense ? (you can specify absolute path, probably something I would avoid in the code base though...)
if they're mission critical, but rather the clearml cache folder?
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
the error for uploading is weird
wait, are you still getting this error?
There is no way to create an artifact/model/dataset without a task, right?
Models are a an entity of it's own, and you can actually create one without a Task.
(just for my own interest: how much does the enterprise version divert from the open source version? It it just extended or are there core changes to the enterprise version)
It adds a few security layers on top, and adds a few features that are just not part of the open source (RBAC, hyper-datasets, advanced scheduling, cu...
These both point to nvidia docker runtime installation issue.
I'm assuming that in both cases you cannot run the docker manually as well, which is essentially what the agent will have to do ...
That makes total sense.
So right now you can probably use clearml-session to spin a session in any container, add the jupyterhub to the requirements like so:clearml-session --packages jupyterhub
Then ssh + run jupyerhub + tunnel port?ssh roo@IP -p 10022 -L 6666:localhost:6666 $ jupyterhub
Would that work?
Maybe it is better to add an option to use jupyterhub instead of jupyterlab ?
wdyt?
Basically what I want is a
clearml-session
but with a docker container running JupyterHub instead of JupyterLab.
I missed that 🙂
The idea of clearml-session
is to launch a container with jupyterlab (or vscode) on a remote machine, and connect the users machines (i.e. the machine executed the clearml-session
CLI) directly into the container.
Pleacing the jupyterlab with JupyterHub will be meaningless here, becuase the idea it spins an instance (contai...
But this is not copy, this is mount, your log showed cp failing
Yes it should
here is fastai example, just in case 🙂
https://github.com/allegroai/clearml/blob/master/examples/frameworks/fastai/fastai_with_tensorboard_example.py
Hi CrookedAlligator14
Hi, I just started using clearml, and it is amazing!
Thank you! 😍
When I enqueue the task, the venv is setup and starts to install all the packages from the
requirements.txt
file, but at the end I get the following in the console:
Can you try with the latest agent, we improved the support for pytorch (they now have a proper pypi compatible repo), can you see if that solves it?pip3 install clearml-agent==1.5.0rc0
My data is already in a directory on the clearml-server machine and I do not want to copy it, just add it to clearml as dataset.
So the short answer is, no, it needs to packager it (read "zip it")
The reason is clearml-data creates an Immutable copy, and just "pointing" to files located somewhere will usually break very easily.
That said, actually it will be relatively easy to add as dataset itself stores links to the files and these links could actually point to an S3 bucket (for exa...
@<1671689437261598720:profile|FranticWhale40> I might have found something, let me see if I can reproduce it
I think it's inside the container since it's after the worker pulls the image
Oh that makes more sense, I mean it should not build the from source, but make sense
To solve for build for source:
Add to the "Additional ClearML Configuration" section the following line:agent.package_manager.pip_version: "<21"
You can also turn on venv caching
Add to the "Additional ClearML Configuration" section the following line:agent.venvs_cache.path: ~/.clearml/venvs-cache
I will make sure w...
yes, it worked. thank you very much.
ScantCrab97 nice!
. it was indeed a matter of the subnets....
BrightRabbit75 you are awesome, thank you!
(now we probably need to add it to the faq somewhere?!)
No, an old experiment changed, nothing was rerun
ohh, that is odd. I think the max iteration value is stored on the DB, which is odd if it changed after an update.
BTW: just making sure, could it be these Tasks were imported ? (i.e. offline execution + import)
But I have no idea what will be input of step2.
What do you mean by that? the assumption is that somehow the output of step 1 will be passed (a string reference) to step 2, what am I missing ?
It's in my local conda environment though.
Meaning this is a wheel installed manually in conda? or is it a folder inside the conda environment ?
Yes, albeit not actually "intercept" as the user will be able to directly put Task sin queues B_machine_a/B_machine_b , but any time the user is pushing Tasks into queue B, this service will pull it and push to the individual machines queue.
what do you think?