Reputation
Badges 1
25 × Eureka!quick update 1.0.2 will be ready in an hour, apologies 😞
Hi, I would like to understand how I can set the pip cache location for my agent,
ClumsyElephant70 by default the pip cache (and all other cache folders) are mounted back into the host itself ~/.clearml/
I'm assuming the idea is shared cache, if this is the case, do:docker_pip_cache = ~/my_shared_nfs/pip-cachehttps://github.com/allegroai/clearml-agent/blob/e3e6a1dda81bee2dd20a64d09746568e415f1823/docs/clearml.conf#L139
it seems it's following the path of the script i'm using to task.create, eg:
The folder it should run it is the script path you are passing (i.e. "script=ep_fn," )
Wrong path would imply that is it not finding the correct repository, is that the case ?
Jupyter Notebook is fully supported.
Could you try and restart the notebook kernel?
Thank you!
one thing i noticed is that it's not able to find the branch name on >=1.0.6x , while on 1.0.5 it can
That might be it! let me check the code again...
Exactly!
Regarding adding feature store, probably not in the near future, a scalable feature store is quite the project, probably more realistic to somehow have a recipe to deploy with Feast
a task of queue B if the next task is of type A it will have to wait,
It seems you imply there are two types of Tasks and they need to be executed one after the other ?
Hi @<1541954607595393024:profile|BattyCrocodile47> and @<1523701225533476864:profile|ObedientDolphin41>
"we're already on AWS, why not use SageMaker?"
TBH, I've never gone through the ML workflow with SageMaker.
LOL I'm assuming this is why you are asking 🙂
- First, you can use SageMaker and still log everything to ClearML (2 lines integration). At least you will have visibility to everything that is running/failing 🙂
- SageMaker job is a container, which means for ...
Hi FrothyShark37
is the task scheduler only acessible through the SDK?
yes, in the open source version this is strictly code based. I know the enterprise tier has a UI for it, but in terms of features I believe this is equivalent
. Curious what advantage it would be to use the StorageManager
Basically if you set the clearml cache folder to the EFS, users can always do:from clearml import StorageManager local_file = StorageManager.get_local_copy(" ")where local_file is stored on persistent cache (EFS) and the cache is automatically cleaned based on last accessed file
Did you set an agent on a machine? (See clearml agent in docs for details)
Could you test with the same file? Maybe timeout has something to do with the file size ?
Hmmm that sounds like a good direction to follow, I'll see if I can come up with something as well. Let me know if you have a better handle on the issue...
I'm sorry JitteryCoyote63 No 😞
I do know that the enterprise addition have these features (a.k.a vault & permissions), basically to answer these types of situations.
If i have an alternative location for the vscode, where should i indicate in the configuration?
We might need to add support for that, but it should not be a problem to override (e.g. downloadable link like http/s3/ etc.)
Is this something that is doable ?
I have a client that runs clearml-session and i saw from the agent's logs that the installation of vscode fails.
That makes sense, it downloads the vscode in runtime, do you have an alternative location? or maybe it is easier to built a container with the vscode pre installed ?
Hi TrickyRaccoon92
... would any running experiment keep a cache of to-be-sent-data, fail the experiment, or continue the run, skipping the recordings until the server is back up?
Basically they will keep trying to send data to server until it is up again (you should not loose any of the logs)
Are there any clever functionality for dumping experiment data to external storage to avoid filling up the server?
You mean artifacts or the database ?
Hi @<1524922424720625664:profile|TartLeopard58>
Yes this is the default it is designed to serve multiple models and scale horizontally
Hmm yes, that is a good point, maybe we should allow to specify a parameter on the model configuration to help with the actual type ...
same: Not Found (#404)
May I suggest to DM it to me (so it is not public)
with tensorboard logging, it works fine when running from my machine, but not when running remotely in an agent.
This is odd, could you send the full Task log?
LOL that's the spirit , making your team happy is key to success in adoption 🙂
I see... In the triton pod, when you run it, it should print the combined pbtxt. Can you print both before/after ones? so that we could compare ?
Hi @<1694157594333024256:profile|DisturbedParrot38>
You mean how to tell the agent to pull only some submodules of your git?
If this is the case you can actually remove them on your git branch, submodule is a file with a soft link. Wdyt?
Hi @<1523701601770934272:profile|GiganticMole91>
Do you mean something like a git ops triggered by PR / tag etc ?
RipeGoose2
HTML file is not a standalone and has some dependencies that require networking..
Really? I thought that when jupyter converts its own notebook it packages everything into a single html, no?
hmm can you share the log of the Task? (the clearml-session created Task)
@<1556812486840160256:profile|SuccessfulRaven86> is the issue with flask reproducible ? if so could you open a github issue, so we do not forget to look into it?