Reputation
Badges 1
25 × Eureka!Tested with two sub folders, seems to work.
Could you please test with the latest RC:pip install clearml==0.17.5rc4
Hi @<1657918706052763648:profile|SillyRobin38>
You should either disable certificate verification or add the self-signed certificate to your urllib
None
or set
export REQUESTS_CA_BUNDLE="/path/to/cert/file"
export SSL_CERT_FILE="/path/to/cert/file"
i keep getting an failed getting token error
MiniatureCrocodile39 what's the server you are using ?
Hi SmallDeer34
ClearML automagical logging will work on the current python process. But in your example yyour Bash is running another python script (that has nothing to do with the original notebook), hence clearml automagic is not aware of it (i.e. it cannot "patch" the tensorboard calls).
In order to make it work.
you should do something like:from joeynmt import train train.main(...)Or something similar 🙂
Make sense ?
, I can see the shape is
[136, 64, 80, 80]
. Is that correct?
Yes that's correct. In case of the name, just try input__0
Notice you also need to convert it to torchscript
Hi @<1556812486840160256:profile|SuccessfulRaven86>
it does not when I run a flask command inside my codebase. Is it an expected behavior? Do you have some workarounds for this?
Hmm where do you have your Task.init ?
(btw: what's the use case of a flask app tracking?)
Then I deleted those workers,
How did you delete those workers? the autoscaler is supposed to spin the ec2 instances down when they are idle, in theory there is no need for manual spin down.
Okay I have an idea, it could be a lock that another agent/user is holding on the cache folder or similar
Let me check something
Hi @<1545216070686609408:profile|EnthusiasticCow4>
is there a way to get the date from the InputModel?
You should be able to with model._get_model_data()
But I think we should have it all exposed, wdyt?
Hi GreasyPenguin66
So the way clearml can store your notebook is by using the jupyter-notebook rest api. It assumes, that it can communicate with it as the kernel is running on the same machine. What exactly is the setup? is the jupyter-lab/notebook running inside the docker? maybe the docker itself is running with some --network argument ?
Thank you JuicyOtter4 ! 😍
. Is there a way to programmatically set that in the code?
Something like?
` task = Task.init(...)
probably we should change that to description ?!
task.set_comment("best thing ever") `
HealthyStarfish45 you mean like replace the debug image viewer with custom widget ?
For the images themselves, you can get heir urls, then embed that in your static html.
You could also have your html talk directly with the server REST API.
What did you have in mind?
No I mean configure the files_server in the clearml.conf
Can you try to run the example code, see if that works for you?
I have no idea what string reference could be used when steps come from Task?
Oh I see, you are correct, when it comes to Tasks the assumption is your are passing strings (with selectors on the strings, i.e. the curly brackets) but there is no fancy serialization/deserialization as you have with pipelines from decorators / functions. The reason for that is that the Task itslef is a standalone, there is no way for the pipeline logic to actually "pull data" from it and "pass" it to the o...
Simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.
Ohhh I missed that. What is the speed you get for uploading the artifacts to the server? (you can test it with simple toy artifact upload code) ?
Hi RipeGoose2
I think it "should" take of uploading the artifacts as well (they are included in the zip file created by the offline package)
Notice that the "default_output_uri" on the remote machine is meaningless as it stored them locally anyhow. It will only have an effect on the machine that actually imports the offline session.
Make sense ?
One option is definitely having a base image that has the things needed. Anything else? Thanks!
This is a bit complicated, to get the cache to kick in you have to mount an NFS file into the pod as the cache (to create a persistent cache)
Basically, spin NFS pod to store the cache, change the glue job template yaml to mount it into the pod (see default cache folders:
/root/.cache/pip and /root/.clearml/pip-download-cache)
Make sense ?
See the last package in the package list:
- wget~=3.2
- trains~=0.14.1
- pybullet~=2.6.5
- gym-cartpole-swingup~=0.0.4
- //github.com/ajliu/pytorch_baselines
WickedGoat98
for such pods instantiating additional workers listening on queues
I would recommend to create a "devops" user and have its credentials spread across all agents. sounds good?
EDIT:
There is no limit on number of users on the system, so login as a new one and create credentials in the "profile" page :)
My pleasure 🙂
I do not think it should change anything, just pull the latest agent and reinstallpip3 install -U clearml-agent
Hi ShinyPuppy47 ,
Yes that is correct. Use Task.init for automagic logging
Hi @<1715900788393381888:profile|BitingSpider17>
Notice that you need __ (double underscore) for converting "." in the clearml.conf file,
this means agent.docker_internal_mounts.sdk_cache will be CLEARML_AGENT__AGENT__DOCKER_INTERNAL_MOUNTS__SDK_CACHE
None
All in all, seems like it will be fairly easy to add JupyterHub to clearml-session, and that would solve your issue, no?
(and it seems from implementation perspective, this will not be a lot of work)
wdyt?
could one also limit the number of CPU cores available?
If you are running in docker mode you can add:--cpus=<value>see ref here: https://docs.docker.com/config/containers/resource_constraints/
Just add it to extra_docker_arguments :
https://github.com/allegroai/clearml-agent/blob/2cb452b1c21191f17635bcb6222fa8bfd82afe29/docs/clearml.conf#L142
You cannot change the 8008 port, it has to be 8008 externally (i.e. from the client side).
You can however do subdomain, but only these will work:api.mydomain.com app.mydomain.com files.mydomain.com
BeefyCow3 On the plot itself click on the json download button
SpotlessFish46
yes you can access the entire code in the incomitted changes, you can test it with:task = Task.get_task(task_id='aabb') task_dict = task.export_task()2. correct, but then if you need the entire code base you need to clone the arepo and apply the uncommitted changes. Basically trains-agent does that when execute with buildtrains-agent build --id aabb --target ~/my_task_env3. See (2)
shared "warm" folder without having to download the dataset locally.
This is already supported 🙂
Configure the sdk.storage.cache.default_base_dir in your clearml.conf to point to a shared (mounted) folder
https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L205
That's it 🙂