
Reputation
Badges 1
25 × Eureka!So if I do this in my local repo, will it mess up my git state, or should I do it in a fresh directory?
It will install everything fresh into the target folder (including venv and code + uncommitted changes)
Hi StaleHippopotamus38
I imagine I could make the changes specified in the warning toΒ
/etc/security/limits.conf
Yep seems like elastic memory issue, but I think the helm chart takes care of it,
You can see a reference in the docker compose:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L41
According to you the VPN shouldn't be a problem right?
Correct as long as all parties are on the same VPN it should work, all the connections are always http so basically trivial communication
2 and 3 - I want to manage access control over the RestAPI
Long story short, put a load-balancer in front of the entire thing (see the k8s setup), and have the load-balancer verify JWT token as authentication (this is usually the easiest)
1- Exactly, custom code
Yes, we need to add a custom example there (somehow forgotten)
Could you open an Issue for that?
in the meantime:
` #
Preprocess class Must be named "Preprocess"
No need to inherit or to implement all methods
lass P...
I guess. or pipelines that you can compose after running experiments to see that experiments are connected to each other
hmm what do you mean by "compose after running experiments" ? like a way to group them? what is the relation between one "item" to another ?
If this is a sequence of Tasks , are they executed by a controller ?
I want to be able to delete only the logs since they are taking a lot of space in my case.
I see... I do not think this is possible π
You can disable the auto logging though ... pass auto_connect_streams=False
to Task.init
that clearml-agent needs to be installed from system python mentioned anywhere in the docs, if not I suggest it gets added.
You are right, I will check and fix if not π
Thank you so much for helping.
My pleasure
Hi SkinnyPanda43
Yes, I think you are right the documentation might be missing it. I'll make sure they know it π
In the meantime :task.update_output_model
https://github.com/allegroai/clearml/blob/d3929033c016476c580557639ff44f900e65904a/clearml/backend_interface/task/task.py#L734
GiddyTurkey39 can you ping the server-address
(just making sure, this should be the IP of the server not 'localhost')
and I have no way to save those as clearml artifacts
You could do (at the end of the codetask.upload_artifact('profiler', Path('./fil-result/'))
wdyt?
Yes my bad π
Let's try again:
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.7rjdh80a.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.ppsd9sze:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/dwhitena/.clearml/pip-cache:/root/.cache/pip ...
Hi SkinnyPanda43
Can you attache the full log?
Clearml agent is installed before your requirements.txt , at least in theory it should not collide
Seems lime someone sitting in the middle and reroutes the request (maybe both https and port) ?!
Hi UptightBeetle98
The hyper parameter example assumes you have agents ( trains-agent
) connected to your account. These agents will pull the jobs from the queue (which they are now, aka pending) setup the environment for the jobs (venv or docker+venv) and execute the job with the specific arguments the optimizer chose.
Make sense ?
Also can you right click on the image and save it on your machine, see if it is cropped, or it is just a UI issue
GiddyTurkey39 Hmm I'm assuming that by default it cannot access that IP range.
Are you using virtual-box for the VM?
EDIT:
Can I assume the machine running the VM (a.k.a the host) can access the trains-server
?
GiddyTurkey39
I would guess your VM cannot access the trains-server
, meaning actual network configuration issue.
What are VM ip and the trains-server IP (the first two numbers are enough, e.g. 10.1.X.Y 174.4.X.Y)
pip cache & git cache & venvs cache
Are all supported, you just need to map the folders.
If you do not want to spin a PVC with NFS mount, you can just mount an S3 bucket with s3fs as part of the container extra bash script,
https://github.com/allegroai/clearml-agent/blob/b39b54bbafab39e6731cb742fdf317bc6dcae54a/docs/clearml.conf#L140
s3 FUSE fuse filesystems:
https://github.com/kahing/goofys
https://github.com/s3fs-fuse/s3fs-fuse
WDYT?
I suppose one way to perform this is with a
that kicks
Yes, that was my thinking.
It seems more efficient to support a triggered response to task fail.
Not sure I follow this one, I mean the pipeline logic itself monitors the execution. If I'm not mistaken, try/except will catch a step that files, and a global will catch the entire pipeline. Am I missing something ?
is how you would create different queues,
SarcasticSquirrel56 you can create them from the UI, when the server is already running
(if you are saying, how do I create them in the first installaiton, then yes you are correct, this is possible in the helm chart, I think π )
SarcasticSquirrel56
if I configure manually the pods for the different nodes, how do I make clearml server aware that those agents exist?
Basically the agent register themselves on your cleaml-server, and they register on which Queue(s) they listen to. In other words the interface to choose the different types of machines/gpus is by enqueue the Task to different queues.
For example: Queue(1): "CUDA11_GPUx1" , Queue(2): "CUDA10_GPUx1"
Make sense ?
EDIT:
I guess to achieve what I w...
Found it
GiganticTurtle0 you are 𧨠! thank you for stumbling across this one as well.
Fix will be pushed later today π
This is odd... can you post the entire trigger code ?
also what's the clearml version?
Hi CharmingBeetle38
On the base task, do you see those arguments under the Configuration tab?
Also, if they are under Args section, you should add "Args/" prefix to the HP optimization (this is how you differentiate between the sections)
Could you amend the original snippet (or verify that it also produces plots in debug samples) ?
(Basically I need something that I can run π )
Hi RoughHedgehog31
I'm assuming your git diff is just too big to be stored as is (probably some binary files)
it should not really have any effect on the execution, it just means the clearml-agent will not be able to reproduce the uncommitted changes.
Make sense ?
MagnificentSeaurchin79 you can delay it with:task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
, I generate some more graphs with a file calledΒ
graphs.py
Β and want to attach/upload to this training task
Make total sense to use Task.get_task, I just want to make sure that you are aware of all the options, so you pick the correct one for you :)
Hi MagnificentSeaurchin79
Unfortunately there is currently no way to reorder the plots, but you have a valid point. I suggest a GitHub UX issue ?
Regrading the debug samples, the difference is that the confutation matrix report is actually metadata, you can get these numbers by the API or the download, but the debug samples are static images ...
BTW: you can try to produce an interactive side by side confusion matrix with plotly, and use report_plotly_figure