Reputation
Badges 1
25 × Eureka!he problem is due to tight security on this k8 cluster, the k8 pod cannot reach the public file server url which is associated with the dataset.
Understood, that makes sense, if this is the case then the path_substitution feature is exactly what you are looking for
It looks somewhat familiar ... 😞
SuccessfulKoala55 any idea?
That said , if you could open a github issue and explain the idea behind it, I think a lot of people will be happy to have such process , i.e. CI process verifying code. And I think we should have a "CI" flag doing exactly what we have in the "hack" wdyt?
CheerfulGorilla72 sounds like a great idea, I'll pass along to documentation ppl 🙂
server-->agent is fast, but agent-->server is slow.
Then multiple connection will not help, this is the bottleneck of the upload speed of your machine, regardless of what the target is (file-server, S3, etc...)
I would like to force the usage of those requirements when running any script
How would you force it? Will you just ignore the "Installed Packages" section ?
Hmm interesting, is it a drop in replacement to poetry ?
I... did not, ashamed to admit.
UnevenDolphin73 😄 I actually think you are correct, meaning I "think" that what you are asking is the low level logging (for example debug that usually is not printed to console) to also log? is that correct ?
Hi CheerfulGorilla72
is it ideological...
Lol, no 😀
Since some of the comparisons are done client side (browser, mostly the text comparisons) it is a bit heavy , so we added a limit. We want to change it so it does some on the backend, but in the meantime we can actually expand the limit, and maybe only lazy compare the text areas. Hopefully in the next version 🤞
We actually plan to create different queues for different types of workloads, we are a bit seeing what the actual usage is to define what type of workloads make sense for us.
That sounds like a great path to take, it will make it very clear fro users on what they will be getting and why they should use specific queue.
As for the memory, yes the reasoning is clear, the main thing we'll have to see is hot define the limits, because we have nodes with quite different resources availab...
So maybe the path is related to the fact I have venv caching on?
hmmm could be...
Can you quickly disable the caching and try ?
To be honest, I'm not sure I have a good explanation on why ... (unless on some scenarios an exception was thrown and caught silently and caused it)
Hi DefeatedCrab47
You mean by trains-agent, or accumulated over all experiences ?
however setting up the interpertier on pycharm is different on mac for some reason, and the video just didnt match what I see
MiniatureCrocodile39 Are you running on a remote machine (i.e. PyCharm + remote ssh) ?
FiercePenguin76 in the Tasks execution tab, under "script path", change to "-m filprofiler run catboost_train.py".
It should work (assuming the "catboost_train.py" is in the working directory).
Hi @<1683648242530652160:profile|ApprehensiveSeaturtle9>
I send a request to the endpoint but never unload (the gpu memory keep increasing when I infer with a new model).
They are not unloaded after the request is done. see discussion here: None
You can however remove the model from the serving session (but I do not think this is what you meant)
I'm assuming you want to run multiple models on a single GPU with not en...
if we look at the host machine we can see a single python process that is actually busy
Only one?! can you see the other python processes ?
The api server by default spins multiple processes (they all might be busy a tye time with a huge flood of requests, but this is still multi process). Let me check if there is an easy way to set more processes
MysteriousBee56 that is so weird ... last one, I promise 🙂docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo \$(which python3) && echo \$(which trains-agent)"
BoredHedgehog47
is this ( https://clearml.slack.com/archives/CTK20V944/p1665426268897429?thread_ts=1665422655.799449&cid=CTK20V944 ) the same issue (or solution) ?
DilapidatedDucks58
is there any way to post Slack alerts for the frozen experiments?
The latest RC should solve the PyTorch data loader, do you want to test it?pip install clearml==0.17.5rc2
(It would be nice to have all the Pypi releases tagged in github btw)
I wanted to say, we listen ... and point to the tag , but for some reason it was not pushed LOL.
Hi JitteryCoyote63
The easiest is to inherit the ResourceMonitor class and change the default logging rate (you could also disable some of the metrics).
https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/task.py#L565
Then pass the new class to Task.init as auto_resource_monitoring
WickedGoat98
I will try to collect the installation steps in a document and share it to the community once ready
Thank you! this will be awesome !
We're here if you need anything 🙂
EnviousPanda91 notice that when passing these arguments to clearml-agent you are actually passing default args, if you want an additional argument to Always be used, set the extra_docker_arguments here:
https://github.com/allegroai/clearml-agent/blob/9eee213683252cd0bd19aae3f9b2c65939d75ac3/docs/clearml.conf#L170