Reputation
Badges 1
25 × Eureka!Hi @<1533620191232004096:profile|NuttyLobster9>
I, but no system stats. ,,,
If the job is too short (I think 30 seconds), it doesn't have enough time to collect stats (basically it collects them over a 30 sec window, but the task ends before it sends them)
does that make sense ?
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
Yes exactly! it should be very easy
Just Inherit from RandomSearch and change create_job
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/automation/optimization.py#L1043
Hi @<1597762318140182528:profile|EnchantingPenguin77>
, but it seems like clearml always create a virtual environmen
Yes that's correct, but the new venv inside the container inherits from the system packages (so if nothing changes it does nothing)
Is there a way that I can have the clearml-task to automatically activated a virtual environment use the activated custom virtual environment in my docker and run the scripts
Yoo can but the "correct" way to work with python and co...
Maybe WackyRabbit7 is a better approach as you will get a new object (instead of the runtime copy that is being used)
It's the safest way to run multiple processes and make sure they are cleaned afterwards ...
HugeArcticwolf77 changing the color is definitely a feature we will have in the next version, right now I think you cannot π it is randomly chosen based on the title/series and I think your example is a great failure case of that randomness π
I can't seem to figure out what the names should be from the pytorch example - where did INPUT__0 come from
This is actually the latyer name in the model:
https://github.com/allegroai/clearml-serving/blob/4b52103636bc7430d4a6666ee85fd126fcb49e2e/examples/pytorch/train_pytorch_mnist.py#L24
Which is just the default name Pytorch gives the layer
https://discuss.pytorch.org/t/how-to-get-layer-names-in-a-network/134238
it appears I need to converted into TorchScript?
Yes, this ...
I see, that means xarray is not an actual package but a folder add to the python path.
This explains why Task.add_requirements fails, as it is supposed to add python packages to the equivalent of "requirements.txt" ...
Is the folder part of the git repository ? How would you pass it to the remote machine the cleamrl-agent is running on?
Could it be that clone has to be False? (I assume the reasoning is the cloning feature)
WickedGoat98 did you setup a machine with trains-agent pulling from the "default" queue ?
Nope - confirmed to be running on the OS's Python environment,
okay so bare metal root is definitely not recommended.
I'm not sure how/why it get's stuck though π
Any chance you can run the agent as non-root?
Also maybe preferred in docker mode, so it is easier for you to control the environment of the Task
Hi @<1798887585121046528:profile|WobblyFrog79>
. When I execute the pipeline remotely in Kubernetes, those components
two things, one, make sure you specify the repo you need the components from in the decorator function, what will happen is the repo will be cloned into the container running on k8s, then inside the repo root your script (i.e. pipeline step) will be running.
[None](https://github.com/clearml/clearml/blob/9c93aa9e538075c848647dcd88e3e12bec051b5f/clearml/automation/con...
yes i can communicate with the server, i managed to put tasks in the queue and retrieve them as well as running tasks with metrics reporting
Through the UI or python code ?
We are working hard on release 1.7 once that is out we will push an RC for review (I hope) π
Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)
This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).
Yes, that's exactly how clearml is designed, a...
I do expect it toΒ
pip
Β install though which doesnβt root access I think
Correct, it is installed on a venv (exactly for that).
It will not fail if the apt-get fails (only warnings)
Let me know if it worked
AgitatedTurtle16 could you check with the latest clearml RC (I remember a similar issue was fixed).pip install clearml==0.17.5rc3Then run againclearml-task ...
SmugOx94
after having installedΒ
numpy==1.16
Β in the first case orΒ
numpy==1.19
Β in the second case. Is it correct?
Correct
the reason is simply that I'd like to setup an MLOps system where
I see the rational here (obviously one would have to maintain their requirements.txt)
The current way trains-agent works is that if there is a list of "installed packages" it will use it, and if it is empty it will default to the requirements.txt
We cou...
I would do something like:
` from clearml import Logger
def forward(...):
self.iteration += 1
weights = self.compute_weights(...)
m = (weights * (target-preds)).mean()
Logger.current_logger().report_scalar(title="debug", series="mean_weight", value=m, iteration=self.iteration)
return m `
agent.cuda_driver_version = ...
agent.cuda_runtime_version = ...
Interesting idea! (I assume for reporting only, not configuration)
... The agent mentionned used output from nvcc (2) ...
The dependencies I shared are not how the agent works, but how Nvidia CUDA works π
regrading the cuda check with nvcc , I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvid...
Hi @<1562973095227035648:profile|ThoughtfulOctopus83>
The host should be just the host name, no https prefix, I'm assuming that's the issue
JitteryCoyote63 if this is simulating an agent, the assumption is that the Task was already created, hence the task ID.
If i am working with Task.set_offline(True)
How would the two combine ? I mean off-line is be definition not executed by an agent, what am I missing ?
I have to leave i'll be back online in a couple of hours.
Meanwhile see if the ports are correct (just curl to all ports see if you get an answer) if everything is okay, try again to run the text example
Fixed in pip install clearml==1.8.1rc0 π
CleanPigeon16 , just making sure, docker is installed and configured on the host machine (i.e. Azure machine)?
TBH ClearML doesn't seem to be picking the model up so I need to do it manually
This is odd, cleamrl will pick framework level serialization, but not just any pickle call
Why do I need an output_uri for the model saving? The dataset API can figure this out on its own
So that it knows where to upload it, if your are setting True this will be the default files server, you can also set iy for shared files system, S3 GCP storage etc.
If no value is passed, it will just log th...
Apparently the error comes when I try to access from
get_model_and_features
the pipeline component
load_model
. If it is not set as pipeline component and only as helper function (provided it is declared before the components that calls it (I already understood that and fixed, different from the code I sent above).
ShallowGoldfish8 so now I'm a bit confused, are you saying that now it works as expected ?
Hi DizzyPelican17
Iβd like to configure requirements file, docker image, docker command for my pipeline controller, but it seems I cannot set it up. Am I missing something?The decorator itself accepts those as arguments:
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#pipelinedecoratorcomponent
https://github.com/allegroai/clearml/blob/90f30e8d9a5ca9a1afa6b2e5ffccb96b0afe9c78/examples/pipeline/pipeline_from_decorator.py#L8
Iβd like to setup up...