Reputation
Badges 1
25 × Eureka!I'm getting lot of bizarre errors running without a docker image attached
I think there is a mix in terminology
ClearML Agent can run in two different modes:
- virtual env - where it create a new venv for every Task executed
- docker mode- where it spins a docker as Base environment, then inside the docker (in real time) it will fetch the code, install missing python packages etc.There is no need to build a specific docker container, for example you can use the "python:3.10-bullseye" d...
JitteryCoyote63 Not sure how/why the X-Pack feature was on (it is not used by the system), but you can disable it with an environment variable in the docker-composexpack.security.enabled=falseShould solve the problem ...
and I've made a script to edit it to our needs as part of the installation processΒ
Β Thanks Martin!
My pleasure, btw: there is no actual need to configure all the clearml.conf values. It will actually take the defaults from the clearml package itself. This means you only need something like:
` api {
server config here
}
sdk.aws.s3{
minio config here
} `
Hi SmallDeer34
Can you see it in TB ? and if so where ?
Sorry if it's something trivial. I recently started working with ClearML.
No worries, this has actually more to do with how you work with Dask
The Task ID is the unique id of the any Task in the system (task.id will return the UID str)
Can you post a toy Dash code here, I'll explain how to make it compatible with clearml π
We suddenly have a need to setup our logging after every
task.close()
Hmm that gives me a handle on things, any chance it is easily reproducible ?
Seems lime someone sitting in the middle and reroutes the request (maybe both https and port) ?!
Basically two options, spin the clearml-k8s-glue, as a k8s service.
This service takes clearml jobs and creates k8s job on your cluster.
The second option is to spin agents inside pods statically, then inside the pods the agent work in venv model.
I know the enterprise edition has more sophisticated k8s integration where the glue also retains the clearml scheduling capabilities.
https://github.com/allegroai/clearml-agent/#kubernetes-integration-optional
@<1687643893996195840:profile|RoundCat60> can you access the web UI over https ?
Hi JitteryCoyote63
So that I could simply do
task._update_requirements(".[train]")
but when I do this, the clearml agent (latest version) does not try to grab the matching cuda version, it only takes the cpu version. Is it a known bug?
The easiest way to go about is to add:Task.add_requirements("torch", "==1.11.0") task = Task.init(...)Then it will auto detect your custom package, and will always add the torch version. The main issue with relying on the package...
You mean like for your internal support channel inside your company ?
IrritableJellyfish76 point taken, suggestions on improving the interface ?
The package is just subdir by the way. So it should not be in installed packages anyways, right?
Correct, also when the agent is spinning the code it will automatically add the root of the git repository to the pythonpath so you should be able to load the package.
This is something that we do need if we are going to keep using ClearML Pipelines, and we need it to be reliable and maintainable, so I donβt know whether it would be wise to cobble together a lower-level solution that has to be updated each time ClearML changes its serialisation code
Sorry if I was not clear, I do not mean for you ti do unstable low-level access, I meant that pipelines are Designed to be editable externally, they always deserialize themselves.
The only part that is mi...
MelancholyBeetle72 there is an RC with a fix, check the GitHub issue for details :)
more like testing especially before a pipeline
Hmm yes, that makes sense.
Any chance you can open a github issue on it?
Let me see if I understand, basically, do not limit the clone on execute_remotely, right ?
When did this PipelineDecorator come. Looks interestingΒ
A few days ago (I think)
It is very cool! checkout the full object proxy interaction on the actual pipeline logic This might be better for your workflow, https://github.com/allegroai/clearml/blob/c85c05ef6aaca4e...
HealthyStarfish45 the pycharm plugin is mainly for remote debugging, you can of course use it for local debugging but the value is just to be able to configure your user credentials and trains-server.
In remote debbugging, it will make sure the correct git repo/diff are stored alongside the experiment (this is due to the fact that pycharm will no sync the .git folder to the remote machine, so without the plugin Trains will not know the git repo etc.)
Is that helpful ?
task.project is the project ID (not name)task.get_project_name() will return the project name
Manually I was installing the
leap
package through
python -m pip install .
when building the docker container.
NaughtyFish36 what happnes if you add to your "installed packages" /opt/keras-hannd ? This should translate to "pip install /opt/keras-hannd" which seems like exactly what you want, no ?
We should probably make sure it is properly stated in the documentation...
Can you see all the agent in the UI (that basically means they are configured correctly and can connect to the server)
Still I wonder if it is normal behavior that clearml exits the experiments with status "completed" and not with failure
Well that depends on the process exit code, if for some reason (not sure why) the process exits with return code 0, it means everything was okay.
I assume this is doing something "Detected an exited process, so exiting main" this is an internal print of your code, I guess it just leaves the process with exitcode 0
If you passed the correct path it should work (if it fails it would have failed right at the beginning).
BTW: I think it is clearml-agent --config-file <file here> daemon ...
UnevenDolphin73 if you have the time to help fix / make it work it will be greatly appreciated π
Hi JealousParrot68
spinning the clearml-agent with docker support (i.e. each experiment is running inside its own container):
https://clear.ml/docs/latest/docs/clearml_agent#docker-mode
Basically you can specify a default docker to use (per agent) and a specific docker container to use per Task (configured in the UI under execution at the bottom)
seems like the server returned 400 error, verify that you are working with your trains-server and not the demoserver :)
actually the issue is that the packages are not being detected π
what happens if you do the following?Task.add_requirements("tensorflow") task = Task.init(...)
JitteryCoyote63 There is a basic elastic license that should always be there. If for some reason it was deleted/expired then the following command should fix it:
curl -XPOST ' http://localhost:9200/_xpack/license/start_basic '