Can you send the full log? This is odd, it will by default use the python executable it (the agent) is running with.
Regardless you can specify the python executable to be used here:
https://github.com/allegroai/clearml-agent/blob/bd411a19843fbb1e063b131e830a4515233bdf04/docs/clearml.conf#L44
What's the exact error you are getting ?
(Maybe this is privilege error on the cache folder, what are the folders it is using, you can see in the configuration as well)
BattyLion34 let me see if I understand.
The same base_task_id when cloned by the UI and enqueues on the same queue as the pipeline, will work but when the pipeline runs the same Task it fails?!
Could it be that you enqueue them on different queues ?
Meanwhile you can just sleep for 24hours and put it all on the services queue. it should work 🙂
Example here:
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
Hmm I think everything is generated inside the c++ library code, and python is just an external interface. That means there is no was to collect the metrics as they are created (i.e. inside the c++ code), which means the only was to collect them is to actively analyze/read the tfrecord created by catboost 😞
Is there a python code that does that (reads the tfrecords it creates) ?
I think the main issue is that for some reason the container running changed one of the files inside the temp folder. then the host machine is "stuck" with a file that the root user owned/changed, and now it cannot reuse / delete the temp folder.
I think the fix is to make sure the container deleted the temp folder when it is done
Retrying (Retry(total=239, connect=240, read=240, redirect=240, status=240)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))': /auth.login
OH that makes sense I'm assuming on your local machine the certificate is installed but not on remote machines / containers
Add the following to your clearml.conf:
api.verify_certificate: false
[None](https...
This is odd I was running the example code from:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
It is stored inside a repo, but the steps that are created (i.e. checking the Task that is created) do not have any repo linked to them.
What's the difference ?
There is no way to create an artifact/model/dataset without a task, right?
Models are a an entity of it's own, and you can actually create one without a Task.
(just for my own interest: how much does the enterprise version divert from the open source version? It it just extended or are there core changes to the enterprise version)
It adds a few security layers on top, and adds a few features that are just not part of the open source (RBAC, hyper-datasets, advanced scheduling, cu...
that should have worked, do you want send the log?
Hi UnsightlySeagull42
Basically you can get the agent to always add additional arguments for the docker run, such as -v for mounting:
https://github.com/allegroai/clearml-agent/blob/948fc4c6ce1ecf33a74619ad570d69b8188f6db9/docs/clearml.conf#L133
Let's try:
` echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && b...
that clearml-agent needs to be installed from system python mentioned anywhere in the docs, if not I suggest it gets added.
You are right, I will check and fix if not 🙂
Thank you so much for helping.
My pleasure
is there a code that can reproduce it ?
Using the dataset.create command and the subsequent add_files, and upload commands I can see the upload action as an experiment but the data is not seen in the Datasets webpage.
ScantCrab97 it might be that you need the latest clearml
package installed on the client end (as well as the new server with the UI)
What is your clearml package version ?
Hi @<1547390415320125440:profile|SilkySparrow85>
because it is trying to send a debug-sample to fileserver!
Yes, you should always configure the "files server" to point to your minio S3, basically:
None
files_server: "
"
But do not forget to also configure the credentials here:
[None](https://github.com/allegroai/clearml/blob/40c6db9d95016382c721546d42...
Hi IrritableGiraffe81
Can you share a code snippet ?
Generally I would trytask = Task.init(..., auto_connect_frameworks={"pytorch': False, 'tensorflow': False)
I'm kind of at a point where I don't know a lot of what to even search for.
we feel you 💗 , yes there still isn't a very good source of information on where to get started...
This is because the entire field is constantly changing and evolving, and one solution will usually only apply to specific use case...
I would start with the mlops community slack channel, and youtube talks (specifically those by companies describe how they built their own internal infrastructure, i...
Hi AverageBee39
It seems the json is corrupted, could that be ?
FlatStarfish45
In the parent task, the libs appear installed.
What do you mean by "parent Task"? Is this the base task we are optimizing (i.e. the experiment / model we are optimizing) ?
Or is it the "Optimization Task" itself?
PompousBeetle71 is this ArgParser argument or a connected dictionary ?
LazyLeopard18 you can point the artifact directly to your azure object storage and have StorageManager download and cache it for you:
another option is the download fails (i.e. missing credentials on the client side, i.e. clearml.conf)
I think we should open a GitHub Issue and get some more feedback, maybe we should just add support in the backend side ?
Hmm, I still wonder what is the "correct" answer for most people, is empty string in argparse redundant anyhow? will someone ever use it?
Hi MagnificentPig49 unfortunately it's only in the trains-server docker, we are working on making it "presentable" and uploading it to it's repo.
It's written in Angular (v8 I think). Do you want to help out, it will definitely incentive the guys to tidy up the code and upload it :)