
Reputation
Badges 1
25 × Eureka!and does the code above reproduce the issue/bug? because obviously should not happen
SubstantialElk6 I just realized 3 weeks passed, wow!
So the good news we have some new examples:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_functions.py
The bad news the documentation was postponed a bit, as we are still messaging the interface (the community is constantly pushing for great ideas and uses cases , and they are just too good to miss out π )...
Or can it also be right after
Task.init()
?
That would work as well π
@<1523701079223570432:profile|ReassuredOwl55> did you try adding manually ?
./path/to/package
You can also do that from code:
Task.add_requirements("./path/to/package")
# notice you need to call Task.add_requirements before Task.init
task = Task.init(...)
Hi @<1523701111020589056:profile|DefiantSpider5>
So there are two answers here, I'll start with the open-source version of both
Is there a way in clear ml to interactively view subsets of images based on a lasso of embedding plots
The ClearML Datasets have no "query" capabilities of the data inside the dataset. That means you can see preview images, statistics and download the datasets, but no query capabilities. On the other hand, there is no limitation on the type and format of me...
How do I best utilize clearml in this scenario such that any coworker of mine is able to reproduce my work with the same pipeline?
Basically this sounds to me like proper software developemnt design (i.e. the class vs stages).
In order to make sure Anyone can reproduce it, you mean anyone can rerun the "pipeline" ? If this is the case just add Task.init (maybe use a specific Task type) and the agents will make sure this is Fully reproducible.
If you mean the data itself is stored, the...
send the agent's logs to log management and monitoring service,
These are stored into ELK, it was built to store large amounts of logs, I cannot see any reason why one would want to remove it?
Maybe if there would be a way to change their format, it could also help filtering them from my side.
You mean in the UI?
Retrying (Retry(total=239, connect=240, read=240, redirect=240, status=240)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))': /auth.login
OH that makes sense I'm assuming on your local machine the certificate is installed but not on remote machines / containers
Add the following to your clearml.conf:
api.verify_certificate: false
[None](https...
Hi @<1730396272990359552:profile|CluelessMouse37>
However, the caching doesn't seem to be working correctly. Despite not changing the configuration, the first step runs every time.
How are you creating the cached component?
is this a standalone script or a git repo link?
These parameters are dictionaries of specific configurations (dict of dict) that are the same but might not be taken into account properly by the caching mechanism.
hmm for the component to be cached (or reuse...
Click on the "k8s_schedule" queue, then on the right hand side, you should see your Task, click on it, it will open the Task page. There click on the "Info" Tab, there look for "STATUS MESSAGE" and "STATUS REASON". What do you have there?
HandsomeCrow5 Ideas on improvement are always welcome π
Hi DeliciousBluewhale87
My theory is that the clearml-agent is configured correctly (which means you see it in the clearml-server). The issue (I think) is that the Task itself (running inside the docker) is missing the configuration. The way the agent passes the configuration into the docker is by mapping a temporary configuration file into the docker itself. If the agent is running bare-metal, this is quite straight forward. If the agent is running on k8s (or basically inside a docker) th...
The warning just let's you know the current processes stopped and itis being launched on a remote machine.
What am I missing? Is the agent failing to run the job that you create manually ?
(notice that when creating a job manually, there is no "execute_remotely", you just enqueue it, as it is not actually "running locally")
Make sense ?
I'm having another problem now because I am using the OptunaOptimizer.
Hmm let me check a sec
YummyWhale40 from the code snippet, it seems like the argument is passed.
"reuse_last_task_id=True" is the default, and it means that if the previous run of the task did not create any artifacts/models and was executed 72 hours ago (configurable), The Task will be reset (i.e. all logs cleared) and will be reused in the current run.
Hi JitteryCoyote63 when you run the trains-agent it tells you where it puts the logs, it's a temp auto generated filename usually under /tmp/Running TRAINS-AGENT daemon in background mode, writing stdout/stderr to /tmp/.trains_agent_daemon_out4uahki3i.txt
(I'll make sure it is added to the docstring because apparently it was not there
Hi JitteryCoyote63
I think that what happens is that the agent are registered on the same name (id). How many agent do you see in the "Workers" tab?
It should move you directly into the queue pages.
Let me double check (working on the community server)
ERROR: Could not install packages due to an EnvironmentError:
[Errno 28] No space left on device
BTW: @<1523703080200179712:profile|NastySeahorse61> this sounds like docker out of space on the Main disk '/var/` where it stores all the images and temp file systems
This will cause you code to fail as any runtime change to the container file system will raise this out of disk space error
Verified, you are correct "." in label enumeration will break the clone .
I'll make sure this bug is passed to backend guys to fix. Thanks TenseOstrich47 !
meanwhile maybe "_" instead ? π
Ohh, hmm, that is odd, there should not be a limit there. Let me check ....
PompousParrot44 these are the default plotly colors. You can change any of the layout properties with the
https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/trains/logger.py#L600
But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
that depends, how would the sibling packages get to a remote machine ?
Hi @<1552101458927685632:profile|FreshGoldfish34>
self-hosted, you mean the open source ? if so, then yes totally free π
That said I would recommend to have the server inside your VPN, just in case from a security perspective
If i have an alternative location for the vscode, where should i indicate in the configuration?
We might need to add support for that, but it should not be a problem to override (e.g. downloadable link like http/s3/ etc.)
Is this something that is doable ?
Hi UnevenDolphin73
Took a long time to figure out that there was a specific Python version with a specific virtualenv that was old ...
NICE!
Then the task requested to use Python 3.7, and that old virtualenv version was broken.
Yes, if the Task is using a specific python version it will first try to find this one (i.e. which python3.7
) then use it to create the new venv
As a result -> Could the agent maybe also output theΒ
virtualenv
Β version used ...