Reputation
Badges 1
25 × Eureka!A quick fix will be:
` import dotenv
dotenv.load_dotenv('~/.env')
from clearml import Task # Now we can load it.
import argparse
if name == "main":
# do stuff `wdyt?
maybe this can cause the issue?
Not likely.
In the original pipeline (the one executed from the Pycharm) do you see the "Pipeline" section under Configuration -> "Config objects" in the UI?
Is this like a local minio?
What do you have under the sdk/aws/s3 section ?
There are also "completed, aborted, queued" .
Archived is actually a tag (system tag, not user tag). There is a "state machines" of moving from one state to the other. The special case is "published" that we probably should have called "locked". The idea is that if a Task/Model is published, you cannot reset it (and even deleting requires force flag).
I would use additional user tags (or even system-tags) to mark "deployed" state, wdyt?
Ok, but whenΒ
nvcc
Β is not available, the agent uses the output fromΒ
nvidia-smi
Β right? On one of my machine,Β
nvcc
Β is not installed and in the experiment logs of the agent runnin there,Β
agent.cuda =
Β is the version shown withΒ
nvidia-smi
Already added to the next agent's version π
Hi GreasyPenguin14
This is what I did, but I could not reproduce the hang, how is this different from your code?
` from multiprocessing import Process
import numpy as np
from matplotlib import pyplot as plt
from clearml import Task, StorageManager
class MyProcess(Process):
def run(self):
# in another process
global logger
# Create a plot
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = ...
The only workaround I can think of is :series = series + 'IoU>X'
It doesn't look that bad π
Hi @<1559711593736966144:profile|SoggyCow20>
How did you configure the clerml.conf ? see here an example:
None
ElegantCoyote26 could you upgrade the docker-compose ?
Okay this is indeed reported in the UI, but the trains-agent is running the experiment, and seems to be failing to clone the repository in question.
Seems like a "https" error, git is actually failing to clone the repository error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.
Can you manually run the clone command on that machine ? I would guess there is some kind of firewall sitting in the middle of the https connection, and that is causing the git to ...
Sure thing, thanks FlutteringWorm14 !
we concluded that we don't want to run it through ClearML after all, so we ran it standalone
out of curiosity, what was the conclusion and why?
Hi EnviousStarfish54
Verified with the frontend / backend guys.
Backend allows to search for "all" tags, and frontend will add a toggle button for the UI to select or/all for the selected Tags.
Should be part of the next release
Hi DeliciousBluewhale87
When you say "workflow orchestration", do you mean like a pipeline automation ?
That makes total sense, this is exactly an OS scenario for signal 9 π
Do you think such a feature exists in ClearML?
Currently this is "fixed" for iterations (which is actually just a integer monotonic value) or the time stamp.
But I cannot see any reason why we could not allow users to control the x-axis title, and to be able to set it in code, I'm assuming this is what you have in mind?
I'm guessing the extra index URL can be a URL to the github repo of interest?
The extra index URL is exactly what you would be passing to pip install, meaning it has to comply to pypi artifactory api.
Make sense ?
I think task.init flag would be great!
π
Hi @<1687643893996195840:profile|RoundCat60>
anyone with access to the server
Is that a thing? If you have access to the server Not sure how "protected" you are even if using a key ring...
(unfortunately I do not think we support anything else, but what did you have in mind?
Yes, but only with git clone π
It is not stored on ClearML, this way you can work with the experiment manager without explicitly giving away all your code π
GrievingTurkey78 MagnificentSeaurchin79 do you guys want to start a PR branch we cal all work on it?
Okay, let's take a step back and I'll explain how things work.
When running the code (initially) and calling Task.init
A new experiment is created on the server, it automatically stores the git repo link, commit ID, and the local uncommitted changes . these are all stored on the experiment in the server.
Now assume the trains-agent is running on a different machine (which is always the case even if it is actually on the same machine).
The trains-agent will create a new virtual-environmen...
SmarmySeaurchin8 I might be missing something in your description. The way the pipeline works,
the Tasks in the DAG are pre-executed (either with "execute_remotely" or actually fully executed once").
The DAG nodes themselves are executed on the trains-agent , which means they reproduce the code / env for every cloned Task in the DAG (not on the original Tasks).
WDYT?
Hi JollyChimpanzee19
What are the versions (clearml , TF , PT), also could you add one more line from the stack (I.e. which call triggered the exception)
What's the "working directory" ?
What's the trains-agent version?
(yes this should have worked, as long as the package "test" is there)
If the only issue is this linetask.execute_remotely(..., exit_process=True)It has to finish the static analysis of the entire repository (which usually happens in the background but now we have to wait for it). If the repo is large this could actually take 20sec (depending on CPU/drive of the machine itself)
Hmm are you running the clearml-agent on this machine? (This is the orchestration module, it will spin the Tasks and the dockers on the gpus)
RobustRat47 are you saying updating the nvidia drivers solved the issue ?