
Reputation
Badges 1
25 × Eureka!Oh that makes sense.
So now you can just get the models as dict as well (basically clearml allows you to access them both as a list, so it is easy to get the last created, and as dict so you can match the filenames)
This one will get the list of modelsprint(task.models["output"].keys())
Now you can just pick the best onemodel = task.models["output"]["epoch13-..."] my_model_file = model.get_local_copy()
Bummer... that seems like a bit of an oversight tbh.
There is never a solution for those, unless the helm chart "knows" something about the server before spinning it the first time, which basically means a predefined access-key, I do not think we want that 😉
This is very odd ... let me check something
Isn't that risky? not knowing you need a package ?
How do you actually install it on the remote machine with the agent ?
Hi Guys,
I hear you guys, and I know this is planned but probably bump down priority.
I know the main issue is the "Execution Tab" comparison, the rest is not an issue.
Maybe a quick Hack to only compare the first 10 in the Execution, and remove the limit on the others ? (The main isue with the execution is the git-diff / installed packages comparison that is quite taxing on the FE)
Thoughts ?
Hi EnviousStarfish54
After the pop up do you see the plot on the web UI?
I have also tried with type hints and it still broadcasts to string. Very weird...
Type hints are ignored, it's the actual value you pass that is important:
` @PipelineDecorator.component(return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
...
@PipelineDecorator.pipeline(name='custom pipeline logic', project='examples', version='0.0.5')
def executing_pipeline(pickle_url, mock_parameter='mock'):
da...
Is it possible to make a checkbox in the profile settings. which would answer az the maximum limit for comparison?
This feature is becoming more and more relevant.
So we are working on a better UI for it, so that this is not limited (it's actually the UI that is the limit here)
specifically you can add custom columns to the experiment table (like accuracy loss etc), and sort based on those (multiple values are also supported, just hold the Shift-Key). This way you can quickly explore ...
I want to be able to compare scalars of more than 10 experiments, otherwise there is no strong need yet
Make sense, in the next version, not the one that will be released next week, the one after with reports (shhh do tell anyone 🙂 ) , they tell me this is solved 🎊
Retrying (Retry(total=239, connect=240, read=240, redirect=240, status=240)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))': /auth.login
OH that makes sense I'm assuming on your local machine the certificate is installed but not on remote machines / containers
Add the following to your clearml.conf:
api.verify_certificate: false
[None](https...
BitterStarfish58 I would suspect the upload was corrupted (I think this is the discrepancy between the files size logged, to the actual file size uploaded)
For example, opening a project or experiment page might take half a minute.
This implies mongodb performance issue
What's the size of the mongo DB?
Yeah we should definitely have get_requirements 🙂
What's the python, torch, clearml version?
Any chance this can be reproducible ?
What's the full error trace/stack you are getting?
Can you try to debug it to where exactly it fails here?
https://github.com/allegroai/clearml/blob/86586fbf35d6bdfbf96b6ee3e0068eac3e6c0979/clearml/binding/import_bind.py#L48
RoughTiger69 wdyt?
AgitatedTurtle16 from the screenshot, it seems the Task is stuck in the queue. which means there is no agent running to actual run the interactive session.
Basic setup:
A machine running clearml-agent
(this is the "remote machine") A machine running cleaml-session (let's call it laptop 🙂 )You need to first start the agent on the "remote machine" (basically call clearml-agent daemon --docker --queue default
), Once the agent is running on the remote machine, from your laptop ru...
Yes, or at least credentials and API...
Maybe inside your code you can later copy the model into fixed location ?
This way you have the model in the model repository and a copy in a fixed location (StorageManager can upload to a specific bucket/folder with the same credentials you already have)
Would that work?
WickedGoat98 Notice this is not the "clearml-agent-services" docker but "clearml-agent" docker image
Also the default docker image is "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
Other than that quite similar :)
Thanks BitterStarfish58 !
ChubbyLouse32 could it be the configuration file is not passed to the agent machine itself ?
(were you able to run anything against this internal server? I mean to connect to it from code, clearml/cleamrl-agent) ?
might it be related to the docker socket not being mounted to the agent daemon running inside a docker container?
Oh yes, if the daemon is running Inside a docker container than you need both --privileged and mounting of the docker socket, to get it to work
Hi CleanPigeon16
You need to pass the private repository docker credentials to the aws instance, I would use the custom bash script option of the aws autoscaler to create the docker credentials file.
Hi FiercePenguin76
Artifacts are as you mentioned, you can create as many as you like but at the end , there is no "versioning" on top , it can be easily used this way with name+counter.
Contrary to that, Models do offer to create multiple entries with the same name and version is implied by order. Wdyt?
So the thing is clearml
automatically detects the last iteration of the previous run, my assumption you also add it hence the double shift.
SourOx12 could that be it?
Is gpu_0_utilization also in % then?
Correct 🙂
I was trying to find, what are those min and max value for above metrics.
Oh that makes sense, notice that you can get the values over time, so you can track the usage over the experiment lifetime (you can of course see it in the Scalar tab of the experiment)
Fixing that would make this feature great.
Hmm, I guess that is doable, this is a good point, search for the GUID is not always trivial (or maybe at least we can put in the description the project/dataset/version )
Correct 🙂
You can spin it in two modes, either venv or docker (notice that even in docker mode, it will still clone the code into the docker and install the packages inside the docker, but it also inherits from the docker preinstalled system packages, so that the installation process is a lot faster, but you have the ability to change packages without having to build an entire new docker image)
Hi @<1691620877822595072:profile|FlutteringMouse14>
Yes, feast has been integrated by at least a couple if I remember correctly.
Basically there are two ways offline and online feature transformation. For offline your pipeline is exactly what would be recommended. The main difference is online transformation where I think feast is a great start
WittyOwl57 I can verify the issue reproduces! 🎉 !
And I know what happens, TQDM is sending an "up arrow" key, if you are running inside bash, that looks like CR (i.e. move the cursor to the begining of the line), but when running inside other terminals (like PyCharm or ClearML log) this "arrow-key" is just unicode character to print, it does nothing, and we end up with multiple lines.
Let me see if we can fix it 🙂
Apparently it ignores it and replaces everything...