Reputation
Badges 1
25 × Eureka!Hmm, I think the issue is here (the docker command mount)'-v', '/tmp/.clearml_agent.de0n48pm.cfg:/root/clearml.conf'
I suspect it failed to create one on the host and then mount into the docker
PompousParrot44 please try to reply on the thread, so we do not create a mess in the main channel 🙂
What's the "working directory" in the execution section? Do you have package "test" in the installed packages?
Let me know if you managed to get it working, then we can see if we can detect it automatically.
I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)task.get_offline_mode_folder()
You can also have a soft link of the offline folder (if you are working on a linux machine:ln -s myoffline_folder ~/.trains/cache/offline
PompousParrot44 unfortunately not yet 😞
But the gist is :
MongoDB stores experiment data (i.e. execution parameters, git ref etc.)
ElasticSearch stores results (i.e. metrics console logs, debug image links etc.)
Does that help?
JuicyDog96 Yes please!
Let me check what's the status with the docs repository, and I'll get back to you soon 🙂
Hi DeliciousKoala34
Happened when cloning and running a task on an agent on a different machine. I
sounds like torch internal issue, can you send the full log of the remote Task ?
Hi, I changed it to 1.13.0, but it still threw the same error.
This is odd, just so we can make the agent better, any chance you can send the Task log ?
Hi @<1545216070686609408:profile|EnthusiasticCow4>
Oh dear, I think this argument is not exposed 😞
- You can open a GH
- If you want to add a PR this is very simple:None
include_archived=False,
):
if not include_archived:
system_tags = ["__$all", cls.__tag, "__$not", "archived"]
else:
system_tags = [cls.__tag]
...
system_tag...
It talks about referencing an issue.
Yes please, just better visibility 🙏
` @PipelineDecorator.component(
name="my step", return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
stuff here `This seemed to work for me
repeat it until they are all dead 🙂
but out of curiosity, whats the point on doing a hyperparam search on the value of the loss on the last epoch of the experiment
The problem is that you might end up with global min that is really nice, but it was 3 epochs ago, and you have the last checkpoint ...
BTW, global min and last min should not be very diff if the model converge, wdyt?
So clearml server already contains an authentication layer (JWT Token), and you do have a full user management on top:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication
Basically what I'm saying if you add httpS on top of the communication, and only open the 3 ports, you should be good to go. Now if you really need SSO (AD included) for user login etc, unfortunately this is not part of the open source, but I know they have it in the scale/ent...
in this week I have met at least two people combining ClearML with other tools (one with Kedro and the other with luigi)
I would love to hear how/what is the use case 🙂
If I run the pipeline twice, changing only parameters or code of taskB, ...
I'll start at the end, yes you can clone a pipeline in the UI (or from code) and instruct it to reuse previous runs.
Let's take our A+B example, Let's say I have a pipeline P, and it executed A and then B (which relies on A's output...
DS, this way they only need to remember (and me only need to teach them where to find) one id.
Yes that's the point, this ID is the Model UID (as opposed to the Task ID), the reason I kind if "insist" on it is that the Model ID is built into the system meaning, this is how you register it, as opposed to the Task ID that somehow needs to be hacked/passed externally
TBH the main reason I went with our API is that because of the custom model loading, we need to use the "custom" framew...
I want to run only that sub-dag on all historical data in ad-hoc manner
But wouldn't that be covered by the caching mechanism ?
make sure you follow all the steps :
https://clear.ml/docs/latest/docs/deploying_clearml/upgrade_server_linux_mac
(basically make sure you get the latest docker-compose.yml and the pull itcurl
-o /opt/clearml/docker-compose.yml docker-compose -f /opt/clearml/docker-compose.yml pull docker-compose -f /opt/clearml/docker-compose.yml up -d
while I'm looking to upload local weights
Oh, so this is not "importing uploaded (exiting) model" but manually creating a Model.
The easiest way to do that is actually to create a Task for Model uploading, because the model itself will be uploaded to unique destination path, and this is built on top of the Task.
Does that make sense ?
Which one of those? the 3d ball dots or the 3d face mesh?
My apologies you are correct 1.8.1rc0 🙂
AttributeError: 'PosixPath' object has no attribute 'loc'
SarcasticSquirrel56 I'm assuming the artifacts is pandas and you forgot to either import before or add as requirement for the Task 🙂
This is causing the artifact .get()
method to revert to returning the local path to the artifact, instead of actually de-serializing
(We should print a warning though, I'll make sure we do 🙂 )
EDIT: basically clearml failed to realize you also need pandas because it was never imported ...
Was trying to figure out how the method knows that the docker image ID belongs to ECR. Do you have any insight into that?
Basically you should have the docker service login before running the agent, then the agent uses docker to run the image from the ECR.
Make sense ?
I just set the git credentials in the
clearml.conf
and it works out of the box
git has issues with passing the user/token from the main repo to the submodules, hence my surprise that it is working out-of-the-box.
Do notice that if you are ussing ssh-key this is a none issue.
Nope, no
.netrc
defined anywhere, ...
If this is the case can you try to add the following to your "extra_vm_bash_script"
` echo machine example.com > ~/.netrc && echo log...