AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 When It Comes To Continuous Training, I Wanted To Know How You Train Or Would Train If You Have Annotated Data Incoming? Do You Train Completely Online Where You Train As Soon As You Have A Training Example Available? Do You Instead Train When You Have A

Sorry for pinging you on this old thread.
...
And what was the learning strategy? ADAM? RMSProp?

Sorry, missed it...
I would actually use the HPO to test various setups (it uses Optuna under the hood so really SOTA hyper band Bayesian optimization ontop of them)
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py

2 years ago

0 Hi There, I’Ve Been Trying To Play Around With The Model Inference Pipeline Following

you need to set

CLEARML_DEFAULT_BASE_SERVE_URL:

So it knows how to access itself

one year ago

0 I’M Getting 404 Errors When Trying To Click Links For Notebook Artifacts And I’M Trying To Figure Out If It’S The File Or If It’S The File Server. Is There Some Sort Of Endpoint We Can Hit On The Fileserver To Verify It’S Available?

looks like at the end of the day we removed

proxy_set_header Host $host;

and use the fqdn for the proxy_pass line

And did that solve the issue?

3 years ago

0 Two Simple Lineage Related Questions:

Hi RoughTiger69
I like the direction this is taking, let me add some more complexity.
My thinking is that if we have “input datasets”, I'd also like to be able to clone the Task and automagically change them (with the need to export the dataset_id as an argument), basically I'm thinking :
train = Datasset.get('aabbcc1', name='train') valid = Datasset.get('aabbcc2', name='validation') custom = Datasset.get('aabbcc3', name='custom')Then you end up with HyperParameter Section: "Input Datas...

3 years ago

0 Two Simple Lineage Related Questions:

You’ll just need the user to

name them

as part of loading them in the code (in case they are loading multiple datasets/models).

Exactly! (and yes UI visualization is coming 🙂 )

3 years ago

0 How To Use

This is odd, Can you send the full Task log? (remove any pass/user/repo that you think is sensitive)

3 years ago

0 How To Use

MelancholyElk85 what do you have under "Installed Packages" for this specific Task ?

3 years ago

0 Is Anyone Also Experiencing Network Error During Every Clearml Dataset Download? It'S Been A While And Almost Every Download Fails...

Hmm maybe we should add a test once the download is done, comparing the expected file size and the actual file size, and if they are different we should redownload ?

2 years ago

0 Hi Everyone. I Have An Issue With The Simple Pipeline - It Runs Two Similar Nn Training Steps (Tf2.3, Windows10, Python 3.7) With Only Difference Is A Batch Size. I'M Running First Separately Each Step To Have Them In Clearml Project Page. Then I Run Pipe

YEY!

3 years ago

0 Trains Seems To Fail To Capture My Conda Environment, Any Idea? Os: Window 10

EnviousStarfish54 we just fixed an issue that relates to "installed packages" on windows.
RC is due to be release in the upcoming days, I'll keep you posted

4 years ago

0 Potential Feature Request: Having The Parallel Coordinates Plot Available From The Hp Parent Task. Right Now, If I Want To See The Parallel Coord Plot (Shown Below), I Have To Manually Select All Trials In A Hpo Run > Compare > Hyperparameters > Parallel

LudicrousParrot69 this is implementation issue, this entire page is based on "task comparison" single Task means totally different interface for querying the data 🙂

3 years ago

0 , This Is A Great Tool For Visualizing All Your Experiments. I Wanted To Know That When I Am Logging Scalar Plots With Title As Train Loss And Test Loss They Are Getting Diplayed As Train Loss And Test Loss In The Scalar Tab. I Wanted That The Title Shoul

Just so I understand,
scheduler executes main every 60sec
main spins X sub-processes
Each subprocess needs to report scalars ?

4 years ago

0 Hi Again! I Am Doing Batch Inference From A Parent Task (That Defines A Base Docker Image). However, I'Ve Encountered An Issue Where The Task Takes Several Minutes (Approximately 3-5 Minutes) Specifically When It Reaches The Stage Of "Environment Setup Co

However, there is still a delay of approximately 2 minutes between the completion of setup,

Where is that delay in the log?
(btw: it seems your container is missing clearml-agent & git, installing those might add some time)

7 months ago

0 Hello! Since Today I Get

(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)

3 years ago

0 Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

YEY

3 years ago

0 Hi I Have A Question: I Have 2 Python Scripts: The First Python Script Is Running The 2. The Imports In The First Script Are Working. But When I Run The Programm On The Gpu I Get For Example The Mistake: No Module Named Tensorflow. This Is A Import In The

os.system

Yes that's the culprit, it actually runs a new process and clearml assumes that there are no other scripts in the repository that are used, so it does not analyze them
A few options:
Manually add the missing requirement Task.add_requirements('package_name')make sure you call it before the Task.init
2. import the second script from the first script. This will tell clearml to analyze it as well.
3. Force the entire clearml to analyze the whole repository: https://g...

3 years ago

0 Hi! I Have A Gpu Workstation At The Office (No Public Ip) With Latest Clearml-Agent Installed. When I Was In The Same Network - I Was Able To Use Clearml-Session From My Laptop. Now I Work From Home, And Clearml-Session Fails With

Hmmm, yes we should definitely add --debug (if you can, please add a GitHub issue so it is not forgotten).
FiercePenguin76 Specifically are you able to ssh manually to <external_address>:<external_ssh_port> ?

3 years ago

0 Hey I’M Running This Script And Initialise The Clearml Task Also In This File

Or can it also be right after

Task.init()

?

That would work as well 🙂

one year ago

0 Security Question: In My Journey Of Running Clearml The "Hard Way" (Self-Hosted), One Problem I Haven'T Solved Is Security. Some Discussion Here...

Hi @<1541954607595393024:profile|BattyCrocodile47>

But the files API is still open to the world, right?

No, of course not 🙂 (i.e. API is authenticated with JWT header, this is why you need to generate the secret/key in the UI)
That said, the login process itself is user/pass stored on the server, but other than that the web/api are secured. The file server on the other hand is plain http storage and does not verify the connection like the API does. So if you are going the self-ho...

one year ago

0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

So this should be easier to implement, and would probably be safer.
You can basically query all the workers (i.e. agents) and check if they are running a Task, then if they are not (for a while) remove the "protection flag"
wdyt?

2 years ago

0 Hey All, I'M Testing The Usage Of

BoredHedgehog47 if you are running it on K8s, then the setup script is running before everything else, even before an agent appears on the machine, unfortunately this means the output is not logged yet, hence the missing console lines (I think the next version of the glue will fix that)
In order to test you can do:
export TEST_MEthen inside your code you will be able to see it
os.environ['TEST_ME']Make sense ?

2 years ago

0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

Is the agent itself registered on the clearml-server (a.k.a can you see it in the UI?)

3 years ago

0 Is It Possible To Add A Callback For A Pipeline From A Step?

Sure, you can pass ${stage_data.id} as argument and the actual Task will get the reference step's Task ID of the current execution.
make sense ?

3 years ago

0 Dear Developers, I Encountered A Question That The Local Module Cannot Be Found When Pulling Task From Queue. I Opened A Issue Here

Do you think the local agent will be supported someday in the future?

We can take this ode sample and extent it. can't see any harm in that.
It will enable very easy to ran "sweeps" without any "real agent" installed.

I'm thinking roll out multiple experiments at once

You mean as multiple subprocesses, sure if you have the memory for it

2 years ago

0 In Pipelinev2, Is It Possible To Register Artifacts To The Pipeline Task? I See There Is A Private Variable

Yep 🙂 but only in RC (or github)

2 years ago

0 Hi, I Have Such A Problem, After I Restore The Experiment From The Checkpoint, My Scalar Metrics Have Gaps Due To The Fact That My Iterations Are Not Zero. If The Smart Way Is How To Get Rid Of It?

Hi SourOx12
I think that you do not actually need this one:
step = step - cfg.start_epoch + 1you can just do
step += 1ClearML Will take care of the offset itself

3 years ago

It looks somewhat familiar ... 😞
SuccessfulKoala55 any idea?

3 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

DilapidatedDucks58 trains-agent adds the artifactory URL as --extra-index-url , are you sure you are getting the correct torch version in the container? because the torch html is not an artifactory html, it is a list of links, I just want to make sure you are getting the correct version, because otherwise it can default to the CPU version, which we don't want 🙂 anyhow you can use the direct link in the "installed packages and just put there " https://download.pytorch.org/whl/nightly/cu101...

4 years ago

0 I Have Used Aws S3 And Minio As Storage For Clearml Artifacts. But Has Anyone Used Nexus As A Storage ?

Quick update Nexus supports direct http upload, which means that as CostlyOstrich36 mentioned, just pointing to the Nexus http upload endpoint would work:
output_uri="http://<nexus>:<port>/repository/something/"See docs:
https://support.sonatype.com/hc/en-us/articles/115006744008-How-can-I-programmatically-upload-files-into-Nexus-3-

3 years ago

0 Hi New With Clearml I Create Clearml Server On Gcp With Docker Now I’M Training Yolov5 And I Want To Save All The Info (Model And Metrics ) With Clearml To My Bucket.. (So I Can Have Small Server And No Memory Issue ) Where Should I Start? Its Should Be C

AstonishingRabbit13
https://github.com/googleapis/google-cloud-python/issues/4941#issuecomment-369472576
check the openssl and the date, this seems like SSL low level error (even before authentication)

one year ago

Show more results