AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 Hi There Trains Riders, Is There A Built-In Way To Send Notifications Upon Completed/Failed Experiment? I Have Seen The Slack_Alerts Code Sample, Where The Monitor Is Implemented By Code. Nice. My Question Is About Existing Monitors In The Trains-Server (

ColossalDeer61 btw, it turns out the docker-compose services docker was ill configured on the GitHub 😞 I suggest you get the latest copy of it:
curl -o docker-compose.yml

5 years ago

0 1St: Is It Possible To Make A Pipeline Component Call Another Pipeline Component (As A Substep)? Or Only The Controller Can Do It? 2Nd: I Am Trying To Call A Function Defined In The Same Script, But Unable To Import It. I Passing The Repo Parameter To The

but not as a component (using the decorator)

Hmm yes, I think that component calling component as an external component is not supported yet
(basically the difference is , is it actually running as a function, or running on a different machine as another pipeline component)

I noticed that when a pipeline step returns an instance of a class, it tries to pickle.

Yes this is how the serialization works, when we pass data from one node to another (by design it supports multiple mach...

3 years ago

0 Hello, I Am Trying To Run The

I'll try to create a more classic image.

That is always better, though I remember we have some flag to allow that, you can try with:
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1 clearml-agent ...

3 years ago

0 Hi There, I'Ve Encountered A Problematic Behavior In Python. When Defining An Argument A Default Value Of

Hi PompousBeetle71
I remember it was an issue, but it was solved a while ago. Which Trains version are you using?

5 years ago

0 Hey I’M Running This Script And Initialise The Clearml Task Also In This File

Still this issue inside a child thread was not detected as failure and the training task resulted in "completed". This error happens now with the Task.init inside the

if name == "main":

as seen above in the code snippet.

I'm not sure I follow, the error seems like your internal code issue, does that means clearml works as expected ?

4 years ago

0 Hi

2 years ago

0 Base_Template_Keras_Simply.Py

DeliciousBluewhale87 this is exactly how it works,
The glue puts a k8s job with the requested docker image (the one on the Task), the job itself (k8s job) starts the agent inside the requested docker, then the agent inside the docker will install all the required packages.

4 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Instead you can do: TRAINS_WORKER_NAME = "trains-agent":$DYNAMIC_INSTANCE_ID
Then the Worker ID will running instance appended to the worker name. This means that even if you use the same $DYNAMIC_INSTANCE_ID twice, you will not have two agent registering on the same name.

5 years ago

0 Hi Everyone, Is It Possible To Sync My Local Git Folder With All My Source Code And Connect It To A Task?

Hi PunyPigeon71
Can you send the log from the remote execution?
Can you see on the Task in the UI , under execution tab, the correct git repo reference, commit ID, and uncommitted changes?

4 years ago

0 Hello! How Can I Use "Report_Scatter2D" In Order To Report Timestamp In The X-Axis?

When I have:
n = 20 duration = 1000 now = time.mktime(time.localtime()) timestamps = np.linspace(now, now + duration, n) dates = [dt.datetime.fromtimestamp(ts) for ts in timestamps] values = np.sin((timestamps - now) / duration * 2 * np.pi) fig = go.Figure(data=go.Scatter(x=dates, y=values, mode='markers')) task.get_logger().report_plotly( title="plotly", series="b", iteration=0, figure=fig)Everything looks okay

4 years ago

0 Hi! I Need Help Debugging The Following Issue Please. I'M Training A Cnn And Plotting The Confusion Matrices For Train And Val In Each Epoch. When I Get To Epoch 101, The Ui Kind Of Breaks..It Starts Showing Me The Images For Epoch 1. When I Right Click O

From creating the event to actually sending it ... 30 min sounds like enough "time"...

4 years ago

0 Base_Template_Keras_Simply.Py

As I suspected, from your log:
agent.package_manager.system_site_packages = falseWhich is exactly the problem of the missing tensorflow (basically it creates a new venv inside the docker, but without the flag On, it does not inherit the docker preinstalled packages)
This flag should have been true.
Could it be that the clearml.conf you are providing for the glue includes this value?
(basically you should only have the sections that are either credentials or missing from the default, there...

4 years ago

0 Base_Template_Keras_Simply.Py

DeliciousBluewhale87 could you send the new log?

4 years ago

0 Is There Any Example Showing How To Work With Nested Pipelines? In My Case I Have Several Functions Decorated With

YEY!

4 years ago

0 Hello! I Think I'Ve Found A Bug, But Couldn'T Fix It Completely To Make A Pull Request. I Want To Optimizer Hyperparameters With Trains.Automation But:

I'll make sure we fix the example, because as you pointed, it is broken :(

5 years ago

0 Hey Guys! I Have A Question, I Was Playing Around With An Experiment And Trying Out The Features. I Have A Model Which Is Detecting Some False Positives, And I’D Like To Know How To Debug This In Clearml: How To Log All Epoch Images? How To See The Top-Lo

Hi @<1597399925723762688:profile|IrritableStork32>
I think that if you have clearml installed an configured on your machine it should just work:
None

2 years ago

0 I Have A General Question About This Part In Dynamic Gpu Allocation. If For Example I Have A Machine That Has 8 Gpus And I Have 3 Queues: Queue1 Will Take 3Gpus, Queue2 Will Take Another 3Gpus, So In Queue3 Can I Put 2-4 Gpus?? If There Are Idle Gpus So T

Hi WickedBee96

Queue1 will take 3GPUs, Queue2 will take another 3GPUs, so in Queue3 can I put 2-4 GPUs??

Yes exactly !

if there are idle GPUs so take them to process the task? o

Correct, basically you are saying, this queue needs a minimum of 2 GPUs, but if you have more allocate them to the Task it pulled (with a maximum of 45 GPUs)
Make sense ?

3 years ago

0 Hey, Can Anyone Please Explain To Me How The /Tmp/.Clearml_Agent.Something.Cfg File Is Generated Which Next Is Used In Docker? Because This File Is Slightly Different From Mine For Example In Mine /Home/Asa/Clearml.Conf I Set System_Site_Packages = False

Hi ResponsiveCamel97
The agent generates a new configuration file to be mounted into the docker, with all the new folders as they will be seen inside the docker itself. One of the changes is the system_site_packages as inside the docker we want the new venv to inherit everything from the docker system installed packages.
Make sense ?

4 years ago

0 Hello Everyone, Does Someone Know Whether It Is Possible To Increase The Height Of The Results -> Plots? Hard To Interact With 3D Plots Currently Since These Are Relatively Small.

Hover over the border (I would suggest to use the full screen, i.e. maximize)

4 years ago

0 Anyone Deployed Trains On Azure, I Am Interested To Know About Your Experience.

Hi LazyLeopard18
I remember someone deploying , specifically on the AZURE k8s (can't remember now how they call it).
What is exactly the feedback you are after?

5 years ago

0 Hi There, There Seems To Be An Issue In The Web Ui -> Viewing Plots In "View In Experiment Table" Doesn'T Respect The "Scalars To Display" One Sets When Viewing In "View In Fullscreen". Is This A Bug Or Expected Behaviour?

ElegantKangaroo44 my bad 😞 I missed the nuance in the description

There seems to be an issue in the web ui -> viewing plots in "view in experiment table" doesn't respect the "scalars to display" one sets when viewing in "view in fullscreen".

Yes the info-panel does not respect the full view selection, It's on the to do list to add this ability, but it is still no implemented...

5 years ago

0 Help Please, After Creating My Data Drift Monitoring Dashboard Using Clearml Serving And Grafana, How Can I Configure My Alerts To Be Notified When The Distribution Of My Metrics (Variables) Changes On My Heatmaps?

I set up the alert rule on this metric by defining a threshold to trigger the alert. Did I understand correctly?

Yes exactly!

Or the new metric should...

basically combining the two, yes looks good.

one year ago

0 How To Use

MelancholyElk85 what do you have under "Installed Packages" for this specific Task ?

4 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

Can you verify it fixes the timeout issue as well? (or some insight on how to reproduce the issue?)

4 years ago

0 Hi, I Would Like To Follow-Up In This

JitteryCoyote63 oh dear, let me see if we can reproduce (version 1.4 is already in internal testing, I want to verify this was fixed)

3 years ago

0 Hi, I'M Trying To Get An Understanding Of How

just got the pipeline to run

Nice!

using the default queue okay?

Using the default queue is fine. The different queue is the "services" queue that by default the "trains-server" is running an agent the will pull jobs from there.
With "services" mode, an agent will pull jobs right after the other (not waiting for the previous job to finish), as opposed to regular queue (any other) that the trains-agent will pull a job only after the previous one completed .

5 years ago

0 Hey All -- I'M Fairly New To This But, As Of Today, My Required Packages Aren'T Being Recognized In Cloned Runs And They Are Repeatedly Failing. Has Anyone Had Similar Issues/Found A Fix?

Are you running a jupyter notebook inside vscode ?

3 years ago

0 What Happens To File That Are Downloaded To A Remote_Execution Via Storagemanager? Are They Removed At The End Of The Run, Or Does It Continuously Increases Disk Space?

It's always the details... Is the new Task running inside a new subprocess ?
basically there is a difference between
remote task spawning new tasks (as subprocesses, or as jobs on remote machine), remote task still running remote task, is being replaced by a spawned task (same process?!)UnevenDolphin73 am I missing a 3rd option? which of these is your case?
p,s. I have a suspicion that there might be a misuse of "Task" here?! What are you considering a Task? (from clearml perspective a Task...

3 years ago

0 For The Frameworks Which Are Supported In Built, Trains Stores The Trained Model As Output Model E.G. For Xgboost Here

so what should the value of "upload_uri" to set to,

fileserver_url

e.g.

?

yes, that would work.

5 years ago

0 Hello All

No worries, basically they are independent, spin your JupyerHub , then every user will have to set their own credentials on the JupyterLab instance they use. Maybe there is a way to somehow connect a specific OS environment user->JupyterLab in JupyterHub, that would mean users do not have to worry about credntials. wdyt?

3 years ago

Show more results