AgitatedDove14

49 Questions, 8054 Answers

Active since 10 January 2023

Last activity 9 months ago

Reputation

Badges 1

25 × Eureka!

Questions 49
Answers 8054

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Slack Security ... Go Figure

Slack security ... Go figure 😉

clearml

4 years ago

0 Votes

3 Answers

542 Views

0 Votes 3 Answers 542 Views

These Are Xgboost Internal Metrics That Are Automatically Picked By Clearml

@<1523703325881536512:profile|ConvolutedSealion94> these are xgboost internal metrics that are automatically picked by clearml

xgboost

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<!everyone> Trains v0.14.2 is out (<https://github.com/allegroai/trains/releases/tag/0.14.2|Change log>) Highlights: <https://github.com/allegroai/trains/blob/master/trains/storage/manager.py#L13|trains.storage.StorageManager> - with caching for any http

Trains v0.14.2 is out ( https://github.com/allegroai/trains/releases/tag/0.14.2 ) Highlights: https://github.com/allegroai/trains/blob/master/trains/storage/...

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi ! trains 0.16.2 is finally out with the new pipelines interface! Check out the new example https://github.com/allegroai/trains/blob/master/examples/pipeli...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hello Everyone!

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<!here> Gals/Guys/:robot_face: If you have ideas on improving the Slack Monitoring service, please add them on the dedicated Github Issue : <https://github.com/allegroai/trains/issues/161> For example: generate an alert if my experiment reaches a certain

Gals/Guys/ :robot_face: If you have ideas on improving the Slack Monitoring service, please add them on the dedicated Github Issue : https://github.com/alleg...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

New releases: ```pip install trains==0.13.3``` <https://github.com/allegroai/trains/releases/tag/0.13.3> ```pip install trains-agent==0.13.2``` <https://github.com/allegroai/trains-agent/releases/tag/0.13.2>

New releases: pip install trains==0.13.3https://github.com/allegroai/trains/releases/tag/0.13.3 pip install trains-agent==0.13.2https://github.com/allegroai/...

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi ClearML v0.17.1 and ClearML-Agent v0.17.0 are now the official packages & repositories 🎉 🎊 👋 🛤️ This new name brings on many changes, mainly replace a...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi Guys! I Have Great News, We Finally Fully Implemented Support For Continuing Previously Trained Models

Hi Guys! I have great news, we finally fully implemented support for continuing previously trained models 🎉 Here is a quick example (this is torch, but any ...

clearml

4 years ago

0 Votes

10 Answers

660 Views

0 Votes 10 Answers 660 Views

Happy Friday Everyone ! We Have A New Repo Release We Would Love To Get Your Feedback On

Happy Friday everyone ! We have a new repo release we would love to get your feedback on 🚀 🎉 Finally easy FRACTIONAL GPU on any NVIDIA GPU 🎊 Run our nvidi...

clearml

10 months ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<https://allegro.ai/docs>

https://allegro.ai/docs

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

I Would Guess Connectivity Issues, The Tls Is Probably Python Inaccurate Response (I Mean In A Way, It Is Also A Tls Error, But I Would Imagine This Has More To Do With The Actual Network Connection)

I would guess connectivity issues, the TLS is probably python inaccurate response (I mean in a way, it is also a TLS error, but I would imagine this has more...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

@YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo server, and do get the Scalars without any issues...

YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo se...

clearml

4 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi https://github.com/allegroai/trains/releases/tag/0.15.1 / https://github.com/allegroai/trains-server/releases/tag/0.15.1 / https://github.com/allegroai/tr...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Is You Server Using Https ?!

Is you server using https ?!

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

YEY!!!! *Download as CSV* :exploding_head:

YEY!!!! Download as CSV 🤯

clearml

2 years ago

0 Votes

2 Answers

546 Views

0 Votes 2 Answers 546 Views

Omg Look Who Just Joined The Pytorch Ecosystem

OMG Look who just joined the PyTorch EcoSystem None Yes! it is TRAINS 🚆 🎉 🎈

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

you set it :slightly_smiling_face:

you set it 🙂

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Apparently Everyone Can ...

apparently everyone can ...

clearml

4 years ago

Show more results

0 We Are Facing Performance Issues Of Our Self-Hosted Clearml Server Looking At The Cpu Utilization \ Memory \ Networking We Couldn'T Identify A Bottleneck We Are At The Moment Using ~100 Workers For Some Hpo, And The Main Performance Issues We Observe Are

Hi DepressedChimpanzee34
I think main issue here is slow response time from the API server, I "think" you can increase the number of API server processes, but considering the 16GB, I'm not sure you have the headroom.
At peak usage, how much free RAM so you have on the machine ?

3 years ago

0 Hi, A Question About Dataset Storage Suppose I Create A Dataset Like This

Hi MelancholyElk85
So the way datasets now work, is they are actually an entity (folder) inside a project , all under TFW hidden .datasets sub project
This is so all data and tasks are both on the same project , but at the same time will not intersect with subprojects by the same name. Does that make sense?

one year ago

0 Hi, I’M Trying To Create A Dataset On Clearml Server From My Aws S3 Bucket Via:

It is available of course, but I think you have to have clearmls-server 1.9+
Which version are you running ?

one year ago

0 Hi All, There Is A Way To Get From A Task-Object The Experiment Source Code? In Other Words, Assume I Have Access To A Specific Trains Server And Want To Store From A Particular Task The Experiment Source Code In A Temp File. There Is A Convenient Way To

SpotlessFish46 unless all the code is under "uncommitted changes" section, what you have is a link to the git repo + commit id

4 years ago

0 Hi Friends! I'M Trying To Upgrade The

Not intentional! When I launched the AMI it was running an older version

I think this is exactly the reason they decided to change the location 🙂 so you will have to manually upgrade, reasoning is we changed directory names (maybe a few more things)
Yes shutdown the current docker copse curl the new docker compose rename folder spin it up againFull instructions here:
https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_aws_ec2_ami.html#upgrading

3 years ago

0 Hi! I Deployed Clearml Server Along With Jupyterhub On Azure K8S (Aks). The Way It Works Is That Every User Is Assigned A New Pod That Is Spawned With A Docker Image Of A Choice (One Of Them With Clearml Sdk Installed). I Managed To Configure Most Of The

But they are all running inside the same pod, correct ?

3 years ago

0 Hi There. When Trying To Launch My Specific Docker, It Fails Launching Clientml-Agent Inside The Container Due To This...

Hmm I think you have a point here, the confusing part is the cp cmd. Can you send the full log? (Regradless , can I assume you are running a rootless container ?)

2 years ago

0 What Happens To File That Are Downloaded To A Remote_Execution Via Storagemanager? Are They Removed At The End Of The Run, Or Does It Continuously Increases Disk Space?

UnevenDolphin73 following the discussion https://clearml.slack.com/archives/CTK20V944/p1643731949324449 , I suggest this change in the pseudo code
` # task code
task = Task.init(...)

if not task.running_locally() and task.is_main_task():
# pre-init stage
StorageManager.download_folder(...) # Prepare local files for execution
else:
StorageManager.upload_file(...) # Repeated for many files needed
task.execute_remotely(...) `Now when I look at is, it kinds of make sense to h...

2 years ago

0 Hi

Sure, it will revert to the old behavior and run in threads

3 years ago

0 I'M Getting This When Running With Keras Framework. Clearml.Storage - Error - Failed Uploading: [Errno 21] Is A Directory: 'Model.Savedmodel'.

It reflects what is stored by Keras, so if Keras stores the best model this is what you get. BTW if you pass output_uri=True it will automatically upload the models

3 years ago

0 Hi There,

Hmm okay, this does point to a mem leak, any chance this is reproducible?

one year ago

0 Hi, I Am Trying To Run Experiment From Clearml Web Ui. I Did Experiment Copy, Enqueue, But In The Execution Log I See That It Runs Command

orchestration module
When you previously mention clone the Task I the UI and then run it, how do you actually run it?
regarding the exception stack
It's pointing to a stdout that was closed?! How could that be? Any chance you can provide a toy example for us to debug?

3 years ago

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

works seamlessly throughout and in our current on premise servers...

I'm assuming via something close to what I suggested above with .netrc ?

2 years ago

0 If I Update The Annotation Files For Some Dataset And Upload It. Can It Call The Previous Version Of Dataset Using The Dataset Id ?

Make sense ?

3 years ago

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

UnevenDolphin73
fatal: could not read Username for ' ': terminal prompts disabled .. fatal: clone of ' ' into submodule path '/root/.clearml/vcs-cache/xxx.60db3666b11ac2df511a851e269817ef/xxx/xxx' failedIt seems it tries to clone a submodule and fails due to to missing keys for the submodule.
https://stackoverflow.com/questions/7714326/git-submodule-url-not-including-username
wdyt?

2 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Most likely yes, but I don't see how clearml would have an impact here, I am more inclined to think it would be a pytorch dataloader issue, although I don't see why

These are most certainly dataloader process. But clearml-agent when killing the process should also kill all subprocesses, and it might be there is something going on that prenets it from killing the subprocesses ...

Is this easily reproducible ? Can you verify it is still the case with the latest RC of clearml-agent ?

one year ago

0 With Clearml 1.0 It Seems That Console Logs Are Only Shown In The Web Ui When The Task Has Finished. Is This Expected Behaviour? With Previous Versions I Was Able To See "Live" Output. I Tested This With The Pytorch_Tensorboardx.Py Example. I Run The Scri

Guys, any chance you can verify the RC solves the issue?
pip install clearml==1.0.2rc0

3 years ago

0 Hi, I Would Like To Follow-Up In This

So you mean 1.3.1 should fix this bug?

Yes it should see the release notes, there are a few "disappearing" UI fixes:
https://github.com/allegroai/clearml-server/releases/tag/v1.3.0

2 years ago

0 Hello! How Can I Use "Report_Scatter2D" In Order To Report Timestamp In The X-Axis?

Should work in all cases, plotly/matplotlib/scalar_rerport

3 years ago

0 Hi, I Have Another Problem

okay, now it should work 🙂

4 years ago

0 Hi Everyone, I Am Running A Pipeline Using The Autoscaler, I Am Able To Spin Up The Vm Instance Using The Autoscaler And The Docker Is Also Getting Installed In There Perfectly. The Issue I Am Facing Is That During Executing A Pipeline Task While Cloning

@<1610083503607648256:profile|DiminutiveToad80> try to turn on:
None

enable_git_ask_pass: true

one year ago

Hi DepressedChimpanzee34 , took me a while but I think there is a solution:
In your docker file, replace:
https://github.com/allegroai/clearml-server/blob/a64c4d264d00eadd2d11818b37151d3cc6266d99/docker/docker-compose.yml#L5
with
entrypoint: /bin/bash command: -c "mkdir -p /var/log/clearml && cd /opt/clearml/ && python3 -m apiserver.apierrors_generator && gunicorn -w 4 -t 600 --bind=0.0.0.0:8008 apiserver.server:app"

3 years ago

0 Would Appreciate Some Help. Getting This Error. Valueerror: Node Train_Model, Parameter '${Split_Dataset.Split_Dataset_Id}', Input Type 'Split_Dataset_Id' Is Invalid

Hi VexedCat68
So if I understand correctly, the issue is this argument:
parameter_override={'Args/dataset_id': '${split_dataset.split_dataset_id}', 'Args/model_id': '${get_latest_model_id.clearml_model_id}'},I think that what is missing is telling it this an artifact:
parameter_override={'Args/dataset_id': '${split_dataset.artifacts.split_dataset_id.url}', 'Args/model_id': '${get_latest_model_id.clearml_model_id}'},You can see the example here:
https://clear.ml/docs/latest/docs/ref...

3 years ago

0 Did Someone Here Already Try The

I can then programmatically choose which file to import with importlib. Is there a way to tell clearml programmatically to analyze the files, so it can built up the requirements correctly?

Sadly no 😞
It analyzes the running code, then if it decides it is not a self contained script it will analyze the entire repo ...

I just saw that

Task.create

takes

Task.create is Not Task.init. It is meant to allow you to create new Tasks (think Jobs) from ...

3 years ago

0 Hi There

set a parameter in that task and enqueue it

how do you do that?

4 years ago

0 Hi, I Want To Run A Script Remotely On My Agent, But For It To Work I Need It To Download To The Agent The Whole Directory The Script Is In, Is It Possible?

🤞

3 years ago

0 If I Have A Dataset And I Process It And I Want The Processed Data As Another Dataset, Is Parent The Right Approach?

LOL AlertBlackbird30 had a PR and pulled it 🙂
Major release due next week after that we will put a a roadmap on the main GitHub page.
Anything specific you have in mind ?

3 years ago

0 Any Chance Storagemanager Could Re-Download Files Only If Their Size Is Different From File In Cache (As An Option)?

any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?

I think there is force argument, to force download.
I think the main issue is getting the size from different backends (i.e. s3 /https / etc.)
Maybe we should add it as a GitHub feature request issue?
The main limitation is that the driver "list()" does not return file size.
For example it might be an issue with the default http files-server.
wdyt?

3 years ago

0 Is It Possible To Add A Callback For A Pipeline From A Step?

Is task.parent something that could help?

Exactly 🙂 something like:
# my step is running here the_pipeline_task = Task.get_task(task_id=task.parent)

3 years ago

0 When Running An Experiment From A Notebook, It Knows It’S A Notebook And Automatically Adds The Notebook As An Artifact Right? And The Uncommited Changes Becomes The Nottebook Converted To A Script? In One Case I Am Seeing Actual Git Diff Coming In Instea

I always have my notebooks in git repo but suddenly it's not running them correctly.

What do you mean?

Can I switch off git diff (change detection?)

Yes, Task.init(..., auto_connect_frameworks={"detect_repository": False})

3 years ago

Show more results