Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
AgitatedDove14
Moderator
48 Questions, 8043 Answers
  Active since 10 January 2023
  Last activity 5 months ago

Reputation

0

Badges 1

25 × Eureka!
0 Hey, Using K8S With Trains 0.16.1-320, All Of A Sudden The Entire Data (I.E Experiments, Tasks, Api Creds) Is Not Showing In The Ui Anymore. All Logs Seems To Be Fine Afai Can Tell... Any Idea What Went Wrong?

so if the node went down and then some other node came up, the data is lost

That might be the case. where is the k8s running ? cloud service ?

3 years ago
0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

None of them is problematic, this is what I'm trying to say 🙂
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:
task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)

4 years ago
0 Hi Everyone. I'M New To Trains. I Do Not Have Sudo Access To My Departmental Servers. Can I Still Use Trains Beyond The Demo Server?

Hmm you will have to set the trains-server on a machine somewhere, it can be any machine win / Mac / Linux

4 years ago
0 Hi Folks. I'Ve Installed Clearml On K8S Cluster Using Helm Chart 7.11.0, If It Matters. When I Trying To Create "App Credentials" From Workspace Settings And Then Past Them To Clearml-Init - I Got The Error:

And are you sure your are pointing to the correct API server and not mixing API with WEB address ?
Also what's the clearml-server version?

one month ago
0 Thread Re: Pipelines And How They'Re Meant To Be Used / How Long They Take To Orchestrate.

if i put pipe.start earlier in the code, the pipeline fails to execute the actual steps.

pipe.start should be called after the pipeline was constructed and should be the "last" call of the script.
Not sure I follow what is "before" the code?

4 months ago
0 In Order To Use The Aws Autoscaling, With Spot And Without Spot Instances - Should We Create A Custom Policy With The Associated Iam Or Will One Of The Two Aws Managed Policies (Or Both) Will Suffice?

WackyRabbit7 you can configure AWS autoscaler with two types of instances , with priority to one of them. So in theory you do not need two autoscaler processes, with that in mind I "think" single IAM should suffice

3 years ago
0 Hi I Wanted To Use Method Task.Reset() Or Task.Delete() However None Of That Seems To Be Able To Delete

I want to be able to delete only the logs since they are taking a lot of space in my case.

I see... I do not think this is possible 😞
You can disable the auto logging though ... pass auto_connect_streams=False to Task.init

one year ago
0 <image>

No need, it should auto close it if you started it with Task.init (or the agent executed it)

3 years ago
0 <image>

No sure what O'm seeing here

3 years ago
0 <image>

How do I reproduce it ? (all the processes are on the same machine?)

3 years ago
0 <image>

Let me check, see what can be learned ...

3 years ago
0 Hi, I Am Getting Following Error While Trying To Checkout A Gut Hub Rep. Error: Rpc Failed; Curl 56 Gnutls Recv Error (-54): Error In The Pull Function. Fatal: The Remote End Hung Up Unexpectedly Fatal: Early Eof Fatal: Index-Pack Failed Repository Cloni

Could you right click on the failed experiment , select reset and send it again for execution?
Could that error be a random network issue ?
(Basically this seems like a generic network error not actually related to the trains-agent)
Is the trains-agent running in docker mode or venv mode?

4 years ago
0 Monitoring Related Question

Hi @<1607909176359522304:profile|UnevenCow76>

followed the below documentation to implement the clearml monitoring using prometheus and grafana

Did you try following this example, it includes both deploying a model and adding grafana metrics:
None

one year ago
0 I Have Another Small Technical Question, I Am Trying To See The Workers Status Programatically Using The Folowing:

Hmm yes we should probably provide metrics:
client.workers.get_stats(..., items=[dict(key='cpu_usage'), dict(key='gpu_usage')])

2 years ago
0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

When I give my Minio to output_uri argument, it uploads 500 KB /sec as before.

But it worked well when using StorageManager and uploading to the minio directly, is that correct?

.. I give my Minio to output_uri argument

How long did it take to run the demo code I posted?
(The one you mentioned took 0.16s to run locally)

4 years ago
0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?

This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)

3 years ago
0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

ThickDove42 looking at the code, I suspect it fails interacting with the actual jupyter server (that is running on the same machine, but still).
Any chance you have a firewall on the Windows machine ?

3 years ago
0 What Is The Suggested Way Of Running Trains-Agent With Slurm? I Was Able To Do A Very Naive Setup: Trains-Agent Runs A Slurm Job. It Has The Disadvantage That This Slurm Job Is Blocking A Gpu Even If The Worker Is Not Running Any Task. Is There An Easy Wa

Hi HealthyStarfish45
Funny just today I had a similar discussion on slurm:
https://allegroai-trains.slack.com/archives/CTK20V944/p1603794531453000

Anyhow, when you say "[scale up agents]" are you referring to a machine constantly running an agent pulling jobs from the queue, where the machine itself (aka the resource) is managed as a slurm job?

3 years ago
0 Help Please, After Creating My Data Drift Monitoring Dashboard Using Clearml Serving And Grafana, How Can I Configure My Alerts To Be Notified When The Distribution Of My Metrics (Variables) Changes On My Heatmaps?

Hi @<1673501397007470592:profile|RelievedDuck3>

how can I configure my alerts to be notified when the distribution of my metrics (variables) changes on my heatmaps?

This can be done inside grafana, here is a simple example:
None
Specifically you need to create a new metric that is the distance of current distribution (i.e. heatmap) from the previous window), then on the distance metric, ...

5 months ago
0 Hello, I Downloaded The Docker-Compose For Windows But When Starting It Up I'M Getting The Following Error For Mongo:

Hi SmoothSheep78
Do you need to import the previous state of the trains-server, or are you starting from scratch ?

3 years ago
Show more results compactanswers